Cover image for FastAPI in Production: Settings, Auth, Middleware, and Project Structure

At a glance

Reading time

~200 words/min

Published

2 hours ago

Jun 10, 2026

Views

3

All-time total

FastAPI in Production: Settings, Auth, Middleware, and Project Structure

Part 5 takes the API from working to deployable. A toy app reads config from scattered constants and trusts every caller. A production app loads typed settings from the environment, authenticates requests, adds middleware for logging and CORS, and is laid out so it can grow. This part covers each of those.

What you will add

  • Typed settings loaded from the environment with pydantic-settings
  • Bearer token authentication as a reusable dependency
  • Middleware for request logging and CORS
  • A project structure that scales past a single file

1. Typed settings from the environment

Never hardcode secrets or read os.environ by hand across the codebase. pydantic-settings gives you a typed settings object validated on startup, so a missing or malformed variable fails loudly before the app serves traffic.

from functools import lru_cache
from pydantic_settings import BaseSettings, SettingsConfigDict

class Settings(BaseSettings):
    model_config = SettingsConfigDict(env_file=".env", extra="ignore")

    app_name: str = "LLM App"
    api_key: str
    anthropic_api_key: str
    request_timeout: float = 30.0

@lru_cache
def get_settings() -> Settings:
    return Settings()  # raises on startup if api_key is missing

Warning

Keep secrets out of source control

Put real values in a .env file that is gitignored, and provide a .env.example with blank placeholders. Never commit live keys.

2. Authentication as a dependency

Build on the dependency pattern from Part 4. A bearer token dependency reads the Authorization header, compares it to the configured key, and rejects anything else. Apply it per route or to a whole router.

from fastapi import Depends, HTTPException
from fastapi.security import HTTPBearer, HTTPAuthorizationCredentials

bearer = HTTPBearer()

async def require_token(
    creds: HTTPAuthorizationCredentials = Depends(bearer),
    settings=Depends(get_settings),
) -> None:
    if creds.credentials != settings.api_key:
        raise HTTPException(status_code=401, detail="invalid token")

3. Middleware for logging and CORS

Middleware wraps every request. Use built in CORS middleware to control which origins may call your API, and a small custom middleware to log method, path, and timing.

import time
from fastapi import FastAPI, Request
from fastapi.middleware.cors import CORSMiddleware

app = FastAPI()
app.add_middleware(
    CORSMiddleware,
    allow_origins=["https://example.com"],
    allow_methods=["*"],
    allow_headers=["*"],
)

@app.middleware("http")
async def log_requests(request: Request, call_next):
    start = time.perf_counter()
    response = await call_next(request)
    ms = (time.perf_counter() - start) * 1000
    print(f"{request.method} {request.url.path} -> {response.status_code} ({ms:.0f}ms)")
    return response

4. A structure that scales

One main.py is fine until it is not. Split routes into routers, keep settings and dependencies in their own modules, and assemble them in a small app factory. This keeps imports clean and tests easy.

src/llm_app/
  main.py            # create_app(), wires routers + middleware
  config.py          # Settings, get_settings
  deps.py            # shared dependencies (auth, clients)
  routers/
    projects.py      # APIRouter for /projects
    chat.py          # APIRouter for /chat (added in later parts)

Checkpoint

Why load settings through a cached get_settings dependency instead of reading os.environ directly?

5. Global exception handling

Per route try and except blocks scattered everywhere are a smell. A single exception handler turns any unhandled error into a clean JSON response with the right status code, and gives you one place to log the failure with context. Handle your own domain errors explicitly, and add a catch all so an unexpected exception becomes a controlled 500 instead of a stack trace leaking to the client.

from fastapi import FastAPI, Request
from fastapi.responses import JSONResponse

app = FastAPI()

class DomainError(Exception):
    def __init__(self, message: str, status: int = 400) -> None:
        self.message, self.status = message, status

@app.exception_handler(DomainError)
async def handle_domain_error(request: Request, exc: DomainError):
    return JSONResponse(status_code=exc.status, content={"error": exc.message})

@app.exception_handler(Exception)
async def handle_unexpected(request: Request, exc: Exception):
    # log exc with request context here, then return a safe message
    return JSONResponse(status_code=500, content={"error": "internal error"})

The rule is to never let an internal exception reach the client verbatim. A controlled error response is easier to handle on the frontend, and it does not leak file paths, query fragments, or library internals that help an attacker.

6. Health and readiness endpoints

Anything that runs your container, a load balancer, Kubernetes, a platform host, needs to know whether the app is alive and whether it is ready to serve traffic. A liveness check answers is the process up, while a readiness check answers can it actually do its job, for example reach its database or model API. Keep liveness trivial so it never fails for the wrong reason, and put real dependency checks in readiness.

from fastapi import APIRouter

router = APIRouter(tags=["health"])

@router.get("/healthz")        # liveness: is the process up
async def healthz() -> dict:
    return {"status": "ok"}

@router.get("/readyz")         # readiness: can it serve traffic
async def readyz() -> dict:
    # check critical dependencies here (db ping, etc.)
    checks = {"database": True}
    ok = all(checks.values())
    return {"ready": ok, "checks": checks}

7. Structured logging you can search

Print statements do not survive contact with production. You want logs that are easy to filter and that carry context: the request path, the status, the duration, and ideally a request id you can trace across services. Configure logging once at startup, log as structured data rather than free text where you can, and the middleware from earlier becomes a real request log instead of a debug print.

import logging

logging.basicConfig(
    level=logging.INFO,
    format='{"level":"%(levelname)s","logger":"%(name)s","msg":"%(message)s"}',
)
log = logging.getLogger("llm_app")

# inside the logging middleware from earlier:
# log.info(f"request path={request.url.path} status={response.status_code} ms={ms:.0f}")
💡

Tip

Rate limit the expensive routes

LLM endpoints cost real money per call, so a single abusive client can run up a bill fast. Put a rate limiter in front of model-backed routes, keyed by API key or IP, before you expose them publicly.

8. One app, many environments

The same code runs on your laptop, in CI, in staging, and in production, and each needs different values: a local database here, a managed one there, debug logging in one place and quiet logging in another. The typed settings object is where that difference lives. Local development reads a .env file, while production injects real environment variables from the platform or a secret manager, and the app does not care which, because it only ever sees a validated Settings instance.

from fastapi import Depends, FastAPI
from llm_app.config import Settings, get_settings

app = FastAPI()

@app.get("/config-check")
async def config_check(settings: Settings = Depends(get_settings)) -> dict:
    # never return secrets; expose only safe, non-sensitive values
    return {"app": settings.app_name, "timeout": settings.request_timeout}

Two rules keep this safe. Never return a secret from an endpoint, even a debug one, and never log the value of an API key. The settings object should be the only thing that ever touches a raw secret, and it hands the rest of the app exactly the typed values it needs and nothing more.

9. A deployment checklist

Before you expose the API, walk a short list. Run behind a production ASGI server such as uvicorn with multiple workers, sized to your host. Put it behind a reverse proxy or platform that terminates TLS, so traffic is encrypted. Lock CORS to the origins that actually need access rather than a wildcard. Set log level to info or warning, not debug. Confirm the readiness endpoint reflects real dependencies. And make sure every secret comes from the environment, with nothing sensitive baked into the image or committed to git.

Pros

  • Typed settings fail fast on startup when config is wrong
  • Secrets stay in the environment, never in code or images
  • CORS, TLS, and rate limits are set before going public

Cons

  • A wildcard CORS policy invites abuse from any origin
  • Debug logging in production leaks internals and slows the app
  • Hardcoded config means a redeploy for every environment change

The bottom line

Production readiness is mostly boring discipline: typed config, real auth, middleware you can see through, and a layout that does not collapse as the app grows. None of it is glamorous, and that is the point, because the glamorous bugs are the ones that come from skipping it. A config value validated on startup never becomes a confusing runtime failure. A bearer token checked in one dependency cannot be forgotten on a new route. A global exception handler means an unexpected error is a clean 500, not a stack trace handed to a stranger. And health endpoints plus structured logs mean that when something does go wrong, you can see it instead of guessing. With these in place your API is safe to expose to real traffic. Next we add the features that make LLM apps feel alive: streaming responses and background work.

? Frequently asked questions

Where do I put database setup? +

Behind a dependency in deps.py that yields a session per request. The same override trick used for settings lets tests swap in a test database.

Should I use OAuth instead of a bearer token? +

For user facing apps, yes. For service to service calls, a bearer token or signed key is often enough. The dependency shape is the same either way.

How many uvicorn workers should I run? +

Start with a small multiple of CPU cores and tune against real traffic. Because most time is spent awaiting the model, async concurrency within each worker matters as much as the worker count.

Up next: Part 6, streaming and background work.

Newsletter

Want more posts like this?

Get practical software notes and tutorials delivered when something new is published.

No spam. Unsubscribe anytime.

How did this land?

Comments

0
Log in or sign up to join the discussion and react to this post.

No comments yet. Be the first to share your thoughts.

Related posts

FastAPI Fundamentals: Routing, Pydantic Models, and Dependency Injection

Build real FastAPI endpoints: typed routing, Pydantic request and response models, dependency injection, and automatic docs.

2 hours ago

Streaming and Background Work in FastAPI: SSE and BackgroundTasks

Stream responses from FastAPI with server sent events, run side effects with BackgroundTasks, and know when to move to a real task queue.

2 hours ago

Testing FastAPI the Right Way: pytest, the Test Client, and Validation

Test FastAPI with pytest and the test client: assert on validation, override dependencies to isolate from real services, and cover async and streaming code.

2 hours ago