Calling LLMs from Python: The Request Loop, Tokens, Cost, and Retries
Call language models from Python with the Claude SDK: the messages loop, tokens and cost, and a client with timeouts and retries you can inject and test.
Ad blocker or privacy protection detected
Ads and Google Analytics may be blocked. You can allow this site in your browser or extension settings, then reload; the site still works if you continue.
Writing
Long-form writing on development, tooling, and lessons from the workbench plus book reviews when a title sticks.
Topics
Jump to a category or open all categories.
Newest first. Use search or categories above to narrow down.
Call language models from Python with the Claude SDK: the messages loop, tokens and cost, and a client with timeouts and retries you can inject and test.
Use Pydantic v2 to validate data at the boundary: models, field constraints, custom validators, and clean serialization, the backbone of FastAPI and reliable LLM output.
Understand asyncio and the event loop, then fan out API and LLM calls concurrently with httpx, gather, and a semaphore to bound load.
Build real FastAPI endpoints: typed routing, Pydantic request and response models, dependency injection, and automatic docs.
Harden a FastAPI app for production: typed settings with pydantic-settings, bearer auth, logging and CORS middleware, and a scalable project structure.
Stream responses from FastAPI with server sent events, run side effects with BackgroundTasks, and know when to move to a real task queue.
Test FastAPI with pytest and the test client: assert on validation, override dependencies to isolate from real services, and cover async and streaming code.
Stream a real model response to the browser: consume the model stream in Python, forward it through a FastAPI SSE endpoint, and render it live.