Calling LLMs from Python: The Request Loop, Tokens, Cost, and Retries
Call language models from Python with the Claude SDK: the messages loop, tokens and cost, and a client with timeouts and retries you can inject and test.
Ad blocker or privacy protection detected
Ads and Google Analytics may be blocked. You can allow this site in your browser or extension settings, then reload; the site still works if you continue.
Category
AI & LLM related content.
Call language models from Python with the Claude SDK: the messages loop, tokens and cost, and a client with timeouts and retries you can inject and test.
Get dependable JSON from language models with structured outputs and function calling, then validate with Pydantic so your code works with typed objects.
Build a RAG service end to end: chunk documents, embed and search by similarity, and answer grounded in retrieved context from a FastAPI endpoint.
Stream a real model response to the browser: consume the model stream in Python, forward it through a FastAPI SSE endpoint, and render it live.
Build a small AI agent API: the tool calling loop, conversation memory, and the guardrails that keep an action taking agent safe and bounded.