AI & LLM

Calling LLMs from Python: The Request Loop, Tokens, Cost, and Retries

Call language models from Python with the Claude SDK: the messages loop, tokens and cost, and a client with timeouts and retries you can inject and test.

Cover image for Structured Outputs and Function Calling: Getting Reliable JSON from LLMs

1 month ago 85

AI & LLM

Structured Outputs and Function Calling: Getting Reliable JSON from LLMs

Get dependable JSON from language models with structured outputs and function calling, then validate with Pydantic so your code works with typed objects.

Cover image for Building a RAG Service with FastAPI: Chunking, Embeddings, and Vector Search

1 month ago 76

AI & LLM

Building a RAG Service with FastAPI: Chunking, Embeddings, and Vector Search

Build a RAG service end to end: chunk documents, embed and search by similarity, and answer grounded in retrieved context from a FastAPI endpoint.

Cover image for Streaming LLM Responses to the Browser with FastAPI and SSE

1 month ago 102

AI & LLM

Streaming LLM Responses to the Browser with FastAPI and SSE

Stream a real model response to the browser: consume the model stream in Python, forward it through a FastAPI SSE endpoint, and render it live.

Cover image for Building an AI Agent API: Tool Calls, Memory, and Guardrails

1 month ago 85

AI & LLM

Building an AI Agent API: Tool Calls, Memory, and Guardrails

Build a small AI agent API: the tool calling loop, conversation memory, and the guardrails that keep an action taking agent safe and bounded.