Part 12 is the capstone. An agent is a model that can call your tools in a loop: it decides an action, you run it, the result goes back, and it continues until the task is done. This part builds that loop on top of the function calling from Part 9, adds simple memory, and puts guardrails in place so the agent stays safe and bounded. Everything from the series comes together here.
What you will build
- The agent loop: model decides, you execute, results return
- Real tools the model can call, validated before they run
- Lightweight conversation memory across turns
- Guardrails: iteration limits, allowed tools, and safe execution
1. The agent loop
The loop is the whole idea. You send the conversation plus the tool definitions. If the model returns a tool_use block, you execute that tool, append the result as a tool_result, and call again. When the model stops asking for tools, you have your answer.
from anthropic import Anthropic
client = Anthropic()
TOOLS = [{
"name": "get_order_status",
"description": "Look up the status of an order by id.",
"input_schema": {
"type": "object",
"properties": {"order_id": {"type": "string"}},
"required": ["order_id"],
},
}]
def run_tool(name: str, args: dict) -> str:
if name == "get_order_status":
return f"Order {args['order_id']} shipped on 2026-06-01."
raise ValueError(f"unknown tool: {name}")
def agent(question: str, max_steps: int = 5) -> str:
messages = [{"role": "user", "content": question}]
for _ in range(max_steps): # guardrail: bounded loop
resp = client.messages.create(
model="claude-opus-4-8", max_tokens=1024, tools=TOOLS, messages=messages,
)
messages.append({"role": "assistant", "content": resp.content})
if resp.stop_reason != "tool_use":
return "".join(b.text for b in resp.content if b.type == "text")
results = []
for block in resp.content:
if block.type == "tool_use":
output = run_tool(block.name, block.input)
results.append({
"type": "tool_result",
"tool_use_id": block.id,
"content": output,
})
messages.append({"role": "user", "content": results})
return "Stopped: reached the step limit."
That loop is the entire agent. Notice how it reuses the request loop from Part 8 and the typed tool inputs from Part 9. Validate block.input against a Pydantic model before running the tool so a malformed call never reaches your code.
2. Memory across turns
The list of messages is the memory. To remember across separate requests, persist that list keyed by a conversation id and reload it on the next turn. For long conversations, summarize older turns so you do not send unbounded context and cost.
# minimal per-conversation memory
SESSIONS: dict[str, list] = {}
def history(conversation_id: str) -> list:
return SESSIONS.setdefault(conversation_id, [])
# load before the loop, save the updated messages after it
3. Guardrails
An agent that can take actions needs limits. The loop above already caps steps. Add an allowlist so the model can only call tools you trust, validate every tool input, and require confirmation for anything destructive. Treat tool inputs as untrusted, exactly as you would a request body.
✓ Pros
- Cap the number of loop steps
- Allowlist tools and validate every input
- Require confirmation for destructive actions
- Log every tool call for auditing
✕ Cons
- An unbounded loop can run up cost and time
- Executing raw tool input is a code injection risk
- A tool with side effects and no confirmation can do real damage
Info
Wire it into FastAPI
Wrap agent() behind a POST endpoint, inject the client as a dependency from Part 5, stream the final answer with Part 11, and test the loop with the overrides from Part 7. The whole series plugs in here.
Checkpoint
What is the single most important guardrail on a tool calling agent?
4. Validate tool input before you run it
The model fills a tool input to match your schema, but you should still validate it into a Pydantic model before acting, for the same reason you validate a request body. The schema guarantees shape, your model can enforce real rules, and the typed object is safer to pass into your code. This is the bridge between Part 9 and a tool that touches anything that matters.
from pydantic import BaseModel, ValidationError
class OrderLookup(BaseModel):
order_id: str
def run_tool(name: str, raw_input: dict) -> str:
if name == "get_order_status":
try:
args = OrderLookup.model_validate(raw_input) # validate, do not trust
except ValidationError:
return "Invalid tool input."
return f"Order {args.order_id} shipped on 2026-06-01."
return f"Unknown tool: {name}"
5. Memory that does not grow forever
The message list is the agent memory, and left unchecked it grows every turn, which raises both cost and latency since the whole history is re-sent each call. The fix is to summarize. When the conversation passes a threshold, replace the oldest turns with a short summary the model writes, keeping recent turns verbatim. The agent keeps its context without paying to resend the entire past on every step.
def compact_history(client, messages: list, keep_recent: int = 6) -> list:
if len(messages) <= keep_recent:
return messages
old, recent = messages[:-keep_recent], messages[-keep_recent:]
summary = client.messages.create(
model="claude-opus-4-8", max_tokens=300,
messages=[{"role": "user",
"content": f"Summarize this conversation briefly:\n{old}"}],
).content[0].text
return [{"role": "user", "content": f"Summary so far: {summary}"}, *recent]
6. Trace every step
An agent that takes actions in a loop is hard to debug if you cannot see what it did. Log each turn: which tool it called, the input, the result, and the token usage. With that trace you can answer the questions that come up in production, why did it call that tool, where did it loop, what did the run cost, without re-running anything. The same operational hygiene from the model client part, applied to a multi step process.
import logging
log = logging.getLogger("llm_app.agent")
# inside the loop, after each tool_use block:
# log.info(f"step={step} tool={block.name} input={block.input}")
# after each model call:
# log.info(f"step={step} stop={resp.stop_reason} out_tokens={resp.usage.output_tokens}")
7. Wire the agent into FastAPI
The agent becomes a feature when it lives behind an endpoint, and at that point every part of this series is in the room. The route validates the request with a model, pulls the injected model client, runs the bounded loop, and can stream the final answer. Authentication, rate limiting, logging, and tests all apply unchanged, because the agent is just another handler that happens to call the model several times.
from anthropic import Anthropic
from fastapi import APIRouter, Depends
from pydantic import BaseModel
from llm_app.deps import get_model_client
router = APIRouter(prefix="/agent", tags=["agent"])
class AgentRequest(BaseModel):
question: str
@router.post("/ask")
async def ask(body: AgentRequest, client: Anthropic = Depends(get_model_client)) -> dict:
answer = agent(body.question, client=client) # the bounded loop from above
return {"answer": answer}
From here the upgrades are incremental and optional: stream the answer with Part 11, persist memory keyed by a conversation id, add more tools behind the same allowlist, and require confirmation before any destructive action. The core never changes, it is still a bounded loop with validated tools and a visible trace.
8. Do you need an agent framework?
Frameworks promise to handle the loop, the memory, and the tool plumbing for you. The honest answer is that you do not need one to start, and building the loop yourself first is the best way to understand what any framework is actually doing on your behalf. Reach for a framework when you genuinely outgrow the simple version: many tools to manage, multi step planning, shared tracing and evaluation infrastructure, or a team that benefits from a common structure. Adopt it with eyes open, because the loop you just wrote is small enough that the abstraction has to earn its place.
✓ Pros
- A hand built loop is small, readable, and fully under your control
- You understand exactly what every step costs and does
- No framework upgrades or hidden behavior to track
✕ Cons
- Many tools and complex planning get unwieldy by hand
- You reimplement tracing and evaluation others have solved
- A larger team may want one shared, conventional structure
The bottom line
An agent is a bounded loop around tool calls, with memory in the message list and guardrails around execution. You built every piece across this series: a clean Python base, Pydantic contracts, async, a production FastAPI app, a reliable model client, structured outputs, retrieval, and streaming. Put them together and you have a real, safe LLM application. That is the whole journey, from Python to production LLM apps.
? Frequently asked questions
Do I need an agent framework? +
Not to start. The loop here is small and clear. Reach for a framework only when you need many tools, multi step planning, or shared infrastructure, and you understand what it is doing for you.
How do I stop an agent from looping forever? +
Cap the steps, as shown, and stop when the model returns no tool call. Also set a token or time budget for the whole task.
That completes the series. Revisit Part 1 any time, or jump back to the part you need.
Comments
0No comments yet. Be the first to share your thoughts.