Building an AI Agent API (Series Part 12)

Part 12 is the capstone. An agent is a model that can call your tools in a loop: it decides an action, you run it, the result goes back, and it continues until the task is done. This part builds that loop on top of the function calling from Part 9, adds simple memory, and puts guardrails in place so the agent stays safe and bounded. Everything from the series comes together here.

★

What you will build

The agent loop: model decides, you execute, results return
Real tools the model can call, validated before they run
Lightweight conversation memory across turns
Guardrails: iteration limits, allowed tools, and safe execution

1. The agent loop

The loop is the whole idea. You send the conversation plus the tool definitions. If the model returns a tool_use block, you execute that tool, append the result as a tool_result, and call again. When the model stops asking for tools, you have your answer.

from anthropic import Anthropic

client = Anthropic()

TOOLS = [{
    "name": "get_order_status",
    "description": "Look up the status of an order by id.",
    "input_schema": {
        "type": "object",
        "properties": {"order_id": {"type": "string"}},
        "required": ["order_id"],
    },
}]

def run_tool(name: str, args: dict) -> str:
    if name == "get_order_status":
        return f"Order {args['order_id']} shipped on 2026-06-01."
    raise ValueError(f"unknown tool: {name}")

def agent(question: str, max_steps: int = 5) -> str:
    messages = [{"role": "user", "content": question}]
    for _ in range(max_steps):                      # guardrail: bounded loop
        resp = client.messages.create(
            model="claude-opus-4-8", max_tokens=1024, tools=TOOLS, messages=messages,
        )
        messages.append({"role": "assistant", "content": resp.content})
        if resp.stop_reason != "tool_use":
            return "".join(b.text for b in resp.content if b.type == "text")
        results = []
        for block in resp.content:
            if block.type == "tool_use":
                output = run_tool(block.name, block.input)
                results.append({
                    "type": "tool_result",
                    "tool_use_id": block.id,
                    "content": output,
                })
        messages.append({"role": "user", "content": results})
    return "Stopped: reached the step limit."

That loop is the entire agent. Notice how it reuses the request loop from Part 8 and the typed tool inputs from Part 9. Validate block.input against a Pydantic model before running the tool so a malformed call never reaches your code.

2. Memory across turns

The list of messages is the memory. To remember across separate requests, persist that list keyed by a conversation id and reload it on the next turn. For long conversations, summarize older turns so you do not send unbounded context and cost.

# minimal per-conversation memory
SESSIONS: dict[str, list] = {}

def history(conversation_id: str) -> list:
    return SESSIONS.setdefault(conversation_id, [])

# load before the loop, save the updated messages after it

3. Guardrails

An agent that can take actions needs limits. The loop above already caps steps. Add an allowlist so the model can only call tools you trust, validate every tool input, and require confirmation for anything destructive. Treat tool inputs as untrusted, exactly as you would a request body.

✓ Pros

Cap the number of loop steps
Allowlist tools and validate every input
Require confirmation for destructive actions
Log every tool call for auditing

✕ Cons

An unbounded loop can run up cost and time
Executing raw tool input is a code injection risk
A tool with side effects and no confirmation can do real damage

Info

Wire it into FastAPI

Wrap agent() behind a POST endpoint, inject the client as a dependency from Part 5, stream the final answer with Part 11, and test the loop with the overrides from Part 7. The whole series plugs in here.

Checkpoint

What is the single most important guardrail on a tool calling agent?

4. Validate tool input before you run it

The model fills a tool input to match your schema, but you should still validate it into a Pydantic model before acting, for the same reason you validate a request body. The schema guarantees shape, your model can enforce real rules, and the typed object is safer to pass into your code. This is the bridge between Part 9 and a tool that touches anything that matters.

from pydantic import BaseModel, ValidationError

class OrderLookup(BaseModel):
    order_id: str

def run_tool(name: str, raw_input: dict) -> str:
    if name == "get_order_status":
        try:
            args = OrderLookup.model_validate(raw_input)   # validate, do not trust
        except ValidationError:
            return "Invalid tool input."
        return f"Order {args.order_id} shipped on 2026-06-01."
    return f"Unknown tool: {name}"

5. Memory that does not grow forever

The message list is the agent memory, and left unchecked it grows every turn, which raises both cost and latency since the whole history is re-sent each call. The fix is to summarize. When the conversation passes a threshold, replace the oldest turns with a short summary the model writes, keeping recent turns verbatim. The agent keeps its context without paying to resend the entire past on every step.

def compact_history(client, messages: list, keep_recent: int = 6) -> list:
    if len(messages) <= keep_recent:
        return messages
    old, recent = messages[:-keep_recent], messages[-keep_recent:]
    summary = client.messages.create(
        model="claude-opus-4-8", max_tokens=300,
        messages=[{"role": "user",
                   "content": f"Summarize this conversation briefly:\n{old}"}],
    ).content[0].text
    return [{"role": "user", "content": f"Summary so far: {summary}"}, *recent]

6. Trace every step

An agent that takes actions in a loop is hard to debug if you cannot see what it did. Log each turn: which tool it called, the input, the result, and the token usage. With that trace you can answer the questions that come up in production, why did it call that tool, where did it loop, what did the run cost, without re-running anything. The same operational hygiene from the model client part, applied to a multi step process.

import logging
log = logging.getLogger("llm_app.agent")

# inside the loop, after each tool_use block:
# log.info(f"step={step} tool={block.name} input={block.input}")
# after each model call:
# log.info(f"step={step} stop={resp.stop_reason} out_tokens={resp.usage.output_tokens}")

7. Wire the agent into FastAPI

The agent becomes a feature when it lives behind an endpoint, and at that point every part of this series is in the room. The route validates the request with a model, pulls the injected model client, runs the bounded loop, and can stream the final answer. Authentication, rate limiting, logging, and tests all apply unchanged, because the agent is just another handler that happens to call the model several times.

from anthropic import Anthropic
from fastapi import APIRouter, Depends
from pydantic import BaseModel
from llm_app.deps import get_model_client

router = APIRouter(prefix="/agent", tags=["agent"])

class AgentRequest(BaseModel):
    question: str

@router.post("/ask")
async def ask(body: AgentRequest, client: Anthropic = Depends(get_model_client)) -> dict:
    answer = agent(body.question, client=client)   # the bounded loop from above
    return {"answer": answer}

From here the upgrades are incremental and optional: stream the answer with Part 11, persist memory keyed by a conversation id, add more tools behind the same allowlist, and require confirmation before any destructive action. The core never changes, it is still a bounded loop with validated tools and a visible trace.

8. Do you need an agent framework?

Frameworks promise to handle the loop, the memory, and the tool plumbing for you. The honest answer is that you do not need one to start, and building the loop yourself first is the best way to understand what any framework is actually doing on your behalf. Reach for a framework when you genuinely outgrow the simple version: many tools to manage, multi step planning, shared tracing and evaluation infrastructure, or a team that benefits from a common structure. Adopt it with eyes open, because the loop you just wrote is small enough that the abstraction has to earn its place.

✓ Pros

A hand built loop is small, readable, and fully under your control
You understand exactly what every step costs and does
No framework upgrades or hidden behavior to track

✕ Cons

Many tools and complex planning get unwieldy by hand
You reimplement tracing and evaluation others have solved
A larger team may want one shared, conventional structure

The bottom line

An agent is a bounded loop around tool calls, with memory in the message list and guardrails around execution. You built every piece across this series: a clean Python base, Pydantic contracts, async, a production FastAPI app, a reliable model client, structured outputs, retrieval, and streaming. Put them together and you have a real, safe LLM application. That is the whole journey, from Python to production LLM apps.

? Frequently asked questions

Do I need an agent framework? +

Not to start. The loop here is small and clear. Reach for a framework only when you need many tools, multi step planning, or shared infrastructure, and you understand what it is doing for you.

How do I stop an agent from looping forever? +

Cap the steps, as shown, and stop when the model returns no tool call. Also set a token or time budget for the whole task.

That completes the series. Revisit Part 1 any time, or jump back to the part you need.

Building an AI Agent API: Tool Calls, Memory, and Guardrails

What you will build

1. The agent loop

2. Memory across turns

3. Guardrails

✓ Pros

✕ Cons

Wire it into FastAPI

4. Validate tool input before you run it

5. Memory that does not grow forever

6. Trace every step

7. Wire the agent into FastAPI

8. Do you need an agent framework?

✓ Pros

✕ Cons

The bottom line

? Frequently asked questions

Bishrul Haq

Tags

Share

Comments

Related posts

Calling LLMs from Python: The Request Loop, Tokens, Cost, and Retries

Structured Outputs and Function Calling: Getting Reliable JSON from LLMs

Building a RAG Service with FastAPI: Chunking, Embeddings, and Vector Search

Streaming LLM Responses to the Browser with FastAPI and SSE