Cover image for MCP Tool Design Agents Get Right: Schemas, Errors, and Structured Output

At a glance

Reading time

~200 words/min

Published

8 hours ago

Jun 11, 2026

Views

5

All-time total

MCP Tool Design Agents Get Right: Schemas, Errors, and Structured Output

Two servers can expose the same database and behave like different products. With one, the agent picks the right tool, fills the arguments correctly, recovers from failures, and finishes the task. With the other, it loops, guesses, apologizes, and gives up. The difference is almost never the model; it is tool design. This part turns the naive tools from Part 2 into tools that agents reliably get right: descriptions written for a reader that decides in one pass, schemas that make invalid calls impossible, an error contract that teaches instead of punishes, and structured output that downstream code can consume without parsing prose.

What you will learn in Part 4

  • Writing tool names and descriptions for the model that has to choose
  • Constraining arguments with Pydantic fields, enums, and defaults
  • The two-layer error model: protocol errors vs isError tool results
  • Structured output: returning typed data, not just strings
  • Tool annotations that tell hosts what is safe to do without asking

1. Your API consumer reads, it does not browse

Every habit you have from designing REST APIs needs one adjustment here: the consumer of your tools reads the documentation in full, every request, and decides under time pressure. The model sees a flat list of names, descriptions, and schemas. It does not click through examples or remember yesterday's calls. So the design goal is choosability: from the description alone, can a reader tell exactly when this tool is the right one, and when it is not?

Three rules follow. Name tools verb_noun and keep one job per tool: search_notes, create_note, never notes_api with a mode argument. Make the first sentence of the description the decision rule, what it does and when to reach for it; put when not to use it explicitly if a sibling tool overlaps. And prefer a handful of task-shaped tools over a mirror of your internal endpoints; the model is completing a task, not exploring your architecture. If your service has twenty endpoints, your server probably wants five tools.

Tool schema designer

Rewrite a vague tool into a choosable one

JSON Schema (what the client sees)


FastMCP Python (what you write)


2. Schemas that make bad calls impossible

A description persuades; a schema enforces. FastMCP builds schemas from type hints, and the hints can carry much more than a bare type. Pydantic Field metadata adds per-argument descriptions and constraints, Literal turns a string into an enum the model cannot stray from, and defaults mark what is genuinely optional. Every constraint you encode is a class of failed calls that can no longer happen.

from typing import Annotated, Literal

from pydantic import Field

@mcp.tool()
def create_note(
    title: Annotated[str, Field(
        description="Unique title for the note",
        min_length=3, max_length=80,
    )],
    body: Annotated[str, Field(
        description="Note body, plain text or Markdown",
    )],
    visibility: Literal["private", "team"] = "team",
) -> str:
    """Create a new note. Use only for new titles; to change an
    existing note, use update_note instead."""
    notes = _load()
    if title in notes:
        return (f"A note titled {title!r} already exists. "
                f"Use update_note to change it, or pick a new title.")
    notes[title] = body
    _save(notes)
    return f"Created {visibility} note {title!r}."

The Literal is the quiet hero. Without it, the model will eventually send visibility values like public, shared, or Team, and your code grows defensive branches. With it, the schema advertises exactly two values and validation rejects everything else before your function runs. The same thinking applies to numeric ranges with ge and le, and to formats: if an argument is a date, say so in the field description and give the format you expect. The playground below shows the enforcement layer in miniature, a validator like the one the SDK runs in front of every call.

Python playground

3. The error contract: teach, do not punish

MCP has two distinct failure layers, and using the right one is half of error design. A protocol error, the JSON-RPC error object from Part 1, means the machinery failed: unknown tool, invalid arguments, server crash. A tool error is different: the call worked, but the task could not be done, and the result carries isError true with content explaining why. The model only sees and reasons about the second kind, which means the text you put there is not logging, it is course correction.

Write tool errors the way you would write a code review comment for a capable colleague: state what failed, why, and what to do instead. The duplicate-title message in create_note above is the pattern: it names the conflict and offers both exits. In FastMCP, raising an exception inside a tool produces an isError result automatically, so the practical rule is: raise for genuine bugs and broken invariants, return guidance-shaped text for expected failures the model should route around.

Checkpoint

An agent calls search_notes while the notes file is corrupted and unreadable. What should your tool do?

4. Structured output: results code can trust

Strings are fine when the consumer is only the model, but agents increasingly pipe tool results into further computation, and parsing prose is how that breaks. Since the 2025-06-18 revision, tools can declare an output schema and return structured content alongside the readable text. In FastMCP, you get this by returning a Pydantic model: the SDK derives the output schema, serializes your instance into structuredContent, and keeps a text rendering for hosts that only show text.

from pydantic import BaseModel

class NoteStats(BaseModel):
    title: str
    word_count: int
    headings: list[str]
    last_modified: str

@mcp.tool()
def note_stats(title: str) -> NoteStats:
    """Compute statistics for one note: word count, headings, and
    last modified time. Read-only."""
    notes = _load()
    if title not in notes:
        raise ValueError(f"No note titled {title!r}")
    body = notes[title]
    return NoteStats(
        title=title,
        word_count=len(body.split()),
        headings=[l.lstrip("# ") for l in body.splitlines()
                  if l.startswith("#")],
        last_modified=_mtime_iso(),
    )

Now a client that needs the word count reads structuredContent.word_count, validated against the declared schema, instead of regexing a sentence. Use structured output whenever the result has fields a program might care about, and keep returning plain strings for results that are inherently narrative. The two coexist happily in one server, and nothing about the model's experience gets worse: it still receives readable text.

5. Annotations: telling hosts what is safe

Hosts gate tool calls behind user approval, but they cannot read your code to know which calls are harmless. Tool annotations are standardized hints that carry exactly that knowledge: readOnlyHint marks tools with no side effects, destructiveHint warns that effects may be irreversible, idempotentHint says repeating a call changes nothing further, and openWorldHint distinguishes tools that touch the wider world from ones that stay inside your domain.

from mcp.types import ToolAnnotations

@mcp.tool(annotations=ToolAnnotations(
    readOnlyHint=True,
    openWorldHint=False,
))
def search_notes(query: str, limit: int = 5) -> str:
    """Search saved notes by keyword across titles and bodies."""
    ...

@mcp.tool(annotations=ToolAnnotations(
    destructiveHint=True,
    idempotentHint=False,
))
def delete_note(title: str) -> str:
    """Permanently delete one note by exact title. Cannot be undone."""
    ...

A host can use these to auto-approve read-only calls in a trusted session while always confirming destructive ones, which directly improves the agent's flow through your server. Two honesty rules: annotations are hints, not security, so Part 6 still applies in full; and a wrong annotation is worse than none, because a destructive tool marked read-only will eventually be auto-approved into disaster. Annotate what the code actually does.

Annotations on the notes server
Tool readOnly destructive idempotent Why
search_notes true n/a n/a Pure read, safe to auto-approve
note_stats true n/a n/a Computed read, no side effects
create_note false false false Adds data; repeat calls add more
delete_note false true false Irreversible; always confirm

6. Results that respect the context window

One last craft point: tool results land in the model's context, and context is the scarcest resource in the system. A search that returns five hundred full notes does not make the agent better informed, it makes it worse at everything that follows. Build limits into the schema with sane defaults, truncate long fields with an explicit marker and a way to get the rest, and when a result set is naturally large, return a page plus a cursor argument for the next call rather than everything at once. The grain to aim for: each result should carry just enough for the model to decide its next step, with a cheap path to drill in.

Checkpoint

Which change most improves an agent's success rate with a tool that currently takes a free-form "options" string argument?

! Common mistakes to avoid

  • Mirroring every API endpoint as a tool

    Design task-shaped tools. Five tools that match user intents beat twenty that match your router, and they cost a fraction of the context.

  • Descriptions that describe implementation, not the decision

    Lead with when to use it and when not to. The model is choosing between tools, not admiring your architecture.

  • Returning success-shaped text for failures

    Expected failures get guidance-shaped text; broken invariants get isError. Never let "no results" and "storage is broken" read the same.

  • Marking tools readOnlyHint=True optimistically

    Annotations feed auto-approval decisions. Audit what the function actually touches before you declare it safe.

  • Unbounded result sets

    Default limits, truncation markers, and cursors. The context window is the model's working memory; do not flood it.

The bottom line

Good MCP tools are choosable from one read, constrained so bad calls die at validation, honest about failure in text the model can act on, structured when programs consume the result, and annotated so hosts know what is safe. None of this needed new infrastructure, only the discipline to treat the model as the user it is. The server is now well designed but still local and wide open. Next we put it on the network with Streamable HTTP and mount it inside a FastAPI application, the bridge to running this in production.

? Frequently asked questions

How many tools is too many for one server? +

There is no hard limit, but every description rides along in context. Past roughly a dozen tools, look for task-shaped consolidation or split servers by domain so hosts can connect only what a workflow needs.

Should error text include stack traces? +

No. Traces leak paths and internals and give the model nothing actionable. Log the trace server-side to stderr, return the decision-relevant sentence to the caller.

Does structured output replace the text content? +

No, results carry both: structuredContent for programs and text for display and for hosts that predate the field. Returning a Pydantic model from FastMCP produces the pair automatically.

Can I change a tool's name or schema after people use it? +

Treat it like any public API: additive changes are cheap, renames and removals break configured hosts. The listChanged notification tells connected clients to refresh, but external configs and habits do not refresh themselves. Version deliberately; Part 8 covers release discipline.

Up next: Part 5, Streamable HTTP and mounting MCP in FastAPI.

Newsletter

Want more posts like this?

Get practical software notes and tutorials delivered when something new is published.

No spam. Unsubscribe anytime.

How did this land?

Comments

0
Log in or sign up to join the discussion and react to this post.

No comments yet. Be the first to share your thoughts.

Related posts

Important functionalities of Pandas in Python : Tricks and Features

Pandas is one of my favorite libraries in python. It’s very useful to visualize the data in a clean structural manner. Nowadays Pandas is widely used in Data Science, Machine Learning and other areas.

5 years ago

How to get data from twitter using Tweepy in Python?

To start working on Python you need to have Python installed on your PC. If you haven’t installed python. Go to the Python website and get it installed.

6 years ago

Predicting per capita income of the US using linear regression

Python enables us to predict and analyze any given data using Linear regression. Linear Regression is one of the basic machine learning or statistical techniques created to solve complex problems.

6 years ago

Essential Sorting Algorithms for Computer Science Students

Algorithms are commonly taught in Computer Science, Software Engineering subjects at your Bachelors or Masters. Some find it difficult to understand due to memorizing.

6 years ago

AI Agents in 2026: What Developers Actually Need to Know

A hype-free developer guide to AI agents in 2026 to the agent loop, tools, memory, MCP, evaluation, guardrails, and the cost and failure modes that separate a demo from production.

1 week ago