Cover image for Testing and Debugging MCP Servers: Inspector, pytest, and In-Memory Clients

At a glance

Reading time

~200 words/min

Published

9 hours ago

Jun 11, 2026

Views

5

All-time total

Testing and Debugging MCP Servers: Inspector, pytest, and In-Memory Clients

MCP servers have a strange testing reputation. Manually, they feel untestable: the real consumer is a model inside a host you do not control, and the failure you care about happens three layers away from your code. Mechanically, they are among the easiest services to test well, because the protocol gives you a complete, scriptable contract: every interaction is a method call with a typed result, and the SDK will happily run a real client against your real server inside a single pytest process, no subprocess, no network, no mocks. This part builds that harness for the notes server, makes the Inspector a precise debugging instrument instead of a toy, and ends with the playbook for the failures every MCP developer eventually meets.

What you will learn in Part 7

  • A testing pyramid that makes sense for MCP servers
  • The Inspector as a debugging client, including CLI mode for scripts
  • pytest suites running a real client in-memory against your server
  • What to assert: behavior, error contract, and schema stability
  • The debugging playbook for hangs, 400s, and works-on-my-machine

1. A testing pyramid that fits the protocol

Map the layers before writing anything. At the base, your domain logic, the load, save, and search helpers, gets ordinary unit tests with no MCP in sight; if you keep tools thin, this layer carries most of the logic. Above that sits the layer this part is really about: protocol tests, where a genuine MCP client session runs the Part 1 lifecycle against your server and calls tools, reads resources, and renders prompts through the same code paths a host uses. Above that, the Inspector for interactive exploration, and at the very top, occasional end-to-end checks inside a real host. The pyramid matters because the top is slow and flaky and the middle is neither: in-memory protocol tests run in milliseconds and catch almost everything the top would.

This is the same philosophy our FastAPI series argued in the testing part: test the promise, not the implementation. An MCP server's promise is precisely enumerable, which is a luxury. It promises a set of tools with stable schemas, sensible results for good calls, and instructive errors for bad ones.

2. The Inspector, used like an instrument

You met the Inspector in Part 2 as a quick poke-around tool. Used deliberately, it is a protocol-accurate client with a glass side: the Tools, Resources, and Prompts tabs exercise discovery and execution, the Notifications pane shows logging and progress arriving live, and the history pane shows every raw JSON-RPC message in both directions, which is exactly the view you need when a host misbehaves and you want to know which side is lying. Connect it to a local stdio command or a remote Streamable HTTP URL; it speaks both, with auth headers when Part 6's gate is up.

# Interactive: server plus Inspector UI
uv run mcp dev server.py

# Headless, for scripts and CI smoke checks: CLI mode
npx @modelcontextprotocol/inspector --cli uv run python server.py \
    --method tools/list

npx @modelcontextprotocol/inspector --cli uv run python server.py \
    --method tools/call --tool-name search_notes \
    --tool-arg query=deploy

CLI mode deserves more fame than it has: it turns any Inspector interaction into a shell command that prints JSON, which makes it the cheapest smoke test in your deploy pipeline and a precise way to share a reproduction in a bug report. But do not let the Inspector become your only safety net; it verifies what you try, while a test suite verifies what you promised, every time, including the cases you stopped thinking about.

3. pytest with an in-memory client

Here is the trick that makes MCP testing pleasant: the SDK can wire a client session and your server object together through in-memory streams, running the full protocol, handshake and all, inside one async test. No subprocess to spawn, no port to pick, no flakiness to apologize for. The session object below is the same ClientSession class a real host embeds.

"""tests/test_server.py: protocol tests for the notes server."""
import pytest

from mcp.shared.memory import (
    create_connected_server_and_client_session as client_session,
)

import server  # our FastMCP module


@pytest.fixture(autouse=True)
def isolated_store(tmp_path, monkeypatch):
    """Each test gets a fresh notes.json in a temp directory."""
    monkeypatch.setattr(server, "STORE", tmp_path / "notes.json")


@pytest.mark.anyio
async def test_add_then_search_roundtrip():
    async with client_session(server.mcp._mcp_server) as session:
        await session.call_tool(
            "add_note",
            {"title": "standup", "body": "Demo the server on Thursday."},
        )
        result = await session.call_tool(
            "search_notes", {"query": "thursday"}
        )
        assert result.isError is False
        assert "standup" in result.content[0].text


@pytest.mark.anyio
async def test_duplicate_title_returns_guidance_not_error():
    async with client_session(server.mcp._mcp_server) as session:
        await session.call_tool("add_note", {"title": "a", "body": "x"})
        result = await session.call_tool(
            "add_note", {"title": "a", "body": "y"}
        )
        # Expected failure: helpful text, not a protocol explosion
        assert result.isError is False
        assert "already exists" in result.content[0].text


@pytest.mark.anyio
async def test_note_resource_reads_back():
    async with client_session(server.mcp._mcp_server) as session:
        await session.call_tool(
            "add_note", {"title": "deploy", "body": "checklist v2"}
        )
        contents = await session.read_resource("notes://deploy")
        assert "checklist v2" in contents.contents[0].text

Three tests, three different promises pinned down: the happy path, the error contract from Part 4, and a resource read. The fixture is doing quiet, important work: state isolation per test, using the same monkeypatching discipline as any Python suite. Run it all with uv run pytest. If you later refactor storage from JSON to SQLite, every one of these tests should keep passing untouched, which is exactly the refactoring insurance protocol tests exist to provide.

4. Pin the schema, not just the behavior

One more category earns its place in CI: contract stability. Your tool schemas are a public interface; Part 4 taught that renames and type changes break configured hosts. A snapshot-style test makes such breaks loud at review time instead of quiet at deploy time. The playground below runs a miniature version of the idea, a fake tools/list against a stored snapshot, so you can feel the failure mode; in your real suite, the session's list_tools result plays the live role.

Python playground

Checkpoint

Why prefer the in-memory client session over launching the server as a subprocess in tests?

5. The debugging playbook

Every MCP developer accumulates the same scar tissue; here is the map so you can skip the scars. The most common failure is also the most confusing: the session that hangs or dies instantly on stdio. The cause, nine times out of ten, is the rule from Part 1, something wrote to stdout. A stray print, a library's progress bar, a warning routed to the wrong stream, and the client's JSON parser meets a sentence. Step through the autopsy below; once you have seen it, you will recognize it forever.

Protocol walkthrough

Autopsy of a corrupted stdio session

Client → Server tools/call request

The host sends a normal call over stdin. Everything is healthy so far.

{ "jsonrpc": "2.0", "id": 4, "method": "tools/call",
  "params": { "name": "search_notes",
               "arguments": { "query": "deploy" } } }

The rest of the playbook fits in a table. Keep it next to your terminal for the first weeks; after that it lives in your fingers.

Symptoms and causes
Symptom Likely cause First move
Works in terminal, failed in host Relative path or PATH in host config Absolute paths everywhere, Part 2
Hangs or instant disconnect on stdio stdout pollution Audit prints and logging streams
400 Bad Request on Streamable HTTP Missing Mcp-Session-Id after initialize Replay the Part 5 walkthrough with curl
404 on the mounted endpoint Mount prefix plus endpoint path Check for /mcp/mcp, Part 5
401 from a remote server Token missing, expired, or wrong audience Inspect with the Part 6 verifier logs
Tools missing after an SDK upgrade Schema generation changed Run the contract snapshot test, diff tools/list

Checkpoint

A teammate reports your stdio server "randomly hangs" in Claude Desktop but the Inspector works fine for them. What do you suspect first?

! Common mistakes to avoid

  • Testing only through the Inspector by hand

    The Inspector verifies what you try today; pytest verifies what you promised forever. Both, in their places.

  • Spawning subprocess servers in every test

    Use the in-memory session for the suite and keep one subprocess smoke test for the launch command itself.

  • Sharing one notes.json across tests

    Isolate state per test with tmp_path and monkeypatch, or your suite's order becomes a hidden dependency.

  • No test for the error contract

    Bad input deserves assertions as much as good input. The duplicate-title test above is two lines and pins real behavior agents depend on.

  • Debugging remote transport issues from inside the host

    Drop a layer: curl or Inspector CLI against the endpoint shows you raw status codes and headers without the host's interpretation.

The bottom line

Test the layers separately and the whole stays honest: unit tests for logic, in-memory protocol tests for the contract, a schema snapshot for stability, the Inspector for eyes-on debugging, and a one-line CLI smoke check in CI. None of it is exotic; the SDK's in-memory session makes protocol testing cheaper than most REST testing you have done. The server now works and provably keeps working. One step remains: getting it out of your terminal and into the world, with a container, a real deployment, and a registry listing so hosts can find it.

? Frequently asked questions

Do I need pytest-asyncio or anyio configuration for these tests? +

The SDK's test helpers run on anyio. Add anyio_backend="asyncio" via the standard fixture or pyproject configuration once, and mark tests with pytest.mark.anyio as shown.

Is reaching into mcp._mcp_server acceptable? +

It is the pragmatic seam at the time of writing: the in-memory helper wants the low-level server object that FastMCP wraps. Hide it behind one fixture so a future SDK that exposes a public seam costs you one line.

How do I test auth from Part 6? +

Unit-test the verifier directly with crafted introspection responses, and run protocol tests with a fake TokenVerifier injected so the suite never needs a live identity provider. One integration test against a real authorization server in staging covers the wiring.

Should tests cover prompts too? +

Yes, cheaply: render each prompt with sample arguments via the session and assert the assembled text contains its key structural markers. Prompts are templates, and templates rot quietly without a test.

Up next: Part 8, shipping your MCP server.

Newsletter

Want more posts like this?

Get practical software notes and tutorials delivered when something new is published.

No spam. Unsubscribe anytime.

How did this land?

Comments

0
Log in or sign up to join the discussion and react to this post.

No comments yet. Be the first to share your thoughts.

Related posts

Important functionalities of Pandas in Python : Tricks and Features

Pandas is one of my favorite libraries in python. It’s very useful to visualize the data in a clean structural manner. Nowadays Pandas is widely used in Data Science, Machine Learning and other areas.

5 years ago

How to get data from twitter using Tweepy in Python?

To start working on Python you need to have Python installed on your PC. If you haven’t installed python. Go to the Python website and get it installed.

6 years ago

Predicting per capita income of the US using linear regression

Python enables us to predict and analyze any given data using Linear regression. Linear Regression is one of the basic machine learning or statistical techniques created to solve complex problems.

6 years ago

Essential Sorting Algorithms for Computer Science Students

Algorithms are commonly taught in Computer Science, Software Engineering subjects at your Bachelors or Masters. Some find it difficult to understand due to memorizing.

6 years ago

AI Agents in 2026: What Developers Actually Need to Know

A hype-free developer guide to AI agents in 2026 to the agent loop, tools, memory, MCP, evaluation, guardrails, and the cost and failure modes that separate a demo from production.

1 week ago