MCP Streamable HTTP and FastAPI Mounting (Series Part 5)

Everything so far runs on stdio: the host launches your process and owns it. That is perfect for a personal tool and useless for a team. The moment your notes server should hold shared data, serve five people, or run next to your existing backend, it needs to become a web service. This part moves the server to Streamable HTTP, the protocol's remote transport, and then mounts it inside a FastAPI application so MCP becomes one more route in an app you already operate. The tool code from Parts 2 through 4 does not change at all, which is the whole point of transports being separate from the protocol.

★

What you will learn in Part 5

When stdio stops being enough and what Streamable HTTP changes
How the transport works: one endpoint, POSTs, SSE streams, session ids
Switching FastMCP to streamable-http in one line
Mounting the MCP server inside an existing FastAPI app
Progress and logging through the Context object

1. Why leave stdio at all

stdio gives you process isolation, zero network configuration, and the host's lifecycle management for free. What it cannot give you is sharing. Each user gets a private child process with private state, on their machine, with their filesystem. The decision point is simple: if the data or the compute belongs with the user, stay on stdio; if it belongs with the service, go remote. Our notes server has clearly outgrown notes.json on one laptop, so it is a service now.

Choosing a transport
	stdio	Streamable HTTP
Where it runs	Child process on the user's machine	A web service you operate
State	Per user, local	Shared, centralized
Auth	Inherits the user's environment	Yours to enforce, Part 6
Ops	None; the host owns the process	Deploys, monitoring, Part 8
Best for	Personal and developer tools	Team and product integrations

A note on history you will hit in older posts: the first remote transport, called HTTP plus SSE, used two endpoints and was replaced in the 2025-03-26 revision by Streamable HTTP. If a tutorial shows a separate /sse endpoint, it is describing the deprecated design. Modern servers expose a single MCP endpoint, and that is what we build.

2. How Streamable HTTP actually works

The transport is plain HTTP arranged carefully. Every JSON-RPC message from the client arrives as a POST to one endpoint, conventionally /mcp. The server answers a request either with a single JSON body or, when it wants to stream things before the final result, with a server-sent events response that carries several messages and then closes. Sessions tie it together: the initialize response includes an Mcp-Session-Id header, and the client echoes that header on everything after. There is also an optional GET on the same endpoint that opens a long-lived stream for server-initiated notifications. Step through a full exchange below.

Protocol walkthrough

One session over Streamable HTTP

Client → Server POST /mcp, initialize

Same handshake as Part 1, now as an HTTP request. The Accept header offers both JSON and SSE so the server may choose per response.

POST /mcp HTTP/1.1
Accept: application/json, text/event-stream
Content-Type: application/json

{ "jsonrpc": "2.0", "id": 1, "method": "initialize",
  "params": { "protocolVersion": "2025-06-18", ... } }

Server → Client 200 OK with session id

The response carries the session id in a header. The client must echo Mcp-Session-Id on every subsequent request.

HTTP/1.1 200 OK
Content-Type: application/json
Mcp-Session-Id: 3f7c1e2a90d44b1b

{ "jsonrpc": "2.0", "id": 1,
  "result": { "capabilities": { "tools": {} }, ... } }

Client → Server POST /mcp, initialized

Notifications get a bare acknowledgment, 202 Accepted, with no body, because notifications never have replies.

POST /mcp HTTP/1.1
Mcp-Session-Id: 3f7c1e2a90d44b1b

{ "jsonrpc": "2.0", "method": "notifications/initialized" }

-> HTTP/1.1 202 Accepted

Client → Server POST /mcp, tools/call

A tool call that will take a while: reindexing every note. The server decides how to answer.

POST /mcp HTTP/1.1
Mcp-Session-Id: 3f7c1e2a90d44b1b

{ "jsonrpc": "2.0", "id": 2, "method": "tools/call",
  "params": { "name": "reindex_notes", "arguments": {} } }

Server → Client SSE stream: progress, then result

Because the tool reports progress, the server upgrades this response to an event stream: notifications first, the final response last, then the stream closes.

HTTP/1.1 200 OK
Content-Type: text/event-stream

event: message
data: { "jsonrpc": "2.0",
        "method": "notifications/progress",
        "params": { "progress": 40, "total": 120 } }

event: message
data: { "jsonrpc": "2.0", "id": 2,
        "result": { "content": [ { "type": "text",
          "text": "Reindexed 120 notes." } ] } }

Client → Server GET /mcp, listening channel

Optionally, the client opens a standing stream so the server can push notifications like tools/list_changed between requests.

GET /mcp HTTP/1.1
Accept: text/event-stream
Mcp-Session-Id: 3f7c1e2a90d44b1b

-> an SSE stream that stays open

Checkpoint

In Streamable HTTP, how does the server tie a tools/call to the right session?

3. Flipping the transport in code

FastMCP makes the standalone version a one-line change. The decorated tools, resources, and prompts are untouched; only the run call and a couple of settings move.

# server.py, bottom of the file
if __name__ == "__main__":
    mcp.settings.host = "127.0.0.1"
    mcp.settings.port = 8000
    mcp.run(transport="streamable-http")
    # The MCP endpoint is now http://127.0.0.1:8000/mcp

Run it with uv run python server.py and point the Inspector at the URL instead of a command: choose the Streamable HTTP transport type and enter http://127.0.0.1:8000/mcp. Claude Code connects the same way with claude mcp add --transport http notes http://127.0.0.1:8000/mcp. There is also a stateless mode, stateless_http=True, where the server keeps no session memory between requests; it sacrifices server-initiated notifications but lets you scale behind a load balancer without sticky sessions, a trade we revisit in Part 8.

It is worth being explicit about what changed operationally the moment you ran that command, because the code diff hides it. On stdio, the host supervised your process: it started it, restarted it, and tore it down with the conversation. Now nothing supervises you but you. The process must already be running when a client connects, it keeps running between sessions, and when it crashes nobody relaunches it until Part 8 puts a process manager and health checks around it. The transport flip is one line; the responsibility flip is the real migration.

4. Reading SSE by hand once

Server-sent events is a deliberately simple format: a text stream where each event is a few lines, data lines carry the payload, and a blank line ends the event. You will meet it raw the first time something misbehaves behind a proxy, so it is worth parsing once yourself. The playground below implements the format from scratch on a captured stream like the one in the walkthrough.

The format also has a resilience feature MCP puts to good use: events can carry an id line, and a client that loses its connection may reconnect with a Last-Event-ID header asking the server to resume from where the stream broke. Streamable HTTP builds its resumability story on exactly this, so a wobbly network does not have to mean a lost tool result. You get it from the SDK without writing anything, but knowing the mechanism exists tells you which questions to ask when a long call seems to vanish: did the stream drop, did the client resume, and did a proxy strip the headers that make resumption possible.

Python playground

5. Mounting MCP inside a FastAPI application

Running a bare MCP process works, but most teams already operate a backend with auth, logging, deploys, and a health check, and the right place for an MCP endpoint is inside it. FastMCP exposes the transport as an ASGI application, so FastAPI can mount it like any sub-application. The one non-obvious requirement is the lifespan: the transport's session manager must be started, and with a mount you own the app's lifespan, so you wire it yourself.

"""app.py: an existing FastAPI app gaining an MCP endpoint."""
import contextlib

from fastapi import FastAPI

from server import mcp  # the FastMCP instance with all our tools


@contextlib.asynccontextmanager
async def lifespan(app: FastAPI):
    # The session manager owns transport state; it must run
    # for the mounted MCP app to accept connections.
    async with mcp.session_manager.run():
        yield


app = FastAPI(title="notes-backend", lifespan=lifespan)

# The MCP endpoint lives at /mcp/mcp (mount prefix + endpoint path).
app.mount("/mcp", mcp.streamable_http_app())


@app.get("/health")
def health() -> dict:
    return {"status": "ok"}

Start it with uv run uvicorn app:app and you have one process serving your normal routes and the MCP endpoint side by side. They share middleware, logging, settings, and deployment. If you followed our FastAPI series, this is the same composition story from the production FastAPI part, applied to a protocol endpoint; the MCP sub-app is just another ASGI citizen. Watch the mounted path carefully: a mount at /mcp plus the app's own endpoint path yields /mcp/mcp, a detail that has eaten many first connection attempts.

⚠

Warning

Origin checks and CORS

A remote MCP endpoint is a web endpoint. Validate the Origin header to keep malicious web pages from driving local or intranet servers, bind development servers to 127.0.0.1 rather than 0.0.0.0, and configure CORS to expose the Mcp-Session-Id header if browser-based clients must reach the server.

6. Progress and logging with Context

Long-running tools should narrate. The SDK injects a Context object into any tool that declares a parameter with that type, and through it you reach the session: report_progress feeds progress bars, info and warning send structured log notifications, and read_resource lets a tool reuse your own resources. On Streamable HTTP these arrive as the SSE notifications you stepped through above; on stdio they flow over the same pipe. Transport-agnostic, again.

from mcp.server.fastmcp import Context

@mcp.tool()
async def reindex_notes(ctx: Context) -> str:
    """Rebuild the search index over every saved note."""
    notes = _load()
    total = len(notes)
    for done, title in enumerate(sorted(notes), start=1):
        _index_one(title, notes[title])
        await ctx.report_progress(done, total)
        await ctx.info(f"indexed {title}")
    return f"Reindexed {total} notes."

Checkpoint

Your mounted MCP endpoint returns 404 from the Inspector, but /health works. What do you check first?

! Common mistakes to avoid

✕Following pre-2025 tutorials onto the deprecated HTTP+SSE transport

✓Modern remote MCP is Streamable HTTP: one endpoint, POSTs in, JSON or SSE out. If you see a dedicated /sse endpoint, the material is outdated.
✕Forgetting the session manager lifespan when mounting

✓Wrap mcp.session_manager.run() in the FastAPI lifespan. Without it the mounted app is a silent brick.
✕Binding a development server to 0.0.0.0

✓Bind 127.0.0.1 until Part 6 adds auth. An open MCP port is an open tool executor.
✕Treating the transport as the security boundary

✓Streamable HTTP authenticates nobody by itself. Until OAuth lands in the next part, anyone who can reach the port can call your tools.

The bottom line

Transports are pluggable, and you just proved it: the same tools, resources, and prompts now serve over a single HTTP endpoint with sessions in headers and streams when streaming helps, standalone or mounted inside FastAPI next to the rest of your backend. The server is finally a service. It is also, right now, a service with no authentication, which is not a detail but a blocker. The next part fixes it properly: OAuth 2.1, scopes, and the agent-specific attacks, prompt injection and tool poisoning, that security reviews of MCP servers actually find.

? Frequently asked questions

Do I have to change tool code when changing transports? +

No. Tools, resources, and prompts are transport-blind; only the run call or the mounting changes. That separation is what made this part short on new concepts.

When should I use stateless mode? +

When you need horizontal scale behind a plain load balancer and can live without server-initiated notifications and subscriptions. Most single-instance deployments should stay stateful; it is simpler and fully featured.

Can the same server still run on stdio for local development? +

Yes, keep mcp.run() behind a flag or a separate entry point. Local stdio plus remote Streamable HTTP from one codebase is a perfectly normal arrangement.

Does Streamable HTTP work through proxies and load balancers? +

Yes, with care: SSE needs response buffering disabled, timeouts long enough for streams, and sticky sessions unless you run stateless. Part 8 covers the deployment specifics.

Up next: Part 6, securing MCP servers.