Cover image for From stdio to the Web: Streamable HTTP and Mounting MCP in FastAPI

At a glance

Reading time

~200 words/min

Published

8 hours ago

Jun 11, 2026

Views

5

All-time total

From stdio to the Web: Streamable HTTP and Mounting MCP in FastAPI

Everything so far runs on stdio: the host launches your process and owns it. That is perfect for a personal tool and useless for a team. The moment your notes server should hold shared data, serve five people, or run next to your existing backend, it needs to become a web service. This part moves the server to Streamable HTTP, the protocol's remote transport, and then mounts it inside a FastAPI application so MCP becomes one more route in an app you already operate. The tool code from Parts 2 through 4 does not change at all, which is the whole point of transports being separate from the protocol.

What you will learn in Part 5

  • When stdio stops being enough and what Streamable HTTP changes
  • How the transport works: one endpoint, POSTs, SSE streams, session ids
  • Switching FastMCP to streamable-http in one line
  • Mounting the MCP server inside an existing FastAPI app
  • Progress and logging through the Context object

1. Why leave stdio at all

stdio gives you process isolation, zero network configuration, and the host's lifecycle management for free. What it cannot give you is sharing. Each user gets a private child process with private state, on their machine, with their filesystem. The decision point is simple: if the data or the compute belongs with the user, stay on stdio; if it belongs with the service, go remote. Our notes server has clearly outgrown notes.json on one laptop, so it is a service now.

Choosing a transport
  stdio Streamable HTTP
Where it runs Child process on the user's machine A web service you operate
State Per user, local Shared, centralized
Auth Inherits the user's environment Yours to enforce, Part 6
Ops None; the host owns the process Deploys, monitoring, Part 8
Best for Personal and developer tools Team and product integrations

A note on history you will hit in older posts: the first remote transport, called HTTP plus SSE, used two endpoints and was replaced in the 2025-03-26 revision by Streamable HTTP. If a tutorial shows a separate /sse endpoint, it is describing the deprecated design. Modern servers expose a single MCP endpoint, and that is what we build.

2. How Streamable HTTP actually works

The transport is plain HTTP arranged carefully. Every JSON-RPC message from the client arrives as a POST to one endpoint, conventionally /mcp. The server answers a request either with a single JSON body or, when it wants to stream things before the final result, with a server-sent events response that carries several messages and then closes. Sessions tie it together: the initialize response includes an Mcp-Session-Id header, and the client echoes that header on everything after. There is also an optional GET on the same endpoint that opens a long-lived stream for server-initiated notifications. Step through a full exchange below.

Protocol walkthrough

One session over Streamable HTTP

Client → Server POST /mcp, initialize

Same handshake as Part 1, now as an HTTP request. The Accept header offers both JSON and SSE so the server may choose per response.

POST /mcp HTTP/1.1
Accept: application/json, text/event-stream
Content-Type: application/json

{ "jsonrpc": "2.0", "id": 1, "method": "initialize",
  "params": { "protocolVersion": "2025-06-18", ... } }

Checkpoint

In Streamable HTTP, how does the server tie a tools/call to the right session?

3. Flipping the transport in code

FastMCP makes the standalone version a one-line change. The decorated tools, resources, and prompts are untouched; only the run call and a couple of settings move.

# server.py, bottom of the file
if __name__ == "__main__":
    mcp.settings.host = "127.0.0.1"
    mcp.settings.port = 8000
    mcp.run(transport="streamable-http")
    # The MCP endpoint is now http://127.0.0.1:8000/mcp

Run it with uv run python server.py and point the Inspector at the URL instead of a command: choose the Streamable HTTP transport type and enter http://127.0.0.1:8000/mcp. Claude Code connects the same way with claude mcp add --transport http notes http://127.0.0.1:8000/mcp. There is also a stateless mode, stateless_http=True, where the server keeps no session memory between requests; it sacrifices server-initiated notifications but lets you scale behind a load balancer without sticky sessions, a trade we revisit in Part 8.

It is worth being explicit about what changed operationally the moment you ran that command, because the code diff hides it. On stdio, the host supervised your process: it started it, restarted it, and tore it down with the conversation. Now nothing supervises you but you. The process must already be running when a client connects, it keeps running between sessions, and when it crashes nobody relaunches it until Part 8 puts a process manager and health checks around it. The transport flip is one line; the responsibility flip is the real migration.

4. Reading SSE by hand once

Server-sent events is a deliberately simple format: a text stream where each event is a few lines, data lines carry the payload, and a blank line ends the event. You will meet it raw the first time something misbehaves behind a proxy, so it is worth parsing once yourself. The playground below implements the format from scratch on a captured stream like the one in the walkthrough.

The format also has a resilience feature MCP puts to good use: events can carry an id line, and a client that loses its connection may reconnect with a Last-Event-ID header asking the server to resume from where the stream broke. Streamable HTTP builds its resumability story on exactly this, so a wobbly network does not have to mean a lost tool result. You get it from the SDK without writing anything, but knowing the mechanism exists tells you which questions to ask when a long call seems to vanish: did the stream drop, did the client resume, and did a proxy strip the headers that make resumption possible.

Python playground

5. Mounting MCP inside a FastAPI application

Running a bare MCP process works, but most teams already operate a backend with auth, logging, deploys, and a health check, and the right place for an MCP endpoint is inside it. FastMCP exposes the transport as an ASGI application, so FastAPI can mount it like any sub-application. The one non-obvious requirement is the lifespan: the transport's session manager must be started, and with a mount you own the app's lifespan, so you wire it yourself.

"""app.py: an existing FastAPI app gaining an MCP endpoint."""
import contextlib

from fastapi import FastAPI

from server import mcp  # the FastMCP instance with all our tools


@contextlib.asynccontextmanager
async def lifespan(app: FastAPI):
    # The session manager owns transport state; it must run
    # for the mounted MCP app to accept connections.
    async with mcp.session_manager.run():
        yield


app = FastAPI(title="notes-backend", lifespan=lifespan)

# The MCP endpoint lives at /mcp/mcp (mount prefix + endpoint path).
app.mount("/mcp", mcp.streamable_http_app())


@app.get("/health")
def health() -> dict:
    return {"status": "ok"}

Start it with uv run uvicorn app:app and you have one process serving your normal routes and the MCP endpoint side by side. They share middleware, logging, settings, and deployment. If you followed our FastAPI series, this is the same composition story from the production FastAPI part, applied to a protocol endpoint; the MCP sub-app is just another ASGI citizen. Watch the mounted path carefully: a mount at /mcp plus the app's own endpoint path yields /mcp/mcp, a detail that has eaten many first connection attempts.

Warning

Origin checks and CORS

A remote MCP endpoint is a web endpoint. Validate the Origin header to keep malicious web pages from driving local or intranet servers, bind development servers to 127.0.0.1 rather than 0.0.0.0, and configure CORS to expose the Mcp-Session-Id header if browser-based clients must reach the server.

6. Progress and logging with Context

Long-running tools should narrate. The SDK injects a Context object into any tool that declares a parameter with that type, and through it you reach the session: report_progress feeds progress bars, info and warning send structured log notifications, and read_resource lets a tool reuse your own resources. On Streamable HTTP these arrive as the SSE notifications you stepped through above; on stdio they flow over the same pipe. Transport-agnostic, again.

from mcp.server.fastmcp import Context

@mcp.tool()
async def reindex_notes(ctx: Context) -> str:
    """Rebuild the search index over every saved note."""
    notes = _load()
    total = len(notes)
    for done, title in enumerate(sorted(notes), start=1):
        _index_one(title, notes[title])
        await ctx.report_progress(done, total)
        await ctx.info(f"indexed {title}")
    return f"Reindexed {total} notes."

Checkpoint

Your mounted MCP endpoint returns 404 from the Inspector, but /health works. What do you check first?

! Common mistakes to avoid

  • Following pre-2025 tutorials onto the deprecated HTTP+SSE transport

    Modern remote MCP is Streamable HTTP: one endpoint, POSTs in, JSON or SSE out. If you see a dedicated /sse endpoint, the material is outdated.

  • Forgetting the session manager lifespan when mounting

    Wrap mcp.session_manager.run() in the FastAPI lifespan. Without it the mounted app is a silent brick.

  • Binding a development server to 0.0.0.0

    Bind 127.0.0.1 until Part 6 adds auth. An open MCP port is an open tool executor.

  • Treating the transport as the security boundary

    Streamable HTTP authenticates nobody by itself. Until OAuth lands in the next part, anyone who can reach the port can call your tools.

The bottom line

Transports are pluggable, and you just proved it: the same tools, resources, and prompts now serve over a single HTTP endpoint with sessions in headers and streams when streaming helps, standalone or mounted inside FastAPI next to the rest of your backend. The server is finally a service. It is also, right now, a service with no authentication, which is not a detail but a blocker. The next part fixes it properly: OAuth 2.1, scopes, and the agent-specific attacks, prompt injection and tool poisoning, that security reviews of MCP servers actually find.

? Frequently asked questions

Do I have to change tool code when changing transports? +

No. Tools, resources, and prompts are transport-blind; only the run call or the mounting changes. That separation is what made this part short on new concepts.

When should I use stateless mode? +

When you need horizontal scale behind a plain load balancer and can live without server-initiated notifications and subscriptions. Most single-instance deployments should stay stateful; it is simpler and fully featured.

Can the same server still run on stdio for local development? +

Yes, keep mcp.run() behind a flag or a separate entry point. Local stdio plus remote Streamable HTTP from one codebase is a perfectly normal arrangement.

Does Streamable HTTP work through proxies and load balancers? +

Yes, with care: SSE needs response buffering disabled, timeouts long enough for streams, and sticky sessions unless you run stateless. Part 8 covers the deployment specifics.

Up next: Part 6, securing MCP servers.

Newsletter

Want more posts like this?

Get practical software notes and tutorials delivered when something new is published.

No spam. Unsubscribe anytime.

How did this land?

Comments

0
Log in or sign up to join the discussion and react to this post.

No comments yet. Be the first to share your thoughts.

Related posts

Important functionalities of Pandas in Python : Tricks and Features

Pandas is one of my favorite libraries in python. It’s very useful to visualize the data in a clean structural manner. Nowadays Pandas is widely used in Data Science, Machine Learning and other areas.

5 years ago

How to get data from twitter using Tweepy in Python?

To start working on Python you need to have Python installed on your PC. If you haven’t installed python. Go to the Python website and get it installed.

6 years ago

Predicting per capita income of the US using linear regression

Python enables us to predict and analyze any given data using Linear regression. Linear Regression is one of the basic machine learning or statistical techniques created to solve complex problems.

6 years ago

Essential Sorting Algorithms for Computer Science Students

Algorithms are commonly taught in Computer Science, Software Engineering subjects at your Bachelors or Masters. Some find it difficult to understand due to memorizing.

6 years ago

AI Agents in 2026: What Developers Actually Need to Know

A hype-free developer guide to AI agents in 2026 to the agent loop, tools, memory, MCP, evaluation, guardrails, and the cost and failure modes that separate a demo from production.

1 week ago