Streaming AI Chat in Laravel with Reverb and SSE

An LLM that takes 12 seconds to respond feels broken. The same response, streamed token-by-token, feels instant. Streaming is no longer a nice-to-have it is the baseline UX for any AI feature that talks to users. The good news is that Laravel 12 ships every primitive you need: HTTP streaming responses, queued jobs, generator-friendly controllers, and Laravel Reverb for first-class WebSockets.

This guide builds a production-grade streaming chat: SSE for the simple case, Reverb for multi-tab and multi-device sync.We will use Claude as the model, but the pattern works for any provider that supports server-sent token streams OpenAI, Mistral, Gemini, even self-hosted Ollama. The core insight: streaming is not a feature you bolt on at the end. It changes how you structure the controller, the queue, and the frontend reducer all at once.

Why streaming matters

Three reasons, in order of importance:

Perceived latency drops to near zero. Users start reading at 300ms instead of waiting 8 seconds for a full reply. Total wall-clock time barely changes; the experience changes completely.
Abandonment goes down. Without streaming, users hit refresh, type again, or leave. With streaming, they wait because they can see progress.
You catch problems earlier. If the model starts answering the wrong question, the user can stop the stream at token 50 instead of waiting for token 800.

Three transports : when to pick which

Transport	Direction	Best for	Avoid when
Polling	Pull	Background jobs, status updates	You need sub second latency
SSE	Server → client	Single-user AI chat, live logs	Bidirectional or multi-consumer fan-out
WebSockets (Reverb)	Bidirectional	Shared conversations, collaborative editing	A single tab is the only consumer

Two streaming modes, one backend

SSE (Server-Sent Events) : perfect when one browser tab is the consumer. Simple, no auth handshake, works with vanilla fetch, automatic reconnection in EventSource (we will use fetch instead because EventSource is GET-only).
Reverb / WebSockets : required when the same conversation needs to stream to multiple devices, dashboards, or shared inboxes. Also the right call when the agent itself runs in a queue worker (the worker cannot hold an HTTP response open).

Step 1 : Streaming SSE endpoint

Laravel's stream() response is the cleanest way to push tokens. It flushes the buffer per chunk, so the browser sees text as it arrives.

// routes/web.php
Route::post('/chat/stream', StreamChatController::class);

// app/Http/Controllers/StreamChatController.php
namespace App\Http\Controllers;

use App\Services\Ai\ClaudeStreamer;
use Illuminate\Http\Request;
use Symfony\Component\HttpFoundation\StreamedResponse;

class StreamChatController extends Controller
{
    public function __invoke(Request $request, ClaudeStreamer $streamer): StreamedResponse
    {
        $data = $request->validate([
            'messages'           => ['required', 'array', 'max:50'],
            'messages.*.role'    => ['required', 'in:user,assistant'],
            'messages.*.content' => ['required', 'string', 'max:4000'],
        ]);

        return response()->stream(function () use ($streamer, $data) {
            foreach ($streamer->stream($data['messages']) as $event) {
                echo "event: {$event['type']}\n";
                echo 'data: ' . json_encode($event['payload']) . "\n\n";

                if (ob_get_level() > 0) ob_flush();
                flush();
            }
        }, 200, [
            'Content-Type'      => 'text/event-stream',
            'Cache-Control'     => 'no-cache',
            'X-Accel-Buffering' => 'no',
        ]);
    }
}

The X-Accel-Buffering header is critical when you sit behind Nginx — without it the proxy buffers the response and the user sees nothing until the model finishes.

Step 2 : The streamer service

This is a generator. It yields events as they come from the Anthropic SDK; the controller decides what to do with them.

// app/Services/Ai/ClaudeStreamer.php
namespace App\Services\Ai;

use Anthropic\Anthropic;
use Generator;

class ClaudeStreamer
{
    public function __construct(private Anthropic $client) {}

    public function stream(array $messages): Generator
    {
        $stream = $this->client->messages->createStreamed([
            'model'      => 'claude-sonnet-4-6',
            'max_tokens' => 1024,
            'system'     => 'You are a concise assistant. Use short paragraphs.',
            'messages'   => $messages,
        ]);

        foreach ($stream as $event) {
            if ($event->type === 'content_block_delta' && $event->delta?->type === 'text_delta') {
                yield ['type' => 'token', 'payload' => ['text' => $event->delta->text]];
            }

            if ($event->type === 'message_stop') {
                yield ['type' => 'done', 'payload' => ['usage' => $event->usage ?? null]];
            }
        }
    }
}

Step 3 : Browser consumer

Use the Fetch streaming API. EventSource only supports GET, which is wrong for chat POST lets you send a message body cleanly.

// resources/js/chat.js
async function streamChat(messages, onToken) {
    const res = await fetch('/chat/stream', {
        method: 'POST',
        headers: {
            'Content-Type': 'application/json',
            'X-CSRF-TOKEN': window.csrfToken,
        },
        body: JSON.stringify({ messages }),
    });

    const reader = res.body.getReader();
    const decoder = new TextDecoder();
    let buffer = '';

    while (true) {
        const { value, done } = await reader.read();
        if (done) break;

        buffer += decoder.decode(value, { stream: true });

        let sep;
        while ((sep = buffer.indexOf('\n\n')) !== -1) {
            const chunk = buffer.slice(0, sep);
            buffer = buffer.slice(sep + 2);

            const event = parseSse(chunk);
            if (event.type === 'token') onToken(event.data.text);
        }
    }
}

function parseSse(chunk) {
    const lines = chunk.split('\n');
    let type = 'message', data = {};
    for (const line of lines) {
        if (line.startsWith('event:')) type = line.slice(6).trim();
        if (line.startsWith('data:')) data = JSON.parse(line.slice(5).trim());
    }
    return { type, data };
}

Step 4 : Reverb for shared conversations

Once you need the same stream to land on multiple clients (think: customer in browser + agent in admin panel), switch to broadcast events. A queued job runs the LLM call and emits each token over a private channel.

composer require laravel/reverb
php artisan reverb:install
php artisan reverb:start

// app/Events/AiTokenStreamed.php
namespace App\Events;

use Illuminate\Broadcasting\Channel;
use Illuminate\Broadcasting\InteractsWithSockets;
use Illuminate\Broadcasting\PrivateChannel;
use Illuminate\Contracts\Broadcasting\ShouldBroadcast;
use Illuminate\Queue\SerializesModels;

class AiTokenStreamed implements ShouldBroadcast
{
    use InteractsWithSockets, SerializesModels;

    public function __construct(
        public int $conversationId,
        public string $token,
        public string $eventType = 'token',
    ) {}

    public function broadcastOn(): Channel
    {
        return new PrivateChannel("conversations.{$this->conversationId}");
    }

    public function broadcastAs(): string
    {
        return $this->eventType;
    }
}

// app/Jobs/StreamAiReply.php
public function handle(ClaudeStreamer $streamer): void
{
    foreach ($streamer->stream($this->messages) as $event) {
        broadcast(new AiTokenStreamed(
            conversationId: $this->conversationId,
            token: $event['payload']['text'] ?? '',
            eventType: $event['type'],
        ));
    }
}

// routes/channels.php
Broadcast::channel('conversations.{id}', function (User $user, int $id) {
    return $user->canAccessConversation($id);
});

Step 5 : Echo client for Reverb

// resources/js/echo.js
import Echo from 'laravel-echo';
import Pusher from 'pusher-js';

window.Pusher = Pusher;

window.Echo = new Echo({
    broadcaster: 'reverb',
    key: import.meta.env.VITE_REVERB_APP_KEY,
    wsHost: import.meta.env.VITE_REVERB_HOST,
    wsPort: import.meta.env.VITE_REVERB_PORT,
    forceTLS: false,
    enabledTransports: ['ws', 'wss'],
});

Echo.private(`conversations.${conversationId}`)
    .listen('.token', (e) => appendToken(e.token))
    .listen('.done', () => finalizeMessage());

Proxy & server configuration

Streaming gets sabotaged at the infra layer more often than at the application layer. Configure these before you debug your code:

Layer	Setting	Why
PHP-FPM	`output_buffering=Off`	Without this, PHP collects the whole response before flushing
Nginx	`proxy_buffering off; proxy_read_timeout 180s;`	Default is 60s; LLMs routinely exceed that
Cloudflare	Disable "Auto Minify" on the chat path	Minification batches output and breaks SSE framing
CDN	Bypass cache for `/chat/stream`	Streaming responses must never be cached
Browser	`Cache-Control: no-cache, no-transform`	Some intermediaries gzip and buffer otherwise

Reconnection & cancellation

Real chat sessions break. The user closes the lid, the train enters a tunnel, the worker restarts mid-stream. Build for it:

Persist partial replies. Save the assistant message to the DB when the stream starts, then update on each message_stop. Reload it if the user reconnects.
Server-side cancellation flag. Store a Redis key like cancel:{conversationId}; the streamer checks it between deltas and stops cleanly.
Client-side AbortController. When the user types a new message, abort the inflight fetch — otherwise tokens from the old reply will keep arriving.
Deduplicate by event ID. If you reconnect and the server replays from buffer, the client should skip events it has already rendered.

// Client cancellation
let controller = null;
function send(messages) {
    controller?.abort();
    controller = new AbortController();
    fetch('/chat/stream', { signal: controller.signal, /* ... */ });
}

Production gotchas

Disable PHP-FPM output buffering. Set output_buffering=Off for the streaming worker pool.
Use a long timeout. A 30-second LLM call dies behind default proxy timeouts. Set proxy_read_timeout to at least 180 seconds.
Send keep-alive comments. If the model thinks for >15 seconds before the first token, push : ping\n\n every 10 seconds to keep proxies happy.
Persist on done, not on token. Buffer the assistant message in memory and write it to the DB once when the stream ends : DB writes per token will melt your database.
Cancel inflight requests. When the user sends a new message before the previous one finishes, the client should abort the fetch and the job should detect a cancellation flag.
Rate limit by user. A user holding the same SSE response open for hours is one socket per session. A throttle middleware on /chat/stream protects you from runaway tab-loops.
Fan-out via Redis when scaling Reverb. Multiple Reverb processes need a shared backplane to broadcast across them.

Streaming is not about speed. It is about turning waiting into reading. The total time barely changes; the perceived latency drops to zero.

Start with SSE. Move to Reverb only when you need fan-out. Both pipelines share the same generator service, so the migration is a single line of code.

Streaming AI Chat with Laravel Reverb and Server-Sent Events

Why streaming matters

Three transports : when to pick which

Two streaming modes, one backend

Step 1 : Streaming SSE endpoint

Step 2 : The streamer service

Step 3 : Browser consumer

Step 4 : Reverb for shared conversations

Step 5 : Echo client for Reverb

Proxy & server configuration

Reconnection & cancellation

Production gotchas

Tags

Share

Related posts

CRUD Operations In Laravel 8

Scheduling Tasks with Cron Job in Laravel 5.8

Connecting Multiple Databases in Laravel 5.8

Integrating Google ReCaptcha in Laravel 5.8

Clearing Route, View, Config Cache in Laravel 5.8