Cover image for Streaming AI Chat with Laravel Reverb and Server-Sent Events

At a glance

Reading time

~200 words/min

Published

16 hours ago

May 12, 2026

Views

9

All-time total

Streaming AI Chat with Laravel Reverb and Server-Sent Events

An LLM that takes 12 seconds to respond feels broken. The same response, streamed token-by-token, feels instant. Streaming is no longer a nice-to-have it is the baseline UX for any AI feature that talks to users. The good news is that Laravel 12 ships every primitive you need: HTTP streaming responses, queued jobs, generator-friendly controllers, and Laravel Reverb for first-class WebSockets.

 

This guide builds a production-grade streaming chat: SSE for the simple case, Reverb for multi-tab and multi-device sync.We will use Claude as the model, but the pattern works for any provider that supports server-sent token streams OpenAI, Mistral, Gemini, even self-hosted Ollama. The core insight: streaming is not a feature you bolt on at the end. It changes how you structure the controller, the queue, and the frontend reducer all at once.

Why streaming matters

Three reasons, in order of importance:

  1. Perceived latency drops to near zero. Users start reading at 300ms instead of waiting 8 seconds for a full reply. Total wall-clock time barely changes; the experience changes completely.
  2. Abandonment goes down. Without streaming, users hit refresh, type again, or leave. With streaming, they wait because they can see progress.
  3. You catch problems earlier. If the model starts answering the wrong question, the user can stop the stream at token 50 instead of waiting for token 800.

Three transports : when to pick which

Transport Direction Best for Avoid when
Polling Pull Background jobs, status updates You need sub second latency
SSE Server → client Single-user AI chat, live logs Bidirectional or multi-consumer fan-out
WebSockets (Reverb) Bidirectional Shared conversations, collaborative editing A single tab is the only consumer

Two streaming modes, one backend

  • SSE (Server-Sent Events) : perfect when one browser tab is the consumer. Simple, no auth handshake, works with vanilla fetch, automatic reconnection in EventSource (we will use fetch instead because EventSource is GET-only).
  • Reverb / WebSockets : required when the same conversation needs to stream to multiple devices, dashboards, or shared inboxes. Also the right call when the agent itself runs in a queue worker (the worker cannot hold an HTTP response open).

Step 1 : Streaming SSE endpoint

Laravel's stream() response is the cleanest way to push tokens. It flushes the buffer per chunk, so the browser sees text as it arrives.

 

// routes/web.php
Route::post('/chat/stream', StreamChatController::class);
// app/Http/Controllers/StreamChatController.php
namespace App\Http\Controllers;

use App\Services\Ai\ClaudeStreamer;
use Illuminate\Http\Request;
use Symfony\Component\HttpFoundation\StreamedResponse;

class StreamChatController extends Controller
{
    public function __invoke(Request $request, ClaudeStreamer $streamer): StreamedResponse
    {
        $data = $request->validate([
            'messages'           => ['required', 'array', 'max:50'],
            'messages.*.role'    => ['required', 'in:user,assistant'],
            'messages.*.content' => ['required', 'string', 'max:4000'],
        ]);

        return response()->stream(function () use ($streamer, $data) {
            foreach ($streamer->stream($data['messages']) as $event) {
                echo "event: {$event['type']}\n";
                echo 'data: ' . json_encode($event['payload']) . "\n\n";

                if (ob_get_level() > 0) ob_flush();
                flush();
            }
        }, 200, [
            'Content-Type'      => 'text/event-stream',
            'Cache-Control'     => 'no-cache',
            'X-Accel-Buffering' => 'no',
        ]);
    }
}

 

The X-Accel-Buffering header is critical when you sit behind Nginx — without it the proxy buffers the response and the user sees nothing until the model finishes.

Step 2 : The streamer service

This is a generator. It yields events as they come from the Anthropic SDK; the controller decides what to do with them.

 

// app/Services/Ai/ClaudeStreamer.php
namespace App\Services\Ai;

use Anthropic\Anthropic;
use Generator;

class ClaudeStreamer
{
    public function __construct(private Anthropic $client) {}

    public function stream(array $messages): Generator
    {
        $stream = $this->client->messages->createStreamed([
            'model'      => 'claude-sonnet-4-6',
            'max_tokens' => 1024,
            'system'     => 'You are a concise assistant. Use short paragraphs.',
            'messages'   => $messages,
        ]);

        foreach ($stream as $event) {
            if ($event->type === 'content_block_delta' && $event->delta?->type === 'text_delta') {
                yield ['type' => 'token', 'payload' => ['text' => $event->delta->text]];
            }

            if ($event->type === 'message_stop') {
                yield ['type' => 'done', 'payload' => ['usage' => $event->usage ?? null]];
            }
        }
    }
}

Step 3 : Browser consumer

Use the Fetch streaming API. EventSource only supports GET, which is wrong for chat POST lets you send a message body cleanly.

 

// resources/js/chat.js
async function streamChat(messages, onToken) {
    const res = await fetch('/chat/stream', {
        method: 'POST',
        headers: {
            'Content-Type': 'application/json',
            'X-CSRF-TOKEN': window.csrfToken,
        },
        body: JSON.stringify({ messages }),
    });

    const reader = res.body.getReader();
    const decoder = new TextDecoder();
    let buffer = '';

    while (true) {
        const { value, done } = await reader.read();
        if (done) break;

        buffer += decoder.decode(value, { stream: true });

        let sep;
        while ((sep = buffer.indexOf('\n\n')) !== -1) {
            const chunk = buffer.slice(0, sep);
            buffer = buffer.slice(sep + 2);

            const event = parseSse(chunk);
            if (event.type === 'token') onToken(event.data.text);
        }
    }
}

function parseSse(chunk) {
    const lines = chunk.split('\n');
    let type = 'message', data = {};
    for (const line of lines) {
        if (line.startsWith('event:')) type = line.slice(6).trim();
        if (line.startsWith('data:')) data = JSON.parse(line.slice(5).trim());
    }
    return { type, data };
}

Step 4 : Reverb for shared conversations

Once you need the same stream to land on multiple clients (think: customer in browser + agent in admin panel), switch to broadcast events. A queued job runs the LLM call and emits each token over a private channel.

 

composer require laravel/reverb
php artisan reverb:install
php artisan reverb:start

 

// app/Events/AiTokenStreamed.php
namespace App\Events;

use Illuminate\Broadcasting\Channel;
use Illuminate\Broadcasting\InteractsWithSockets;
use Illuminate\Broadcasting\PrivateChannel;
use Illuminate\Contracts\Broadcasting\ShouldBroadcast;
use Illuminate\Queue\SerializesModels;

class AiTokenStreamed implements ShouldBroadcast
{
    use InteractsWithSockets, SerializesModels;

    public function __construct(
        public int $conversationId,
        public string $token,
        public string $eventType = 'token',
    ) {}

    public function broadcastOn(): Channel
    {
        return new PrivateChannel("conversations.{$this->conversationId}");
    }

    public function broadcastAs(): string
    {
        return $this->eventType;
    }
}

 

// app/Jobs/StreamAiReply.php
public function handle(ClaudeStreamer $streamer): void
{
    foreach ($streamer->stream($this->messages) as $event) {
        broadcast(new AiTokenStreamed(
            conversationId: $this->conversationId,
            token: $event['payload']['text'] ?? '',
            eventType: $event['type'],
        ));
    }
}
// routes/channels.php
Broadcast::channel('conversations.{id}', function (User $user, int $id) {
    return $user->canAccessConversation($id);
});

Step 5 : Echo client for Reverb

// resources/js/echo.js
import Echo from 'laravel-echo';
import Pusher from 'pusher-js';

window.Pusher = Pusher;

window.Echo = new Echo({
    broadcaster: 'reverb',
    key: import.meta.env.VITE_REVERB_APP_KEY,
    wsHost: import.meta.env.VITE_REVERB_HOST,
    wsPort: import.meta.env.VITE_REVERB_PORT,
    forceTLS: false,
    enabledTransports: ['ws', 'wss'],
});

Echo.private(`conversations.${conversationId}`)
    .listen('.token', (e) => appendToken(e.token))
    .listen('.done', () => finalizeMessage());

Proxy & server configuration

Streaming gets sabotaged at the infra layer more often than at the application layer. Configure these before you debug your code:

Layer Setting Why
PHP-FPM output_buffering=Off Without this, PHP collects the whole response before flushing
Nginx proxy_buffering off; proxy_read_timeout 180s; Default is 60s; LLMs routinely exceed that
Cloudflare Disable "Auto Minify" on the chat path Minification batches output and breaks SSE framing
CDN Bypass cache for /chat/stream Streaming responses must never be cached
Browser Cache-Control: no-cache, no-transform Some intermediaries gzip and buffer otherwise

Reconnection & cancellation

Real chat sessions break. The user closes the lid, the train enters a tunnel, the worker restarts mid-stream. Build for it:

  • Persist partial replies. Save the assistant message to the DB when the stream starts, then update on each message_stop. Reload it if the user reconnects.
  • Server-side cancellation flag. Store a Redis key like cancel:{conversationId}; the streamer checks it between deltas and stops cleanly.
  • Client-side AbortController. When the user types a new message, abort the inflight fetch — otherwise tokens from the old reply will keep arriving.
  • Deduplicate by event ID. If you reconnect and the server replays from buffer, the client should skip events it has already rendered.

 

// Client cancellation
let controller = null;
function send(messages) {
    controller?.abort();
    controller = new AbortController();
    fetch('/chat/stream', { signal: controller.signal, /* ... */ });
}

Production gotchas

  • Disable PHP-FPM output buffering. Set output_buffering=Off for the streaming worker pool.
  • Use a long timeout. A 30-second LLM call dies behind default proxy timeouts. Set proxy_read_timeout to at least 180 seconds.
  • Send keep-alive comments. If the model thinks for >15 seconds before the first token, push : ping\n\n every 10 seconds to keep proxies happy.
  • Persist on done, not on token. Buffer the assistant message in memory and write it to the DB once when the stream ends : DB writes per token will melt your database.
  • Cancel inflight requests. When the user sends a new message before the previous one finishes, the client should abort the fetch and the job should detect a cancellation flag.
  • Rate limit by user. A user holding the same SSE response open for hours is one socket per session. A throttle middleware on /chat/stream protects you from runaway tab-loops.
  • Fan-out via Redis when scaling Reverb. Multiple Reverb processes need a shared backplane to broadcast across them.

Streaming is not about speed. It is about turning waiting into reading. The total time barely changes; the perceived latency drops to zero.

Start with SSE. Move to Reverb only when you need fan-out. Both pipelines share the same generator service, so the migration is a single line of code.

Newsletter

Want more posts like this?

Get practical software notes and tutorials delivered when something new is published.

No spam. Unsubscribe anytime.

Share

Related posts

CRUD Operations In Laravel 8

This tutorial is created to illustrate the basic CRUD (Create , Read, Update, Delete) operation using SQL with Laravel 8. Laravel is one of the fastest-growing frameworks for PHP.

4 years ago

Scheduling Tasks with Cron Job in Laravel 5.8

Cron Job is used to schedule tasks that will be executed every so often. Crontab is a file that contains a list of scripts, By editing the Crontab, You can run the scripts periodically.

7 years ago

Connecting Multiple Databases in Laravel 5.8

This tutorial is created to implement multiple database connections using mysql. Let’s see how to configure multiple database connections in Laravel 5.8.

6 years ago

Integrating Google ReCaptcha in Laravel 5.8

reCAPTCHA is a free service from Google. It’s a CAPTCHA-like system designed to recognize that the user is human and, at the same time, assist in the digitization of books. It helps to protects your w

6 years ago

Clearing Route, View, Config Cache in Laravel 5.8

Sometimes you may face an issue that the changes to the Laravel Project may not update on the web. This occures when the application is served by the cache. In this tutorial, You’ll learn to Clear App

6 years ago