Multimodal AI with Claude Vision in Laravel (2026)

Text-only AI is yesterday's feature. The interesting workloads in 2026 receipt parsing, ID verification, screenshot triage, dashboard captioning, design QA, medical-form digitization all start with an image or a PDF. Multimodal models let you skip the OCR step entirely: pass the image, get structured data back. No Tesseract pipeline. No regex on noisy text. No two-stage system that hands an OCR string to a smaller LLM. With Claude you do it all in one HTTP call and a strict JSON schema.

This guide builds a real multimodal feature: an expense receipt extractor that takes a phone photo or scanned PDF and returns clean line items, totals, currency, tax, and a confidence score. We will use Claude vision, base64-encoded images, and a validated DTO so the rest of your app can trust the output.

Once the pipeline exists, the same pattern fits invoices, lab results, ID cards, screenshots of dashboards, and any other "extract structured data from a picture" job your product manager has been asking for.

Why multimodal beats classical OCR

Step	Classical OCR pipeline	Multimodal LLM
Layout handling	Brittle needs tuned templates per merchant	Generalizes across formats out of the box
Rotated / skewed images	Pre-process or fail	Tolerates tilt, glare, and crumples
Math reconciliation	Separate logic stage	Performed during extraction
Multilingual	One language model per language	Works across major scripts
Confidence	Per-character, hard to calibrate	Per-field, reasoning-aware
Setup cost	Days of pipeline tuning	A prompt and a JSON schema

The tradeoff is cost-per-page. Classical OCR is essentially free at scale; vision LLM calls cost cents. For workloads under ~1M pages a year, the lower engineering cost of the multimodal route wins comfortably.

Real use cases for this pattern

Expense management : receipts to line items (this guide).
AP automation : invoices to PO matching.
Identity verification : ID document fields with confidence-routed manual review.
Healthcare intake : paper forms to FHIR-shaped JSON.
Logistics : bills of lading and delivery proofs.
Design QA : screenshot vs. mockup diff with structured callouts.
Dashboard captioning : turn a chart screenshot into a one-paragraph summary for an alert.

What we are building

User uploads receipt photo (jpg/png/pdf)
        ↓
Laravel resizes & encodes → Claude vision → structured JSON
        ↓
Validation → Expense model → review queue if confidence < 0.7

Step 1 : Upload route with size and type guards

// app/Http/Controllers/ReceiptUploadController.php
namespace App\Http\Controllers;

use App\Jobs\ExtractReceiptDataJob;
use App\Models\Receipt;
use Illuminate\Http\Request;

class ReceiptUploadController extends Controller
{
    public function __invoke(Request $request)
    {
        $data = $request->validate([
            'file' => ['required', 'file', 'max:10240', 'mimes:jpg,jpeg,png,webp,pdf'],
        ]);

        $path = $data['file']->store('receipts', 'private');

        $receipt = Receipt::create([
            'user_id'  => $request->user()->id,
            'path'     => $path,
            'mime'     => $data['file']->getMimeType(),
            'status'   => 'queued',
        ]);

        ExtractReceiptDataJob::dispatch($receipt->id);

        return response()->json(['receipt_id' => $receipt->id]);
    }
}

Step 2 : Image preprocessing

Phone photos are huge. Resize before sending anything beyond 1600px on the long edge is wasted tokens. Use intervention/image for the Laravel-friendly API.

// app/Services/Vision/ImagePreparer.php
namespace App\Services\Vision;

use Intervention\Image\ImageManager;
use Intervention\Image\Drivers\Gd\Driver;

class ImagePreparer
{
    public function prepare(string $absolutePath, string $mime): array
    {
        if ($mime === 'application/pdf') {
            return [
                'media_type' => 'application/pdf',
                'data'       => base64_encode(file_get_contents($absolutePath)),
            ];
        }

        $image = (new ImageManager(new Driver()))
            ->read($absolutePath)
            ->scaleDown(width: 1600, height: 1600);

        return [
            'media_type' => 'image/jpeg',
            'data'       => base64_encode((string) $image->toJpeg(quality: 85)),
        ];
    }
}

Step 3 : The vision call

Claude accepts a content array that mixes images and text. The trick is giving it a strict JSON schema in the prompt and asking for a confidence score per field the model is surprisingly good at telling you when it is guessing.

// app/Services/Vision/ReceiptExtractor.php
namespace App\Services\Vision;

use Anthropic\Anthropic;

class ReceiptExtractor
{
    public function __construct(
        private Anthropic $client,
        private ImagePreparer $preparer,
    ) {}

    public function extract(string $absolutePath, string $mime): array
    {
        $media = $this->preparer->prepare($absolutePath, $mime);

        $response = $this->client->messages->create([
            'model'      => 'claude-sonnet-4-6',
            'max_tokens' => 1500,
            'system'     => 'You extract structured data from receipts. Return strict JSON. Never invent values; use null when unsure.',
            'messages'   => [[
                'role' => 'user',
                'content' => [
                    [
                        'type'   => 'image',
                        'source' => [
                            'type'       => 'base64',
                            'media_type' => $media['media_type'],
                            'data'       => $media['data'],
                        ],
                    ],
                    [
                        'type' => 'text',
                        'text' => $this->prompt(),
                    ],
                ],
            ]],
        ]);

        $text = $response->content[0]['text'] ?? '{}';
        return json_decode($text, true) ?: [];
    }

    private function prompt(): string
    {
        return <<<TXT
        Extract this receipt as JSON.

        Schema:
        {
          "merchant": "string|null",
          "purchased_at": "YYYY-MM-DD|null",
          "currency": "ISO 4217|null",
          "subtotal": "number|null",
          "tax": "number|null",
          "total": "number|null",
          "items": [
            { "description": "string", "quantity": "number", "unit_price": "number", "total": "number" }
          ],
          "confidence": "number between 0 and 1"
        }

        Rules:
        - Numbers must be plain decimals, no currency symbol.
        - Use null when a field is unreadable, never invent values.
        - confidence reflects how clearly the receipt could be read.
        TXT;
    }
}

Step 4 : Validate and persist

// app/Services/Vision/ReceiptValidator.php
namespace App\Services\Vision;

use Illuminate\Support\Facades\Validator;
use Illuminate\Validation\ValidationException;

class ReceiptValidator
{
    public function validate(array $data): array
    {
        $validator = Validator::make($data, [
            'merchant'        => ['nullable', 'string', 'max:120'],
            'purchased_at'    => ['nullable', 'date_format:Y-m-d'],
            'currency'        => ['nullable', 'string', 'size:3'],
            'subtotal'        => ['nullable', 'numeric', 'min:0'],
            'tax'             => ['nullable', 'numeric', 'min:0'],
            'total'           => ['nullable', 'numeric', 'min:0'],
            'items'           => ['array', 'max:100'],
            'items.*.description' => ['required', 'string', 'max:200'],
            'items.*.quantity'    => ['required', 'numeric', 'min:0'],
            'items.*.unit_price'  => ['required', 'numeric', 'min:0'],
            'items.*.total'       => ['required', 'numeric', 'min:0'],
            'confidence'      => ['required', 'numeric', 'between:0,1'],
        ]);

        if ($validator->fails()) {
            throw ValidationException::withMessages($validator->errors()->toArray());
        }

        return $validator->validated();
    }
}

// app/Jobs/ExtractReceiptDataJob.php
public function handle(
    ReceiptExtractor $extractor,
    ReceiptValidator $validator,
): void {
    $receipt = Receipt::findOrFail($this->receiptId);
    $absolute = Storage::disk('private')->path($receipt->path);

    try {
        $raw = $extractor->extract($absolute, $receipt->mime);
        $data = $validator->validate($raw);

        $receipt->update([
            'merchant'     => $data['merchant'],
            'purchased_at' => $data['purchased_at'],
            'currency'     => $data['currency'],
            'subtotal'     => $data['subtotal'],
            'tax'          => $data['tax'],
            'total'        => $data['total'],
            'items'        => $data['items'],
            'confidence'   => $data['confidence'],
            'status'       => $data['confidence'] >= 0.7 ? 'extracted' : 'needs_review',
        ]);
    } catch (\Throwable $e) {
        $receipt->update(['status' => 'failed', 'error' => $e->getMessage()]);
    }
}

Step 5 : Cross-check totals

The model can read all the line items correctly and still return a total that does not match. A simple arithmetic check catches more errors than a stronger prompt.

$lineSum = collect($data['items'])->sum('total');
$claimed = $data['subtotal'] ?? $data['total'];

if ($claimed && abs($lineSum - $claimed) > max(0.05, $claimed * 0.01)) {
    $receipt->update(['status' => 'needs_review', 'confidence' => min($data['confidence'], 0.5)]);
}

Step 6 : Show progress while it runs

Vision calls take 4 to 8 seconds. Use the same Reverb broadcast pattern from streaming chat: dispatch a status event when the job starts, completes, or fails.

broadcast(new ReceiptStatusChanged(
    receiptId: $receipt->id,
    status: $receipt->status,
));

Confidence-based routing

The single most important design choice in any vision pipeline is what you do with the confidence score. Auto-approving everything is reckless; queueing every receipt for human review defeats the point. Bucket the output:

Confidence	Action	SLA
≥ 0.95	Auto-approve, mark as AI-extracted	Instant
0.70 – 0.94	Show pre-filled form for one-click confirm	User-facing
< 0.70	Send to ops review queue	Same-day
Math mismatch	Force review even at high confidence	Same-day

The "math mismatch" row is the cheap superpower. If the line items do not sum to the claimed total, the extraction is suspect regardless of what the model says about confidence flag it.

Image preprocessing : what actually helps

Resize to ≤1600px on the long edge. Anything bigger costs more tokens with no quality benefit.
Auto-orient using EXIF. Phone photos often arrive sideways; rotate before sending.
Convert HEIC to JPEG. Apple's default format is not universally supported; normalize at the boundary.
Avoid heavy contrast/threshold filters. They help classical OCR; they confuse vision LLMs by destroying gradient information.
Strip GPS metadata. Receipts often have location EXIF you do not want stored — clear it before persisting.

Production tips

Always resize. A 4MB receipt photo costs more than a 200KB one and reads no better.
Trust confidence, then verify. Below 0.7, route to a human. Above 0.95, auto-approve. The middle is where review queues earn their keep.
Strip PII when caching. Receipts often have card numbers and names. Hash the image bytes, not the text content, and avoid logging the raw extraction unless redacted.
Test with bad inputs. Crumpled receipts, glare, low light, screenshots of screenshots, fingers in the frame. The failure modes are the feature.
PDF first page only. Most receipt PDFs are one page; passing 20 pages of a billing PDF wastes tokens. Detect and split when the file actually contains multiple receipts.
Localize the prompt. If users upload non-English receipts, mention the language hint in the prompt accuracy on dates, currency, and tax labels improves.
Cap retries. Vision calls fail occasionally. Retry once, then mark failed with a clear error so the user can re-upload never silently loop.
Log a redacted thumbnail. When debugging a wrong extraction, you need to see roughly what the model saw without storing the original sensitive image at full resolution.

Multimodal AI is the cheapest data-entry team you will ever hire but only if you keep humans in the loop for the cases the model is unsure about.

The same pipeline applies to invoices, IDs, lab reports, dashboard screenshots, and design mocks. Swap the prompt, swap the schema, keep the validation. That is the whole pattern.

Multimodal AI in Laravel: Receipt Extraction with Claude Vision

Why multimodal beats classical OCR

Real use cases for this pattern

What we are building

Step 1 : Upload route with size and type guards

Step 2 : Image preprocessing

Step 3 : The vision call

Step 4 : Validate and persist

Step 5 : Cross-check totals

Step 6 : Show progress while it runs

Confidence-based routing

Image preprocessing : what actually helps

Production tips

Tags

Share

Related posts

CRUD Operations In Laravel 8

Scheduling Tasks with Cron Job in Laravel 5.8

Connecting Multiple Databases in Laravel 5.8

Integrating Google ReCaptcha in Laravel 5.8

Clearing Route, View, Config Cache in Laravel 5.8