Cover image for Multimodal AI in Laravel: Receipt Extraction with Claude Vision

At a glance

Reading time

~200 words/min

Published

6 hours ago

May 20, 2026

Views

4

All-time total

Multimodal AI in Laravel: Receipt Extraction with Claude Vision

Text-only AI is yesterday's feature. The interesting workloads in 2026 receipt parsing, ID verification, screenshot triage, dashboard captioning, design QA, medical-form digitization all start with an image or a PDF. Multimodal models let you skip the OCR step entirely: pass the image, get structured data back. No Tesseract pipeline. No regex on noisy text. No two-stage system that hands an OCR string to a smaller LLM. With Claude you do it all in one HTTP call and a strict JSON schema.

 

This guide builds a real multimodal feature: an expense receipt extractor that takes a phone photo or scanned PDF and returns clean line items, totals, currency, tax, and a confidence score. We will use Claude vision, base64-encoded images, and a validated DTO so the rest of your app can trust the output.

 

Once the pipeline exists, the same pattern fits invoices, lab results, ID cards, screenshots of dashboards, and any other "extract structured data from a picture" job your product manager has been asking for.

Why multimodal beats classical OCR

Step Classical OCR pipeline Multimodal LLM
Layout handling Brittle needs tuned templates per merchant Generalizes across formats out of the box
Rotated / skewed images Pre-process or fail Tolerates tilt, glare, and crumples
Math reconciliation Separate logic stage Performed during extraction
Multilingual One language model per language Works across major scripts
Confidence Per-character, hard to calibrate Per-field, reasoning-aware
Setup cost Days of pipeline tuning A prompt and a JSON schema

The tradeoff is cost-per-page. Classical OCR is essentially free at scale; vision LLM calls cost cents. For workloads under ~1M pages a year, the lower engineering cost of the multimodal route wins comfortably.

Real use cases for this pattern

  • Expense management : receipts to line items (this guide).
  • AP automation : invoices to PO matching.
  • Identity verification : ID document fields with confidence-routed manual review.
  • Healthcare intake : paper forms to FHIR-shaped JSON.
  • Logistics : bills of lading and delivery proofs.
  • Design QA : screenshot vs. mockup diff with structured callouts.
  • Dashboard captioning : turn a chart screenshot into a one-paragraph summary for an alert.

What we are building

User uploads receipt photo (jpg/png/pdf)
        ↓
Laravel resizes & encodes → Claude vision → structured JSON
        ↓
Validation → Expense model → review queue if confidence < 0.7

Step 1 : Upload route with size and type guards

// app/Http/Controllers/ReceiptUploadController.php
namespace App\Http\Controllers;

use App\Jobs\ExtractReceiptDataJob;
use App\Models\Receipt;
use Illuminate\Http\Request;

class ReceiptUploadController extends Controller
{
    public function __invoke(Request $request)
    {
        $data = $request->validate([
            'file' => ['required', 'file', 'max:10240', 'mimes:jpg,jpeg,png,webp,pdf'],
        ]);

        $path = $data['file']->store('receipts', 'private');

        $receipt = Receipt::create([
            'user_id'  => $request->user()->id,
            'path'     => $path,
            'mime'     => $data['file']->getMimeType(),
            'status'   => 'queued',
        ]);

        ExtractReceiptDataJob::dispatch($receipt->id);

        return response()->json(['receipt_id' => $receipt->id]);
    }
}

Step 2 : Image preprocessing

Phone photos are huge. Resize before sending anything beyond 1600px on the long edge is wasted tokens. Use intervention/image for the Laravel-friendly API.

 

// app/Services/Vision/ImagePreparer.php
namespace App\Services\Vision;

use Intervention\Image\ImageManager;
use Intervention\Image\Drivers\Gd\Driver;

class ImagePreparer
{
    public function prepare(string $absolutePath, string $mime): array
    {
        if ($mime === 'application/pdf') {
            return [
                'media_type' => 'application/pdf',
                'data'       => base64_encode(file_get_contents($absolutePath)),
            ];
        }

        $image = (new ImageManager(new Driver()))
            ->read($absolutePath)
            ->scaleDown(width: 1600, height: 1600);

        return [
            'media_type' => 'image/jpeg',
            'data'       => base64_encode((string) $image->toJpeg(quality: 85)),
        ];
    }
}

Step 3 : The vision call

Claude accepts a content array that mixes images and text. The trick is giving it a strict JSON schema in the prompt and asking for a confidence score per field the model is surprisingly good at telling you when it is guessing.

 

// app/Services/Vision/ReceiptExtractor.php
namespace App\Services\Vision;

use Anthropic\Anthropic;

class ReceiptExtractor
{
    public function __construct(
        private Anthropic $client,
        private ImagePreparer $preparer,
    ) {}

    public function extract(string $absolutePath, string $mime): array
    {
        $media = $this->preparer->prepare($absolutePath, $mime);

        $response = $this->client->messages->create([
            'model'      => 'claude-sonnet-4-6',
            'max_tokens' => 1500,
            'system'     => 'You extract structured data from receipts. Return strict JSON. Never invent values; use null when unsure.',
            'messages'   => [[
                'role' => 'user',
                'content' => [
                    [
                        'type'   => 'image',
                        'source' => [
                            'type'       => 'base64',
                            'media_type' => $media['media_type'],
                            'data'       => $media['data'],
                        ],
                    ],
                    [
                        'type' => 'text',
                        'text' => $this->prompt(),
                    ],
                ],
            ]],
        ]);

        $text = $response->content[0]['text'] ?? '{}';
        return json_decode($text, true) ?: [];
    }

    private function prompt(): string
    {
        return <<<TXT
        Extract this receipt as JSON.

        Schema:
        {
          "merchant": "string|null",
          "purchased_at": "YYYY-MM-DD|null",
          "currency": "ISO 4217|null",
          "subtotal": "number|null",
          "tax": "number|null",
          "total": "number|null",
          "items": [
            { "description": "string", "quantity": "number", "unit_price": "number", "total": "number" }
          ],
          "confidence": "number between 0 and 1"
        }

        Rules:
        - Numbers must be plain decimals, no currency symbol.
        - Use null when a field is unreadable, never invent values.
        - confidence reflects how clearly the receipt could be read.
        TXT;
    }
}

Step 4 : Validate and persist

// app/Services/Vision/ReceiptValidator.php
namespace App\Services\Vision;

use Illuminate\Support\Facades\Validator;
use Illuminate\Validation\ValidationException;

class ReceiptValidator
{
    public function validate(array $data): array
    {
        $validator = Validator::make($data, [
            'merchant'        => ['nullable', 'string', 'max:120'],
            'purchased_at'    => ['nullable', 'date_format:Y-m-d'],
            'currency'        => ['nullable', 'string', 'size:3'],
            'subtotal'        => ['nullable', 'numeric', 'min:0'],
            'tax'             => ['nullable', 'numeric', 'min:0'],
            'total'           => ['nullable', 'numeric', 'min:0'],
            'items'           => ['array', 'max:100'],
            'items.*.description' => ['required', 'string', 'max:200'],
            'items.*.quantity'    => ['required', 'numeric', 'min:0'],
            'items.*.unit_price'  => ['required', 'numeric', 'min:0'],
            'items.*.total'       => ['required', 'numeric', 'min:0'],
            'confidence'      => ['required', 'numeric', 'between:0,1'],
        ]);

        if ($validator->fails()) {
            throw ValidationException::withMessages($validator->errors()->toArray());
        }

        return $validator->validated();
    }
}

 

// app/Jobs/ExtractReceiptDataJob.php
public function handle(
    ReceiptExtractor $extractor,
    ReceiptValidator $validator,
): void {
    $receipt = Receipt::findOrFail($this->receiptId);
    $absolute = Storage::disk('private')->path($receipt->path);

    try {
        $raw = $extractor->extract($absolute, $receipt->mime);
        $data = $validator->validate($raw);

        $receipt->update([
            'merchant'     => $data['merchant'],
            'purchased_at' => $data['purchased_at'],
            'currency'     => $data['currency'],
            'subtotal'     => $data['subtotal'],
            'tax'          => $data['tax'],
            'total'        => $data['total'],
            'items'        => $data['items'],
            'confidence'   => $data['confidence'],
            'status'       => $data['confidence'] >= 0.7 ? 'extracted' : 'needs_review',
        ]);
    } catch (\Throwable $e) {
        $receipt->update(['status' => 'failed', 'error' => $e->getMessage()]);
    }
}

Step 5 : Cross-check totals

The model can read all the line items correctly and still return a total that does not match. A simple arithmetic check catches more errors than a stronger prompt.

 

$lineSum = collect($data['items'])->sum('total');
$claimed = $data['subtotal'] ?? $data['total'];

if ($claimed && abs($lineSum - $claimed) > max(0.05, $claimed * 0.01)) {
    $receipt->update(['status' => 'needs_review', 'confidence' => min($data['confidence'], 0.5)]);
}

Step 6 : Show progress while it runs

Vision calls take 4 to 8 seconds. Use the same Reverb broadcast pattern from streaming chat: dispatch a status event when the job starts, completes, or fails.

 

broadcast(new ReceiptStatusChanged(
    receiptId: $receipt->id,
    status: $receipt->status,
));

Confidence-based routing

The single most important design choice in any vision pipeline is what you do with the confidence score. Auto-approving everything is reckless; queueing every receipt for human review defeats the point. Bucket the output:

Confidence Action SLA
≥ 0.95 Auto-approve, mark as AI-extracted Instant
0.70 – 0.94 Show pre-filled form for one-click confirm User-facing
< 0.70 Send to ops review queue Same-day
Math mismatch Force review even at high confidence Same-day

The "math mismatch" row is the cheap superpower. If the line items do not sum to the claimed total, the extraction is suspect regardless of what the model says about confidence flag it.

Image preprocessing : what actually helps

  • Resize to ≤1600px on the long edge. Anything bigger costs more tokens with no quality benefit.
  • Auto-orient using EXIF. Phone photos often arrive sideways; rotate before sending.
  • Convert HEIC to JPEG. Apple's default format is not universally supported; normalize at the boundary.
  • Avoid heavy contrast/threshold filters. They help classical OCR; they confuse vision LLMs by destroying gradient information.
  • Strip GPS metadata. Receipts often have location EXIF you do not want stored — clear it before persisting.

Production tips

  • Always resize. A 4MB receipt photo costs more than a 200KB one and reads no better.
  • Trust confidence, then verify. Below 0.7, route to a human. Above 0.95, auto-approve. The middle is where review queues earn their keep.
  • Strip PII when caching. Receipts often have card numbers and names. Hash the image bytes, not the text content, and avoid logging the raw extraction unless redacted.
  • Test with bad inputs. Crumpled receipts, glare, low light, screenshots of screenshots, fingers in the frame. The failure modes are the feature.
  • PDF first page only. Most receipt PDFs are one page; passing 20 pages of a billing PDF wastes tokens. Detect and split when the file actually contains multiple receipts.
  • Localize the prompt. If users upload non-English receipts, mention the language hint in the prompt accuracy on dates, currency, and tax labels improves.
  • Cap retries. Vision calls fail occasionally. Retry once, then mark failed with a clear error so the user can re-upload never silently loop.
  • Log a redacted thumbnail. When debugging a wrong extraction, you need to see roughly what the model saw without storing the original sensitive image at full resolution.

Multimodal AI is the cheapest data-entry team you will ever hire but only if you keep humans in the loop for the cases the model is unsure about.

The same pipeline applies to invoices, IDs, lab reports, dashboard screenshots, and design mocks. Swap the prompt, swap the schema, keep the validation. That is the whole pattern.

Newsletter

Want more posts like this?

Get practical software notes and tutorials delivered when something new is published.

No spam. Unsubscribe anytime.

Share

Related posts

CRUD Operations In Laravel 8

This tutorial is created to illustrate the basic CRUD (Create , Read, Update, Delete) operation using SQL with Laravel 8. Laravel is one of the fastest-growing frameworks for PHP.

4 years ago

Scheduling Tasks with Cron Job in Laravel 5.8

Cron Job is used to schedule tasks that will be executed every so often. Crontab is a file that contains a list of scripts, By editing the Crontab, You can run the scripts periodically.

7 years ago

Connecting Multiple Databases in Laravel 5.8

This tutorial is created to implement multiple database connections using mysql. Let’s see how to configure multiple database connections in Laravel 5.8.

6 years ago

Integrating Google ReCaptcha in Laravel 5.8

reCAPTCHA is a free service from Google. It’s a CAPTCHA-like system designed to recognize that the user is human and, at the same time, assist in the digitization of books. It helps to protects your w

6 years ago

Clearing Route, View, Config Cache in Laravel 5.8

Sometimes you may face an issue that the changes to the Laravel Project may not update on the web. This occures when the application is served by the cache. In this tutorial, You’ll learn to Clear App

6 years ago