On-Premise Invoice Processing with Ollama and LLaMA

A fully local invoice processing pipeline built on n8n, Ollama, and an open-source OCR stack — extracting structured data from supplier invoices using a self-hosted LLaMA model, with no document ever leaving the client’s own server.

The Problem

A Latvian manufacturing company processed 200–250 supplier invoices per month across multiple suppliers. Their security policy was explicit: no supplier invoice data — which included pricing, volumes, and supplier identities — was permitted to be sent to any external API or cloud service. This ruled out OpenAI, Claude, Google Vision, and any other hosted AI service.

At the same time, they wanted the same outcome as any other invoice automation project: extract vendor name, invoice number, date, line items, amounts, and VAT from PDF attachments and push the structured data into their ERP via webhook.

The constraint was simple and non-negotiable. Everything had to run on their own hardware.

What I Built

The full stack runs on a single on-premise Linux server (Ubuntu 22, 32GB RAM, an NVIDIA RTX 3080 with 10GB VRAM). No external API calls, no cloud dependencies.

Component 1 — Ollama with LLaMA 3.2 Vision

Ollama is a local LLM runtime that handles model management, serving, and hardware acceleration. I installed it on the server and pulled llama3.2-vision:11b — an 11-billion parameter multimodal model that can process both text and images, runs comfortably on the available VRAM, and produces structured JSON output reliably when prompted correctly.

Ollama exposes a local REST API at http://localhost:11434, which n8n calls directly via the HTTP Request node. No internet connection required after the initial model download.

Component 2 — PDF to Image Conversion

LLaMA Vision expects image input, not raw PDFs. I deployed Gotenberg (a Docker-based PDF processing server) on the same machine. The workflow sends each invoice PDF to Gotenberg’s /forms/chromium/convert/url endpoint and receives a PNG of the first page. Multi-page invoices run each page through separately; in practice, 94% of invoices in this dataset fit on one page.

Component 3 — n8n Workflow

The ingestion trigger is a folder watcher — n8n monitors a network share directory every 2 minutes. New PDF files dropped into the folder (by the email client’s auto-save rule, or manually) trigger the workflow.

Steps:

Read the PDF file from the share
POST to Gotenberg → receive PNG bytes
POST the PNG (base64-encoded) to Ollama’s /api/generate endpoint with the extraction prompt
Parse the JSON response
Validate required fields — if any are null, route to the human review queue
POST structured data to the ERP webhook endpoint
Move the processed PDF to an archived subfolder, rename it to YYYY-MM-DD_Vendor_InvoiceNumber.pdf

Component 4 — The Prompt

Getting consistent JSON output from a local model requires more careful prompting than with API models. The final prompt:

<image>
You are processing a supplier invoice. Extract the following fields and return ONLY a JSON object with no additional text.

Required fields:
- vendor_name (string)
- invoice_number (string)  
- invoice_date (YYYY-MM-DD)
- currency (ISO 4217 code)
- subtotal (number, no currency symbol)
- vat_amount (number)
- total (number)
- line_items (array of objects with: description, quantity, unit_price, line_total)
- confidence (number 0.0 to 1.0, your confidence in the extraction accuracy)

If a field cannot be found, set it to null. Return nothing except the JSON object.

The <image> tag at the start signals to LLaMA that image input follows in the API request body. Without it, the model ignores the image.

Component 5 — Confidence Scoring and Human Review

The local model is less consistent than hosted APIs on unusual invoice layouts — handwritten amounts, non-standard table structures, low-resolution scans. The confidence threshold is set at 0.80. Below this, the invoice is copied to a review folder and a desktop notification fires on the finance team’s shared computer (via a simple webhook to a local ntfy instance).

A small HTML form — served from the same n8n instance on an internal URL — lets the reviewer correct the extracted fields and re-submit. Corrected submissions bypass Gotenberg and Ollama and go straight to the ERP webhook.

Performance

On the RTX 3080, LLaMA 3.2 Vision 11B processes one invoice page in 6–9 seconds. For 250 invoices per month, total GPU time is roughly 30–40 minutes — well within acceptable limits for a batch process.

Compared to a cloud API:

Latency per invoice: 6–9 seconds (local) vs 1–2 seconds (Claude/OpenAI)
Accuracy on clean PDFs: comparable — around 92–94% field-level accuracy
Accuracy on low-quality scans: lower than hosted models — around 78%
Cost per invoice: effectively zero (server was already running)
Data exposure: none

Results

After 12 weeks in production:

89% of invoices processed automatically without human review
11% flagged — mostly older scanned invoices with poor scan quality
Processing time: average 8 seconds per invoice from folder drop to ERP webhook
Previous process: 8–10 minutes per invoice (manual entry), outsourced to a part-time data entry contractor
Cost saving: contractor hours reduced by approximately 80%; remaining 20% handles the review queue and other tasks
Zero data leakage incidents — security audit passed

What I’d Do Differently

LLaMA 3.2 Vision 11B was the right choice for available VRAM. If the server had 24GB VRAM, I would test LLaMA 3.2 Vision 90B — the larger model is meaningfully more accurate on degraded scans, which would push the automatic processing rate closer to 95%.

The folder watcher trigger is simple but fragile — if n8n goes down, files accumulate silently. A better trigger is a filesystem event listener that pushes to n8n’s webhook, reducing polling lag and making missed files visible immediately.

Stack

n8n (self-hosted, on-premise)
Ollama — local LLM runtime
LLaMA 3.2 Vision 11B — invoice extraction model
Gotenberg — PDF to image conversion (Docker, on-premise)
ntfy — local desktop push notifications for review queue
Custom HTML review form — served from n8n’s internal URL

Need invoice processing that never sends documents to the cloud? Get in touch.

Menu

On-Premise Invoice Processing with Ollama and LLaMA

Case summary

The Problem

What I Built

Performance

Results

What I’d Do Differently

Stack

More cases

Multi-Step Invoice Approval Workflow from Email to QuickBooks

Automated Supplier Invoice Processing from Gmail to e-Boekhouden

WooCommerce Order Sync to Moneybird via n8n

Need a similar system in your business?