AIFinancial Services & Insurance

Meridian DocAI

An intelligent document processing platform that extracts, classifies, and routes financial documents with 98% accuracy — replacing weeks of manual review with minutes of automated processing.

Project Overview

Meridian DocAI was built for a mid-market insurance group that processed thousands of policy documents, claims forms, and compliance filings every week. Their existing workflow relied on manual data entry across three separate systems, leading to bottlenecks, human error, and a backlog that grew faster than their team could clear it.

We designed an end-to-end document intelligence platform that ingests scanned PDFs, handwritten forms, and digital submissions, then extracts structured data, classifies each document by type and urgency, and routes it to the correct downstream system — all without human intervention for the vast majority of cases.

Technical Architecture

The core extraction engine is built in Python using PyTorch and fine-tuned transformer models from Hugging Face. We trained custom named-entity recognition models on domain-specific financial terminology — policy numbers, coverage types, claim amounts — achieving extraction accuracy that generic OCR solutions could not match.

LangChain orchestrates multi-step reasoning chains that handle ambiguous documents: when a form contains conflicting information or missing fields, the system queries context from related documents before making a classification decision. This chain-of-thought approach reduced the exception rate from 35% to under 4%.

Model training and inference run on AWS SageMaker, giving us auto-scaling GPU instances for batch processing during peak hours and cost-efficient spot instances for overnight retraining jobs. The entire ML pipeline — from data labelling to model deployment — is versioned and reproducible.

Downstream workflow automation is handled by n8n, which routes classified documents to the appropriate teams, triggers compliance checks, and updates the client's existing policy management system via REST APIs. The orchestration layer processes over 2,000 documents per day without manual intervention.

Extracted data is stored in PostgreSQL with full audit trails, and the review dashboard is a Next.js application served through Cloudflare Workers for global edge performance. Infrastructure is provisioned with Terraform and deployed to Kubernetes clusters for zero-downtime releases.

PythonPyTorchHugging FaceLangChainAWS SageMakern8nPostgreSQLNext.jsCloudflare WorkersTerraformKubernetes

Results & Impact

98% extraction accuracy across 12 document types, up from 72% with the previous OCR system

Processing time reduced from 4 days average to under 15 minutes per document batch

85% reduction in manual data entry, freeing 6 full-time employees for higher-value work

Exception rate dropped from 35% to under 4%, with clear audit trails for every decision

System handles 2,000+ documents per day with auto-scaling during peak periods

Have a similar project in mind?

Let's discuss how we can build something great together.

Get in Touch