Meridian DocAI
An intelligent document processing platform that extracts, classifies, and routes financial documents with 98% accuracy — replacing weeks of manual review with minutes of automated processing.
Project Overview
Meridian DocAI was built for a mid-market insurance group that processed thousands of policy documents, claims forms, and compliance filings every week. Their existing workflow relied on manual data entry across three separate systems, leading to bottlenecks, human error, and a backlog that grew faster than their team could clear it.
We designed an end-to-end document intelligence platform that ingests scanned PDFs, handwritten forms, and digital submissions, then extracts structured data, classifies each document by type and urgency, and routes it to the correct downstream system — all without human intervention for the vast majority of cases.
Technical Architecture
The core extraction engine is built in Python using PyTorch and fine-tuned transformer models from Hugging Face. We trained custom named-entity recognition models on domain-specific financial terminology — policy numbers, coverage types, claim amounts — achieving extraction accuracy that generic OCR solutions could not match.
LangChain orchestrates multi-step reasoning chains that handle ambiguous documents: when a form contains conflicting information or missing fields, the system queries context from related documents before making a classification decision. This chain-of-thought approach reduced the exception rate from 35% to under 4%.
Model training and inference run on AWS SageMaker, giving us auto-scaling GPU instances for batch processing during peak hours and cost-efficient spot instances for overnight retraining jobs. The entire ML pipeline — from data labelling to model deployment — is versioned and reproducible.
Downstream workflow automation is handled by n8n, which routes classified documents to the appropriate teams, triggers compliance checks, and updates the client's existing policy management system via REST APIs. The orchestration layer processes over 2,000 documents per day without manual intervention.
Extracted data is stored in PostgreSQL with full audit trails, and the review dashboard is a Next.js application served through Cloudflare Workers for global edge performance. Infrastructure is provisioned with Terraform and deployed to Kubernetes clusters for zero-downtime releases.