Skip to main content
kolvex labsBook a call
Menu / 04 Sections

Case study 2026 AI engineering

Distill.

A document-intelligence engine that turns messy business paperwork into structured, verified data: extraction with per-field confidence, answers grounded in the source, and an agent that drafts only from what was actually found. Built to run on samples today and live against Claude the moment a key is connected.

Role
Architecture, AI engineering, build
Stack
Next.js / Claude API / TypeScript
Surface
Document intelligence
Status
Working sample, live-ready

Context

Most AI demos are toys. Real products refuse to guess.

Plenty of tools will read a document and hand back a confident answer. The problem is the confidence: a model that always answers will, often enough, answer wrong, and in a contract or an invoice a wrong number is worse than no number at all.

Distill is built around the opposite instinct. It extracts a fixed set of fields from a document, tags each with where it came from and how sure it is, and flags anything it cannot find rather than inventing a plausible value. A human can see exactly what to trust. The engine is a generalization of one I built for a production benefits tool, widened from a single insurance schema to any document type.

I built it as the kind of artifact a buyer can actually evaluate: a working demo that runs instantly on prepared documents, with the live engine wired behind the same interface and ready to switch on.

Approach

Four moves, one rule: never fabricate.

  • Extract

    Claude fills a strict, typed schema from the document. Every value carries the section it came from and a confidence, so the output is auditable, not a wall of prose.

  • Flag, don't guess

    Absent or ambiguous fields come back as not-found with a null value. A missing value surfaces honestly; it is never fabricated to look complete.

  • Answer, grounded

    Questions are answered only from the extracted fields, with the fields they drew on cited back. If the document doesn't contain the answer, the system says so.

  • Draft, from verified data

    A bounded tool-using agent assembles a finished artifact, pulling each value through a read-only lookup so it can only use data that was actually verified.

Under the hood

The engineering is the proof.

Every decision below exists to make the output trustworthy enough to act on. That is the whole job of an AI product: not to sound right, but to be checkable.

  • Structured output

    Claude fills a JSON schema rather than free text, so every value lands in a typed contract the UI can render directly. Numbers never come from open-ended generation.

  • Confidence over completeness

    The model is instructed to return not-found with a null value rather than guess. The discipline lives in the prompt and is enforced by the schema's confidence enum.

  • Native document handling

    PDFs and images go to Claude as document and image blocks, so scanned and photographed pages work without bolting on a separate OCR pass.

  • Prompt caching

    The system prompt is byte-identical across requests, so re-running a document hits Anthropic's prompt cache instead of paying to re-read the instructions each time.

  • Read-only drafting agent

    The draft step is a bounded tool-use loop with a single get_field tool and a hard round cap. The agent can read verified values and nothing else, so it can't wander.

  • Sample now, live later

    The API routes serve cached samples with no key and call Claude when one is present, with no client changes. A failed live call falls back to cache, so the demo never breaks.

The contract

Every field knows where it came from.

The whole system rests on one small type. A value is never just a value: it carries its source and its confidence, and a confidence of not-found is a first-class result, not an error.

lib/distill/types.ts
type Confidence = "high" | "medium" | "low" | "not_found";

interface FieldValue {
  value: string | null;   // null when not found, never a guess
  source: string | null;  // e.g. "Section 7.2"
  confidence: Confidence;
}

The model is told, in a cached system prompt: a missing value is acceptable; a wrong value is not. That single instruction is what separates a tool you can act on from one you have to double-check.

By design

The decisions, stated plainly.

These are design facts, not performance claims. Accuracy, latency, and cost are worth measuring against real traffic; this build runs on prepared samples, so I am not going to put a number on the wall that I have not earned yet.

  • Sample documents

    3

    A vendor contract, an invoice, and a benefits summary, deliberately cross-industry so the engine reads as general-purpose, not a single-domain trick.

  • Fabricated values

    0

    Each sample leaves one field genuinely absent from the document, and the engine reports it as not-found rather than inventing a plausible answer.

  • Tools the agent can call

    1read-only

    The drafting agent's entire world is one get_field lookup over verified data. A small surface is a safe surface.

  • To go live

    1API key

    The same routes serve samples now and call Claude when a key is present. Going live is a credential, not a rewrite.

  • Extra OCR steps

    0

    Native document handling reads PDFs and images directly. There is no second pipeline to maintain or to drift out of sync.

  • If a live call fails

    Cache

    The routes fall back to the prepared result rather than showing an error, so a flaky network never turns into a broken demo in front of someone.

Your documents

Have a pile of paperwork that should be structured data?

Contracts, invoices, statements, forms: if a person is reading them by hand to pull out the same fields every time, that is work an engine like this can take over, with a human checking only what it flags.