Case study 2026 AI engineering

Mirror.

An AI-visibility scanner that scores any website for how legibly an AI assistant can read it, then shows, in plain language, what an assistant understands you do and what it gets wrong. The score is computed from the page itself, with no model in the loop; the plain-language read is the only part that calls Claude, and it degrades to an honest sample the moment it can't.

Role: Architecture, AI engineering, build
Stack: Next.js / Claude API / Upstash / TypeScript
Surface: AI visibility (AEO / GEO)
Status: Working sample, live-ready

Try the live demo

Context

A score you can trust shouldn't need a model to compute it.

Customers increasingly ask an AI assistant about a business before they visit its site: what does this company do, and should I use them? The answer is assembled from whatever the page makes legible to a machine. A whole category of tools has appeared to grade that legibility, and most of them just ask a model how you are doing and hope the answer is stable from one run to the next.

Mirror splits the job in two so the trustworthy part stays trustworthy. The score is deterministic: it comes from concrete, checkable signals on the page (the title and description, schema.org markup, whether robots.txt lets AI crawlers in, social cards, a sitemap, the heading structure), so the same site scores the same every time, with or without an API key. Only the plain-language read calls a model, and even that is bounded and labeled.

I built it as something a buyer can evaluate in seconds: a real legibility score for any URL on the spot, with the live AI read wired behind the same interface and ready the moment a key is connected.

Approach

Two layers, one rule: never dress up a guess as a measurement.

Score, deterministically
Six weighted signals, graded straight from the page with no model involved: identity, AI-crawler access, structured data, shareability, discoverability, and content structure. The number is real for any URL and does not drift between runs.
Reflect, in plain language
A single Claude call describes the business the way an assistant would to a customer, strictly from what the page says, then names what it gets wrong and the highest-impact fixes. It reads; it does not flatter.
Degrade honestly
No key, a spent daily budget, or a model hiccup all fall back to a templated read built from the real signals, and the result is labeled a sample. A guess is never passed off as a live AI answer.
Freeze and share
Every completed scan becomes a permanent link, so the read is something you can send to a colleague, not a session that disappears when the tab closes.

Under the hood

The trust is in the boundaries.

Mirror fetches whatever URL a stranger types and runs it past a model. Almost every decision below is about keeping that safe, honest, and cheap, because those are the parts that make a demo something you can actually put in front of someone.

Deterministic core
The scorer is a pure function: HTML in, scorecard out. The samples on the demo run through the exact same function as a live scan, so what you see can never drift from what the engine actually does.
SSRF-hardened fetcher
Because the URL comes from a stranger, the fetcher refuses private, loopback, link-local, and cloud-metadata addresses, re-checks the host on every redirect hop, and caps both wall-clock time and response size. The fetch is the security boundary, so it is treated like one.
Bounded structured AI
The read is one cheap Haiku call that fills a strict JSON schema with every field required, so the output is predictable and complete, not an open-ended essay to parse.
Prompt caching
The system prompt is byte-stable across requests, so repeat scans hit Anthropic's prompt cache instead of paying to re-read the instructions every time.
Budgeted on purpose
A hard daily ceiling caps live reads. Past it, scans still run because the score is free, and the read degrades to a sample, so a sudden spike of traffic cannot turn into a runaway bill.
Persisted and shareable
Results are frozen in Upstash Redis behind a permanent link, with an in-process fallback so the whole thing still runs locally with no credentials at all.

The contract

The score is real. The read says when it isn't live.

Two small types carry the whole honesty story. The score never depends on a model, and whether the plain-language read came from Claude or from a sample is a first-class fact the UI can always show.

lib/mirror/types.ts

interface ScanResult {
  overall: number;        // 0-100, computed from the page, no model
  grade: string;          // e.g. "Mostly legible to AI"
  dimensions: Dimension[];
}

interface MirrorRecord {
  scan: ScanResult;       // always real, with or without a key
  narrative: AiNarrative; // the plain-language read
  narrativeLive: boolean; // false => an honest sample, never faked
}

The structural score is computed before any model is consulted, so it stands on its own. And because narrativeLive travels with every result, the interface never has to pretend a sample is a live read. Honesty is a type here, not a footnote.

By design

The decisions, stated plainly.

These are design facts, not performance claims. How a real audience's sites score, and how often the live read changes minds, is worth measuring against real traffic; this build runs on prepared samples and on-demand scans, so I am not going to put a number on the wall that I have not earned yet.

Signals scored
6
Identity, AI access, structured data, shareability, discoverability, and content structure, each graded from the page and weighted into one number.
Cost to score a site
$0
The legibility score is deterministic. A model is only ever called for the optional plain-language read, never for the number itself.
Sample scorecards
3
A microbakery, a logistics SaaS, and a barely-there law firm, scored cross-industry on purpose so the engine reads as general, not a single-domain trick.
Internal hosts reachable
0
The fetcher refuses private and cloud-metadata addresses and re-validates every redirect, so a public URL cannot be bounced toward something internal.
To go live
1API key
The same routes serve the structural score now and switch on the live AI read when a key is present. Going live is a credential, not a rewrite.
If the AI read fails
Sample
A missing key, a spent budget, or a model error falls back to an honest, labeled sample rather than an error, so the demo never breaks in front of someone.

Your website

Curious what an AI assistant makes of your site?

Paste a URL and Mirror returns the real legibility score in seconds, plus, with a key connected, the plain-language read of what an assistant thinks you do and what it cannot tell. If that read is wrong, that is exactly the thing worth fixing.

Try the demo Start a project

Mirror..

Score, deterministically

Reflect, in plain language

Degrade honestly

Freeze and share

Deterministic core

SSRF-hardened fetcher

Bounded structured AI

Prompt caching

Budgeted on purpose

Persisted and shareable

Curious what an AI assistant makes of your site??

Mirror.

Curious what an AI assistant makes of your site?