How Clairfy works — vision LLMs that reason about images

The problem with a percentage

A score of 78% AI is not actionable. It doesn't tell a journalist which part of the image to scrutinize, doesn't help a moderator write a decision, and doesn't survive appeal. We built Clairfy because explainability is the product — the verdict and the reasons arrive together or neither arrives at all.

The pipeline

Image is normalized. EXIF stripped, resolution capped, format converted to something the vision model handles reliably.
A frontier vision LLM analyzes it against a carefully designed prompt (versioned, evaluated against a labeled test set on every change). The prompt asks the model to look at specific signal classes: pixel-level statistics, optical signatures, anatomy, lighting consistency, frequency-domain artifacts, and prose coherence in any text it contains.
The model commits to one of three verdicts — ai_generated, real_photo, or uncertain — and emits a confidence score, a rationale paragraph, and a list of specific evidence items it found.
The structured response is validated against a Pydantic schema before it's returned. Malformed responses are retried, not surfaced.

What the model looks for

Photographic signatures

Real cameras leave consistent fingerprints: sensor noise that follows a uniform high-frequency pattern, chromatic aberration at high-contrast edges, depth-of-field falloff matching the physics of a real lens. Diffusion models don't reproduce these uniformly.

Anatomy and geometry

Hands, ears, jewelry, eye reflections, and architectural perspective are common failure modes for generators. The model is asked specifically to scrutinize these and to flag any structural inconsistencies it finds.

Texture coherence

Diffusion outputs tend to produce regular, repeating fine-detail patterns where real images have natural sub-pixel variation. Fabric weaves, skin pores, hair strands, and background bokeh are diagnostic.

Prose coherence in image text

Any text in the image — signs, jewelry inscriptions, license plates — gets read. Generators routinely produce text that looks correct from a distance but doesn't parse as any real language. This is one of the strongest signals when it's present.

Why we trust the verdict

Every time the prompt changes, the eval harness runs it against a labeled image set and produces a fresh accuracy report — overall accuracy, precision, recall, per-class breakdown, and a list of misclassifications. Prompts that don't improve the score don't ship. The rationale isn't decoration — it's the constraint that keeps the model honest.

What it can't do

No detector is infallible. Adversarial post-processing — heavy compression, deliberate noise injection, screenshots of screenshots — degrades any signal. Generators improve every quarter and the methodology has to keep up. The verdict is a strong informed opinion, not a court judgment. Treat it like you'd treat a forensic analyst's report.

Try it on your own image →