The problem with a percentage
A score of 78% AI is not actionable. It doesn't tell a journalist which part of
the image to scrutinize, doesn't help a moderator write a decision, and doesn't survive
appeal. We built Clairfy because explainability is the product — the verdict and the reasons
arrive together or neither arrives at all.
The pipeline
- Image is normalized. EXIF stripped, resolution capped, format converted to something the vision model handles reliably.
- A frontier vision LLM analyzes it against a carefully designed prompt (versioned, evaluated against a labeled test set on every change). The prompt asks the model to look at specific signal classes: pixel-level statistics, optical signatures, anatomy, lighting consistency, frequency-domain artifacts, and prose coherence in any text it contains.
- The model commits to one of three verdicts —
ai_generated,real_photo, oruncertain— and emits a confidence score, a rationale paragraph, and a list of specific evidence items it found. - The structured response is validated against a Pydantic schema before it's returned. Malformed responses are retried, not surfaced.
What the model looks for
Photographic signatures
Real cameras leave consistent fingerprints: sensor noise that follows a uniform high-frequency pattern, chromatic aberration at high-contrast edges, depth-of-field falloff matching the physics of a real lens. Diffusion models don't reproduce these uniformly.
Anatomy and geometry
Hands, ears, jewelry, eye reflections, and architectural perspective are common failure modes for generators. The model is asked specifically to scrutinize these and to flag any structural inconsistencies it finds.
Texture coherence
Diffusion outputs tend to produce regular, repeating fine-detail patterns where real images have natural sub-pixel variation. Fabric weaves, skin pores, hair strands, and background bokeh are diagnostic.
Prose coherence in image text
Any text in the image — signs, jewelry inscriptions, license plates — gets read. Generators routinely produce text that looks correct from a distance but doesn't parse as any real language. This is one of the strongest signals when it's present.
Why we trust the verdict
Every time the prompt changes, the eval harness runs it against a labeled image set and produces a fresh accuracy report — overall accuracy, precision, recall, per-class breakdown, and a list of misclassifications. Prompts that don't improve the score don't ship. The rationale isn't decoration — it's the constraint that keeps the model honest.
What it can't do
No detector is infallible. Adversarial post-processing — heavy compression, deliberate noise injection, screenshots of screenshots — degrades any signal. Generators improve every quarter and the methodology has to keep up. The verdict is a strong informed opinion, not a court judgment. Treat it like you'd treat a forensic analyst's report.