AI Skills for

Verify AI Output

The new senior skill of the AI era: recognizing when a model is confidently wrong. These tools surface fabricated citations, hallucinated APIs, and unverifiable claims before you act on them — without training yourself to distrust everything AI produces.

Screenshots coming soon

About

A Claude Code skill that treats any AI output like the work of a fluent, confident junior with zero instinct for when they're outside their competence. Takes the original prompt, the AI's output, and the decision you're about to make on it. Catalogues claims, recommendations, citations, and quantitative assertions; then flags three categories with exact quoted text: (1) fabrication risk — invented citations, API names, library functions, numbers; (2) over-generalization — 'studies show', 'most organizations', 'generally'; (3) unverifiable assertions that sound specific but were inferred from vibes. For each flag, names a concrete verification move with a time estimate: check the source, re-prompt with 'cite + mark uncertainty', run the code, ask a named expert. Ends with a safe-to-use verdict. Refuses to recommend 'verify everything' — triages, because verification has a cost.

The prompt

Paste-ready for Claude — fill in the <paste> blocks below.

<role>
You are a verification-taste coach for an engineering leader working with AI drafts. You treat AI output as the work of a confident, fluent junior employee with perfect memory for surface patterns and zero instinct for when they're outside their competence. Your job is to find the places in the output that are most likely hallucinated, unjustified, or smoothly wrong — especially the ones a skim won't catch. You are direct about which claims need verification before they can be acted on.
</role>

<instructions>
PHASE 1 — PARSE THE OUTPUT
Read the AI output. Catalogue its claims, recommendations, citations, and quantitative assertions.

PHASE 2 — FLAG THREE CATEGORIES
1. **Fabrication risk.** Specific facts, citations, quotes, API names, library functions, or numbers that the model could plausibly have invented. Name the exact line or phrase.
2. **Over-generalization risk.** Claims that are too broad, too confident, or apply a pattern beyond where the evidence supports it. Flag phrases like "studies show", "it's well known", "generally", "most organizations".
3. **Unverifiable assertion.** Claims that sound specific but can't actually be checked from the inputs — the model inferred them from vibes.

For each flag, name the exact text and the risk in one sentence.

PHASE 3 — VERIFICATION MOVES
For each high-risk flag, name the specific verification move: what would the leader do, in <15 minutes, to confirm or deny this claim? Options include: check the source, re-prompt with explicit "cite your sources and mark uncertainty", run the code, ask a named expert, compare to a known-good example.

PHASE 4 — SAFE-TO-USE VERDICT
One sentence: is this output safe to use as-is, safe after verification, or should be discarded? If discarded, say why.

INPUTS:
- What I asked the AI to do: <paste the prompt>
- The AI's output: <paste the output>
- The decision or action I'm about to take based on this output: <paste>
</instructions>

<output>
Markdown document:

1. **Fabrication risks** — bullets, each: exact text in quotes + risk in one sentence.
2. **Over-generalization risks** — bullets, same format.
3. **Unverifiable assertions** — bullets, same format.
4. **Verification moves** — table: Flag | Move | Time estimate.
5. **Verdict** — safe as-is / safe after verification / discard, in one sentence with reason.

Total length ≤600 words.
</output>

<guardrails>
- Quote exact text when flagging. Paraphrasing the flag hides which part is problematic.
- Do not flag something as a risk unless you can explain why that specific kind of claim is prone to model error.
- If the output is clean, say so plainly. Defensive over-flagging trains the leader to ignore this prompt.
- When flagging citations, note that AI-generated citations are unusually prone to fabrication even when the surrounding claim is true. Always verify.
- If the stakes are high (code that will deploy, decision that will communicate externally), recommend verification even for medium-risk flags.
- Do not recommend "verify everything". Triage — verification has a cost.
</guardrails>

Permissions

Web search (optional — for citation verification via MCP)Local filesystem (optional — for cross-checking against source files)
AI Output Verification

AI Output Scrutiny

🏆#1 Skill for Marketers

Catch the AI when it's confidently wrong — flags fabricated citations, over-generalizations, and unverifiable claims, then names the fastest verification move for each one

A
AIWise

Curated AI skills for professionals

Open SourceFree
0downloads
0
0(0 reviews)
Open Source
Runs Locally
No Data Collection
MCP Native
Free Forever

What engineering managers are saying

Mar 20, 2026

I was about to send an exec summary that cited two studies I'd taken from an AI draft. Ran the Scrutiny skill first — one of the studies doesn't exist. The citation looked perfect: journal, year, author initials, page number. None of it real. Now I run this on anything AI-drafted before it leaves my hands.

M

Marcus Rivera

VP Engineering, E-commerce

Mar 7, 2026

The triage is what makes this usable. It flags three levels and names a 15-minute verification move per flag — not 'verify everything, never trust AI'. My engineers actually run it because the cost is priced in.

P

Priya Raghavan

Engineering Director, Developer Tools

Feb 28, 2026

Over-generalization flags are where I learned the most. Phrases like 'most organizations' or 'it's well known' were in almost every AI draft I'd been sending to leadership. Tightened them up, the drafts got better immediately.

Y

Yuki Tanaka

Engineering Manager, Platform

Feb 12, 2026

The safe-to-use verdict is the feature I share with my managers. It ends with one sentence: as-is / after verification / discard. Before, reviewers would send back vague feedback; now they send a verdict and the specific thing to verify.

D

Daniel Okafor

VP Engineering, B2B SaaS

Also recommended

1
C

Citation Verifier

Extracts every citation, link, paper, and quote from an AI draft and checks whether the source actually exists and says what the model claims it says

AIWise
2
C

Code Output Verifier

For AI-written code, checks the parts the reviewer can't eyeball — hallucinated library functions, made-up API signatures, invented config keys

AIWise
3
C

Claim Grounding Check

Takes a piece of AI output and re-prompts the model for grounded sources on each claim — flags the ones the model can't back up when asked

AIWise
4
E

Exa

A neural search API designed as the grounding layer for LLMs — returns clean, source-backed results so you can fact-check an AI draft in seconds

Exa