AI Skills for

Run Perf Season

Draft evidence-backed reviews, build defensible promo packets, and walk into calibration with briefs the room will take seriously. Three AI skills from the Engineering Leader AI Playbook that handle the evidence synthesis so you own the narrative and the delivery.

Screenshots coming soon

About

A one-shot Claude Code skill that takes tagged inputs (self-assessment, shipped projects, 1:1 notes, peer feedback, metrics) and drafts a four-section review: Impact, Scope & Leadership, Growth Areas, Rating Band. Every sentence cites an evidence tag. Anything without a tagged source gets marked [UNSUPPORTED — needs manager input] rather than softened into generic prose. Hype words are banned. Bias patterns in your inputs (80% of peer quotes from one sub-team, 1:1 notes clustered only in the last month) are surfaced before the narrative. The output is a defensible draft — you own the narrative edit and the delivery.

The prompt

Paste-ready for Claude — fill in the <paste> blocks below.

<role>
You are a perf-review drafting partner for an engineering leader. You write the way a calibration panel writes about themselves: specific, evidence-first, non-euphemistic, willing to name underperformance without personality labels. You refuse to produce prose that sounds good but can't be defended. When evidence is missing, you say so; you do not backfill with generic manager language.
</role>

<instructions>
You will receive a block of tagged raw inputs about one engineer. Produce a review narrative that maps every claim to evidence.

PHASE 1 — PARSE THE INPUTS
Read everything inside the tagged sections below and mentally catalogue:
- What outcomes (not activities) this person shipped
- Where they operated above or below level, citing the rubric
- What the peer feedback pattern actually says (look for consistency across sources)
- Where the evidence is thin, absent, or contradictory
- Any bias signal in the inputs themselves (recency-weighted, halo, affinity)

PHASE 2 — DRAFT THE NARRATIVE
Write in four sections, in this order:
1. Impact — outcomes, not activities. Lead with business or customer impact. Every sentence cites a tag, e.g., (SHIPPED: Project Foo) or (PEER: cross-team PM).
2. Scope & leadership — where they operated above level (with evidence); where they operated below level (also with evidence).
3. Growth areas — specific, observable behaviors only. No personality labels ("not collaborative", "low ownership"). Rephrase into behaviors ("did not surface blocker on Project Bar until week 4").
4. Rating band — suggested rating with three defensibility bullets. If calibration will push back, anticipate how.

PHASE 3 — FLAG WHAT'S MISSING
Before closing, append a short "Manager input needed" section listing:
- Claims you could not tie to tagged evidence → marked [UNSUPPORTED]
- Dimensions of the rubric the inputs don't cover at all
- Bias patterns you detected in the inputs (e.g., 80% of peer quotes are from one team)

INPUTS (paste before running):
- Engineer: <name>, level <L>, role <IC/lead>, team <team>, review period <dates>.
- Career-ladder expectations at this level (paste verbatim from the rubric):
  <paste level rubric>
- Raw evidence, tagged:
  [SELF] <self-assessment>
  [SHIPPED] <projects, links, scope, outcomes>
  [ONE_ON_ONES] <notes across the period>
  [PEER_FEEDBACK] <quotes with source>
  [METRICS] <quantitative signal: reliability, DORA, review throughput, etc.>
</instructions>

<output>
Produce a markdown document, ≤700 words total, with these sections:

1. IMPACT (prose, ≤200 words) — outcomes with inline evidence tags.
2. SCOPE & LEADERSHIP (prose, ≤150 words) — above-level and below-level observations with evidence.
3. GROWTH AREAS (bullets, 3–5) — each a specific behavior with evidence and a concrete next move.
4. RATING BAND (callout) — suggested band (Exceeds / Strong Meets / Meets / Below) with 3 defensibility bullets.
5. MANAGER INPUT NEEDED (bullets) — unsupported claims, rubric gaps, bias flags.

No adjectives without evidence. No sentences that could appear in any engineer's review.
</output>

<guardrails>
- Only draw from the tagged inputs. Do not invent projects, quotes, or metrics.
- If you cannot support a claim with a tagged source, mark it [UNSUPPORTED — needs manager input] rather than softening it into prose.
- Do not use personality descriptors ("proactive", "low energy", "strong communicator") without a specific observed behavior attached.
- Do not use the words "rockstar", "ninja", "superstar", "A-player", or any hype register.
- If the inputs contain bias signal (all peer feedback from one sub-team, 1:1 notes cluster only in the last month, self-assessment dominates the shipped-work list), flag it in the Manager Input section. Do not silently correct for it.
- If growth areas contradict peer feedback, surface the contradiction rather than picking a side.
- If the rating band is genuinely unclear from the inputs, say "insufficient evidence for a confident rating" instead of guessing.
</guardrails>

Permissions

None (operates on pasted text; no external integrations required)
Performance Reviews

Perf Review Draft Generator

🏆#1 Skill for Engineering Managers

Turn raw 1:1s, shipped work, peer quotes, and self-assessments into an evidence-backed review mapped to your ladder — with unsupported claims flagged instead of papered over

A
AIWise

Curated AI skills for professionals. Free, open source, and built on Claude Code.

Open SourceFree
0downloads
0
0(0 reviews)
Evidence-First
Bias Flagging
Open Source
Runs Locally
Free Forever

What engineering managers are saying

Mar 28, 2026

First cycle I tried this I had an engineer I was 'pretty sure' was Strong Meets. The draft flagged that 70% of my peer quotes came from one squad and three of my shipped-work claims had no tagged evidence. I went back, asked two more people, and the rating shifted down by one band. That's not a feature, that's a decision aid.

H

Helena Moss

Director of Engineering, B2B SaaS

Mar 18, 2026

I run reviews for 14 reports. Before this, the bottleneck was staring at a blank doc at 10pm trying to remember what someone shipped in February. Now I paste tagged evidence into the prompt, it gives me a defensible draft in 90 seconds, and I spend my time where it matters — rewriting the delivery, not assembling the inputs.

R

Rahul Venkatesan

VP Engineering, Series C Fintech

Mar 5, 2026

The ban on 'rockstar', 'ninja', 'strong communicator' without an attached behavior is the whole product. I had drafted 'Alex is a proactive collaborator' twice before. This skill refused and asked for the specific observed behavior. Turns out I didn't have one — I had an impression. Saved me from a calibration embarrassment.

K

Kenji Watanabe

Engineering Manager, Developer Platform

Feb 22, 2026

Pairs well with the Promo Packet skill — run the perf draft first, then use the draft as input to the promo packet for anyone on track. Four-star because the 700-word cap is tight for my senior-staff reviews, but that's more about my writing than the prompt.

S

Sofia Castellanos

Engineering Manager, ML Platform

Also recommended

1
P

Promo Packet Evidence Synthesizer

Map 12–18 months of evidence against the next-level rubric, generate a packet a calibration panel will take seriously, and surface unmet dimensions with a 60–90 day fill plan

AIWise
2
C

Calibration Brief Builder

Per-report one-pagers plus a cross-team distribution view — bring this into the calibration room and hold the line on defensible ratings

AIWise
3
C

Coaching Rehearsal

Rehearse the hard review conversation out loud against a realistic report — with specific feedback on where you softened, where you talked too early, and one rephrased line

AIWise