claude-anvil

Evidence-first code review for Claude Code.

A Claude Code plugin that verifies before presenting — attacking its own output with multiple AI models, recording every check in SQL, and refusing to show broken code.

A port of burkeholland/anvil for GitHub Copilot CLI by Burke Holland — ported and maintained for Claude Code by Aleksei Lutkov.

Get Started

Install via the plugin marketplace, then run the setup wizard once to configure reviewers and API keys.

/ plugin marketplace add allut/claude-anvil
/ plugin install claude-anvil@allut-claude-anvil
/ claude-anvil:anvil-setup

Then invoke on any task:

/ anvil fix the login crash in auth/session.py

AI That Claims It Checked — but Didn't

Claude writes code, says "build passed ✅", and moves on — without running a single command. You ship the PR. CI fails. The bug was obvious. The self-reported verification was fiction.

claude-anvil makes that impossible: every verification step is a tool call with an exit code, recorded in a SQLite ledger. No INSERT, no claim.

The Forge

multi-model

Adversarial review by Claude, GPT-4o, Gemini, and Ollama

After implementing, claude-anvil spawns up to three independent reviewers in parallel — different models, different blind spots. Findings get fixed before you see a line of code.

ledger

SQL-backed verification ledger

Every check — build exit codes, test results, linter output, reviewer verdicts — is INSERTed into ~/.claude-anvil/anvil.db. The evidence bundle is a SELECT query, not prose.

verification

Build / test / lint / IDE diagnostics — all tiers

IDE diagnostics first, then the project's own build and test commands discovered dynamically from config files. If Tiers 1–2 yield no runtime signal, a Tier 3 smoke test runs instead.

baseline

Baseline snapshots and regression detection

State is captured before any edit. After implementation, regressions — checks that went green→red — are surfaced explicitly in the evidence bundle.

memory

Session memory of past failures

Lessons from previous sessions are recalled per-file before planning. If a past session touched this file and caused a regression, you'll know before a single line changes.

git

Git automation — branch, stash, rollback

Checks for dirty state and main-branch edits before starting. Creates a feature branch automatically for Medium and Large tasks. Commits after presenting with a structured message.

What the Output Looks Like

Every Medium and Large task ends with a bundle pulled from SQL — not written by hand.

Phase Check Tool Result Detail
baseline ide-diagnostics mcp-ide pass 0 errors, 0 warnings
baseline build npm pass exit 0
after ide-diagnostics mcp-ide pass 0 errors, 0 warnings
after build npm pass exit 0
after tests pytest pass 47 passed in 3.2s
review review-claude anvil-review pass No issues found
review review-gemini anvil-review fail Missing null check — fixed before presenting
review review-ollama anvil-review pass No issues found

Open Source

The plugin is open source. Contributions, issues, and feedback are welcome.

github.com/allut/claude-anvil →