claude-anvil
Evidence-first code review for Claude Code.
A Claude Code plugin that verifies before presenting — attacking its own output with multiple AI models, recording every check in SQL, and refusing to show broken code.
A port of burkeholland/anvil for GitHub Copilot CLI by Burke Holland — ported and maintained for Claude Code by Aleksei Lutkov.
install
Get Started
Install via the plugin marketplace, then run the setup wizard once to configure reviewers and API keys.
Then invoke on any task:
the problem
AI That Claims It Checked — but Didn't
Claude writes code, says "build passed ✅", and moves on — without running a single command.
You ship the PR. CI fails. The bug was obvious. The self-reported verification was fiction.
claude-anvil makes that impossible: every verification step is a tool call with an exit code, recorded in a SQLite ledger. No INSERT, no claim.
what it does
The Forge
Adversarial review by Claude, GPT-4o, Gemini, and Ollama
After implementing, claude-anvil spawns up to three independent reviewers in parallel — different models, different blind spots. Findings get fixed before you see a line of code.
SQL-backed verification ledger
Every check — build exit codes, test results, linter output, reviewer verdicts — is INSERTed into ~/.claude-anvil/anvil.db. The evidence bundle is a SELECT query, not prose.
Build / test / lint / IDE diagnostics — all tiers
IDE diagnostics first, then the project's own build and test commands discovered dynamically from config files. If Tiers 1–2 yield no runtime signal, a Tier 3 smoke test runs instead.
Baseline snapshots and regression detection
State is captured before any edit. After implementation, regressions — checks that went green→red — are surfaced explicitly in the evidence bundle.
Session memory of past failures
Lessons from previous sessions are recalled per-file before planning. If a past session touched this file and caused a regression, you'll know before a single line changes.
Git automation — branch, stash, rollback
Checks for dirty state and main-branch edits before starting. Creates a feature branch automatically for Medium and Large tasks. Commits after presenting with a structured message.
evidence
What the Output Looks Like
Every Medium and Large task ends with a bundle pulled from SQL — not written by hand.
| Phase | Check | Tool | Result | Detail |
|---|---|---|---|---|
| baseline | ide-diagnostics | mcp-ide | pass | 0 errors, 0 warnings |
| baseline | build | npm | pass | exit 0 |
| after | ide-diagnostics | mcp-ide | pass | 0 errors, 0 warnings |
| after | build | npm | pass | exit 0 |
| after | tests | pytest | pass | 47 passed in 3.2s |
| review | review-claude | anvil-review | pass | No issues found |
| review | review-gemini | anvil-review | fail | Missing null check — fixed before presenting |
| review | review-ollama | anvil-review | pass | No issues found |
source
Open Source
The plugin is open source. Contributions, issues, and feedback are welcome.
github.com/allut/claude-anvil →