Name: claude-anvil
Author: Aleksei Lutkov

claude-anvil

Evidence-first code review for Claude Code.

A Claude Code plugin that verifies before presenting — attacking its own output with multiple AI models, recording every check in SQL, and refusing to show broken code.

A port of burkeholland/anvil for GitHub Copilot CLI by Burke Holland — ported and maintained for Claude Code by Aleksei Lutkov.

install

Get Started

Install via the plugin marketplace, then run the setup wizard once to configure reviewers and API keys.

/ plugin marketplace add allut/claude-anvil

/ plugin install claude-anvil@allut-claude-anvil

/ claude-anvil:anvil-setup

Then invoke on any task:

/ anvil fix the login crash in auth/session.py

the problem

AI That Claims It Checked — but Didn't

Claude writes code, says "build passed ✅", and moves on — without running a single command. You ship the PR. CI fails. The bug was obvious. The self-reported verification was fiction.

claude-anvil makes that impossible: every verification step is a tool call with an exit code, recorded in a SQLite ledger. No INSERT, no claim.

what it does

The Forge

multi-model

Adversarial review by Claude, GPT-4o, Gemini, and Ollama

After implementing, claude-anvil spawns up to three independent reviewers in parallel — different models, different blind spots. Findings get fixed before you see a line of code.

ledger

SQL-backed verification ledger

Every check — build exit codes, test results, linter output, reviewer verdicts — is INSERTed into ~/.claude-anvil/anvil.db. The evidence bundle is a SELECT query, not prose.

verification

Build / test / lint / IDE diagnostics — all tiers

IDE diagnostics first, then the project's own build and test commands discovered dynamically from config files. If Tiers 1–2 yield no runtime signal, a Tier 3 smoke test runs instead.

baseline

Baseline snapshots and regression detection

State is captured before any edit. After implementation, regressions — checks that went green→red — are surfaced explicitly in the evidence bundle.

memory

Session memory of past failures

Lessons from previous sessions are recalled per-file before planning. If a past session touched this file and caused a regression, you'll know before a single line changes.

git

Git automation — branch, stash, rollback

Checks for dirty state and main-branch edits before starting. Creates a feature branch automatically for Medium and Large tasks. Commits after presenting with a structured message.

evidence

What the Output Looks Like

Every Medium and Large task ends with a bundle pulled from SQL — not written by hand.

Phase	Check	Tool	Result	Detail
baseline	ide-diagnostics	mcp-ide	pass	0 errors, 0 warnings
baseline	build	npm	pass	exit 0
after	ide-diagnostics	mcp-ide	pass	0 errors, 0 warnings
after	build	npm	pass	exit 0
after	tests	pytest	pass	47 passed in 3.2s
review	review-claude	anvil-review	pass	No issues found
review	review-gemini	anvil-review	fail	Missing null check — fixed before presenting
review	review-ollama	anvil-review	pass	No issues found

source

Open Source

The plugin is open source. Contributions, issues, and feedback are welcome.

github.com/allut/claude-anvil →