Verify a branch-fix
// GUIDE · BRANCH-FIX
Test a candidate fix end-to-end.
Vivarium pairs a recipe (the bug) with a comparison surface (your fix) and a verdict that flips when the fix actually steers around the broken code path. This guide walks the loop end-to-end — Layer 1 (in-browser) and Layer 2/3 (Docker).
// 0 · WHAT THIS LOOP IS FOR
AI-slop verification — does this fix actually work?
An AI agent (Claude Code, Cursor, Cline, Continue, …) hands you a candidate fix for an upstream bug. The honest question is "did it actually fix the bug, or does it just look like a fix?" Vivarium's branch-fix loop answers that with a mechanical verdict flip: feed the fix through the recipe's runtime and check whether the bug still triggers.
Verdict semantics.
reproduced= the bug still triggers (your fix did not avoid it).unreproduced= the bug did not trigger (your fix steered around the broken code path). A working fix flips the verdict fromreproducedtounreproduced.
Two paths share the same wire shape (Contract v1) and the same comparison surface (/repro/compare):
You paste an alternative reproduction script onto the recipe page. The recipe's WASM runtime (php-wasm, ruby.wasm, Pyodide) re-runs it in your tab and captures a verdict. No CI required, no Docker required.
You build and push a branch-fix Docker image. A GitHub Actions workflow pulls the image, runs it, captures a verdict, and publishes the artefact bundle for /repro/compare to render.
Layer dispatch is automatic — the recipe's catalogue entry already
knows which layer it is, and the MCP verify_branch_fix
tool returns the right scaffolding for both.
// 1 · PICK A RECIPE
Find the bug in the catalogue.
Two ways in, depending on whether you start from a known recipe or from a paste of the error:
Browse the gallery and click through to the recipe page. Note the slug (e.g. php-12167).
Use /repro/match (paste the error, get ranked candidates) — or call match_error from your AI agent (same scoring, same surface).
// 2 · PATH A — LAYER 1
Paste a fix on the recipe page; capture the verdict.
Layer 1 recipes that opt into Path A render a "Try a
fix" panel underneath the baseline output. Today this is
shipped on php-12167;
others follow as the shape proves out.
The recipe runs the baseline reproduction first; the panel renders once the original verdict is captured (typically reproduced).
Three modes: (a) paste the fix into the textarea, (b) supply a publicly fetchable URL (raw GitHub / Gist), or (c) pick a file from disk. CORS errors on URL fetch fall back to paste.
The recipe's already-loaded WASM runtime executes the substituted source. The panel captures a Contract v1 verdict bundle: branch-fix-verdict.json + original-verdict.json.
The panel renders one download link per side. Both files conform to Contract v1 — same shape /repro/compare consumes from a Layer 2/3 workflow run.
Open /repro/compare and drag both JSON files onto the drop zone. The page renders side-by-side evidence with the divergent fields highlighted.
Agent-driven shortcut. If you call
verify_branch_fix(slug, fix_url)from your AI agent, the tool returns acompare_urlwith?fix_url=pre-loaded — open that URL and the recipe page auto-runs the fix. Same forverify_branch_fix(slug, fix_source)(≤4 KiB inline; longer fixes go viafix_url).
// 3 · PATH B — LAYER 2 / 3
Build an image; run a workflow; drop the artefact.
Layer 2 (Docker catalogue) and Layer 3 (record-replay) recipes can't run in a browser tab — the bug needs a real OS, real sockets, real filesystem. Path B shifts the build to your own infrastructure and uses a GitHub Actions workflow to capture the verdict against your pushed image.
Apply the AI's candidate fix to your fork, build a Docker image with the recipe's Dockerfile, and push to a registry the GitHub Actions runner can pull from (a public ghcr.io / Docker Hub repo). The image-as-input boundary is the contract — Vivarium does not build your source, you do.
Run gh workflow run branch-fix-verdict.yml --repo aletheia-works/vivarium -f slug=<slug> -f branch_image=<your-image-ref>. The workflow pulls your image, runs the reproduction inside it, captures a Contract v1 verdict, and uploads branch-fix-verdict-<slug>-<run_id> as an artefact.
From the workflow run page, grab the artefact zip. It contains branch-fix-verdict.json and (when a deployed snapshot exists) original-verdict.json.
The page parses the zip client-side, validates both verdicts against the Contract v1 schema, and renders side-by-side evidence.
Agent-driven shortcut.
verify_branch_fix(slug)on a Layer 2/3 slug returns thegh_commandready to copy-paste. The agent still needsghauth and registry access to actually run it — but the command itself is constructed for you.
// 4 · READ THE VERDICT
What `reproduced` and `unreproduced` mean here.
The branch-fix verdict tells you exactly one thing: did the bug trigger when the fix was in place?
Original =
reproduced, branch-fix =unreproduced. The fix avoided the bug. This is the typical "fix works" outcome.Original =
reproduced, branch-fix =reproduced. The fix is slop — it did not change the outcome. Iterate on the fix and re-run.Original =
unreproduced, branch-fix =reproduced. Regression — your change introduced the bug. Reverse the diff or check what you changed.Original =
unreproduced, branch-fix =unreproduced. No change either way. Either the recipe is inactive (upstream already fixed it), or your fix is a no-op against the current runtime.
Path A is testing whether your userland fix sidesteps the buggy code path — it is not patching the upstream interpreter. Path B is testing whether your fully-rebuilt image fixes the bug at the binary level. Different weight, same wire format.
// 5 · EDGE CASES
When the loop doesn't go to plan.
The runtime errored before producing a verdict. On Path A, this surfaces as the panel's status line going red with the runtime error text. Common cause: a syntax error in the pasted fix. Fix the syntax and click Run again.
The schema rejected the verdict shape. /repro/compare validates each verdict against
verdict.schema.json(Contract v1 rev3). The error region shows the JSON path of the bad field. Path A produces well-formed verdicts by construction; if you see this, you are likely dropping a hand-edited file.CORS blocked the URL fetch. Public raw GitHub and Gist URLs return CORS-friendly headers. Self-hosted URLs often don't. Fall back to paste mode in that case.
Path B private-registry image. The runner pulls unauthenticated; private registries are out of scope for v1. Push to a public ref or wait for the pull-credential follow-up.
The fix is so long it doesn't fit in
?fix=. The inline URL-param cap is 4 KiB. Use?fix_url=with a Gist or fork URL instead.
// 6 · WHAT'S NEXT
The agent loop you can build on top of this.
Pipeline: match_error → narrow to a slug → verify_branch_fix → open the compare_url → read the verdict. If reproduced, ask the agent for another candidate. If unreproduced, ship.
If your bug isn't in the catalogue yet, write the recipe. Path A then opts in with a one-liner.
The integrate-with-your-repo path catches verdict drift on every push, not just on demand.
Stuck somewhere in the loop? That's a bug in this guide. File an issue with the slug, the layer, the path (A or B), and the step number you got stuck on.