Verify a branch-fix

// GUIDE · BRANCH-FIX

Test a candidate fix end-to-end.

Vivarium pairs a recipe (the bug) with a comparison surface (your fix) and a verdict that flips when the fix actually steers around the broken code path. This guide walks the loop end-to-end — Layer 1 (in-browser) and Layer 2/3 (Docker).

// 0 · WHAT THIS LOOP IS FOR

AI-slop verification — does this fix actually work?

An AI agent (Claude Code, Cursor, Cline, Continue, …) hands you a candidate fix for an upstream bug. The honest question is "did it actually fix the bug, or does it just look like a fix?" Vivarium's branch-fix loop answers that with a mechanical verdict flip: feed the fix through the recipe's runtime and check whether the bug still triggers.

Verdict semantics. reproduced = the bug still triggers (your fix did not avoid it). unreproduced = the bug did not trigger (your fix steered around the broken code path). A working fix flips the verdict from reproduced to unreproduced.

Two paths share the same wire shape (Contract v1) and the same comparison surface (/repro/compare):

01
Path A — Layer 1 (in-browser source substitution).

You paste an alternative reproduction script onto the recipe page. The recipe's WASM runtime (php-wasm, ruby.wasm, Pyodide) re-runs it in your tab and captures a verdict. No CI required, no Docker required.

02
Path B — Layer 2/3 (Docker image rebuild).

You build and push a branch-fix Docker image. A GitHub Actions workflow pulls the image, runs it, captures a verdict, and publishes the artefact bundle for /repro/compare to render.

Layer dispatch is automatic — the recipe's catalogue entry already knows which layer it is, and the MCP verify_branch_fix tool returns the right scaffolding for both.

// 1 · PICK A RECIPE

Find the bug in the catalogue.

Two ways in, depending on whether you start from a known recipe or from a paste of the error:

01
I know the recipe.

Browse the gallery and click through to the recipe page. Note the slug (e.g. php-12167).

02
I have an error message — find the recipe for me.

Use /repro/match (paste the error, get ranked candidates) — or call match_error from your AI agent (same scoring, same surface).

// 2 · PATH A — LAYER 1

Paste a fix on the recipe page; capture the verdict.

Layer 1 recipes that opt into Path A render a "Try a fix" panel underneath the baseline output. Today this is shipped on php-12167; others follow as the shape proves out.

01
Open the recipe page.

The recipe runs the baseline reproduction first; the panel renders once the original verdict is captured (typically reproduced).

02
Provide the fix source.

Three modes: (a) paste the fix into the textarea, (b) supply a publicly fetchable URL (raw GitHub / Gist), or (c) pick a file from disk. CORS errors on URL fetch fall back to paste.

03
Click Run.

The recipe's already-loaded WASM runtime executes the substituted source. The panel captures a Contract v1 verdict bundle: branch-fix-verdict.json + original-verdict.json.

04
Download both verdicts.

The panel renders one download link per side. Both files conform to Contract v1 — same shape /repro/compare consumes from a Layer 2/3 workflow run.

05
Drop them on /repro/compare.

Open /repro/compare and drag both JSON files onto the drop zone. The page renders side-by-side evidence with the divergent fields highlighted.

Agent-driven shortcut. If you call verify_branch_fix(slug, fix_url) from your AI agent, the tool returns a compare_url with ?fix_url= pre-loaded — open that URL and the recipe page auto-runs the fix. Same for verify_branch_fix(slug, fix_source) (≤4 KiB inline; longer fixes go via fix_url).

// 3 · PATH B — LAYER 2 / 3

Build an image; run a workflow; drop the artefact.

Layer 2 (Docker catalogue) and Layer 3 (record-replay) recipes can't run in a browser tab — the bug needs a real OS, real sockets, real filesystem. Path B shifts the build to your own infrastructure and uses a GitHub Actions workflow to capture the verdict against your pushed image.

01
Build and push your branch-fix image.

Apply the AI's candidate fix to your fork, build a Docker image with the recipe's Dockerfile, and push to a registry the GitHub Actions runner can pull from (a public ghcr.io / Docker Hub repo). The image-as-input boundary is the contract — Vivarium does not build your source, you do.

02
Trigger the comparison workflow.

Run gh workflow run branch-fix-verdict.yml --repo aletheia-works/vivarium -f slug=<slug> -f branch_image=<your-image-ref>. The workflow pulls your image, runs the reproduction inside it, captures a Contract v1 verdict, and uploads branch-fix-verdict-<slug>-<run_id> as an artefact.

03
Download the artefact zip.

From the workflow run page, grab the artefact zip. It contains branch-fix-verdict.json and (when a deployed snapshot exists) original-verdict.json.

04
Drop the zip on /repro/compare.

The page parses the zip client-side, validates both verdicts against the Contract v1 schema, and renders side-by-side evidence.

Agent-driven shortcut. verify_branch_fix(slug) on a Layer 2/3 slug returns the gh_command ready to copy-paste. The agent still needs gh auth and registry access to actually run it — but the command itself is constructed for you.

// 4 · READ THE VERDICT

What `reproduced` and `unreproduced` mean here.

The branch-fix verdict tells you exactly one thing: did the bug trigger when the fix was in place?

  • Original = reproduced, branch-fix = unreproduced. The fix avoided the bug. This is the typical "fix works" outcome.

  • Original = reproduced, branch-fix = reproduced. The fix is slop — it did not change the outcome. Iterate on the fix and re-run.

  • Original = unreproduced, branch-fix = reproduced. Regression — your change introduced the bug. Reverse the diff or check what you changed.

  • Original = unreproduced, branch-fix = unreproduced. No change either way. Either the recipe is inactive (upstream already fixed it), or your fix is a no-op against the current runtime.

Path A is testing whether your userland fix sidesteps the buggy code path — it is not patching the upstream interpreter. Path B is testing whether your fully-rebuilt image fixes the bug at the binary level. Different weight, same wire format.

// 5 · EDGE CASES

When the loop doesn't go to plan.

  • The runtime errored before producing a verdict. On Path A, this surfaces as the panel's status line going red with the runtime error text. Common cause: a syntax error in the pasted fix. Fix the syntax and click Run again.

  • The schema rejected the verdict shape. /repro/compare validates each verdict against verdict.schema.json (Contract v1 rev3). The error region shows the JSON path of the bad field. Path A produces well-formed verdicts by construction; if you see this, you are likely dropping a hand-edited file.

  • CORS blocked the URL fetch. Public raw GitHub and Gist URLs return CORS-friendly headers. Self-hosted URLs often don't. Fall back to paste mode in that case.

  • Path B private-registry image. The runner pulls unauthenticated; private registries are out of scope for v1. Push to a public ref or wait for the pull-credential follow-up.

  • The fix is so long it doesn't fit in ?fix=. The inline URL-param cap is 4 KiB. Use ?fix_url= with a Gist or fork URL instead.

// 6 · WHAT'S NEXT

The agent loop you can build on top of this.

01
Loop the agent end-to-end.

Pipeline: match_error → narrow to a slug → verify_branch_fix → open the compare_url → read the verdict. If reproduced, ask the agent for another candidate. If unreproduced, ship.

02
Add the recipe for your bug.

If your bug isn't in the catalogue yet, write the recipe. Path A then opts in with a one-liner.

03
Wire the verdict into your CI.

The integrate-with-your-repo path catches verdict drift on every push, not just on demand.

Stuck somewhere in the loop? That's a bug in this guide. File an issue with the slug, the layer, the path (A or B), and the step number you got stuck on.

// NEXT

Glossary

The vocabulary the rest of these guides assume — Layer, manifest, contract, verdict, evidence, slug — in one place.

VIVARIUM IS PART OF ALETHEIA-WORKS · SEE SOURCE ON GITHUB →