Consumer workflow

A reusable GitHub Actions workflow that any repo can uses: to verify a Vivarium-hosted bug reproduction in their own CI — without copying any Vivarium internals.

The workflow lives at aletheia-works/.github/.github/workflows/vivarium-verdict.yml. It pulls the published ghcr.io/aletheia-works/vivarium-<slug> image, runs the recipe, captures a verdict.json matching Contract v1, validates it against the published JSON Schema, and asserts the captured verdict matches what the caller expected.

Five-line consumer example

jobs:
  bash-issue:
    uses: aletheia-works/.github/.github/workflows/vivarium-verdict.yml@main
    with:
      slug: bash-local-shadows-exit

That is the entire integration. A consumer repo's .github/workflows/check-bug.yml can carry many such jobs (one per recipe to track), each turning into a green / red signal in their own CI. Slugs are the directory names under src/layer2_docker/ (Layer 2 catalogue) and src/layer3_thirdway/ (Layer 3 catalogue, where the trace is baked into the image).

Inputs

InputTypeRequiredDefaultPurpose
slugstringRecipe slug, e.g. bash-local-shadows-exit. Used to derive the default image tag and to label artefacts and log lines.
imagestringghcr.io/aletheia-works/vivarium-<slug>:latestImage override. Useful when the consumer wants to pin a specific git-sha tag or test a private fork.
expected_verdictstring"reproduced""reproduced" or "unreproduced". Job fails if the captured verdict differs. Use "unreproduced" only if you intentionally track a recipe whose upstream bug has been fixed (sentinel page).
timeout_minutesnumber5Job timeout. Most Layer 2 recipes complete in seconds; the budget exists for image-pull on slow networks.

Verdict semantics

reproduced means the upstream bug reproduces in this run — the reproduction is doing its job. unreproduced means the bug does not reproduce, usually because the upstream project shipped a fix the bundled image picked up. See Contract v1: Verdict semantics for the full reasoning.

Consumers that want a "this bug is fixed" alert can therefore write:

jobs:
  fixed-detector:
    uses: aletheia-works/.github/.github/workflows/vivarium-verdict.yml@main
    with:
      slug: my-favourite-recipe
      expected_verdict: reproduced  # default; spelled out for clarity

…and the workflow flips red the moment the bug stops reproducing, which is exactly the upstream-fix-detected signal.

Artefact

The job uploads the captured verdict.json as a workflow artefact named verdict-<slug>-<run_id> with 30-day retention. Consumer-side badges and debug flows can fetch the artefact via the GitHub Actions API.

What this workflow does not do

  • Layer 1 (WASM) verification. Layer 1 reproductions run in-page in a browser; the verdict surface is live DOM / JavaScript. CI consumer-side verification of Layer 1 is a separate problem and does not benefit from a reusable workflow — the Vivarium gallery's Playwright suite is the canonical Layer 1 regression check.
  • Layer 3 (rr replay) verification on hosted GHA runners. The replay step itself runs as part of the recipe's image CMD, so this workflow does drive Layer 3 from the consumer side, but only on runners that expose CPUID faulting to the guest. GitHub-hosted Ubuntu runners do not. Self-hosted runners on bare metal or PMU-exposing KVM are required for Layer 3 consumer verification.

See also