Use Vivarium from your AI agent

// GUIDE · AI AGENT

Drive the Vivarium MCP server from Claude Code, Cursor, or Cline.

Vivarium ships an MCP (Model Context Protocol) server with five tools that let an agent search the catalogue, read verdict snapshots, and scaffold branch-fix verification end-to-end. No HTML scraping of the docs site required.

// 0 · WHAT THIS GUIDE COVERS

Wiring an AI agent to Vivarium's catalogue via MCP.

The Vivarium MCP server (@aletheia-works/vivarium-mcp) speaks Model Context Protocol over stdio and exposes five tools: list_recipes, get_recipe, lookup_verdict, match_error, and verify_branch_fix. An agent can browse the catalogue, fetch metadata for a given recipe, pull deployed Layer 2 / 3 verdict snapshots, and scaffold the AI-slop verification loop end-to-end.

The first JSR / npm publish is intentionally on hold. Vivarium overall and the MCP server's feature surface are both judged not yet finished — pushing v1 now would crystallise an interface still under iteration. The maintainer will announce timing separately. Until then, the working install path is local clone.

// 1 · BUILD LOCALLY

clone → bun install → bun run build.

git clone https://github.com/aletheia-works/vivarium.git
cd vivarium/packages/mcp-server
bun install
bun run build
# → produces dist/index.js (the entry point your client will spawn)

Note the absolute path to dist/index.js. The MCP client config will spawn it via node.

// 2 · REGISTER WITH YOUR MCP CLIENT

Same shape everywhere: command + args.

Every MCP client takes essentially the same JSON snippet — a command and an args array:

{
"mcpServers": {
  "vivarium": {
    "command": "node",
    "args": ["/abs/path/to/vivarium/packages/mcp-server/dist/index.js"]
  }
}
}

What differs is where you put it:

01
Claude Code (CLI).

claude mcp add vivarium node /abs/path/to/...dist/index.js registers it interactively. Equivalent: drop the JSON above into the mcpServers key of ~/.claude.json.

02
Cursor.

Drop the JSON into ~/.cursor/mcp.json (or .cursor/mcp.json at the project root for project-scoped registration).

03
Cline (VS Code extension).

VS Code → Cline sidebar → MCP ServersConfigure MCP Servers opens a JSON editor. Add the snippet there.

04
Continue.

Translate the command + args into the YAML mcpServers section of ~/.continue/config.yaml — same fields, just YAML syntax.

The authoritative client list is at modelcontextprotocol.io/clients. Per-client config paths drift over time; consult upstream when anything looks off.

// 3 · TOOLS

All four go through the same stdio transport.

  • list_recipes(layer?, project?, q?) — filtered enumeration of the catalogue. layer is the integer 1/2/3, project matches the upstream project (e.g. "pandas"), q is a substring search across slug, project, and title.

  • get_recipe(slug) — full metadata for one recipe (title, project, issue, page URL, verdict snapshot URL, GitHub source URL). Returns { found: false, error } on unknown slug.

  • lookup_verdict(slug) — Layer 1 returns { kind: "live", page_url, note } (the verdict is computed in-browser at view time). Layer 2 / 3 returns { kind: "snapshot", snapshot: { verdict, exit_code, image_digest, stdout, stderr_tail, ... } }.

  • match_error(text, limit?) — score the catalogue against a pasted error message or stack trace by mechanical token overlap. No LLM, no fuzzy matching — exact token hits against symptom / tags / project / slug, weighted per source.

  • verify_branch_fix(slug, fix_url? | fix_source?) — scaffolding helper for the AI-slop verification loop (NOT an execution engine). Layer 1 returns Path A: a recipe-page compare_url with the fix pre-loaded via ?fix_url= or ?fix=<base64url>. Layer 2 / 3 returns Path B: a /repro/compare deep-link plus the gh workflow run branch-fix-verdict.yml command the contributor runs. Full walkthrough: Verify a branch-fix.

// 4 · SAMPLE PROMPTS

Three patterns that come up most.

01
Paste an error, find candidate recipes.

"Use match_error to find Vivarium recipes matching this stack trace, top 5: ...paste..." → the agent calls match_error and shows ranked hits. Drill into one with get_recipe.

02
Check whether something still reproduces today.

"What does lookup_verdict say about pandas-56679?" → Layer 1 returns a kind: "live" URL the agent can offer for a browser visit; Layer 2 / 3 returns the snapshot's verdict / exit code directly.

03
Filter by layer or project.

"Use list_recipes to enumerate all Layer 2 recipes" → calls with { layer: 2 } and returns only the Docker layer.

The match_error scoring is bit-identical to the error → recipe matcher page. MCP and UI return the same ranked candidates.

// 5 · COMMON SNAGS

Most setups stall on one of these.

  • No dist/index.js. Means bun run build wasn't run. Run bun install && bun run build in packages/mcp-server/ once.

  • Some clients can't expand the home-directory shortcut or relative paths. Use a fully-qualified absolute path — /Users/<you>/code/vivarium/.../dist/index.js, not $HOME/code/vivarium/....

  • Recipes seem stale. The server caches https://aletheia-works.github.io/vivarium/api/recipes.json with a 5-minute TTL and falls back to a build-time snapshot offline. Restart the MCP server to force a refetch.

  • "Layer 1 verdict isn't a snapshot." By design — Layer 1's site of truth is the browser at view time (Contract v1). The agent should treat kind: "live" as "open this URL for the human to verify."

// 6 · WHAT'S NEXT

Lines branching out from this surface.

01
Pair with consumer-side CI.

Integrate with your own repo sets up the verdict-watch side from your own CI. The two sides — agent + CI — close the loop.

02
Verify a branch fix end-to-end.

Confirming "did my candidate fix flip reproduced → unreproduced?" through the agent in one shot is wired up via verify_branch_fix (Phase 7 B3). Full walkthrough: Verify a branch-fix.

If a step here didn't work, that's a bug in this guide — file an issue with your client name, OS, and the step number you got stuck on.

// NEXT

Integrate with your own repo

With the agent side wired, adding the consumer-CI side closes the other half of the loop.

VIVARIUM IS PART OF ALETHEIA-WORKS · SEE SOURCE ON GITHUB →