Getting started

v0.3.0 is on PyPI as the whatifd distribution (the brand stays whatifd; the bare PyPI slot was taken). Install via uv pip install whatifd whatifd-langfuse whatifd-inspect-ai whatifd-phoenix whatifd-datadog. Watch the GitHub repo for release notes.

What you’ll do

This section walks you from a blank machine to a verdict report:

  1. Install whatifd and verify the CLI is on your $PATH.

  2. Implement the runner contract for your agent-a single Python function.

  3. Run your first experiment — fork production failures, replay with a proposed change, read the verdict (failure_rescue shape).

  4. Or run a regression check — same flow, but pointed at a known-good baseline to verify a candidate change doesn’t introduce regressions (regression_check shape, v0.2).

What you’ll need

  • Python 3.11+.

  • A tracer with at least 20 recent traces — Langfuse (shipped v0.1) or Arize Phoenix / OpenInference (shipped v0.2). LangSmith and OpenTelemetry GenAI are on the v0.3 roadmap.

  • Read API access to that tracer.

  • An agent you can re-execute programmatically — a function or class you can construct and call from Python. If your agent only runs as a hosted service, the runner contract pattern doesn’t fit yet (this is a known limitation; tracked for v0.3+).

  • An eval framework — Inspect AI is the default. v0.2 makes it reachable from whatifd.config.yaml via scorer.score_fn; the v0.1 programmatic path is preserved.

What you won’t need

  • A whatifd server or hosted account. It’s a CLI.

  • A new tracer. whatifd reads from yours.

  • A new SLO platform. whatifd’s output flows into yours.

Who this is for

  • SREs / platform engineers running LLM systems in production.

  • ML engineers shipping prompt or model changes weekly.

  • Anyone debating “is this prompt change actually better” in standup more than once a week.

If those don’t sound like you yet, Concepts might be a better starting point.