Getting started¶

v0.3.0 is on PyPI as the whatifd distribution (the brand stays whatifd; the bare PyPI slot was taken). Install via uv pip install whatifd whatifd-langfuse whatifd-inspect-ai whatifd-phoenix whatifd-datadog. Watch the GitHub repo for release notes.

What you’ll do¶

This section walks you from a blank machine to a verdict report:

Install whatifd and verify the CLI is on your $PATH.
Implement the runner contract for your agent-a single Python function.
Run your first experiment — fork production failures, replay with a proposed change, read the verdict (failure_rescue shape).
Or run a regression check — same flow, but pointed at a known-good baseline to verify a candidate change doesn’t introduce regressions (regression_check shape, v0.2).

What you’ll need¶

Python 3.11+.
A tracer with at least 20 recent traces — Langfuse (shipped v0.1) or Arize Phoenix / OpenInference (shipped v0.2). LangSmith and OpenTelemetry GenAI are on the v0.3 roadmap.
Read API access to that tracer.
An agent you can re-execute programmatically — a function or class you can construct and call from Python. If your agent only runs as a hosted service, the runner contract pattern doesn’t fit yet (this is a known limitation; tracked for v0.3+).
An eval framework — Inspect AI is the default. v0.2 makes it reachable from whatifd.config.yaml via scorer.score_fn; the v0.1 programmatic path is preserved.

What you won’t need¶

A whatifd server or hosted account. It’s a CLI.
A new tracer. whatifd reads from yours.
A new SLO platform. whatifd’s output flows into yours.

Who this is for¶

SREs / platform engineers running LLM systems in production.
ML engineers shipping prompt or model changes weekly.
Anyone debating “is this prompt change actually better” in standup more than once a week.

If those don’t sound like you yet, Concepts might be a better starting point.