Getting started¶
v0.3.0 is on PyPI as the whatifd distribution (the brand stays whatifd; the bare PyPI slot was taken). Install via uv pip install whatifd whatifd-langfuse whatifd-inspect-ai whatifd-phoenix whatifd-datadog. Watch the GitHub repo for release notes.
What you’ll do¶
This section walks you from a blank machine to a verdict report:
Install
whatifdand verify the CLI is on your$PATH.Implement the runner contract for your agent-a single Python function.
Run your first experiment — fork production failures, replay with a proposed change, read the verdict (
failure_rescueshape).Or run a regression check — same flow, but pointed at a known-good baseline to verify a candidate change doesn’t introduce regressions (
regression_checkshape, v0.2).
What you’ll need¶
Python 3.11+.
A tracer with at least 20 recent traces — Langfuse (shipped v0.1) or Arize Phoenix / OpenInference (shipped v0.2). LangSmith and OpenTelemetry GenAI are on the v0.3 roadmap.
Read API access to that tracer.
An agent you can re-execute programmatically — a function or class you can construct and call from Python. If your agent only runs as a hosted service, the runner contract pattern doesn’t fit yet (this is a known limitation; tracked for v0.3+).
An eval framework — Inspect AI is the default. v0.2 makes it reachable from
whatifd.config.yamlviascorer.score_fn; the v0.1 programmatic path is preserved.
What you won’t need¶
A
whatifdserver or hosted account. It’s a CLI.A new tracer.
whatifdreads from yours.A new SLO platform.
whatifd’s output flows into yours.
Who this is for¶
SREs / platform engineers running LLM systems in production.
ML engineers shipping prompt or model changes weekly.
Anyone debating “is this prompt change actually better” in standup more than once a week.
If those don’t sound like you yet, Concepts might be a better starting point.