CLI reference

whatif --help

Commands

Command

Purpose

whatif fork

Fork production traces, replay with a proposed change, score, emit verdict.

whatif report

Re-render an existing JSON report as Markdown (without re-running).

whatif explain <trace-id>

Deep-dive a single trace from the last run-full diff, judge rationale, replay span tree.

whatif --version

Print version and exit.

whatif --help

Print top-level help.

whatif fork

The main verb. Ingests traces, replays, scores, emits report.

Required flags

Flag

Description

Example

- -source`

Tracer to ingest from.

langfuse

- -target`

Path to the user-supplied runner.

"python:my_agent.replay:run"

- -change`

The proposed change to apply.

"system_prompt=prompts/v3.txt"

- -score`

Scorer specification.

"inspect_ai:faithfulness"

Selection flags

Flag

Description

Default

- -failures`

Failure-cohort selector.

required (or - -config`)

- -baseline`

Baseline-cohort selector.

recommended; warning if omitted

- -mode`

failures_plus_baseline (default) or failures_only.

failures_plus_baseline

Selector grammar:

"score-below:0.6,since:24h,limit:20"
"score-above:0.8,since:24h,limit:20,sample:random,seed:42"
"tag:incident-triage,limit:50"

Replay flags

Flag

Description

Default

- -tool-cache`

use-original (v0.1), live (v0.3, opt-in), mock (v0.3+).

use-original

- -tool-allowlist`

When - -tool-cache live`, comma-separated list of tools allowed to call live.

empty

Output flags

Flag

Description

Default

- -report`

Path for the Markdown report.

./reports/whatif-<timestamp>.md

- -json`

Path for the machine-readable JSON report.

./reports/whatif-<timestamp>.json

- -quiet`

Suppress progress output (the report is still written).

false

Decision flags

Flag

Description

Default

- -fail-on-regression`

Exit 1 if the verdict is “Don’t Ship”.

true in v0.2+ via config; opt-in CLI flag in v0.1

- -regression-threshold`

Median score drop above which to fail.

0.1

Configuration mode (v0.2+)

whatif fork --config whatif.config.yaml --change system_prompt=prompts/v3.txt

When - -config is provided, all selection / replay / scoring / decision settings come from the YAML file. Only  - -change is supplied per-invocation. See the config reference.

Exit codes

Code

Meaning

0

Passed configured policy. Ship-grade.

1

Failed configured policy. Do not ship.

2

Inconclusive-setup, replay, or scoring failure.

whatif enforces your declared policy. Exit code 0 is not a safety certification; it means the run satisfied the policy you declared.

whatif report

Re-render a JSON report as Markdown. Useful for CI pipelines that want both formats but emit JSON first.

whatif report --json ./reports/run-1234.json --report ./reports/run-1234.md

whatif explain

Deep-dive a single trace from the most recent run.

whatif explain trace-id-abc123

Output: full diff, judge rationale, replay span tree, links back to the source tracer. Useful when the verdict report’s Evidence section flags a case as a regression and you want the underlying detail.