GitHub Actions¶
The CLI is the wedge; CI integration is the moat. This page covers two integration paths:
The
whatifd-forkcomposite GitHub Action (shipped in v0.2) — recommended for GitHub workflows. See the dedicated section further down.The direct CLI pattern (works on any CI system) — used by the
whatifd-forkaction under the hood, documented next for GitLab / Jenkins / CircleCI and for GitHub workflows that need finer-grained control.
Direct CLI pattern (any CI system)¶
whatifd fork is config-file driven and writes a JSON + Markdown report; the exit code is the gate. Use this pattern directly on non-GitHub CI, or on GitHub when you need to customize behavior beyond what the composite action exposes.
# .github/workflows/llm-regression.yml
name: llm-regression
on:
pull_request:
paths:
- "prompts/**"
- "src/agent/**"
- "whatifd.config.yaml"
permissions:
contents: read
pull-requests: write # for posting the report as a PR comment
jobs:
whatifd:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: astral-sh/setup-uv@v7
with:
python-version: "3.12"
- run: uv pip install --system whatifd whatifd-langfuse whatifd-inspect-ai whatifd-phoenix
- run: whatifd fork --config whatifd.config.yaml
env:
LANGFUSE_HOST: ${{ secrets.LANGFUSE_HOST }}
LANGFUSE_PUBLIC_KEY: ${{ secrets.LANGFUSE_PUBLIC_KEY }}
LANGFUSE_SECRET_KEY: ${{ secrets.LANGFUSE_SECRET_KEY }}
ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
- if: always()
uses: actions/upload-artifact@v4
with:
name: whatifd-report
path: reports/whatifd-fork-*.*
Exit codes propagate naturally: 0 (Ship) → green check; 1 (Don’t Ship) → red, blocks merge; 2 (Inconclusive or setup failure) → red.
Generic CI pattern¶
If you’re not on GitHub, the same pattern works:
# .gitlab-ci.yml example
whatifd:
image: python:3.12-slim
before_script:
- pip install whatifd whatifd-langfuse whatifd-inspect-ai whatifd-phoenix
script:
- whatifd fork --config whatifd.config.yaml
artifacts:
when: always
paths:
- reports/
whatifd-fork GitHub Action (v0.2)¶
v0.2.0 ships a composite GitHub Action at .github/actions/whatifd-fork/ in the whatifd repo. It wraps whatifd fork --config <path>, captures ./reports/*.json and ./reports/*.md artifacts, posts the rendered Markdown verdict as a PR comment on pull_request events, and surfaces the verdict via a GitHub status annotation (::notice / ::warning / ::error per verdict).
Consume it from your workflow:
- uses: victoralfred/whatifd/.github/actions/whatifd-fork@v0.2.0
with:
config: whatifd.config.yaml
profile: default # or "forensic" — preserves the two-affirmation gate
comment-on-pr: true
fail-on-dont-ship: true # true (default) fails the workflow on Don't Ship / Inconclusive
Outputs: verdict (ship / dont_ship / inconclusive), exit-code, report-json, report-md. Downstream steps can branch on the verdict instead of relying on workflow-level pass/fail.
Marketplace publication is on the v0.3 roadmap; today the action is consumable directly from the whatifd repo via uses: victoralfred/whatifd/.github/actions/whatifd-fork@<ref>.
Exit-code behavior¶
The CI integration relies entirely on exit codes:
Exit code |
CI behavior |
Meaning |
|---|---|---|
|
Job succeeds. PR check is green. |
Verdict: Ship. |
|
Job fails. PR check is red. |
Verdict: Don’t Ship. |
|
Job fails. PR check is red, but with an “Inconclusive” annotation. |
Setup / replay / scoring failure. |
In Path Z, exit code 1 is what blocks the merge.
Posting the report as a PR comment¶
The Action’s comment-on-pr: true does this automatically. For generic CI, post the Markdown report yourself:
gh pr comment $PR_NUMBER --body-file whatifd-report.md
Or via the GitHub API directly:
curl -X POST \
-H "Authorization: Bearer $GH_TOKEN" \
-H "Accept: application/vnd.github+json" \
https://api.github.com/repos/$REPO/issues/$PR_NUMBER/comments \
-d "$(jq -Rs '{body: .}' < whatifd-report.md)"
Required permissions¶
Permission |
Why |
|---|---|
|
To check out the PR. |
|
To post the verdict report as a PR comment. |
|
Only if your scorer or runner uses OIDC (e.g., for cloud LLM auth). |
Secrets¶
Secret |
When needed |
|---|---|
|
For - -source langfuse`. |
|
For the runner’s LLM calls AND the Inspect AI judge. |
Custom tracer creds |
When using |
Use GitHub Environments for production credentials and require a manual approval before the job runs against prod data:
jobs:
whatifd:
environment: production-traces
# the job won't run until a reviewer approves access to that environment
What runs when (recommendations)¶
Trigger |
Cohort size |
Cost |
Use case |
|---|---|---|---|
|
20 + 20 |
~$0.20 |
Per-PR check on prompt / model changes. |
|
50 + 50 |
~$1 |
Continuous regression catch on |
|
100 + 100 |
~$3 |
Release-gate: full sample before tagging. |
Cost estimates assume Claude Haiku 4.5 as the Inspect judge.
What this enables (Path Z, made concrete)¶
✗ whatifd / experiment-runner-Verdict: Don't Ship
Failures: 14/20 improved, but baseline regressed 6/20.
Median baseline Δ: -0.18 (CI: [-0.24, -0.12])
Top regression: trace t_492af ("agent now refuses requests it
previously handled correctly")
Full report → /artifacts/whatifd-report.md
That output, on a PR check, is the moment whatifd becomes infrastructure. See Path Z for the trajectory.