GitHub Actions

The CLI is the wedge; CI integration is the moat. This page covers two integration paths:

  1. The whatifd-fork composite GitHub Action (shipped in v0.2) — recommended for GitHub workflows. See the dedicated section further down.

  2. The direct CLI pattern (works on any CI system) — used by the whatifd-fork action under the hood, documented next for GitLab / Jenkins / CircleCI and for GitHub workflows that need finer-grained control.

Direct CLI pattern (any CI system)

whatifd fork is config-file driven and writes a JSON + Markdown report; the exit code is the gate. Use this pattern directly on non-GitHub CI, or on GitHub when you need to customize behavior beyond what the composite action exposes.

# .github/workflows/llm-regression.yml
name: llm-regression
on:
  pull_request:
    paths:
      - "prompts/**"
      - "src/agent/**"
      - "whatifd.config.yaml"

permissions:
  contents: read
  pull-requests: write   # for posting the report as a PR comment

jobs:
  whatifd:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: astral-sh/setup-uv@v7
        with:
          python-version: "3.12"
      - run: uv pip install --system whatifd whatifd-langfuse whatifd-inspect-ai whatifd-phoenix
      - run: whatifd fork --config whatifd.config.yaml
        env:
          LANGFUSE_HOST: ${{ secrets.LANGFUSE_HOST }}
          LANGFUSE_PUBLIC_KEY: ${{ secrets.LANGFUSE_PUBLIC_KEY }}
          LANGFUSE_SECRET_KEY: ${{ secrets.LANGFUSE_SECRET_KEY }}
          ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
      - if: always()
        uses: actions/upload-artifact@v4
        with:
          name: whatifd-report
          path: reports/whatifd-fork-*.*

Exit codes propagate naturally: 0 (Ship) → green check; 1 (Don’t Ship) → red, blocks merge; 2 (Inconclusive or setup failure) → red.

Generic CI pattern

If you’re not on GitHub, the same pattern works:

# .gitlab-ci.yml example
whatifd:
  image: python:3.12-slim
  before_script:
    - pip install whatifd whatifd-langfuse whatifd-inspect-ai whatifd-phoenix
  script:
    - whatifd fork --config whatifd.config.yaml
  artifacts:
    when: always
    paths:
      - reports/

whatifd-fork GitHub Action (v0.2)

v0.2.0 ships a composite GitHub Action at .github/actions/whatifd-fork/ in the whatifd repo. It wraps whatifd fork --config <path>, captures ./reports/*.json and ./reports/*.md artifacts, posts the rendered Markdown verdict as a PR comment on pull_request events, and surfaces the verdict via a GitHub status annotation (::notice / ::warning / ::error per verdict).

Consume it from your workflow:

- uses: victoralfred/whatifd/.github/actions/whatifd-fork@v0.2.0
  with:
    config: whatifd.config.yaml
    profile: default            # or "forensic" — preserves the two-affirmation gate
    comment-on-pr: true
    fail-on-dont-ship: true     # true (default) fails the workflow on Don't Ship / Inconclusive

Outputs: verdict (ship / dont_ship / inconclusive), exit-code, report-json, report-md. Downstream steps can branch on the verdict instead of relying on workflow-level pass/fail.

Marketplace publication is on the v0.3 roadmap; today the action is consumable directly from the whatifd repo via uses: victoralfred/whatifd/.github/actions/whatifd-fork@<ref>.

Exit-code behavior

The CI integration relies entirely on exit codes:

Exit code

CI behavior

Meaning

0

Job succeeds. PR check is green.

Verdict: Ship.

1

Job fails. PR check is red.

Verdict: Don’t Ship.

2

Job fails. PR check is red, but with an “Inconclusive” annotation.

Setup / replay / scoring failure.

In Path Z, exit code 1 is what blocks the merge.

Posting the report as a PR comment

The Action’s comment-on-pr: true does this automatically. For generic CI, post the Markdown report yourself:

gh pr comment $PR_NUMBER --body-file whatifd-report.md

Or via the GitHub API directly:

curl -X POST \
  -H "Authorization: Bearer $GH_TOKEN" \
  -H "Accept: application/vnd.github+json" \
  https://api.github.com/repos/$REPO/issues/$PR_NUMBER/comments \
  -d "$(jq -Rs '{body: .}' < whatifd-report.md)"

Required permissions

Permission

Why

contents: read

To check out the PR.

pull-requests: write

To post the verdict report as a PR comment.

id-token: write

Only if your scorer or runner uses OIDC (e.g., for cloud LLM auth).

Secrets

Secret

When needed

LANGFUSE_PUBLIC_KEY + LANGFUSE_SECRET_KEY

For - -source langfuse`.

ANTHROPIC_API_KEY (or other LLM API key)

For the runner’s LLM calls AND the Inspect AI judge.

Custom tracer creds

When using --source other than Langfuse (e.g., Phoenix / OpenInference in v0.2; LangSmith / OTel GenAI on the v0.3 roadmap).

Use GitHub Environments for production credentials and require a manual approval before the job runs against prod data:

jobs:
  whatifd:
    environment: production-traces
    # the job won't run until a reviewer approves access to that environment

What runs when (recommendations)

Trigger

Cohort size

Cost

Use case

pull_request (paths-filtered)

20 + 20

~$0.20

Per-PR check on prompt / model changes.

schedule (nightly)

50 + 50

~$1

Continuous regression catch on main.

release (manual / pre-tag)

100 + 100

~$3

Release-gate: full sample before tagging.

Cost estimates assume Claude Haiku 4.5 as the Inspect judge.

What this enables (Path Z, made concrete)

✗ whatifd / experiment-runner-Verdict: Don't Ship

  Failures: 14/20 improved, but baseline regressed 6/20.
  Median baseline Δ: -0.18  (CI: [-0.24, -0.12])

  Top regression: trace t_492af  ("agent now refuses requests it
                                   previously handled correctly")

  Full report → /artifacts/whatifd-report.md

That output, on a PR check, is the moment whatifd becomes infrastructure. See Path Z for the trajectory.