Arize Phoenix / OpenInference¶

Arize Phoenix and the broader OpenInference tracing standard are whatifd’s second supported trace source, shipped in v0.2. The adapter is tracer-neutral by construction: it consumes OpenInference-shaped span dictionaries from a caller-supplied provider, not a pinned Phoenix client. Anything that emits OpenInference spans — Phoenix proper, a custom OTLP collector, an offline JSONL dump — works with a ~5-line wiring callable.

What `whatifd` reads from Phoenix / OpenInference¶

For each selected trace:

input.value — the agent’s user-facing input (wrapped as Sensitive[str] at the boundary per cardinal #5).
output.value — the agent’s output (the original output the verdict is diffed against; also wrapped).
context.trace_id + parent_id — used to group spans into traces and identify the root span.
openinference.span.kind — used to identify LLM vs tool vs retrieval spans for the ToolCache.
All other span attributes — pass through to RawTrace.metadata unwrapped (cardinal #5: only intentional user content is Sensitive).

whatifd is read-only on Phoenix. It never writes spans back.

Install¶

uv pip install whatifd whatifd-langfuse whatifd-inspect-ai whatifd-phoenix

whatifd-phoenix itself is tracer-neutral and has no required SDK pin. If you want to read from a live Phoenix instance via arize-phoenix-client, install the optional extra:

uv pip install "whatifd-phoenix[live]"

The core whatifd package does not pull Phoenix — you can use Langfuse, your own adapter, or the synthetic whatifd.adapters.stub.StubTraceSource without ever installing this package.

Wiring a `spans_provider`¶

The adapter accepts a spans_provider: Callable[[], Iterable[dict]] that yields OpenInference-shaped span dictionaries. The shape decouples whatifd from any single Phoenix client SDK or transport layer.

From `arize-phoenix-client` (live Phoenix instance)¶

from arize.phoenix.client import Client
from whatifd_phoenix import PhoenixTraceSource

def spans_provider():
    client = Client(endpoint="https://your-phoenix-host")
    # Iterate however your Phoenix deployment exposes spans —
    # the API surface varies by version. The adapter cares only
    # that each yielded item is a dict with OpenInference attrs.
    for span in client.get_spans(project="my-project"):
        yield span.to_dict()

source = PhoenixTraceSource(spans_provider=spans_provider)

From a JSONL dump (offline / CI fixture)¶

import json
from pathlib import Path
from whatifd_phoenix import PhoenixTraceSource

def spans_provider():
    with Path("spans.jsonl").open() as f:
        for line in f:
            yield json.loads(line)

source = PhoenixTraceSource(spans_provider=spans_provider)

From an OpenTelemetry collector¶

If your spans land in any OpenInference-emitting OTLP destination, the same pattern applies — implement a spans_provider that pulls them out as dicts. The adapter never assumes Phoenix-specific transport.

OpenInference attribute mapping¶

The adapter reads the standard OpenInference span attribute conventions:

OpenInference attribute	`whatifd` use	`Sensitive[str]`?
`context.trace_id`	Group spans into traces	no
`parent_id` (or its absence)	Identify root span per trace	no
`openinference.span.kind`	Classify span (LLM / tool / retriever / agent)	no
`input.value`	Trace input (user message)	yes
`output.value`	Trace output (agent response)	yes
any other attribute	Passed through to `RawTrace.metadata`	no

The Sensitive[str] wrapping is enforced at the adapter boundary; downstream code that needs the raw value must call .unwrap(reason=...) and the unwrap is audit-logged.

Selectors¶

The Phoenix adapter supports the same selector grammar as Langfuse for cohort filtering — see the Langfuse selector grammar. Selectors that depend on tracer-specific concepts (e.g., Langfuse-style score-based filters) are interpreted by the adapter against whatever attribute Phoenix exposes for the equivalent signal.

If a selector references a concept Phoenix doesn’t have a direct equivalent for, the adapter declares it unsupported at config-load time (cardinal #1: structural failure, not silent skip).

Tool-call cache¶

OpenInference tool spans surface as cache entries under whatifd’s default use-original policy. The cache key components are derived from the span’s openinference.span.kind == "tool" plus the span’s input.value hash. Tool-output retrieval reads output.value from the same span.

If a trace has incomplete tool spans (output missing, mismatched parent/child structure), whatifd records the trace as a structured replay failure (cardinal #1) in the report’s Replay validity section.

Scope of v0.2 support¶

Feature	v0.2	Notes
Read OpenInference spans from any provider	✅	The `spans_provider` callable is the boundary.
Wrap user content as `Sensitive[str]`	✅	Per cardinal #5.
Conformance with `TraceSource` Protocol	✅	14 conformance tests (5 inherited from the harness + 9 adapter-specific).
Live-Phoenix recorded-cassette smoke	❌ (v0.3)	Parity with `whatifd-langfuse`’s cassette discipline. Today the adapter is verified against fixture spans only.
Phoenix-specific selector grammar	partial	Core selectors map cleanly; Phoenix-only signals (e.g., evaluation runs) are not yet first-class.
Multi-project Phoenix tenancy	✅	The `spans_provider` is per-instance — point it at whichever project.
`cluster_key` for clustered-paired bootstrap (cardinal #10)	declared empty	`cluster_key_support()` returns `()`; v0.3’s cluster-paired bootstrap will widen this.

Limitations¶

The adapter doesn’t ship a Phoenix client wrapper. You supply the spans_provider. This is a deliberate design choice: pinning to a specific Phoenix SDK version would bind whatifd-phoenix to one transport, breaking neutrality. The trade-off is ~5 lines of wiring code at integration time.
OpenInference attribute conformance. If your spans are emitted by a non-standard tracer that almost follows OpenInference, attribute mapping may surface gaps. The adapter validates the attributes it reads at trace-build time; gaps appear as structured RawTrace build errors.
Live-Phoenix verification is not yet structurally pinned. v0.3 will land a pytest-recording-based cassette suite mirroring whatifd-langfuse’s discipline. Until then, validate your spans_provider against a fixture set before relying on it in CI.