Automation rule filter: fields accepted by API are silently ignored at online-scoring evaluation time

## Summary

Several `TraceField` values that the automation-rule filter **API** accepts are **silently ignored** by the online-scoring filter **evaluator**. The payload round-trips cleanly (create → list → read back), but at scoring time the filter does nothing — which makes filter mis-use indistinguishable from filter mis-configuration for the user.

The symptom is: you configure a filter on an automation rule expecting it to exclude certain traces from judge scoring, traces that should have been excluded are still scored, and nothing in the UI or response tells you the filter was ineffective.

## Repro

1. Create an LLM-as-Judge automation rule in any project with a filter on `error_info` (or other field listed below). Accepted via REST:

```json
{
  "filters": [
    { "field": "error_info", "operator": "is_empty", "key": "", "value": "" }
  ],
  "type": "llm_as_judge",
  "name": "test",
  ...
}
```

2. List rules — the filter is there:

```
Filters (1):
  field='error_info' operator='is_empty' key='' value=''
```

3. Create traces with and without an `error_info` payload. Both get scored by the rule — the filter has no effect.

4. Check backend logs — you'll see: `WARN  Unsupported trace field for filter evaluation: ERROR_INFO`.

## Root cause (from source)

Two layers are inconsistent:

**1. `TraceField` enum** ([TraceField.java](https://github.com/comet-ml/opik/blob/main/apps/opik-backend/src/main/java/com/comet/opik/api/filter/TraceField.java)) declares many fields as filterable: `INPUT`, `OUTPUT`, `INPUT_JSON`, `OUTPUT_JSON`, `METADATA`, `TAGS`, `USAGE_*_TOKENS`, `TOTAL_ESTIMATED_COST`, `LLM_SPAN_COUNT`, `FEEDBACK_SCORES`, `SPAN_FEEDBACK_SCORES`, `DURATION`, `TTFT`, `THREAD_ID`, `GUARDRAILS`, `VISIBILITY_MODE`, `ERROR_INFO`, `ERROR_TYPE`, `CREATED_AT`, `LAST_UPDATED_AT`, `SOURCE`, `EXPERIMENT_ID`, `CUSTOM`, ...

**2. `TraceFilterEvaluationService.extractFieldValue(...)`** ([source](https://github.com/comet-ml/opik/blob/main/apps/opik-backend/src/main/java/com/comet/opik/domain/evaluators/TraceFilterEvaluationService.java)) only switches on a subset:

```java
case ID, NAME, START_TIME, END_TIME,
     INPUT, OUTPUT, INPUT_JSON, OUTPUT_JSON, METADATA, TAGS,
     TOTAL_ESTIMATED_COST, USAGE_COMPLETION_TOKENS, USAGE_PROMPT_TOKENS,
     USAGE_TOTAL_TOKENS, FEEDBACK_SCORES, DURATION, TTFT, THREAD_ID, CUSTOM
     -> (handled)
default -> {
    log.warn("Unsupported trace field for filter evaluation: {}", traceField);
    yield null;
}
```

Fields declared in `TraceField` but **not** in the switch fall through to `default`, return `null`, and every subsequent operator comparison becomes a no-op for real data.

Missing from the switch (at time of reporting):

- `ERROR_INFO`, `ERROR_TYPE`
- `GUARDRAILS`
- `VISIBILITY_MODE`
- `LLM_SPAN_COUNT`
- `SPAN_FEEDBACK_SCORES`
- `ANNOTATION_QUEUE_IDS`
- `EXPERIMENT_ID`
- `CREATED_AT`, `LAST_UPDATED_AT`
- `SOURCE`

## Why this matters

The `ERROR_INFO` case is especially load-bearing: filtering LLM-judges to only fire on successful traces (`error_info is_empty`) is the right way to keep agent-runtime failures out of content-quality means. Without that working, judges score the empty/partial output of error traces and drag down all content-related metrics, producing the "double-count" pattern where a single failed trace hurts both `empty_output_rate` and `response_quality`.

And because the traces-dashboard filter supports `Errors is_not_empty` correctly (`TracesTable` query path uses a different evaluator), users naturally reach for the same filter on their automation rules and are surprised when it doesn't work.

## Asks

1. **Fail loud, not silent** — reject filters at rule-create time if the field isn't in the evaluator's switch. A `400 Bad Request: "field X is accepted in the schema but not evaluated at scoring time"` is strictly better than silently no-op'ing.
2. **Close the gap for `ERROR_INFO` / `ERROR_TYPE`** — these are the highest-value missing fields for reliability work. Adding them to the switch looks straightforward (the values are on the `Trace` object).
3. **Document the support matrix** — until the gap is closed, the production-rules docs should list which fields actually work for automation-rule filters, separate from the traces-view filter list.

## Environment

- Opik backend: latest as observed via the public `comet-ml/opik` main branch
- Reporting from: `consumer-agent` project, Python SDK `opik==1.10.x`
- Observed behavior verified against `TraceFilterEvaluationService.extractFieldValue` switch

Thanks for maintaining Opik — happy to PR the validation and/or the error_info case if that's a welcome path.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Automation rule filter: fields accepted by API are silently ignored at online-scoring evaluation time #6344

Summary

Repro

Root cause (from source)

Why this matters

Asks

Environment

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Automation rule filter: fields accepted by API are silently ignored at online-scoring evaluation time #6344

Description

Summary

Repro

Root cause (from source)

Why this matters

Asks

Environment

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions