Skip to content

Automation rule filter: fields accepted by API are silently ignored at online-scoring evaluation time #6344

@connectwithprakash

Description

@connectwithprakash

Summary

Several TraceField values that the automation-rule filter API accepts are silently ignored by the online-scoring filter evaluator. The payload round-trips cleanly (create → list → read back), but at scoring time the filter does nothing — which makes filter mis-use indistinguishable from filter mis-configuration for the user.

The symptom is: you configure a filter on an automation rule expecting it to exclude certain traces from judge scoring, traces that should have been excluded are still scored, and nothing in the UI or response tells you the filter was ineffective.

Repro

  1. Create an LLM-as-Judge automation rule in any project with a filter on error_info (or other field listed below). Accepted via REST:
{
  "filters": [
    { "field": "error_info", "operator": "is_empty", "key": "", "value": "" }
  ],
  "type": "llm_as_judge",
  "name": "test",
  ...
}
  1. List rules — the filter is there:
Filters (1):
  field='error_info' operator='is_empty' key='' value=''
  1. Create traces with and without an error_info payload. Both get scored by the rule — the filter has no effect.

  2. Check backend logs — you'll see: WARN Unsupported trace field for filter evaluation: ERROR_INFO.

Root cause (from source)

Two layers are inconsistent:

1. TraceField enum (TraceField.java) declares many fields as filterable: INPUT, OUTPUT, INPUT_JSON, OUTPUT_JSON, METADATA, TAGS, USAGE_*_TOKENS, TOTAL_ESTIMATED_COST, LLM_SPAN_COUNT, FEEDBACK_SCORES, SPAN_FEEDBACK_SCORES, DURATION, TTFT, THREAD_ID, GUARDRAILS, VISIBILITY_MODE, ERROR_INFO, ERROR_TYPE, CREATED_AT, LAST_UPDATED_AT, SOURCE, EXPERIMENT_ID, CUSTOM, ...

2. TraceFilterEvaluationService.extractFieldValue(...) (source) only switches on a subset:

case ID, NAME, START_TIME, END_TIME,
     INPUT, OUTPUT, INPUT_JSON, OUTPUT_JSON, METADATA, TAGS,
     TOTAL_ESTIMATED_COST, USAGE_COMPLETION_TOKENS, USAGE_PROMPT_TOKENS,
     USAGE_TOTAL_TOKENS, FEEDBACK_SCORES, DURATION, TTFT, THREAD_ID, CUSTOM
     -> (handled)
default -> {
    log.warn("Unsupported trace field for filter evaluation: {}", traceField);
    yield null;
}

Fields declared in TraceField but not in the switch fall through to default, return null, and every subsequent operator comparison becomes a no-op for real data.

Missing from the switch (at time of reporting):

  • ERROR_INFO, ERROR_TYPE
  • GUARDRAILS
  • VISIBILITY_MODE
  • LLM_SPAN_COUNT
  • SPAN_FEEDBACK_SCORES
  • ANNOTATION_QUEUE_IDS
  • EXPERIMENT_ID
  • CREATED_AT, LAST_UPDATED_AT
  • SOURCE

Why this matters

The ERROR_INFO case is especially load-bearing: filtering LLM-judges to only fire on successful traces (error_info is_empty) is the right way to keep agent-runtime failures out of content-quality means. Without that working, judges score the empty/partial output of error traces and drag down all content-related metrics, producing the "double-count" pattern where a single failed trace hurts both empty_output_rate and response_quality.

And because the traces-dashboard filter supports Errors is_not_empty correctly (TracesTable query path uses a different evaluator), users naturally reach for the same filter on their automation rules and are surprised when it doesn't work.

Asks

  1. Fail loud, not silent — reject filters at rule-create time if the field isn't in the evaluator's switch. A 400 Bad Request: "field X is accepted in the schema but not evaluated at scoring time" is strictly better than silently no-op'ing.
  2. Close the gap for ERROR_INFO / ERROR_TYPE — these are the highest-value missing fields for reliability work. Adding them to the switch looks straightforward (the values are on the Trace object).
  3. Document the support matrix — until the gap is closed, the production-rules docs should list which fields actually work for automation-rule filters, separate from the traces-view filter list.

Environment

  • Opik backend: latest as observed via the public comet-ml/opik main branch
  • Reporting from: consumer-agent project, Python SDK opik==1.10.x
  • Observed behavior verified against TraceFilterEvaluationService.extractFieldValue switch

Thanks for maintaining Opik — happy to PR the validation and/or the error_info case if that's a welcome path.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions