Summary
Several TraceField values that the automation-rule filter API accepts are silently ignored by the online-scoring filter evaluator. The payload round-trips cleanly (create → list → read back), but at scoring time the filter does nothing — which makes filter mis-use indistinguishable from filter mis-configuration for the user.
The symptom is: you configure a filter on an automation rule expecting it to exclude certain traces from judge scoring, traces that should have been excluded are still scored, and nothing in the UI or response tells you the filter was ineffective.
Repro
- Create an LLM-as-Judge automation rule in any project with a filter on
error_info (or other field listed below). Accepted via REST:
{
"filters": [
{ "field": "error_info", "operator": "is_empty", "key": "", "value": "" }
],
"type": "llm_as_judge",
"name": "test",
...
}
- List rules — the filter is there:
Filters (1):
field='error_info' operator='is_empty' key='' value=''
-
Create traces with and without an error_info payload. Both get scored by the rule — the filter has no effect.
-
Check backend logs — you'll see: WARN Unsupported trace field for filter evaluation: ERROR_INFO.
Root cause (from source)
Two layers are inconsistent:
1. TraceField enum (TraceField.java) declares many fields as filterable: INPUT, OUTPUT, INPUT_JSON, OUTPUT_JSON, METADATA, TAGS, USAGE_*_TOKENS, TOTAL_ESTIMATED_COST, LLM_SPAN_COUNT, FEEDBACK_SCORES, SPAN_FEEDBACK_SCORES, DURATION, TTFT, THREAD_ID, GUARDRAILS, VISIBILITY_MODE, ERROR_INFO, ERROR_TYPE, CREATED_AT, LAST_UPDATED_AT, SOURCE, EXPERIMENT_ID, CUSTOM, ...
2. TraceFilterEvaluationService.extractFieldValue(...) (source) only switches on a subset:
case ID, NAME, START_TIME, END_TIME,
INPUT, OUTPUT, INPUT_JSON, OUTPUT_JSON, METADATA, TAGS,
TOTAL_ESTIMATED_COST, USAGE_COMPLETION_TOKENS, USAGE_PROMPT_TOKENS,
USAGE_TOTAL_TOKENS, FEEDBACK_SCORES, DURATION, TTFT, THREAD_ID, CUSTOM
-> (handled)
default -> {
log.warn("Unsupported trace field for filter evaluation: {}", traceField);
yield null;
}
Fields declared in TraceField but not in the switch fall through to default, return null, and every subsequent operator comparison becomes a no-op for real data.
Missing from the switch (at time of reporting):
ERROR_INFO, ERROR_TYPE
GUARDRAILS
VISIBILITY_MODE
LLM_SPAN_COUNT
SPAN_FEEDBACK_SCORES
ANNOTATION_QUEUE_IDS
EXPERIMENT_ID
CREATED_AT, LAST_UPDATED_AT
SOURCE
Why this matters
The ERROR_INFO case is especially load-bearing: filtering LLM-judges to only fire on successful traces (error_info is_empty) is the right way to keep agent-runtime failures out of content-quality means. Without that working, judges score the empty/partial output of error traces and drag down all content-related metrics, producing the "double-count" pattern where a single failed trace hurts both empty_output_rate and response_quality.
And because the traces-dashboard filter supports Errors is_not_empty correctly (TracesTable query path uses a different evaluator), users naturally reach for the same filter on their automation rules and are surprised when it doesn't work.
Asks
- Fail loud, not silent — reject filters at rule-create time if the field isn't in the evaluator's switch. A
400 Bad Request: "field X is accepted in the schema but not evaluated at scoring time" is strictly better than silently no-op'ing.
- Close the gap for
ERROR_INFO / ERROR_TYPE — these are the highest-value missing fields for reliability work. Adding them to the switch looks straightforward (the values are on the Trace object).
- Document the support matrix — until the gap is closed, the production-rules docs should list which fields actually work for automation-rule filters, separate from the traces-view filter list.
Environment
- Opik backend: latest as observed via the public
comet-ml/opik main branch
- Reporting from:
consumer-agent project, Python SDK opik==1.10.x
- Observed behavior verified against
TraceFilterEvaluationService.extractFieldValue switch
Thanks for maintaining Opik — happy to PR the validation and/or the error_info case if that's a welcome path.
Summary
Several
TraceFieldvalues that the automation-rule filter API accepts are silently ignored by the online-scoring filter evaluator. The payload round-trips cleanly (create → list → read back), but at scoring time the filter does nothing — which makes filter mis-use indistinguishable from filter mis-configuration for the user.The symptom is: you configure a filter on an automation rule expecting it to exclude certain traces from judge scoring, traces that should have been excluded are still scored, and nothing in the UI or response tells you the filter was ineffective.
Repro
error_info(or other field listed below). Accepted via REST:{ "filters": [ { "field": "error_info", "operator": "is_empty", "key": "", "value": "" } ], "type": "llm_as_judge", "name": "test", ... }Create traces with and without an
error_infopayload. Both get scored by the rule — the filter has no effect.Check backend logs — you'll see:
WARN Unsupported trace field for filter evaluation: ERROR_INFO.Root cause (from source)
Two layers are inconsistent:
1.
TraceFieldenum (TraceField.java) declares many fields as filterable:INPUT,OUTPUT,INPUT_JSON,OUTPUT_JSON,METADATA,TAGS,USAGE_*_TOKENS,TOTAL_ESTIMATED_COST,LLM_SPAN_COUNT,FEEDBACK_SCORES,SPAN_FEEDBACK_SCORES,DURATION,TTFT,THREAD_ID,GUARDRAILS,VISIBILITY_MODE,ERROR_INFO,ERROR_TYPE,CREATED_AT,LAST_UPDATED_AT,SOURCE,EXPERIMENT_ID,CUSTOM, ...2.
TraceFilterEvaluationService.extractFieldValue(...)(source) only switches on a subset:Fields declared in
TraceFieldbut not in the switch fall through todefault, returnnull, and every subsequent operator comparison becomes a no-op for real data.Missing from the switch (at time of reporting):
ERROR_INFO,ERROR_TYPEGUARDRAILSVISIBILITY_MODELLM_SPAN_COUNTSPAN_FEEDBACK_SCORESANNOTATION_QUEUE_IDSEXPERIMENT_IDCREATED_AT,LAST_UPDATED_ATSOURCEWhy this matters
The
ERROR_INFOcase is especially load-bearing: filtering LLM-judges to only fire on successful traces (error_info is_empty) is the right way to keep agent-runtime failures out of content-quality means. Without that working, judges score the empty/partial output of error traces and drag down all content-related metrics, producing the "double-count" pattern where a single failed trace hurts bothempty_output_rateandresponse_quality.And because the traces-dashboard filter supports
Errors is_not_emptycorrectly (TracesTablequery path uses a different evaluator), users naturally reach for the same filter on their automation rules and are surprised when it doesn't work.Asks
400 Bad Request: "field X is accepted in the schema but not evaluated at scoring time"is strictly better than silently no-op'ing.ERROR_INFO/ERROR_TYPE— these are the highest-value missing fields for reliability work. Adding them to the switch looks straightforward (the values are on theTraceobject).Environment
comet-ml/opikmain branchconsumer-agentproject, Python SDKopik==1.10.xTraceFilterEvaluationService.extractFieldValueswitchThanks for maintaining Opik — happy to PR the validation and/or the error_info case if that's a welcome path.