Skip to content

fix(task_05_fmops): upgrade judge model from legacy Sonnet 4 to Sonnet 4.6#256

Open
el-pedrito wants to merge 1 commit into
aws-samples:mainfrom
el-pedrito:fix/task05-legacy-bedrock-model
Open

fix(task_05_fmops): upgrade judge model from legacy Sonnet 4 to Sonnet 4.6#256
el-pedrito wants to merge 1 commit into
aws-samples:mainfrom
el-pedrito:fix/task05-legacy-bedrock-model

Conversation

@el-pedrito

Copy link
Copy Markdown
Contributor

Problem

The QualitativeModelEvaluation pipeline step fails with:

Float cannot represent non numeric value: nan

This error occurs 6 times (3 metrics × 2 aggregations: mean + variance).

Root Cause

The LLM-as-a-judge model claude-sonnet-4-20250514-v1:0 has been marked as Legacy by Anthropic on Amazon Bedrock. When a model is Legacy and hasn't been used in the last 30 days, Bedrock rejects all invocations with:

ResourceNotFoundException: Access denied. This Model is marked by provider as Legacy
and you have not been actively using the model in the last 30 days.

This causes make_genai_metric to silently fail on every judge call → all scores become None → aggregated as NaN → the SageMaker Managed MLflow tracking server (GraphQL API) rejects NaN float values.

The downstream EvaluationGate step also fails with Incompatible types in BinaryCondition left type [String], right type [Float] because the qualitative result is not a valid float.

Fix

Replace all references to the legacy model ID with the active claude-sonnet-4-6:

  • bedrock:/global.anthropic.claude-sonnet-4-20250514-v1:0bedrock:/global.anthropic.claude-sonnet-4-6

Also removed commented-out references to the legacy claude-3-haiku-20240307-v1:0.

Files Changed

  • steps/qualitative_eval_step.py — 3 metric definitions + 1 log_param
  • 05.00_fmops_examples.ipynb — judge_llm variable + markdown description

Testing

Pipeline executed end-to-end successfully after the fix:

  • QualitativeModelEvaluation: ✅ Succeeded (real scores logged)
  • EvaluationGate: ✅ Succeeded
  • ModelRegistration: ✅ Succeeded

…t 4.6

claude-sonnet-4-20250514-v1:0 was marked as Legacy on Bedrock, causing
all LLM-as-a-judge calls to fail silently and produce NaN scores.

Replace with active model: global.anthropic.claude-sonnet-4-6
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant