fix(task_05_fmops): upgrade judge model from legacy Sonnet 4 to Sonnet 4.6 by el-pedrito · Pull Request #256 · aws-samples/generative-ai-on-amazon-sagemaker

el-pedrito · 2026-05-27T08:42:38Z

Problem

The QualitativeModelEvaluation pipeline step fails with:

Float cannot represent non numeric value: nan

This error occurs 6 times (3 metrics × 2 aggregations: mean + variance).

Root Cause

The LLM-as-a-judge model claude-sonnet-4-20250514-v1:0 has been marked as Legacy by Anthropic on Amazon Bedrock. When a model is Legacy and hasn't been used in the last 30 days, Bedrock rejects all invocations with:

ResourceNotFoundException: Access denied. This Model is marked by provider as Legacy
and you have not been actively using the model in the last 30 days.

This causes make_genai_metric to silently fail on every judge call → all scores become None → aggregated as NaN → the SageMaker Managed MLflow tracking server (GraphQL API) rejects NaN float values.

The downstream EvaluationGate step also fails with Incompatible types in BinaryCondition left type [String], right type [Float] because the qualitative result is not a valid float.

Fix

Replace all references to the legacy model ID with the active claude-sonnet-4-6:

bedrock:/global.anthropic.claude-sonnet-4-20250514-v1:0 → bedrock:/global.anthropic.claude-sonnet-4-6

Also removed commented-out references to the legacy claude-3-haiku-20240307-v1:0.

Files Changed

steps/qualitative_eval_step.py — 3 metric definitions + 1 log_param
05.00_fmops_examples.ipynb — judge_llm variable + markdown description

Testing

Pipeline executed end-to-end successfully after the fix:

QualitativeModelEvaluation: ✅ Succeeded (real scores logged)
EvaluationGate: ✅ Succeeded
ModelRegistration: ✅ Succeeded

…t 4.6 claude-sonnet-4-20250514-v1:0 was marked as Legacy on Bedrock, causing all LLM-as-a-judge calls to fail silently and produce NaN scores. Replace with active model: global.anthropic.claude-sonnet-4-6

fix(task_05_fmops): upgrade judge model from legacy Sonnet 4 to Sonne…

bdffc40

…t 4.6 claude-sonnet-4-20250514-v1:0 was marked as Legacy on Bedrock, causing all LLM-as-a-judge calls to fail silently and produce NaN scores. Replace with active model: global.anthropic.claude-sonnet-4-6

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(task_05_fmops): upgrade judge model from legacy Sonnet 4 to Sonnet 4.6#256

fix(task_05_fmops): upgrade judge model from legacy Sonnet 4 to Sonnet 4.6#256
el-pedrito wants to merge 1 commit into
aws-samples:mainfrom
el-pedrito:fix/task05-legacy-bedrock-model

el-pedrito commented May 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

el-pedrito commented May 27, 2026

Problem

Root Cause

Fix

Files Changed

Testing

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant