Added a workshop for sagemaker automated inference benchmarking by dferguson992 · Pull Request #264 · aws-samples/generative-ai-on-amazon-sagemaker

dferguson992 · 2026-06-09T17:26:13Z

Add: SageMaker AI Automated Benchmarking & Inference Recommendations Workshop

Summary

Adds a new 4-lab workshop that walks users through the full lifecycle of benchmarking and optimizing generative AI model deployments on Amazon SageMaker AI — from initial deployment through automated inference recommendations and cost-per-token comparison.

What's included

File	Description
`workshops/inference-benchmarking/README.md`	Workshop overview, architecture, prerequisites, and references
`lab0/lab0_setup.ipynb`	Environment setup — endpoint provisioning, dependencies, region config
`lab1/lab1_deploy_and_benchmark.ipynb`	Deploy Llama 3.1 8B via JumpStart and run a first benchmark with `CreateAIBenchmarkJob`
`lab2/lab2_benchmarking_nuances.ipynb`	Explore how concurrency, token lengths, request rates, and streaming impact TTFT/ITL/throughput
`lab3/lab3_inference_recommendations.ipynb`	Use `CreateAIRecommendationJob` to find optimal deployment configs across instance types
`lab4/lab4_deploy_and_compare.ipynb`	Deploy the recommended config and compare baseline vs. optimized performance + cost
`requirements.txt`	Python dependencies
`utils.py`	Shared helpers for metrics parsing, visualization, and comparison tables

Key concepts covered

SageMaker AI Workload Configs and Benchmark Jobs (NVIDIA AIPerf)
Automated Inference Recommendations with speculative decoding and kernel tuning
TTFT, ITL, throughput, and latency percentile analysis
Cost-per-token comparison across configurations

Prerequisites

Service quota for ml.g6.12xlarge (or ml.g5.12xlarge)
HuggingFace token stored as env var (HF_TOKEN) for gated model access
Available in: us-east-1, us-east-2, us-west-2, ap-southeast-1, ap-northeast-1, eu-central-1, eu-west-1

Testing

Ran all labs end-to-end in us-west-2 on SageMaker Studio
Verified cleanup cells properly delete endpoints and inference components
Estimated total cost for full workshop run: ~$100–145

Estimated lab durations

Lab	Time	Cost
Lab 1	~45 min	~$15–20
Lab 2	~60 min	~$25–35
Lab 3	~90 min	~$40–60
Lab 4	~45 min	~$20–30

frgud and others added 3 commits June 9, 2026 13:17

Added a workshop for sagemaker automated inference benchmarking

3f3431d

Merge branch 'aws-samples:main' into main

37f04c9

Merge branch 'aws-samples:main' into main

9000d01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added a workshop for sagemaker automated inference benchmarking#264

Added a workshop for sagemaker automated inference benchmarking#264
dferguson992 wants to merge 3 commits into
aws-samples:mainfrom
dferguson992:main

dferguson992 commented Jun 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

dferguson992 commented Jun 9, 2026

Add: SageMaker AI Automated Benchmarking & Inference Recommendations Workshop

Summary

What's included

Key concepts covered

Prerequisites

Testing

Estimated lab durations

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants