Skip to content

Added a workshop for sagemaker automated inference benchmarking#264

Open
dferguson992 wants to merge 3 commits into
aws-samples:mainfrom
dferguson992:main
Open

Added a workshop for sagemaker automated inference benchmarking#264
dferguson992 wants to merge 3 commits into
aws-samples:mainfrom
dferguson992:main

Conversation

@dferguson992

Copy link
Copy Markdown
Contributor

Add: SageMaker AI Automated Benchmarking & Inference Recommendations Workshop

Summary

Adds a new 4-lab workshop that walks users through the full lifecycle of benchmarking and optimizing generative AI model deployments on Amazon SageMaker AI — from initial deployment through automated inference recommendations and cost-per-token comparison.

What's included

File Description
workshops/inference-benchmarking/README.md Workshop overview, architecture, prerequisites, and references
lab0/lab0_setup.ipynb Environment setup — endpoint provisioning, dependencies, region config
lab1/lab1_deploy_and_benchmark.ipynb Deploy Llama 3.1 8B via JumpStart and run a first benchmark with CreateAIBenchmarkJob
lab2/lab2_benchmarking_nuances.ipynb Explore how concurrency, token lengths, request rates, and streaming impact TTFT/ITL/throughput
lab3/lab3_inference_recommendations.ipynb Use CreateAIRecommendationJob to find optimal deployment configs across instance types
lab4/lab4_deploy_and_compare.ipynb Deploy the recommended config and compare baseline vs. optimized performance + cost
requirements.txt Python dependencies
utils.py Shared helpers for metrics parsing, visualization, and comparison tables

Key concepts covered

  • SageMaker AI Workload Configs and Benchmark Jobs (NVIDIA AIPerf)
  • Automated Inference Recommendations with speculative decoding and kernel tuning
  • TTFT, ITL, throughput, and latency percentile analysis
  • Cost-per-token comparison across configurations

Prerequisites

  • Service quota for ml.g6.12xlarge (or ml.g5.12xlarge)
  • HuggingFace token stored as env var (HF_TOKEN) for gated model access
  • Available in: us-east-1, us-east-2, us-west-2, ap-southeast-1, ap-northeast-1, eu-central-1, eu-west-1

Testing

  • Ran all labs end-to-end in us-west-2 on SageMaker Studio
  • Verified cleanup cells properly delete endpoints and inference components
  • Estimated total cost for full workshop run: ~$100–145

Estimated lab durations

Lab Time Cost
Lab 1 ~45 min ~$15–20
Lab 2 ~60 min ~$25–35
Lab 3 ~90 min ~$40–60
Lab 4 ~45 min ~$20–30

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants