Skip to content

Latest commit

 

History

History
220 lines (166 loc) · 9.29 KB

File metadata and controls

220 lines (166 loc) · 9.29 KB

Deepgram SageMaker Text-to-Speech Stress Test Client

A Python client for stress testing Deepgram Text-to-Speech (TTS) endpoints deployed on AWS SageMaker. Streams text phrases to multiple simultaneous bidirectional connections for load testing, with audio playback from a single selectable connection.

Prerequisites

  • Python 3.12+
  • uv package manager
  • AWS credentials configured (CLI, environment variables, or IAM role)
  • A deployed Amazon SageMaker endpoint running a Deepgram TTS model
  • PyAudio for audio playback:
    • macOS: brew install portaudio
    • Linux: sudo apt-get install portaudio19-dev

Installation

cd python-tts
uv sync

tts_stress.py

Streams text phrases from a file to multiple simultaneous bidirectional connections to a Deepgram TTS endpoint on SageMaker. Audio from one selected connection is played back through local speakers; all other connections receive and discard audio.

Prepare a text input file

Create a plain text file with one phrase per line (default: tts-input.txt):

Hello, this is a test of the Deepgram text-to-speech system.
Welcome to the future of voice synthesis.

Phrases are cycled repeatedly for the duration of the test.

Examples

Basic usage (single connection, 30-second test):

uv run tts_stress.py your-endpoint-name

With a specific AWS region:

uv run tts_stress.py your-endpoint-name --region us-west-2

Multiple simultaneous connections (load testing):

uv run tts_stress.py your-endpoint-name --connections 5

Select which connection plays audio to speakers:

uv run tts_stress.py your-endpoint-name --connections 5 --playback 3

With a different Deepgram TTS voice:

uv run tts_stress.py your-endpoint-name --voice aura-2-orion-en

Custom duration and text file:

uv run tts_stress.py your-endpoint-name --duration 60 --text-file my-phrases.txt

Full example with all options:

uv run tts_stress.py your-endpoint-name \
  --connections 5 \
  --playback 2 \
  --duration 120 \
  --voice aura-2-thalia-en \
  --text-file tts-input.txt \
  --region us-east-2 \
  --log-level DEBUG

Options

Option Description Default
endpoint_name SageMaker endpoint name (required)
--connections N Number of simultaneous streaming connections 1
--playback N Connection ID whose audio is played to speakers (1-based, ≤ connections; 0 = headless) 1
--no-playback Run headless — no speaker playback (sets --playback 0). Required when PortAudio/pyaudio is unavailable (CI / e2e) off
--duration SECONDS How long to run the test 30
--once Send each phrase exactly once then stop (deterministic), instead of cycling for --duration off
--voice VOICE Deepgram TTS voice model aura-2-thalia-en
--text-file PATH Path to a text file with phrases to synthesize, one per line tts-input.txt
--text "..." Inline text to synthesize (overrides --text-file; single phrase)
--extra "k=v&k2=v2" Extra /v1/speak query params appended verbatim (e.g. encoding=mulaw&sample_rate=8000&speed=1.2)
--summary-jsonl PATH Write a per-connection JSON summary (audio bytes/RMS/duration, Flushed acks, warnings, errors) — consumed by the e2e driver
--save-audio-dir DIR Save each connection's synthesized audio to DIR (WAV for linear16, raw bytes otherwise)
--region REGION AWS region us-east-2
--log-level LEVEL DEBUG, INFO, WARNING, ERROR, CRITICAL INFO

How it works

  1. Text phrases are read from the input file and cycled repeatedly for the duration of the test.
  2. Each phrase is sent as a Speak message followed by a Flush to trigger audio synthesis.
  3. The script waits for a Flushed acknowledgement from every active connection before sending the next phrase, keeping all connections in sync and avoiding server-side rate limiting.
  4. The duration timer starts when the first audio chunk is received from the playback connection.
  5. On shutdown, the script waits up to 30 seconds for each connection to finish receiving remaining synthesized audio.

End-to-end correctness drivers (e2e/)

Run-everything correctness gates that exercise a TTS endpoint across its full parameter surface and validate the synthesized audio self-contained — TTS has no transcript to score against, so each scenario checks the audio itself (non-empty bytes, correct container/codec, non-silent RMS for linear16, the requested sample rate, and — for speed control — that duration changes with speed). No second (STT) endpoint is required.

A TTS endpoint can be invoked three ways. The batch driver covers REST sync and async (--mode); the streaming driver covers the websocket transport — mirroring the STT split.

e2e/e2e_test_batch.py — REST sync + async (invoke_endpoint / invoke_endpoint_async)

The bulk of parameter coverage. --mode sync (default) calls invoke_endpoint with the /v1/speak path + query in CustomAttributes and a JSON {"text": "..."} body. --mode async uploads that JSON to S3, calls invoke_endpoint_async, and polls the S3 output (audio) / failure prefixes. Either way the returned audio is validated self-contained. A given endpoint serves one transport — whichever its config was created with (async needs an AsyncInferenceConfig).

cd python-tts
# sync endpoint:
uv run e2e/e2e_test_batch.py your-tts-endpoint --region us-east-2

# async endpoint (config has AsyncInferenceConfig):
uv run e2e/e2e_test_batch.py your-async-tts-endpoint --mode async \
  --bucket your-async-bucket --region us-east-2

uv run e2e/e2e_test_batch.py --list
Scenario What it checks
basic / concurrent_5 linear16/wav non-empty, non-silent audio; 5-way concurrency
default_format records the server's default output format (empirically mp3 on current bundles)
voice_aura2_orion / voice_aura1_asteria alternate Aura-2 / legacy Aura-1 voices (PASS-WITH-NOTE if unbundled)
encoding_linear16_wav / encoding_linear16_raw WAV container vs bare PCM (container=none)
encoding_mp3 / encoding_flac / encoding_opus_ogg / encoding_mulaw_wav each codec's container magic bytes
encoding_aac AAC (raw codec stream — bytes-only check)
sample_rate_48000 / sample_rate_16000 WAV sample rate matches the request
bit_rate_mp3_32000 bit_rate accepted for mp3
speed_duration synth at 0.7 / 1.0 / 1.5 — duration must shrink as speed rises; PASS-WITH-NOTE if the bundle lacks speed
pronunciation_ipa well-formed inline IPA override; PASS-WITH-NOTE if the bundle lacks inline controls
text_limit_exceeded text > 2000 chars → expects rejection (413 / Payload Too Large)
mip_opt_out / tag passthrough flags accepted (smoke)

e2e/e2e_test_streaming.py — websocket (bidirectional /v1/speak)

Streaming-specific behavior. Drives tts_stress.py headless (--no-playback --once --summary-jsonl) and validates each connection's audio + protocol acks from the summary JSON.

uv run e2e/e2e_test_streaming.py your-tts-endpoint --region us-east-2
uv run e2e/e2e_test_streaming.py --list
Scenario What it checks
basic / concurrent_5 Speak→audio + Flushed; 5-way concurrency
multi_phrase_flush 3 phrases — multiple Flushed acks (Speak→Flush→Flushed loop)
encoding_linear16_24k explicit linear16 @ 24 kHz — non-silent audio
encoding_mulaw_8k streaming-supported companded codec (bytes-only check)
speed_fast speed=1.4 over the websocket — non-silent audio when supported; PASS-WITH-NOTE on bundles that silently drop speed (no audio → flush-ack timeout)
voice_alt alternate voice (PASS-WITH-NOTE if unbundled)
mip_opt_out passthrough flag (smoke)

Both drivers: exit code 0 = all pass; tolerated_error_substring scenarios PASS-WITH-NOTE when the endpoint returns a known "not supported by this bundle" error. Per-scenario logs + aggregated results.json land under /tmp/dg-sagemaker-e2e/tts-batch|tts-streaming/<timestamp>/. Parameter coverage is scoped to the TTS docs (https://developers.deepgram.com/docs/tts-media-output-settings, https://developers.deepgram.com/docs/tts-voice-controls) as of the June 2026 audit.


Troubleshooting

Audio not playing

  • Verify --playback is between 1 and the number of --connections
  • Check system audio output settings
  • Verify PyAudio is installed: python -c "import pyaudio; print('OK')"
  • macOS: brew install portaudio && pip install pyaudio

Connection errors

  • Verify the endpoint name is correct and matches the target --region
  • Confirm AWS credentials are configured: aws sts get-caller-identity
  • Confirm the endpoint is InService in the AWS Console or via aws sagemaker describe-endpoint --endpoint-name your-endpoint-name
  • Check CloudWatch Logs for the SageMaker endpoint for server-side errors

No audio output

  • Ensure the --voice is valid for the deployed TTS model — see the Voices documentation
  • Check system volume and speaker settings
  • Enable debug logging: --log-level DEBUG