A Python client for stress testing Deepgram Text-to-Speech (TTS) endpoints deployed on AWS SageMaker. Streams text phrases to multiple simultaneous bidirectional connections for load testing, with audio playback from a single selectable connection.
- Python 3.12+
- uv package manager
- AWS credentials configured (CLI, environment variables, or IAM role)
- A deployed Amazon SageMaker endpoint running a Deepgram TTS model
- PyAudio for audio playback:
- macOS:
brew install portaudio - Linux:
sudo apt-get install portaudio19-dev
- macOS:
cd python-tts
uv syncStreams text phrases from a file to multiple simultaneous bidirectional connections to a Deepgram TTS endpoint on SageMaker. Audio from one selected connection is played back through local speakers; all other connections receive and discard audio.
Create a plain text file with one phrase per line (default: tts-input.txt):
Hello, this is a test of the Deepgram text-to-speech system.
Welcome to the future of voice synthesis.
Phrases are cycled repeatedly for the duration of the test.
Basic usage (single connection, 30-second test):
uv run tts_stress.py your-endpoint-nameWith a specific AWS region:
uv run tts_stress.py your-endpoint-name --region us-west-2Multiple simultaneous connections (load testing):
uv run tts_stress.py your-endpoint-name --connections 5Select which connection plays audio to speakers:
uv run tts_stress.py your-endpoint-name --connections 5 --playback 3With a different Deepgram TTS voice:
uv run tts_stress.py your-endpoint-name --voice aura-2-orion-enCustom duration and text file:
uv run tts_stress.py your-endpoint-name --duration 60 --text-file my-phrases.txtFull example with all options:
uv run tts_stress.py your-endpoint-name \
--connections 5 \
--playback 2 \
--duration 120 \
--voice aura-2-thalia-en \
--text-file tts-input.txt \
--region us-east-2 \
--log-level DEBUG| Option | Description | Default |
|---|---|---|
endpoint_name |
SageMaker endpoint name (required) | — |
--connections N |
Number of simultaneous streaming connections | 1 |
--playback N |
Connection ID whose audio is played to speakers (1-based, ≤ connections; 0 = headless) |
1 |
--no-playback |
Run headless — no speaker playback (sets --playback 0). Required when PortAudio/pyaudio is unavailable (CI / e2e) |
off |
--duration SECONDS |
How long to run the test | 30 |
--once |
Send each phrase exactly once then stop (deterministic), instead of cycling for --duration |
off |
--voice VOICE |
Deepgram TTS voice model | aura-2-thalia-en |
--text-file PATH |
Path to a text file with phrases to synthesize, one per line | tts-input.txt |
--text "..." |
Inline text to synthesize (overrides --text-file; single phrase) |
— |
--extra "k=v&k2=v2" |
Extra /v1/speak query params appended verbatim (e.g. encoding=mulaw&sample_rate=8000&speed=1.2) |
— |
--summary-jsonl PATH |
Write a per-connection JSON summary (audio bytes/RMS/duration, Flushed acks, warnings, errors) — consumed by the e2e driver |
— |
--save-audio-dir DIR |
Save each connection's synthesized audio to DIR (WAV for linear16, raw bytes otherwise) |
— |
--region REGION |
AWS region | us-east-2 |
--log-level LEVEL |
DEBUG, INFO, WARNING, ERROR, CRITICAL |
INFO |
- Text phrases are read from the input file and cycled repeatedly for the duration of the test.
- Each phrase is sent as a
Speakmessage followed by aFlushto trigger audio synthesis. - The script waits for a
Flushedacknowledgement from every active connection before sending the next phrase, keeping all connections in sync and avoiding server-side rate limiting. - The duration timer starts when the first audio chunk is received from the playback connection.
- On shutdown, the script waits up to 30 seconds for each connection to finish receiving remaining synthesized audio.
Run-everything correctness gates that exercise a TTS endpoint across its full
parameter surface and validate the synthesized audio self-contained — TTS
has no transcript to score against, so each scenario checks the audio itself
(non-empty bytes, correct container/codec, non-silent RMS for linear16, the
requested sample rate, and — for speed control — that duration changes with
speed). No second (STT) endpoint is required.
A TTS endpoint can be invoked three ways. The batch driver covers REST sync
and async (--mode); the streaming driver covers the websocket
transport — mirroring the STT split.
The bulk of parameter coverage. --mode sync (default) calls invoke_endpoint
with the /v1/speak path + query in CustomAttributes and a JSON
{"text": "..."} body. --mode async uploads that JSON to S3, calls
invoke_endpoint_async, and polls the S3 output (audio) / failure prefixes.
Either way the returned audio is validated self-contained. A given endpoint
serves one transport — whichever its config was created with (async needs an
AsyncInferenceConfig).
cd python-tts
# sync endpoint:
uv run e2e/e2e_test_batch.py your-tts-endpoint --region us-east-2
# async endpoint (config has AsyncInferenceConfig):
uv run e2e/e2e_test_batch.py your-async-tts-endpoint --mode async \
--bucket your-async-bucket --region us-east-2
uv run e2e/e2e_test_batch.py --list| Scenario | What it checks |
|---|---|
basic / concurrent_5 |
linear16/wav non-empty, non-silent audio; 5-way concurrency |
default_format |
records the server's default output format (empirically mp3 on current bundles) |
voice_aura2_orion / voice_aura1_asteria |
alternate Aura-2 / legacy Aura-1 voices (PASS-WITH-NOTE if unbundled) |
encoding_linear16_wav / encoding_linear16_raw |
WAV container vs bare PCM (container=none) |
encoding_mp3 / encoding_flac / encoding_opus_ogg / encoding_mulaw_wav |
each codec's container magic bytes |
encoding_aac |
AAC (raw codec stream — bytes-only check) |
sample_rate_48000 / sample_rate_16000 |
WAV sample rate matches the request |
bit_rate_mp3_32000 |
bit_rate accepted for mp3 |
speed_duration |
synth at 0.7 / 1.0 / 1.5 — duration must shrink as speed rises; PASS-WITH-NOTE if the bundle lacks speed |
pronunciation_ipa |
well-formed inline IPA override; PASS-WITH-NOTE if the bundle lacks inline controls |
text_limit_exceeded |
text > 2000 chars → expects rejection (413 / Payload Too Large) |
mip_opt_out / tag |
passthrough flags accepted (smoke) |
Streaming-specific behavior. Drives tts_stress.py headless
(--no-playback --once --summary-jsonl) and validates each connection's audio +
protocol acks from the summary JSON.
uv run e2e/e2e_test_streaming.py your-tts-endpoint --region us-east-2
uv run e2e/e2e_test_streaming.py --list| Scenario | What it checks |
|---|---|
basic / concurrent_5 |
Speak→audio + Flushed; 5-way concurrency |
multi_phrase_flush |
3 phrases — multiple Flushed acks (Speak→Flush→Flushed loop) |
encoding_linear16_24k |
explicit linear16 @ 24 kHz — non-silent audio |
encoding_mulaw_8k |
streaming-supported companded codec (bytes-only check) |
speed_fast |
speed=1.4 over the websocket — non-silent audio when supported; PASS-WITH-NOTE on bundles that silently drop speed (no audio → flush-ack timeout) |
voice_alt |
alternate voice (PASS-WITH-NOTE if unbundled) |
mip_opt_out |
passthrough flag (smoke) |
Both drivers: exit code 0 = all pass; tolerated_error_substring scenarios
PASS-WITH-NOTE when the endpoint returns a known "not supported by this bundle"
error. Per-scenario logs + aggregated results.json land under
/tmp/dg-sagemaker-e2e/tts-batch|tts-streaming/<timestamp>/. Parameter coverage
is scoped to the TTS docs
(https://developers.deepgram.com/docs/tts-media-output-settings,
https://developers.deepgram.com/docs/tts-voice-controls) as of the June 2026 audit.
- Verify
--playbackis between 1 and the number of--connections - Check system audio output settings
- Verify PyAudio is installed:
python -c "import pyaudio; print('OK')" - macOS:
brew install portaudio && pip install pyaudio
- Verify the endpoint name is correct and matches the target
--region - Confirm AWS credentials are configured:
aws sts get-caller-identity - Confirm the endpoint is
InServicein the AWS Console or viaaws sagemaker describe-endpoint --endpoint-name your-endpoint-name - Check CloudWatch Logs for the SageMaker endpoint for server-side errors
- Ensure the
--voiceis valid for the deployed TTS model — see the Voices documentation - Check system volume and speaker settings
- Enable debug logging:
--log-level DEBUG