Welcome to the Pipecat Audio Transcription Example!
This project showcases how to integrate the awesome pipecat library with a neat textual interface (powered by Textual) to select audio devices, perform real-time speech-to-text (STT) transcription using Whisper.
Note: Although the script allows you to select both input and output audio devices, this example only utilizes the audio input for transcription.
- Interactive Audio Device Selection:
Choose your preferred audio input device using a cool, textual UI. - State-of-the-Art Transcription:
Leverage Whisper's large model (running on CUDA) for high-quality, real-time STT. - Live Transcription Logging:
Watch your spoken words transform into text on your console instantly. - Easy Setup:
Everything you need is in therequirements.txt.
Get a quick glimpse of the app in action!
(Don't worry – I'll be adding a GIF demo here soon!)
Install dependencies:
uv syncRun the main script:
uv run bot.pyWhen the app launches, you'll see a textual interface that lets you select your audio input device. Once selected, the app will begin capturing audio, transcribing it using Whisper.
- LocalAudioTransport:
Captures audio from your chosen input device. - WhisperSTTService:
Processes the audio stream using Whisper's large model for speech-to-text conversion. - TranscriptionLogger:
Logs the transcribed text to the console as soon as it's processed.
The project relies on:
- pipecat – For building the audio processing pipeline.
- Textual – For the interactive terminal UI.
- Whisper – For state-of-the-art STT transcription.
I plan to improve this example with local LLM calls and audio output.
