Production First and Production Ready End-to-End Speech Recognition Toolkit
-
Updated
May 11, 2026 - Python
Production First and Production Ready End-to-End Speech Recognition Toolkit
OpenAI Whisper ASR Webservice API
Automatic Speech Recognition (ASR), Speaker Verification, Speech Synthesis, Text-to-Speech (TTS), Language Modelling, Singing Voice Synthesis (SVS), Voice Conversion (VC)
End-to-end Automatic Speech Recognition for Madarian and English in Tensorflow
🐸STT - The deep learning toolkit for Speech-to-Text. Training and deploying STT models has never been so easy.
Frontier CoreML audio models in your apps — text-to-speech, speech-to-text, voice activity detection, and speaker diarization. In Swift, powered by SOTA open source.
Voice Activity Detector (VAD) : low-latency, high-performance and lightweight
Open-source industrial-grade ASR models supporting Mandarin, Chinese dialects and English, achieving a new SOTA on public Mandarin ASR benchmarks, while also offering outstanding singing lyrics recognition capability.
PORORO: Platform Of neuRal mOdels for natuRal language prOcessing
⚡ TensorFlowASR: Almost State-of-the-art Automatic Speech Recognition in Tensorflow 2. Supported languages that can use characters or subwords
Evaluate your speech-to-text system with similarity measures such as word error rate (WER)
Open STT
[AutoArk] GPA (General Purpose Audio) can do ASR, TTS and voice conversion with one tiny model!
Collection of resources on the applications of Large Language Models (LLMs) in Audio AI.
Offline Speech Recognition with OpenAI Whisper and TensorFlow Lite for Android
On-device streaming speech-to-text engine powered by deep learning
End-to-end ASR/LM implementation with PyTorch
A SOTA Industrial-Grade All-in-One ASR system with ASR, VAD, LID, and Punc modules. FireRedASR2 supports Chinese (Mandarin, 20+ dialects/accents), English, code-switching, and both speech and singing ASR. FireRedVAD supports speech/singing/music in 100+ langs. FireRedLID supports 100+ langs and 20+ zh dialects. FireRedPunc supports zh and en.
This is a list of features, scripts, blogs and resources for better using Kaldi ( http://kaldi-asr.org/ )
Add a description, image, and links to the automatic-speech-recognition topic page so that developers can more easily learn about it.
To associate your repository with the automatic-speech-recognition topic, visit your repo's landing page and select "manage topics."