Speech-to-Text Transcription Script

This script uses the Vosk speech recognition toolkit to transcribe audio files into text and saves the transcription in a Word document (.docx).

Prerequisites

Python: Ensure you have Python installed on your system.
Vosk Model: Download the Vosk model for speech recognition from Vosk Models.
Required Python Libraries: Install the necessary Python libraries using the following command:
```
pip install vosk python-docx
```

How It Works

The script checks if the Vosk model exists in the specified path.
It opens the provided audio file and reads the audio data.
The audio data is processed using Vosk's speech recognition to transcribe the speech into text.
The transcribed text is saved into a new Word document (transcript.docx).

Usage

Download and Install Dependencies:
```
pip install vosk python-docx
```
Download a Vosk Model: Download a Vosk model (e.g., vosk-model-en-us-0.42-gigaspeech) from the Vosk website and extract it to a directory.
Update the Model Path: Replace the placeholder in the script with the path to your downloaded Vosk model:
```
model_path = "{your path of the downloaded model}/vosk-model-en-us-0.42-gigaspeech"
```
Run the Script:
- Ensure your audio file is accessible.
- Run the script and provide the path to your audio file when prompted.
- The script will transcribe the audio and save the transcription as transcript.docx.
```
python your_script_name.py
```
Example:
```
Enter the path to your audio file: /path/to/your/audio/file.wav
```
After successful transcription, a file named transcript.docx will be created in the same directory as the script.

Error Handling

Vosk Model Not Found: Ensure the model path is correct and the model exists at the specified location.
Audio File Not Found: Verify the audio file path is correct.
File Permission Error: Ensure you have write permissions in the directory where the script is saving the transcription.

Notes

Adjust the sample rate in the KaldiRecognizer instantiation if your audio file's sample rate differs from 16000 Hz.
The script currently handles basic error scenarios. For production use, consider adding more comprehensive error handling and logging.

License

This project is licensed under the MIT License. See the LICENSE file for more details.

Feel free to customize and extend this script to better fit your specific needs!

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
speech_to_text_using_vosk_model.py		speech_to_text_using_vosk_model.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Speech-to-Text Transcription Script

Prerequisites

How It Works

Usage

Error Handling

Notes

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Speech-to-Text Transcription Script

Prerequisites

How It Works

Usage

Error Handling

Notes

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages