Purpose: real-time and REST-driven voice commands using a shared core pipeline:
- STT (optional) → intent parse → action execute → TTS (optional)
- Actions can route to MCP tools, workflows, custom handlers, or LLM chat
Core implementation:
- REST + WebSocket endpoints:
tldw_Server_API/app/api/v1/endpoints/voice_assistant.py - Core module:
tldw_Server_API/app/core/VoiceAssistant/
Process a text command as if it were spoken (bypasses STT).
Request body (VoiceCommandRequest):
text(required): transcribed text to processsession_id(optional): reuse an existing voice sessioninclude_tts(defaulttrue): include base64 TTS in the responsetts_provider,tts_voice,tts_format(optional overrides)
Response (VoiceCommandResponse):
session_id: resolved session identifierintent: parsed intent (action type + metadata)action_result: action result (success, response text, error)output_audio/output_audio_format: base64 TTS (when enabled)processing_time_ms: end-to-end processing time
Voice commands can be system-level (YAML defaults, user_id=0) or user-level
(DB-backed).
- GET
/api/v1/voice/commands- Query:
include_system(defaulttrue)include_disabled(defaultfalse)
- Query:
- POST
/api/v1/voice/commands- Creates a user command
- GET
/api/v1/voice/commands/{command_id}- Fetch a specific command (user or system)
- PUT
/api/v1/voice/commands/{command_id}- Update a user command (system commands are forbidden)
- POST
/api/v1/voice/commands/{command_id}/toggle- Enable/disable a user command
- DELETE
/api/v1/voice/commands/{command_id}- Soft-delete a user command
Command definition notes:
phrasesare matched as prefixes.- Conflicts are resolved by match score, then
priority. - See
tldw_Server_API/Config_Files/voice_commands.yamlfor system defaults.
- GET
/api/v1/voice/sessions- Query:
active_only(defaulttrue): uses the session timeout windowlimit(default100, max1000)
- Query:
- GET
/api/v1/voice/sessions/{session_id}- Session detail snapshot
- DELETE
/api/v1/voice/sessions/{session_id}- End a session (204 on success)
Analytics:
- GET
/api/v1/voice/analytics- Query:
days(default7, range1..365)
- Returns aggregate metrics, top commands, and daily usage
- Query:
- GET
/api/v1/voice/commands/{command_id}/usage- Query:
days(default30, range1..365)
- Returns per-command usage stats
- Query:
- GET
/api/v1/voice/workflows/templates- Lists voice-oriented workflow templates
- GET
/api/v1/voice/workflows/{run_id}/status- Workflow run status for the current user
- POST
/api/v1/voice/workflows/{run_id}/cancel- Best-effort cancellation
Purpose: low-latency voice turns with streamed STT input and streamed TTS output.
High-level protocol:
- Client sends
auth - Server sends
auth_ok(or closes) - Client sends
config - Client streams
audioand/or sendstext - Client sends
committo finalize an utterance - Server sends
transcription→intent→action_*→tts_*
First message must be auth.
Auth:
{"type": "auth", "token": "YOUR_API_KEY_OR_JWT"}Config:
{
"type": "config",
"stt_model": "parakeet",
"stt_language": "en",
"tts_provider": "kokoro",
"tts_voice": "af_heart",
"tts_format": "mp3",
"sample_rate": 16000
}Audio (base64 PCM float32 frames):
{"type": "audio", "data": "<base64_float32_pcm>", "sequence": 1}Commit:
{"type": "commit"}Text (bypass STT):
{"type": "text", "text": "search for notes about rag reranking"}Other controls:
{"type":"cancel"}clears buffered audio and pending confirmations{"type":"workflow_subscribe","run_id":"..."}{"type":"workflow_cancel","run_id":"..."}
State changes:
{"type": "state_change", "state": "listening"}
{"type": "state_change", "state": "processing"}Transcription:
{"type": "transcription", "text": "search for rag benchmarks", "is_final": true}Intent + action:
{"type": "intent", "action_type": "mcp_tool", "command_name": "Search Media"}
{"type": "action_start", "action_type": "mcp_tool"}
{"type": "action_result", "success": true, "response_text": "I found 3 results."}TTS streaming:
{"type": "tts_chunk", "sequence": 1, "format": "mp3", "data": "<base64_audio>"}
{"type": "tts_end", "total_chunks": 4, "total_bytes": 58231}On completion, the server returns to idle:
{"type": "state_change", "state": "idle"}Voice routes are mounted under the voice-assistant and voice-assistant-ws
route keys in tldw_Server_API/app/main.py.
Important for tests:
- When
MINIMAL_TEST_APP=1, voice routes are not mounted. - For voice endpoint tests, set:
MINIMAL_TEST_APP=0ULTRA_MINIMAL_APP=0- and reload
tldw_Server_API.app.mainif already imported
Voice persistence and analytics live in the ChaChaNotes DB:
voice_commands: command definitionsvoice_sessions: session snapshotsvoice_command_events: per-turn analytics events
Analytics endpoints are powered by aggregate queries in
tldw_Server_API/app/core/VoiceAssistant/db_helpers.py.
Targeted test commands:
- REST endpoints:
python -m pytest -q tldw_Server_API/tests/VoiceAssistant/test_rest_endpoints.py
- Pipeline behaviors:
python -m pytest -q tldw_Server_API/tests/VoiceAssistant/test_e2e_pipeline.py