Skip to content

kizuna-ai-lab/sokuji

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1,140 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Sokuji Logo

Real-time speech translation — cloud or fully offline on your device

AGPL-3.0 License Build and Release Latest Release Platforms Ask DeepWiki

English | 日本語 | 中文


Why Sokuji?

Built by Kizuna AI Lab — we use AI to break language and accessibility barriers, creating genuine human connections. "Kizuna" (絆) means "bond" in Japanese, and Sokuji (即時) is our flagship tool to make real-time communication possible across any language.

Sokuji is a cross-platform live speech translation app for desktop and browser. It supports Local Inference — on-device ASR, translation, and TTS powered by WASM and WebGPU, with no API key required, no expensive GPU needed, fully offline, and completely private. It also integrates with cloud providers including OpenAI, Google Gemini, Palabra.ai, Kizuna AI, Doubao AST 2.0, and OpenAI-compatible APIs.


How It Works

graph LR
    A["🗣️ You speak<br/>(any language)"] --> B["🎙️ Sokuji"]
    B --> C{"Choose one"}
    C -->|"☁️ Cloud"| D["OpenAI · Gemini<br/>Palabra · Doubao..."]
    C -->|"🖥️ Local"| E["On-device AI<br/>ASR → Translate → TTS<br/>(fully offline, no GPU)"]
    D --> F["🔊 Translated voice"]
    E --> F
    F --> G["💻 Zoom · Teams · Meet<br/>Discord · Any app"]

    style A fill:#4a9eff,stroke:#357abd,color:#fff
    style B fill:#10a37f,stroke:#0d8a6a,color:#fff
    style C fill:#ff9f43,stroke:#e88a2e,color:#fff
    style D fill:#6c5ce7,stroke:#5a4bd1,color:#fff
    style E fill:#00b894,stroke:#009d7e,color:#fff
    style F fill:#fd79a8,stroke:#e56b96,color:#fff
    style G fill:#636e72,stroke:#525c60,color:#fff
Loading
Providers 7 — OpenAI, Gemini, Palabra.ai, Kizuna AI, Doubao AST 2.0, OpenAI Compatible, Local Inference
Local Models 48 ASR models, 55+ translation pairs, 136 TTS voices
Languages 99+ (speech recognition) · 55+ (translation) · 53 (text-to-speech)
Platforms Linux · Windows · macOS · Chrome · Edge
Privacy Local Inference = 100% on-device, no API key, no internet

Demo

demo.mp4

Install

Sokuji is available as a Desktop App and a Browser Extension — same features, different reach.

Desktop App Browser Extension
Features All features identical All features identical
Use with Any app with mic input — Zoom, Teams, Discord, Slack, games, OBS, and more Web-based meeting platforms — Google Meet, Teams, Zoom, Discord, Slack, Gather.town, Whereby
Install Download & install Zero install — add from store
Platforms Windows · macOS · Linux Chrome · Edge · Brave (coming soon)

Desktop App

Download from the Releases page:

Platform Package
Windows Sokuji-x.y.z.Setup.exe
macOS (Apple Silicon) Sokuji-x.y.z-arm64.pkg
macOS (Intel) Sokuji-x.y.z-x64.pkg
Linux (Debian/Ubuntu x64) sokuji_x.y.z_amd64.deb
Linux (Debian/Ubuntu ARM64) sokuji_x.y.z_arm64.deb

Browser Extension

Available on Chrome Web Store Available on Microsoft Edge Add-ons

Install extension in Developer Mode
  1. Download sokuji-extension.zip from the Releases page
  2. Extract the zip file
  3. Go to chrome://extensions/ and enable "Developer mode"
  4. Click "Load unpacked" and select the extracted folder

Build from Source

git clone https://github.com/kizuna-ai-lab/sokuji.git
cd sokuji && npm install
npm run electron:dev        # Development
npm run electron:build      # Production

Features

Local Inference (Edge AI)

Run everything on your device — no API keys, no internet, no expensive GPU, complete privacy. Powered by WASM and WebGPU, Sokuji runs efficiently on any modern browser using your existing CPU and integrated graphics.

  • 50 ASR models (32 offline + 10 streaming + 8 WebGPU including Whisper, Cohere Transcribe, Voxtral Mini 4B) covering 99+ languages
  • 55+ translation pairs via Opus-MT + 5 multilingual LLMs (Qwen 2.5 / 3 / 3.5, GemmaTranslate) with WebGPU
  • 136 TTS voices across 53 languages (Piper, Piper-Plus, Coqui, Mimic3, Matcha engines)
  • One-click model download with IndexedDB caching

Cloud Providers

Provider Key Feature
OpenAI gpt-realtime-mini / gpt-realtime-1.5 · 10 voices · configurable turn detection (Normal / Semantic / Disabled) · noise reduction · 60+ languages
Google Gemini Dynamic model selection (audio/live models) · 30 voices · built-in turn detection · 34 language variants
Palabra.ai WebRTC low-latency · voice cloning · auto sentence segmentation · partial transcription translation · 60+ source / 40+ target languages
Kizuna AI Sign in and go — API key managed by backend · same OpenAI models with optimized defaults
Doubao AST 2.0 Speech-to-speech with speaker voice cloning · bidirectional Chinese↔English · Ogg Opus audio output
OpenAI Compatible Bring your own endpoint — any OpenAI Realtime API-compatible service (Electron only)
Local Inference Fully offline · ASR → Translation → TTS on-device · no API key · no GPU required

Audio

  • Translate your voice — speak in your language, others hear the translation as if you spoke it natively
  • Translate others' voice — capture meeting audio (extension) or any system audio (desktop) and get real-time translated subtitles
  • Virtual Microphone — route translated audio to Zoom, Meet, Teams, or any app
  • Real-time Passthrough — monitor your own voice while recording
  • AI Noise Suppression — removes background noise, keyboard sounds, and other distractions
  • Echo Cancellation — built-in with modern Web Audio API

Interface

  • 30 languages — fully localized UI
  • Simple Mode — streamlined setup for non-technical users
  • Advanced Mode — waveform display and detailed controls

Privacy

Your audio stays on your device — if you choose Local Inference, nothing ever leaves.

  • Cloud mode connects directly to provider APIs — no intermediary servers
  • API keys stored locally only, never transmitted to us
  • Local Inference processes everything on-device with zero network requests
  • Anonymous usage analytics via PostHog

Tech Stack


Contributing

We welcome contributions! Please read our Contributing Guidelines before getting started.


License

AGPL-3.0

Support

Acknowledgments

For detailed model licenses, see THIRD_PARTY_NOTICES.md.

About

Live speech translation powered by on-device AI and cloud providers — OpenAI, Google Gemini, Palabra.ai, Kizuna AI, Volcengine, and more

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Packages

 
 
 

Contributors