GitHub - kizuna-ai-lab/sokuji: Live speech translation powered by on-device AI and cloud providers — OpenAI, Google Gemini, Palabra.ai, Kizuna AI, Volcengine, and more

Real-time speech translation — cloud or fully offline on your device

Why Sokuji?

Built by Kizuna AI Lab — we use AI to break language and accessibility barriers, creating genuine human connections. "Kizuna" (絆) means "bond" in Japanese, and Sokuji (即時) is our flagship tool to make real-time communication possible across any language.

Sokuji is a cross-platform live speech translation app for desktop and browser. It supports Local Inference — on-device ASR, translation, and TTS powered by WASM and WebGPU, with no API key required, no expensive GPU needed, fully offline, and completely private. It also integrates with cloud providers including OpenAI, Google Gemini, Palabra.ai, Kizuna AI, Doubao AST 2.0, and OpenAI-compatible APIs.

How It Works

graph LR
    A["🗣️ You speak<br/>(any language)"] --> B["🎙️ Sokuji"]
    B --> C{"Choose one"}
    C -->|"☁️ Cloud"| D["OpenAI · Gemini<br/>Palabra · Doubao..."]
    C -->|"🖥️ Local"| E["On-device AI<br/>ASR → Translate → TTS<br/>(fully offline, no GPU)"]
    D --> F["🔊 Translated voice"]
    E --> F
    F --> G["💻 Zoom · Teams · Meet<br/>Discord · Any app"]

    style A fill:#4a9eff,stroke:#357abd,color:#fff
    style B fill:#10a37f,stroke:#0d8a6a,color:#fff
    style C fill:#ff9f43,stroke:#e88a2e,color:#fff
    style D fill:#6c5ce7,stroke:#5a4bd1,color:#fff
    style E fill:#00b894,stroke:#009d7e,color:#fff
    style F fill:#fd79a8,stroke:#e56b96,color:#fff
    style G fill:#636e72,stroke:#525c60,color:#fff


Providers	7 — OpenAI, Gemini, Palabra.ai, Kizuna AI, Doubao AST 2.0, OpenAI Compatible, Local Inference
Local Models	48 ASR models, 55+ translation pairs, 136 TTS voices
Languages	99+ (speech recognition) · 55+ (translation) · 53 (text-to-speech)
Platforms	Linux · Windows · macOS · Chrome · Edge
Privacy	Local Inference = 100% on-device, no API key, no internet

Demo

demo.mp4

Install

Sokuji is available as a Desktop App and a Browser Extension — same features, different reach.

	Desktop App	Browser Extension
Features	All features identical	All features identical
Use with	Any app with mic input — Zoom, Teams, Discord, Slack, games, OBS, and more	Web-based meeting platforms — Google Meet, Teams, Zoom, Discord, Slack, Gather.town, Whereby
Install	Download & install	Zero install — add from store
Platforms	Windows · macOS · Linux	Chrome · Edge · Brave (coming soon)

Desktop App

Download from the Releases page:

Platform	Package
Windows	`Sokuji-x.y.z.Setup.exe`
macOS (Apple Silicon)	`Sokuji-x.y.z-arm64.pkg`
macOS (Intel)	`Sokuji-x.y.z-x64.pkg`
Linux (Debian/Ubuntu x64)	`sokuji_x.y.z_amd64.deb`
Linux (Debian/Ubuntu ARM64)	`sokuji_x.y.z_arm64.deb`

Browser Extension

Install extension in Developer Mode

Download sokuji-extension.zip from the Releases page
Extract the zip file
Go to chrome://extensions/ and enable "Developer mode"
Click "Load unpacked" and select the extracted folder

Build from Source

git clone https://github.com/kizuna-ai-lab/sokuji.git
cd sokuji && npm install
npm run electron:dev        # Development
npm run electron:build      # Production

Features

Local Inference (Edge AI)

Run everything on your device — no API keys, no internet, no expensive GPU, complete privacy. Powered by WASM and WebGPU, Sokuji runs efficiently on any modern browser using your existing CPU and integrated graphics.

50 ASR models (32 offline + 10 streaming + 8 WebGPU including Whisper, Cohere Transcribe, Voxtral Mini 4B) covering 99+ languages
55+ translation pairs via Opus-MT + 5 multilingual LLMs (Qwen 2.5 / 3 / 3.5, GemmaTranslate) with WebGPU
136 TTS voices across 53 languages (Piper, Piper-Plus, Coqui, Mimic3, Matcha engines)
One-click model download with IndexedDB caching

Cloud Providers

Provider	Key Feature
OpenAI	`gpt-realtime-mini` / `gpt-realtime-1.5` · 10 voices · configurable turn detection (Normal / Semantic / Disabled) · noise reduction · 60+ languages
Google Gemini	Dynamic model selection (audio/live models) · 30 voices · built-in turn detection · 34 language variants
Palabra.ai	WebRTC low-latency · voice cloning · auto sentence segmentation · partial transcription translation · 60+ source / 40+ target languages
Kizuna AI	Sign in and go — API key managed by backend · same OpenAI models with optimized defaults
Doubao AST 2.0	Speech-to-speech with speaker voice cloning · bidirectional Chinese↔English · Ogg Opus audio output
OpenAI Compatible	Bring your own endpoint — any OpenAI Realtime API-compatible service (Electron only)
Local Inference	Fully offline · ASR → Translation → TTS on-device · no API key · no GPU required

Audio

Translate your voice — speak in your language, others hear the translation as if you spoke it natively
Translate others' voice — capture meeting audio (extension) or any system audio (desktop) and get real-time translated subtitles
Virtual Microphone — route translated audio to Zoom, Meet, Teams, or any app
Real-time Passthrough — monitor your own voice while recording
AI Noise Suppression — removes background noise, keyboard sounds, and other distractions
Echo Cancellation — built-in with modern Web Audio API

Interface

30 languages — fully localized UI
Simple Mode — streamlined setup for non-technical users
Advanced Mode — waveform display and detailed controls

Privacy

Your audio stays on your device — if you choose Local Inference, nothing ever leaves.

Cloud mode connects directly to provider APIs — no intermediary servers
API keys stored locally only, never transmitted to us
Local Inference processes everything on-device with zero network requests
Anonymous usage analytics via PostHog

Tech Stack

Desktop: Electron (Windows, macOS, Linux)
Extension: Chrome/Edge Manifest V3
UI: React + TypeScript + Zustand
Local AI: sherpa-onnx (WASM) · Transformers.js · WebGPU
Audio: Web Audio API · AudioWorklet · WebRTC
i18n: i18next (30 languages)

Contributing

We welcome contributions! Please read our Contributing Guidelines before getting started.

License

AGPL-3.0

Support

Issues — Bug reports
Discussions — Questions & ideas

Acknowledgments

Cloud APIs: OpenAI, Google Gemini, Volcengine
ASR: sherpa-onnx, OpenAI Whisper, SenseVoice, Moonshine, Cohere Transcribe, Voxtral Mini 4B
TTS: Piper, Piper-Plus, Matcha-TTS, Coqui TTS, Mimic 3
Translation: Opus-MT, Qwen, GemmaTranslate
Infra: Transformers.js, ONNX Runtime, Electron, React

For detailed model licenses, see THIRD_PARTY_NOTICES.md.

Name		Name	Last commit message	Last commit date
Latest commit History 1,140 Commits
.github		.github
.signpath/artifact-configurations		.signpath/artifact-configurations
BlackHole @ 4fdd55c		BlackHole @ 4fdd55c
assets		assets
benchmark		benchmark
docs		docs
electron		electron
evals		evals
extension		extension
model-packs		model-packs
pkg-scripts		pkg-scripts
public		public
resources		resources
screenshots		screenshots
scripts		scripts
shared		shared
src		src
.env.example		.env.example
.gitignore		.gitignore
.gitmodules		.gitmodules
.npmrc		.npmrc
CHANGELOG.md		CHANGELOG.md
CLAUDE.md		CLAUDE.md
LICENSE		LICENSE
README.md		README.md
THIRD_PARTY_NOTICES.md		THIRD_PARTY_NOTICES.md
build-pkg.sh		build-pkg.sh
forge.config.js		forge.config.js
index.html		index.html
package-lock.json		package-lock.json
package.json		package.json
tsconfig.json		tsconfig.json
tsconfig.node.json		tsconfig.node.json
vite.config.ts		vite.config.ts
vitest.config.ts		vitest.config.ts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Real-time speech translation — cloud or fully offline on your device

Why Sokuji?

How It Works

Demo

Install

Desktop App

Browser Extension

Build from Source

Features

Local Inference (Edge AI)

Cloud Providers

Audio

Interface

Privacy

Tech Stack

Contributing

License

Support

Acknowledgments

About

Uh oh!

Releases 125

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Real-time speech translation — cloud or fully offline on your device

Why Sokuji?

How It Works

Demo

Install

Desktop App

Browser Extension

Build from Source

Features

Local Inference (Edge AI)

Cloud Providers

Audio

Interface

Privacy

Tech Stack

Contributing

License

Support

Acknowledgments

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 125

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages