Skip to content

refactor(voice): move mic ownership from clx to otoji#135

Open
snomiao wants to merge 8 commits into
mainfrom
feat/otoji-owns-mic-permission
Open

refactor(voice): move mic ownership from clx to otoji#135
snomiao wants to merge 8 commits into
mainfrom
feat/otoji-owns-mic-permission

Conversation

@snomiao
Copy link
Copy Markdown
Member

@snomiao snomiao commented May 2, 2026

Summary

  • clx was capturing audio via cpal/VPIO and piping raw WAV to otoji listen --plain -
  • This meant clx held the macOS microphone permission and AEC logic, not otoji
  • Switch to otoji listen --plain [--aec] so otoji owns both the mic permission AND the AEC

Changes

otoji (submodule):

  • src/audio/vpio.rs: new VPIO AudioUnit backend — echo-cancelled mic capture, 30x gain, 48kHz→16kHz resample, emits AudioChunk to AudioTx
  • src/audio/mod.rs: expose vpio module on macOS
  • src/main.rs: --aec flag on listen; run_listen_vpio function that uses VPIO instead of cpal
  • build.rs: link AudioToolbox framework

clx:

  • voice_otoji.rs: restore aec_enabled param to start(); pass --aec to otoji args on macOS when aec_mode = always/dual-only
  • voice.rs: restore aec_enabled computation at call site

Result

  • macOS permission dialog now attributed to otoji (not clx)
  • AEC (echo cancellation) fully preserved — runs inside otoji via VoiceProcessingIO
  • clx no longer touches audio at all

Test plan

  • Build succeeds (./build.sh)
  • Space+V: mic permission dialog attributed to otoji
  • Voice transcription works end-to-end
  • With AEC enabled: speaker bleed is cancelled (no echo in transcription)

🤖 Generated with Claude Code

snomiao and others added 2 commits April 29, 2026 03:23
Broaden the existing `.claude/settings.local.json` rule to the whole
`.claude/` directory (Claude Code stores per-project session caches
there), and add `.DS_Store` plus the bin/ artifacts (`clx-prompt`,
`clx-prompt-slint`) compiled by build.sh from tracked source.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The Windows adapter's bin name in Cargo.toml is `clx-rust`, so cargo
emits `rs/target/release/clx-rust.exe`. The packaging step expected
`clx.exe`, so the v2.0.0-beta.3 Windows build failed at the
Copy-Item step ("Cannot find path ... clx.exe because it does not
exist") even though the cargo build itself succeeded in 11 minutes.

Source from `clx-rust.exe`, rename to `clx.exe` in the staging dir so
distribution still ships a clean `clx.exe`. The back-commit-to-main
step at line 122 already does this rename in reverse, so its inputs
remain consistent.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings May 2, 2026 08:51
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR shifts microphone ownership from clx to the external otoji subprocess by switching from stdin-audio piping (otoji listen --plain -) to having otoji open the microphone directly (otoji listen --plain), so macOS mic permission is attributed to otoji.

Changes:

  • Update voice_otoji to spawn otoji listen --plain without stdin audio piping (remove cpal/VPIO capture + WAV/resample helpers).
  • Update voice to stop computing/passing AEC settings into the otoji backend.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
rs/core/src/modules/voice_otoji.rs Drops stdin WAV streaming + mic capture threads; launches otoji to open mic itself and reads JSON events from stdout.
rs/core/src/modules/voice.rs Removes AEC gating/pass-through when launching the otoji backend.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 558 to 559
// VPIO AEC enable for the otoji subprocess mic path.
// "always" → always on
Comment on lines 283 to 288
let mut cmd = Command::new("otoji");
let ctx_path = super::voice_ptt::ptt_context_file_path();
let mut args: Vec<String> = vec![
"listen".into(), "--plain".into(), "-".into(),
"listen".into(), "--plain".into(),
// "openai" route goes through OpenAiPolisher which honors the
// OTOJI_POLISH_BASE_URL / _API_KEY / _MODEL env vars. Default
snomiao and others added 6 commits May 7, 2026 22:58
`gh release edit` doesn't accept `--generate-notes` (only create does).
When the release already exists (re-tag scenario), the fallback path
fired with the wrong flag set and crashed with "unknown flag", which
also skipped all dependent build jobs because of `needs: create-release`.

Use distinct flag sets for the two paths.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
capslockx-windows builds without the `stt` feature (whisper-rs doesn't
compile on Windows), so sherpa-rs's build.rs is never invoked and no
runtime DLLs are produced. clx.exe runs as a STT-stub in that case;
the DLLs would only be needed if `stt` were enabled.

Bundle DLLs when present, skip silently when not, instead of throwing
"No DLLs found ... sherpa-rs build.rs may have failed".

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Pairs with the workflow fix in 08e1d64. NSIS aborts with
"File \"*.dll\" -> no files found" when capslockx-windows builds
without the stt feature (the current default — whisper-rs doesn't
compile on Windows). Add /nonfatal so the installer build succeeds
without DLLs; clx.exe runs as a stt-stub in that configuration.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The header itself flagged it as `# @depreacted use semantic-release`,
and release-rust.yml has been the canonical release path since the
Rust rewrite. The deprecated workflow kept firing on every `v*` tag
and failing because `github-release-from-cc-changelog` couldn't find
a matching CHANGELOG entry, polluting the actions tab with red runs.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
clx was capturing audio via cpal/VPIO and piping raw WAV to otoji's
stdin. This meant clx held the macOS microphone permission, not otoji.

Switch to `otoji listen --plain` (no `-` argument) so otoji opens the
mic itself and requests the permission on its own behalf. clx no longer
needs mic access — it only reads JSON-line AsrEvents from otoji's stdout.

Removes ~240 lines: the otoji-mic thread, cpal capture, VPIO path,
write_wav_header, resample_linear, and the aec_enabled parameter from
OtojiBackend::start(). AEC can be added to otoji directly when needed.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
OtojiBackend::start() gets aec_enabled back so CLX can pass --aec to
otoji on macOS when voice.aec_mode = always/dual-only. AEC now runs
inside otoji (VoiceProcessingIO) instead of clx.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@snomiao snomiao force-pushed the feat/otoji-owns-mic-permission branch from 77e6a9a to a940782 Compare May 7, 2026 14:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants