Skip to content

Latest commit

Β 

History

History
82 lines (56 loc) Β· 9.29 KB

File metadata and controls

82 lines (56 loc) Β· 9.29 KB

Audio architecture

3-thread lock-free pipeline. The contract is "the cpal callback never blocks, allocates or logs"; everything else flows from that.

β”Œβ”€ Tauri commands (tokio)        β”Œβ”€ Decoder thread (std)               β”Œβ”€ cpal callback (real-time)
β”‚  player_play, pause, seek      β”‚  symphonia FormatReader +           β”‚  pop f32 from SPSC ring
β”‚  β†’ crossbeam::Sender ─────────►│  Decoder + rubato Resampler         β”‚  Γ— volume Γ— normalization
β”‚                                β”‚  push f32 β†’ rtrb::Producer ────────►│  mono downmix (if enabled)
β”‚                                β”‚  emit position/state events         β”‚  β†’ device native format
└────────────────────────────────┴─────────────────────────────────────┴──────────────────────────

Threads

Thread Owner Responsibilities
Tokio runtime Tauri Command dispatch. player_* commands send AudioCmd enum variants over a crossbeam::Sender to the decoder.
waveflow-audio-decoder audio::decoder::spawn_decoder_thread Owns the rtrb::Producer<f32> and the active ActiveStream (symphonia + rubato). Polls commands between packets so pause / stop / seek feel responsive.
waveflow-audio-output audio::output::spawn_output_thread Owns the cpal::Stream (which is !Send on Windows because WASAPI / COM handles can't cross threads). Parks on a shutdown channel for the engine's lifetime.
cpal callback cpal-managed (WASAPI / ALSA / CoreAudio worker) Pops samples from rtrb::Consumer, applies volume / normalization / mono downmix, writes to the device buffer.
waveflow-wasapi-exclusive ΒΉ audio::wasapi_exclusive::spawn_exclusive_output_thread Windows-only alternate output backend (opt-in). Owns the WASAPI IAudioClient + event handle. Blocks on the OS event between buffer periods β€” zero CPU when idle. Drives the same rtrb::Consumer<f32> as the cpal thread does in shared mode.

ΒΉ Mutually exclusive with the cpal output thread β€” only one of the two is running at a time, picked by output::spawn_output_with_mode based on the persisted audio.wasapi_exclusive setting.

Shared state

SharedPlayback β€” an Arc<...> of atomics plus the rtrb consumer half. Read on the hot path; mutated by the decoder and the command layer. No locks anywhere in the pipeline.

Atomic Owner writes Hot-path reads
samples_played cpal callback UI for position display
base_offset_ms decoder (on seek / new track / speed change) UI
volume, normalize_enabled, mono_enabled command layer cpal callback
paused_output, drain_silent command layer / decoder cpal callback
crossfade_ms, replaygain_enabled command layer decoder
playback_speed_bits, speed_dirty command layer / decoder decoder + UI position math
current_track_id, seek_generation decoder UI

playback_speed_bits is read on every position computation (UI 4 Hz + analytics) β€” see current_position_ms and playback / Playback speed. speed_dirty is a one-shot flag the decoder consumes once per 'pkt loop iteration to trigger a resampler rebuild.

WASAPI Exclusive Mode (Windows opt-in)

audio/wasapi_exclusive.rs is a parallel output backend to the cpal shared-mode default. Engaged via the audio.wasapi_exclusive profile setting (toggle in Settings β†’ Audio). When on:

  1. output::spawn_output_with_mode tries the exclusive backend first via the wasapi crate.
  2. The backend opens the device in event-driven exclusive mode at the device's mix-format sample rate (whatever the user picked in the Windows Sound control panel). 32-bit float stereo, anchored on KSDATAFORMAT_SUBTYPE_IEEE_FLOAT.
  3. If init fails (device busy with another exclusive app, no float-32 support, COM apartment conflict), the engine logs a warning and falls back transparently to the cpal shared backend so the user keeps hearing audio.

Trade-offs:

  • Bit-perfect to the DAC at the chosen rate. No Windows mixer between us and the hardware β€” no automatic resampling, no system-sound mixing, no per-app volume DSP.
  • One app at a time. While exclusive is engaged, system sounds (notifications, Discord, browser audio) are silenced. By design.
  • Mode survives device hot-swaps. engine::set_output_device reuses the same spawn_output_with_mode dispatch so picking a new output keeps the chosen mode.
  • No per-track rate switching yet. The decoder's rubato resampler still converts every source to the device's mix rate. True bit-perfect at the source rate is a future phase (would require reinitialising the WASAPI client on every rate change).

Dependency footprint: the wasapi crate + a slim slice of windows-rs features (Win32_Foundation, Win32_System_Com, Win32_System_Threading) target-gated to cfg(target_os = "windows"). Adds ~5-10 MB to the NSIS / MSI Windows binary; the Linux + macOS bundles are untouched.

Ring buffer sizing

RING_CAPACITY = 96_000 f32 samples. At 48 kHz stereo this is ~1 s of audio β€” plenty of headroom for the decoder while keeping latency low. With more channels the headroom shrinks proportionally (8-channel surround β†’ ~272 ms), which is mostly relevant for the seek drain time (see playback).

Drain modes

Two reasons to suppress audio output without tearing the stream down:

Flag Behaviour Use
paused_output callback writes silence, doesn't pop the ring Pause β€” resume picks back up exactly where we stopped.
drain_silent callback bulk-pops the entire ring AND writes silence Track switch / seek β€” flushes the tail of the previous position so it never reaches the device.

The bulk-pop in drain_silent (vs the previous one-pop-per-output-slot) is what makes seeks feel instant on multi-channel output devices.

Crossfade dual-decoder

When the user enables crossfade, the decoder maintains a pending_next: Option<ActiveStream> set by an AudioCmd::SetNextTrack from the command layer. On each iteration it tops up persistent primary_resampled and secondary_resampled buffers (one packet each), then mixes the minimum of both with equal_power_gains(t). The window is clamped to min(user_ms, primary.duration / 2) so 30 s clips don't start mixing at 18 s.

Per-stream ReplayGain is applied before the mix so the loudness of the two tracks doesn't drift mid-fade.

Why not async for the decoder?

The decoder is a tight CPU + I/O loop with no benefit from Future polling. Spawning it as a std::thread keeps it off the tokio runtime (so a stuck packet read can't starve other tasks) and lets it own its Producer<f32> and ActiveStream without Send + Sync gymnastics. The interface to the rest of the app is a single crossbeam::channel.