You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: CHANGELOG.md
+8Lines changed: 8 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -4,6 +4,14 @@ Notable changes to this project will be documented in this file.
4
4
5
5
Per-PR attribution and contributor credits are published automatically on the corresponding GitHub release page; this file is the curated, human-readable summary.
6
6
7
+
## [v0.5.0] - 2026-06-06
8
+
### Added
9
+
-`POST /dev/unload` release model from VRAM without stopping container; lazy reload on next request. For freeing a shared GPU while idle. Reclaim scale with load (~0.7 GB; ~1.6 GB via long-form test on 4060Ti). (#474)
10
+
### Fixed
11
+
- Web UI long-playback bugfix around the 10-minute mark; in-browser audio buffer is now bounded ahead of `currentTime` with trailing eviction behind it, so long generations stop overflowing the SourceBuffer.
12
+
- Web UI stays responsive on extended sessions; waveform animation is transition-gated and `PlayerState` short-circuits no-op updates, so controls don't drift into lag after 10+ minutes of playback.
13
+
- Web UI MP3 seek/scrub works after stream completes; pausing or playback end auto-swaps to the full server file, allowing timeline navigation.
14
+
7
15
## [v0.4.0] - 2026-05-24
8
16
### Added
9
17
- GPU image variants for Blackwell / RTX 50-series (`:latest-cu128`, `:vX.Y.Z-cu128`, amd64 only) with PyTorch cu128 wheels (#443). Default `:latest` and new `:latest-cu126` alias stay on cu126 for Maxwell/Pascal compatibility.
- Realtime Speed: Ranges between 35x-100x (generation time to output audio length)
387
389
- Average Processing Rate: 137.67 tokens/second (cl100k_base)
388
390
391
+
### Model Unload / VRAM Reclaim
392
+
393
+
`POST /dev/unload` frees the model from VRAM and reloads lazily on the next request. Reclaim scales with load (the activation pool, not just weights) but plateaus: chunks cap at 450 tokens. Long-form = ~30 paragraphs. Same setup as above.
Floor is host + CUDA context. Reproduce with `uv run --extra benchmarks assorted_checks/benchmarks/benchmark_model_unload.py` from `examples/`.
406
+
389
407
### Transcription roundtrip (WER/CER)
390
408
391
409
End-to-end roundtrip: synthesize with Kokoro, transcribe the result back with [`faster-whisper`](https://github.com/SYSTRAN/faster-whisper), compare to the source text. Scripts and data live under `examples/assorted_checks/test_transcription/`.
@@ -548,6 +566,19 @@ except Exception as e:
548
566
See `examples/phoneme_examples/generate_phonemes.py`for a sample script.
549
567
</details>
550
568
569
+
<details>
570
+
<summary>Inline Control Tokens</summary>
571
+
572
+
Two tokens can be embedded in the `input` text and are parsed server-side (API, WebUI, or any client):
573
+
574
+
- **Pause**: `[pause:1.5s]` inserts that much silence. Must be exactly this form (colon, trailing `s`, case-insensitive). `[pause=1.5]`, `[PAUSE 1.0]`, and SSML `<break/>` are not recognized and get read aloud.
575
+
- **Pronunciation**: `[Worcester](/wˈʊstər/)` speaks the IPA between the slashes instead of the word. English only; use `/dev/phonemize` to find the IPA.
576
+
577
+
```text
578
+
The city of [Worcester](/wˈʊstər/) is easy. [pause:1s] See?
579
+
```
580
+
</details>
581
+
551
582
<details>
552
583
<summary>Debug Endpoints</summary>
553
584
@@ -556,6 +587,7 @@ Monitor system state and resource usage with these endpoints:
556
587
- `/debug/threads` - Get thread information and stack traces
557
588
- `/debug/storage` - Monitor temp file and output directory usage
558
589
- `/debug/system` - Get system information (CPU, memory, GPU)
590
+
- `POST /dev/unload` - Release model from VRAM; reloads lazily on next request
559
591
560
592
Useful for debugging resource exhaustion or performance issues.
0 commit comments