Skip to content

Commit d54895b

Browse files
committed
Merge master into release for v0.5.0
2 parents 41b2d18 + 4129920 commit d54895b

48 files changed

Lines changed: 1738 additions & 184 deletions

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

.github/workflows/ci.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,7 @@ jobs:
1313
fail-fast: false
1414

1515
steps:
16-
- uses: actions/checkout@v4
16+
- uses: actions/checkout@v5
1717

1818
# Match Dockerfile dependencies
1919
- name: Install Dependencies

.github/workflows/release.yml

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -36,7 +36,7 @@ jobs:
3636
fi
3737
3838
- name: Checkout repository
39-
uses: actions/checkout@v4
39+
uses: actions/checkout@v5
4040
with:
4141
ref: ${{ steps.resolve-ref.outputs.source_ref }}
4242
fetch-depth: 0
@@ -100,7 +100,7 @@ jobs:
100100
runs-on: ${{ matrix.runs_on }}
101101
steps:
102102
- name: Checkout repository
103-
uses: actions/checkout@v4
103+
uses: actions/checkout@v5
104104
with:
105105
ref: ${{ needs.prepare-release.outputs.source_ref }}
106106

@@ -115,14 +115,14 @@ jobs:
115115
df -h
116116
117117
- name: Set up Docker Buildx
118-
uses: docker/setup-buildx-action@v3 # Use v3
118+
uses: docker/setup-buildx-action@v4
119119
with:
120120
driver-opts: |
121121
image=moby/buildkit:v0.21.1
122122
network=host
123123
124124
- name: Log in to GitHub Container Registry
125-
uses: docker/login-action@v3 # Use v3
125+
uses: docker/login-action@v3
126126
with:
127127
registry: ghcr.io
128128
username: ${{ github.actor }}
@@ -276,7 +276,7 @@ jobs:
276276
contents: write
277277
steps:
278278
- name: Checkout repository
279-
uses: actions/checkout@v4
279+
uses: actions/checkout@v5
280280
with:
281281
fetch-depth: 0
282282

.github/workflows/test_build.yml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -79,7 +79,7 @@ jobs:
7979
runs-on: ${{ matrix.runs_on }}
8080
steps:
8181
- name: Checkout repository
82-
uses: actions/checkout@v4
82+
uses: actions/checkout@v5
8383
with:
8484
ref: ${{ inputs.branch_name }}
8585

@@ -88,7 +88,7 @@ jobs:
8888
sudo rm -rf /usr/share/dotnet /usr/local/lib/android /opt/ghc /opt/hostedtoolcache
8989
9090
- name: Set up Docker Buildx
91-
uses: docker/setup-buildx-action@v3
91+
uses: docker/setup-buildx-action@v4
9292
with:
9393
driver-opts: image=moby/buildkit:v0.21.1
9494

.github/workflows/test_client_image.yml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -44,7 +44,7 @@ jobs:
4444
OWNER: ${{ vars.OWNER || 'remsky' }}
4545
IMAGE_NAME: ${{ vars.TEST_CLIENT_IMAGE_NAME || 'tts-api-test-client' }}
4646
steps:
47-
- uses: actions/checkout@v4
47+
- uses: actions/checkout@v5
4848

4949
- name: Resolve image refs
5050
id: refs
@@ -61,7 +61,7 @@ jobs:
6161
echo "latest=${BASE}:latest" >> "$GITHUB_OUTPUT"
6262
6363
- name: Set up Docker Buildx
64-
uses: docker/setup-buildx-action@v3
64+
uses: docker/setup-buildx-action@v4
6565

6666
- name: Log in to GHCR
6767
if: github.event_name != 'workflow_dispatch' || inputs.push

.gitignore

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -23,6 +23,7 @@ build/
2323
# Environment
2424
# .env
2525
.venv/
26+
node_modules/
2627
env/
2728
venv/
2829
ENV/
@@ -76,6 +77,10 @@ examples/assorted_checks/test_transcription/output_long_form/*.transcript.txt
7677
examples/assorted_checks/test_transcription/output_long_form/*.wav
7778
examples/assorted_checks/test_transcription/output_long_form/*.synth_meta.json
7879
examples/assorted_checks/test_transcription/output_multilingual/*.wav
80+
examples/assorted_checks/benchmarks/output_data/model_unload_stats.txt
81+
examples/assorted_checks/benchmarks/output_data/model_unload_results.json
82+
examples/assorted_checks/benchmarks/output_plots/model_unload_longform.png
83+
examples/assorted_checks/benchmarks/output_plots/model_unload_short.png
7984
uv.lock
8085
!docker/test-client/uv.lock
8186

@@ -87,3 +92,5 @@ pyproject.toml.bkp
8792
# Local scratch notes, scan outputs, anything not meant to ship
8893
.local/
8994
examples/assorted_checks/test_silence/out/*
95+
playwright-report/
96+
test-results/

CHANGELOG.md

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,14 @@ Notable changes to this project will be documented in this file.
44

55
Per-PR attribution and contributor credits are published automatically on the corresponding GitHub release page; this file is the curated, human-readable summary.
66

7+
## [v0.5.0] - 2026-06-06
8+
### Added
9+
- `POST /dev/unload` release model from VRAM without stopping container; lazy reload on next request. For freeing a shared GPU while idle. Reclaim scale with load (~0.7 GB; ~1.6 GB via long-form test on 4060Ti). (#474)
10+
### Fixed
11+
- Web UI long-playback bugfix around the 10-minute mark; in-browser audio buffer is now bounded ahead of `currentTime` with trailing eviction behind it, so long generations stop overflowing the SourceBuffer.
12+
- Web UI stays responsive on extended sessions; waveform animation is transition-gated and `PlayerState` short-circuits no-op updates, so controls don't drift into lag after 10+ minutes of playback.
13+
- Web UI MP3 seek/scrub works after stream completes; pausing or playback end auto-swaps to the full server file, allowing timeline navigation.
14+
715
## [v0.4.0] - 2026-05-24
816
### Added
917
- GPU image variants for Blackwell / RTX 50-series (`:latest-cu128`, `:vX.Y.Z-cu128`, amd64 only) with PyTorch cu128 wheels (#443). Default `:latest` and new `:latest-cu126` alias stay on cu126 for Maxwell/Pascal compatibility.

README.md

Lines changed: 32 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -79,6 +79,8 @@ Configuration via environment variables, see `core/config.py`. The `:latest` and
7979
# The Docker GPU image is CUDA-only and won't run on Apple Silicon. With Docker, use `docker/cpu`.
8080
# For native MPS (Apple GPU) acceleration, run directly via UV with `./start-gpu_mac.sh`.
8181

82+
cd ../.. # back to repo root for the paths below
83+
8284
# Models will auto-download, but if needed you can manually download:
8385
python docker/scripts/download_model.py --output api/src/models/v1_0
8486

@@ -386,6 +388,22 @@ Key Performance Metrics:
386388
- Realtime Speed: Ranges between 35x-100x (generation time to output audio length)
387389
- Average Processing Rate: 137.67 tokens/second (cl100k_base)
388390
391+
### Model Unload / VRAM Reclaim
392+
393+
`POST /dev/unload` frees the model from VRAM and reloads lazily on the next request. Reclaim scales with load (the activation pool, not just weights) but plateaus: chunks cap at 450 tokens. Long-form = ~30 paragraphs. Same setup as above.
394+
395+
<p align="center">
396+
<img src="assets/gpu_model_unload_short.png" width="45%" alt="Short workload" style="border: 2px solid #333; padding: 10px; margin-right: 1%;">
397+
<img src="assets/gpu_model_unload_longform.png" width="45%" alt="Long-form workload" style="border: 2px solid #333; padding: 10px;">
398+
</p>
399+
400+
| Workload | Loaded | Floor | Reclaimed | Reload |
401+
| --- | --- | --- | --- | --- |
402+
| Short (6s audio) | 3.11 GB | 2.37 GB | 758 MiB | +4.9s |
403+
| Long-form (7.5m) | 3.98 GB | 2.37 GB | 1,656 MiB | +5.1s |
404+
405+
Floor is host + CUDA context. Reproduce with `uv run --extra benchmarks assorted_checks/benchmarks/benchmark_model_unload.py` from `examples/`.
406+
389407
### Transcription roundtrip (WER/CER)
390408
391409
End-to-end roundtrip: synthesize with Kokoro, transcribe the result back with [`faster-whisper`](https://github.com/SYSTRAN/faster-whisper), compare to the source text. Scripts and data live under `examples/assorted_checks/test_transcription/`.
@@ -548,6 +566,19 @@ except Exception as e:
548566
See `examples/phoneme_examples/generate_phonemes.py` for a sample script.
549567
</details>
550568
569+
<details>
570+
<summary>Inline Control Tokens</summary>
571+
572+
Two tokens can be embedded in the `input` text and are parsed server-side (API, WebUI, or any client):
573+
574+
- **Pause**: `[pause:1.5s]` inserts that much silence. Must be exactly this form (colon, trailing `s`, case-insensitive). `[pause=1.5]`, `[PAUSE 1.0]`, and SSML `<break/>` are not recognized and get read aloud.
575+
- **Pronunciation**: `[Worcester](/wˈʊstər/)` speaks the IPA between the slashes instead of the word. English only; use `/dev/phonemize` to find the IPA.
576+
577+
```text
578+
The city of [Worcester](/wˈʊstər/) is easy. [pause:1s] See?
579+
```
580+
</details>
581+
551582
<details>
552583
<summary>Debug Endpoints</summary>
553584
@@ -556,6 +587,7 @@ Monitor system state and resource usage with these endpoints:
556587
- `/debug/threads` - Get thread information and stack traces
557588
- `/debug/storage` - Monitor temp file and output directory usage
558589
- `/debug/system` - Get system information (CPU, memory, GPU)
590+
- `POST /dev/unload` - Release model from VRAM; reloads lazily on next request
559591
560592
Useful for debugging resource exhaustion or performance issues.
561593
</details>

VERSION

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
0.4.0
1+
0.5.0

api/src/core/config.py

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,7 @@
1-
from importlib.metadata import PackageNotFoundError, version as _pkg_version
1+
from importlib.metadata import (
2+
PackageNotFoundError,
3+
version as _pkg_version,
4+
)
25
from pathlib import Path
36

47
import torch

api/src/core/paths.py

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -330,6 +330,16 @@ async def get_content_type(path: str) -> str:
330330
".gif": "image/gif",
331331
".svg": "image/svg+xml",
332332
".ico": "image/x-icon",
333+
# audio downloads: serve a real media type so the webui can play the file
334+
# directly (the player swaps to this URL once generation finishes, #150).
335+
".mp3": "audio/mpeg",
336+
".wav": "audio/wav",
337+
".opus": "audio/opus",
338+
".flac": "audio/flac",
339+
".aac": "audio/aac",
340+
".m4a": "audio/mp4",
341+
".ogg": "audio/ogg",
342+
".pcm": "audio/pcm",
333343
}.get(ext, "application/octet-stream")
334344

335345

0 commit comments

Comments
 (0)