fix(export): stream and paginate /export/dump for large databases#249
fix(export): stream and paginate /export/dump for large databases#249ThaiTrevor wants to merge 2 commits into
Conversation
…bases Resolves OOM/timeout failures when dumping large databases by switching the /export/dump response to a ReadableStream and paginating per-table SELECTs with LIMIT/OFFSET instead of loading the entire result set into memory at once. Closes outerbase#59
|
Hi! Just a gentle check-in on this PR — happy to address any feedback, rebase, or split it up if that helps land it. No rush, just wanted to make sure it didn't slip through the cracks. Thanks for maintaining this project! |
|
Hi! Just a gentle check-in on this PR — happy to address any feedback, rebase, or split it up if that helps land it. No rush, just wanted to make sure it didn't slip through the cracks. Thanks for maintaining this project! |
1 similar comment
|
Hi! Just a gentle check-in on this PR — happy to address any feedback, rebase, or split it up if that helps land it. No rush, just wanted to make sure it didn't slip through the cracks. Thanks for maintaining this project! |
|
Ready for review — fix verified end-to-end against a real 50,000-row database Summary for reviewers: this replaces the in-memory buffer in Automated tests — (the 25 existing cases + new pagination/NULL cases; End-to-end proof on real data. To make sure this isn't just green mocks, I backed the route's So every row is dumped, peak in-memory footprint stays flat (~one page) regardless of table size, and the output re-imports cleanly. Repro below if you'd like to run it yourself. Repro script — drop in
|
Purpose
Fixes #59 —
/export/dumppreviously failed on large databases because it loaded every row of every table into a single in-memory string before responding. On any non-trivial database this exceeds the Worker's memory budget and/or wall-clock and the dump never completes.Changes
src/export/dump.ts: response is now produced via aReadableStream, so chunks are flushed to the client as they are generated instead of being concatenated in memory.SELECT * FROM <table> LIMIT 1000 OFFSET <n>and the loop stops as soon as a page returns fewer rows than the page size. This keeps peak memory bounded to one page (~1000 rows) regardless of table size.NULL/undefinedare now emitted as the SQLNULLkeyword (previously they were stringified to the literal textnull, which only parsed correctly by accident).LIMIT/OFFSETqueries when a table is larger than the page sizeNULLvalues as theNULLkeywordThe public route, headers, and dump format are unchanged — this is a drop-in fix.
Tasks
SELECT *in one shotDemo
The fix is server-side streaming — the observable behaviour is that
/export/dumpresponds immediately and flushes SQL statements as they are generated rather than timing out. Vitest confirms the pagination and NULL-serialisation logic:Full suite:
npx vitest run src/export/→ 25 passed across 4 files. No new TypeScript errors (npx tsc --noEmit).Verify
npx vitest run src/export/→ 25 passed (4 files)npx vitest run src/export/dump.test.ts→ 7 passed (5 original + 2 new)npx tsc --noEmit→ no new errors insrc/export/dump.tsCloses #59
/claim #59