Skip to content

fix(export): stream and paginate /export/dump for large databases#249

Open
ThaiTrevor wants to merge 2 commits into
outerbase:mainfrom
ThaiTrevor:fix/issue-59-starbasedb-database-dumps-do-not
Open

fix(export): stream and paginate /export/dump for large databases#249
ThaiTrevor wants to merge 2 commits into
outerbase:mainfrom
ThaiTrevor:fix/issue-59-starbasedb-database-dumps-do-not

Conversation

@ThaiTrevor

@ThaiTrevor ThaiTrevor commented May 24, 2026

Copy link
Copy Markdown

Purpose

Fixes #59/export/dump previously failed on large databases because it loaded every row of every table into a single in-memory string before responding. On any non-trivial database this exceeds the Worker's memory budget and/or wall-clock and the dump never completes.

Changes

  • src/export/dump.ts: response is now produced via a ReadableStream, so chunks are flushed to the client as they are generated instead of being concatenated in memory.
  • Per-table data is paged with SELECT * FROM <table> LIMIT 1000 OFFSET <n> and the loop stops as soon as a page returns fewer rows than the page size. This keeps peak memory bounded to one page (~1000 rows) regardless of table size.
  • Small correctness improvement in value serialization: NULL/undefined are now emitted as the SQL NULL keyword (previously they were stringified to the literal text null, which only parsed correctly by accident).
  • All existing test cases continue to pass unchanged; added two new tests:
    • paginates across multiple LIMIT/OFFSET queries when a table is larger than the page size
    • serializes NULL values as the NULL keyword

The public route, headers, and dump format are unchanged — this is a drop-in fix.

Tasks

  • Stream the dump response instead of buffering it
  • Page through table rows instead of SELECT * in one shot
  • Preserve existing dump format and test expectations
  • Add tests for pagination and NULL handling

Demo

The fix is server-side streaming — the observable behaviour is that /export/dump responds immediately and flushes SQL statements as they are generated rather than timing out. Vitest confirms the pagination and NULL-serialisation logic:

✓ src/export/dump.test.ts (7 tests)
  ✓ streams all tables in SQL format
  ✓ omits table when no rows returned
  ✓ returns correct headers
  ✓ handles multiple tables
  ✓ handles empty table
  ✓ paginates across multiple LIMIT/OFFSET queries when table exceeds page size
  ✓ serializes NULL values as SQL NULL keyword

Test Files  1 passed (1)
Tests       7 passed (7)

Full suite: npx vitest run src/export/ → 25 passed across 4 files. No new TypeScript errors (npx tsc --noEmit).

Verify

  • npx vitest run src/export/ → 25 passed (4 files)
  • npx vitest run src/export/dump.test.ts → 7 passed (5 original + 2 new)
  • npx tsc --noEmit → no new errors in src/export/dump.ts

Closes #59

/claim #59

…bases

Resolves OOM/timeout failures when dumping large databases by switching
the /export/dump response to a ReadableStream and paginating per-table
SELECTs with LIMIT/OFFSET instead of loading the entire result set into
memory at once.

Closes outerbase#59
@PTHAICAP

PTHAICAP commented Jun 6, 2026

Copy link
Copy Markdown

Hi! Just a gentle check-in on this PR — happy to address any feedback, rebase, or split it up if that helps land it. No rush, just wanted to make sure it didn't slip through the cracks. Thanks for maintaining this project!

@ThaiTrevor

Copy link
Copy Markdown
Author

@Brayden quick ping — PR #249 + #250 against open Algora bounties for
starbasedb, both ~12d old. They're stuck on "1 workflow awaiting approval"
— would you mind clicking "Approve and run workflows" so CI can run?
Also happy to get a review pass when your queue allows.

@PTHAICAP

PTHAICAP commented Jun 9, 2026

Copy link
Copy Markdown

Hi! Just a gentle check-in on this PR — happy to address any feedback, rebase, or split it up if that helps land it. No rush, just wanted to make sure it didn't slip through the cracks. Thanks for maintaining this project!

1 similar comment
@PTHAICAP

Copy link
Copy Markdown

Hi! Just a gentle check-in on this PR — happy to address any feedback, rebase, or split it up if that helps land it. No rush, just wanted to make sure it didn't slip through the cracks. Thanks for maintaining this project!

@ThaiTrevor

Copy link
Copy Markdown
Author

Ready for review — fix verified end-to-end against a real 50,000-row database

Summary for reviewers: this replaces the in-memory buffer in /export/dump with a ReadableStream that pages each table via SELECT … LIMIT 1000 OFFSET n. Peak memory becomes O(page) instead of O(entire database), which is the root cause of dumps failing on large DBs (#59). The route, headers, and dump format are unchanged — it's a drop-in fix (+141/−38, 3 files).

Automated testsnpx vitest run src/export/:

Test Files  5 passed (5)
     Tests  26 passed (26)

(the 25 existing cases + new pagination/NULL cases; npx tsc --noEmit reports no errors in src/export/dump.ts.)

End-to-end proof on real data. To make sure this isn't just green mocks, I backed the route's executeOperation with a real node:sqlite database of 50,000 rows, ran the actual dumpDatabaseRoute, consumed the streamed response chunk-by-chunk, then re-imported the produced dump into a fresh database:

source rows          : 50,000
dump size            : 2.82 MB
paged SELECT queries : 51      (LIMIT 1000 OFFSET …)   ← previously a single unbounded SELECT *
stream chunks        : 53
peak chunk in memory : 55.7 KB (vs a 2.82 MB single buffer on the old code)
INSERTs in dump      : 50,000
re-imported rows     : 50,000   ✓ round-trip OK

So every row is dumped, peak in-memory footprint stays flat (~one page) regardless of table size, and the output re-imports cleanly. Repro below if you'd like to run it yourself.

Repro script — drop in src/export/dump.demo.test.ts, run npx vitest run src/export/dump.demo.test.ts
import { describe, it, expect, vi } from 'vitest'
import { createRequire } from 'node:module'
const { DatabaseSync } = createRequire(import.meta.url)('node:sqlite')
import { dumpDatabaseRoute } from './dump'
import { executeOperation } from '.'
import type { DataSource } from '../types'
import type { StarbaseDBConfiguration } from '../handler'

vi.mock('.', () => ({ executeOperation: vi.fn() }))
vi.mock('../utils', () => ({
    createResponse: vi.fn(
        (d, m, s) =>
            new Response(JSON.stringify({ result: d, error: m }), { status: s })
    ),
}))

const ROWS = 50_000

describe('DEMO: real large-DB dump', () => {
    it(`streams + paginates a ${ROWS}-row table from real SQLite`, async () => {
        const src = new DatabaseSync(':memory:')
        src.exec('CREATE TABLE users (id INTEGER, name TEXT, note TEXT);')
        const ins = src.prepare('INSERT INTO users VALUES (?, ?, ?)')
        for (let i = 1; i <= ROWS; i++)
            ins.run(i, `User${i}`, i === 7 ? null : `n${i}`)

        let dataQueries = 0
        vi.mocked(executeOperation).mockImplementation(async (ops: any) => {
            const sql = ops[0].sql as string
            if (/LIMIT \d+ OFFSET/.test(sql)) dataQueries++
            return src.prepare(sql).all() as any
        })

        const ds = { source: 'external', external: { dialect: 'sqlite' } } as DataSource
        const cfg = { role: 'admin', features: {} } as unknown as StarbaseDBConfiguration
        const res = await dumpDatabaseRoute(ds, cfg)

        const reader = res.body!.getReader()
        let total = 0, maxChunk = 0, chunks = 0
        const parts: Uint8Array[] = []
        for (;;) {
            const { value, done } = await reader.read()
            if (done) break
            chunks++; total += value.length
            maxChunk = Math.max(maxChunk, value.length); parts.push(value)
        }
        const dump = Buffer.concat(parts).toString('utf8')

        const inserts = (dump.match(/INSERT INTO users VALUES/g) || []).length
        expect(inserts).toBe(ROWS)
        expect(dataQueries).toBe(Math.ceil(ROWS / 1000) + 1)
        expect(dump).toContain("INSERT INTO users VALUES (7, 'User7', NULL);")
        expect(maxChunk).toBeLessThan(total / 10)

        const dst = new DatabaseSync(':memory:')
        dst.exec(dump.replace('SQLite format 3\0', ''))
        const cnt = (dst.prepare('SELECT COUNT(*) c FROM users').get() as any).c
        expect(cnt).toBe(ROWS)
    })
})

The only thing blocking validation on your side is the pending workflow approval ("1 workflow awaiting approval") — could a maintainer approve the run so CI can confirm? Happy to rebase or adjust anything. This is against the 💎 Algora bounty on #59. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Database dumps do not work on large databases

2 participants