Skip to content

Straight-forward exposure pruning-related API#51

Open
citizen-stig wants to merge 10 commits into
mainfrom
nikolai/versioned-prune
Open

Straight-forward exposure pruning-related API#51
citizen-stig wants to merge 10 commits into
mainfrom
nikolai/versioned-prune

Conversation

@citizen-stig

@citizen-stig citizen-stig commented May 18, 2026

Copy link
Copy Markdown
Member

Prerequisite for Sovereign-Labs/sovereign-sdk#2886

This PR adds pruning support for VersionedDB: callers can collect and commit archival cleanup batches that remove old historical rows, clear processed pruning-index entries, record a pruning watermark, and optionally compact affected archival column families to reclaim disk.

Why

Versioned tables currently keep historical data indefinitely. This PR lets users bound retained history while preserving the invariant that live keys still have a surviving historical anchor for recent queries.

Main Decisions

  • merge specialized from generic SchemaBatch<K,V> to SchemaBatch<SchemaKey, V>, via a new RangeDeleteKey trait whose "next key" = append 0x00. Drops generality, but only Vec keys are used and the split needs byte-successor semantics.

  • Hot-path key encoding duplicated as free functions (encode_archival_key / encode_pruning_key) instead of routing through KeyWithVersionPrefixAndSuffix. Layout-drift risk is pinned by a unit test (encode_helpers_match_key_with_version_layout).

  • Capped passes buffer + sort collected entries in memory (bounded by max_batch_size) rather than streaming. The pruning tombstone's upper bound is the exact successor of the last collected entry — tighter than the old break-point
    bound, removing stranded-entry / orphaned-row risk.

  • Raw lookups stay non-watermark-aware by design: get_historical_value / get_version_for_key can return a "survivor" below the watermark; watermark-aware reads go through VersionedDeltaReader. Documented and locked by a test.

  • Pruning is explicit and batched via collect_pruning_batch; callers commit the returned SchemaBatch.

  • The pruning CF is cleared with range tombstones for efficiency.

  • Historical CF deletes remain point deletes because rows are scattered by key.

  • Capped pruning is supported with max_batch_size; large backlogs can be drained over multiple passes.

  • PrunedVersion is a reader watermark, not proof that every old row is gone. Raw historical lookups may still see survivor rows below it; VersionedDeltaReader enforces the watermark.

  • SchemaBatch::merge now preserves range deletes and splits earlier range deletes around later puts to maintain last-write-wins semantics.

@citizen-stig citizen-stig marked this pull request as ready for review June 26, 2026 10:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants