Straight-forward exposure pruning-related API#51
Open
citizen-stig wants to merge 10 commits into
Open
Conversation
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2 tasks
bkolad
approved these changes
Jun 26, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Prerequisite for Sovereign-Labs/sovereign-sdk#2886
This PR adds pruning support for VersionedDB: callers can collect and commit archival cleanup batches that remove old historical rows, clear processed pruning-index entries, record a pruning watermark, and optionally compact affected archival column families to reclaim disk.
Why
Versioned tables currently keep historical data indefinitely. This PR lets users bound retained history while preserving the invariant that live keys still have a surviving historical anchor for recent queries.
Main Decisions
merge specialized from generic
SchemaBatch<K,V>toSchemaBatch<SchemaKey, V>, via a new RangeDeleteKey trait whose "next key" = append 0x00. Drops generality, but only Vec keys are used and the split needs byte-successor semantics.Hot-path key encoding duplicated as free functions (encode_archival_key / encode_pruning_key) instead of routing through KeyWithVersionPrefixAndSuffix. Layout-drift risk is pinned by a unit test (encode_helpers_match_key_with_version_layout).
Capped passes buffer + sort collected entries in memory (bounded by max_batch_size) rather than streaming. The pruning tombstone's upper bound is the exact successor of the last collected entry — tighter than the old break-point
bound, removing stranded-entry / orphaned-row risk.
Raw lookups stay non-watermark-aware by design: get_historical_value / get_version_for_key can return a "survivor" below the watermark; watermark-aware reads go through VersionedDeltaReader. Documented and locked by a test.
Pruning is explicit and batched via collect_pruning_batch; callers commit the returned
SchemaBatch.The pruning CF is cleared with range tombstones for efficiency.
Historical CF deletes remain point deletes because rows are scattered by key.
Capped pruning is supported with
max_batch_size; large backlogs can be drained over multiple passes.PrunedVersionis a reader watermark, not proof that every old row is gone. Raw historical lookups may still see survivor rows below it;VersionedDeltaReaderenforces the watermark.SchemaBatch::mergenow preserves range deletes and splits earlier range deletes around later puts to maintain last-write-wins semantics.