Skip to content

Re-enable pruning#2886

Open
citizen-stig wants to merge 12 commits into
devfrom
nikolai/re-enable-pruning
Open

Re-enable pruning#2886
citizen-stig wants to merge 12 commits into
devfrom
nikolai/re-enable-pruning

Conversation

@citizen-stig

@citizen-stig citizen-stig commented May 18, 2026

Copy link
Copy Markdown
Member

Description

This PR re-enables NOMT historical-state pruning by moving user/kernel pruning to rockbound::VersionedDB::collect_pruning_batch (PR Sovereign-Labs/rockbound#51), keeping accessory pruning through the existing pruner, and committing both delete batches duringfinalization. It replaces flat pruning config fields with PrunerConfig::{Off, Periodic, OnceAtStartup}, adds a startup-only prune hook to HierarchicalStorageManager, and optionally compacts pruned CFs after startup pruning.

Main tradeoffs: defaulting to Off protects fresh nodes from accidental history deletion but makes config migration more important; OnceAtStartup avoids live read/write contention but can make startup expensive; range tombstones reduce pruning-index write volume but need compaction to reclaim space.


  • I have updated CHANGELOG.md with a new entry if my PR makes any breaking changes or fixes a bug. If my PR removes a feature or changes its behavior, I provide help for users on how to migrate to the new behavior.
  • I have carefully reviewed all my Cargo.toml changes before opening the PRs. (Are all new dependencies necessary? Is any module dependency leaked into the full-node (hint: it shouldn't)?)

Testing

New tests have been added

Docs

No updates

Comment thread crates/full-node/sov-db/src/storage_manager/nomt_based/groups.rs Outdated

@devin-ai-integration devin-ai-integration Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Devin Review: No Issues Found

Devin Review analyzed this PR and found no potential bugs to report.

View in Devin Review to see 4 additional findings.

Open in Devin Review

Comment thread crates/full-node/sov-db/src/storage_manager/nomt_based/mod.rs Outdated
Comment thread crates/module-system/sov-modules-rollup-blueprint/src/native_only/mod.rs Outdated
@citizen-stig citizen-stig marked this pull request as ready for review June 26, 2026 13:52

@devin-ai-integration devin-ai-integration Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 2 new potential issues.

Open in Devin Review

Comment on lines +183 to +201
fn resolve_unbound_read(
result: anyhow::Result<Option<SlotValue>>,
source: &'static str,
) -> Option<SlotValue> {
match result {
Ok(value) => value,
Err(error) => {
debug_assert!(
!matches!(
error.downcast_ref::<HistoricalValueError>(),
Some(HistoricalValueError::PrunedVersion { .. })
),
"unbound reads should never observe PrunedVersion (source={source})"
);
tracing::error!(?error, %source, "unbound read failed; returning None");
None
}
}
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚩 Unbound read error handling changed from panic to silent None — deliberate but significant behavioral shift

The old code panicked on ANY error from get_user_value_option_by_key_unbound / get_kernel_value_option_by_key_unbound / accessory get_value_option (via .expect()). The new resolve_unbound_read (prover_storage.rs:183-201) logs the error and returns None. This prevents crashes from transient I/O issues but means a genuine database corruption or hardware fault could silently cause a key to appear absent. Callers of get_unbound (the Storage trait method) would see None and proceed, potentially computing an incorrect state update. The debug_assert! catches PrunedVersion in debug builds, but other error types (e.g., RocksDB corruption) would be silently swallowed in release mode. The comment documents this as intentional resilience — verify this matches the team's crash-vs-silent-corruption tradeoff preference.

Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Comment on lines +1013 to +1025
if !pruner_present {
debug_assert_eq!(config.pruner, PrunerConfig::Off);
// Match the periodic pruning used by the fresh example configs so migrated prod nodes
// that didn't configure pruning still prune (an absent policy leaves pruning disabled).
config.pruner = PrunerConfig::Periodic {
block_interval: 100,
versions_to_keep: NonZeroU64::new(20).expect("20 is non-zero"),
max_batch_size: None,
};
notes.push(
"storage.pruner absent; defaulted to periodic pruning (every 100 DA blocks, keep 20 versions)"
.to_string(),
);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚩 Migration script silently enables periodic pruning for nodes that previously had no pruning configured

In mockda_to_celestia.rs:1013-1025, when pruner_present is false (no [storage.pruner] section in the operator's config), the migration defaults to PrunerConfig::Periodic { block_interval: 100, versions_to_keep: 20 }. This means an archival node that previously ran without pruning (by simply not setting pruner_block_interval) will silently start pruning historical state after migration. The old code had the same pruner_versions_to_keep = Some(20) default, but pruning only ran when pruner_block_interval was explicitly set. The migration adds a note to the report, but operators of archival nodes should be warned that they need to explicitly set pruner = "off" in their config to preserve full history.

Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

/// unwrapped to plain `usize` and `max_batch_size` is resolved to its concrete default. Each
/// variant carries exactly the parameters that mode needs, so there are no unused/dummy fields.
#[derive(Debug, Clone, Copy)]
enum PrunerSchedule {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This type is very similar to PrunerConfig; why do we need a new type instead of using the serde machinery to handle default values?

.unwrap_or(false);
let mut pruning_commit_time = None;

if is_pruner_ready {

@bkolad bkolad Jun 27, 2026

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of the is_pruner_ready flag and unwrap, let's use if let Some(..) = pattern

// Only default-enable pruning when the operator left the section out entirely. An explicit
// `[storage.pruner]` (including `pruner = "off"`, e.g. an archival node) is respected so the
// migration never silently starts deleting history the operator chose to retain.
if !pruner_present {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This pruner override has no effect:

  1. The migration is an offline tool; it never runs the node's finalize loop, and Periodic pruning only fires from finalize. So nothing is pruned while it runs.
  2. The config is never written back to disk, so the override is discarded on exit — after restart the node re-reads the original TOML.

Lets don't deal with this setting in the migration tool.

}
let pruning_commit_time = self
.pruning
.on_finalize(&mut self.db_group, block_header.height())?;

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any pruning error makes finalize return Err, which propagates through the runner and shuts the node down. If that's intended, let's ensure this produces a very clear error message so operators know pruning — not finalization — failed.

/// observe the latest committed version and can therefore *never* surface
/// [`HistoricalValueError::PrunedVersion`] — only genuine I/O faults. We encode that invariant
/// with a `debug_assert!`, and in production we log-and-return `None` rather than panicking, so
/// a transient I/O hiccup cannot crash block processing.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn’t seem correct. None is a valid storage result meaning “the key is absent”, but here we also return None for an I/O/read error. That makes callers act on potentially false absence. In particular, this can produce incorrect latest-state reads or invalid storage-proof responses instead of surfacing the underlying DB error. Let's keep this path fallible, retry and then return an error, or preserve the previous fail-fast behavior.

"unbound reads should never observe PrunedVersion (source={source})"
);
tracing::error!(?error, %source, "unbound read failed; returning None");
None

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is the problem

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants