Re-enable pruning#2886
Conversation
| fn resolve_unbound_read( | ||
| result: anyhow::Result<Option<SlotValue>>, | ||
| source: &'static str, | ||
| ) -> Option<SlotValue> { | ||
| match result { | ||
| Ok(value) => value, | ||
| Err(error) => { | ||
| debug_assert!( | ||
| !matches!( | ||
| error.downcast_ref::<HistoricalValueError>(), | ||
| Some(HistoricalValueError::PrunedVersion { .. }) | ||
| ), | ||
| "unbound reads should never observe PrunedVersion (source={source})" | ||
| ); | ||
| tracing::error!(?error, %source, "unbound read failed; returning None"); | ||
| None | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
🚩 Unbound read error handling changed from panic to silent None — deliberate but significant behavioral shift
The old code panicked on ANY error from get_user_value_option_by_key_unbound / get_kernel_value_option_by_key_unbound / accessory get_value_option (via .expect()). The new resolve_unbound_read (prover_storage.rs:183-201) logs the error and returns None. This prevents crashes from transient I/O issues but means a genuine database corruption or hardware fault could silently cause a key to appear absent. Callers of get_unbound (the Storage trait method) would see None and proceed, potentially computing an incorrect state update. The debug_assert! catches PrunedVersion in debug builds, but other error types (e.g., RocksDB corruption) would be silently swallowed in release mode. The comment documents this as intentional resilience — verify this matches the team's crash-vs-silent-corruption tradeoff preference.
Was this helpful? React with 👍 or 👎 to provide feedback.
| if !pruner_present { | ||
| debug_assert_eq!(config.pruner, PrunerConfig::Off); | ||
| // Match the periodic pruning used by the fresh example configs so migrated prod nodes | ||
| // that didn't configure pruning still prune (an absent policy leaves pruning disabled). | ||
| config.pruner = PrunerConfig::Periodic { | ||
| block_interval: 100, | ||
| versions_to_keep: NonZeroU64::new(20).expect("20 is non-zero"), | ||
| max_batch_size: None, | ||
| }; | ||
| notes.push( | ||
| "storage.pruner absent; defaulted to periodic pruning (every 100 DA blocks, keep 20 versions)" | ||
| .to_string(), | ||
| ); |
There was a problem hiding this comment.
🚩 Migration script silently enables periodic pruning for nodes that previously had no pruning configured
In mockda_to_celestia.rs:1013-1025, when pruner_present is false (no [storage.pruner] section in the operator's config), the migration defaults to PrunerConfig::Periodic { block_interval: 100, versions_to_keep: 20 }. This means an archival node that previously ran without pruning (by simply not setting pruner_block_interval) will silently start pruning historical state after migration. The old code had the same pruner_versions_to_keep = Some(20) default, but pruning only ran when pruner_block_interval was explicitly set. The migration adds a note to the report, but operators of archival nodes should be warned that they need to explicitly set pruner = "off" in their config to preserve full history.
Was this helpful? React with 👍 or 👎 to provide feedback.
| /// unwrapped to plain `usize` and `max_batch_size` is resolved to its concrete default. Each | ||
| /// variant carries exactly the parameters that mode needs, so there are no unused/dummy fields. | ||
| #[derive(Debug, Clone, Copy)] | ||
| enum PrunerSchedule { |
There was a problem hiding this comment.
This type is very similar to PrunerConfig; why do we need a new type instead of using the serde machinery to handle default values?
| .unwrap_or(false); | ||
| let mut pruning_commit_time = None; | ||
|
|
||
| if is_pruner_ready { |
There was a problem hiding this comment.
Instead of the is_pruner_ready flag and unwrap, let's use if let Some(..) = pattern
| // Only default-enable pruning when the operator left the section out entirely. An explicit | ||
| // `[storage.pruner]` (including `pruner = "off"`, e.g. an archival node) is respected so the | ||
| // migration never silently starts deleting history the operator chose to retain. | ||
| if !pruner_present { |
There was a problem hiding this comment.
This pruner override has no effect:
- The migration is an offline tool; it never runs the node's finalize loop, and Periodic pruning only fires from finalize. So nothing is pruned while it runs.
- The config is never written back to disk, so the override is discarded on exit — after restart the node re-reads the original TOML.
Lets don't deal with this setting in the migration tool.
| } | ||
| let pruning_commit_time = self | ||
| .pruning | ||
| .on_finalize(&mut self.db_group, block_header.height())?; |
There was a problem hiding this comment.
Any pruning error makes finalize return Err, which propagates through the runner and shuts the node down. If that's intended, let's ensure this produces a very clear error message so operators know pruning — not finalization — failed.
| /// observe the latest committed version and can therefore *never* surface | ||
| /// [`HistoricalValueError::PrunedVersion`] — only genuine I/O faults. We encode that invariant | ||
| /// with a `debug_assert!`, and in production we log-and-return `None` rather than panicking, so | ||
| /// a transient I/O hiccup cannot crash block processing. |
There was a problem hiding this comment.
This doesn’t seem correct. None is a valid storage result meaning “the key is absent”, but here we also return None for an I/O/read error. That makes callers act on potentially false absence. In particular, this can produce incorrect latest-state reads or invalid storage-proof responses instead of surfacing the underlying DB error. Let's keep this path fallible, retry and then return an error, or preserve the previous fail-fast behavior.
| "unbound reads should never observe PrunedVersion (source={source})" | ||
| ); | ||
| tracing::error!(?error, %source, "unbound read failed; returning None"); | ||
| None |
Description
This PR re-enables NOMT historical-state pruning by moving user/kernel pruning to
rockbound::VersionedDB::collect_pruning_batch(PR Sovereign-Labs/rockbound#51), keeping accessory pruning through the existing pruner, and committing both delete batches duringfinalization. It replaces flat pruning config fields withPrunerConfig::{Off, Periodic, OnceAtStartup}, adds a startup-only prune hook to HierarchicalStorageManager, and optionally compacts pruned CFs after startup pruning.Main tradeoffs: defaulting to Off protects fresh nodes from accidental history deletion but makes config migration more important;
OnceAtStartupavoids live read/write contention but can make startup expensive; range tombstones reduce pruning-index write volume but need compaction to reclaim space.CHANGELOG.mdwith a new entry if my PR makes any breaking changes or fixes a bug. If my PR removes a feature or changes its behavior, I provide help for users on how to migrate to the new behavior.Cargo.tomlchanges before opening the PRs. (Are all new dependencies necessary? Is any module dependency leaked into the full-node (hint: it shouldn't)?)Testing
New tests have been added
Docs
No updates