[mono-move] Global storage support#19870
Conversation
This stack of pull requests is managed by Graphite. Learn more about stacking. |
d7cec7d to
d668799
Compare
d668799 to
b58e9da
Compare
b58e9da to
f72341d
Compare
f72341d to
d68fad8
Compare
There was a problem hiding this comment.
Stale comment
Aptos Security Bugbot has reviewed your changes, there are still 2 issues that need to be addressed from previous scan.
Fixed since last run: Finding 1 (stale
prev_writeUAF after GC inBorrowGlobalMut/MoveFrom) is fully addressed — the handlers now re-readentry.writefrom the map afterdeep_copyreturns, correctly capturing any GC relocation of the pointer before journaling.Still open:
[MEDIUM]
MoveTo:NonNull::new_uncheckedon unverified pointer — UB in release buildsFile:
third_party/move/mono-move/runtime/src/interpreter.rs, line 1466–1467The
MoveTohandler callsread_ptr(fp, src)and then wraps the raw pointer withNonNull::new_unchecked. The only guard is adebug_assert!(!src_ptr.is_null()), which is compiled out in release builds. The Move verifier'scheck_frame_access_8checks slot bounds but does not guarantee the slot is initialized to a non-null pointer. If a frame slot is default-zeroed or a compiler/optimizer bug leaves it null, callingnew_uncheckedon a null pointer is immediate undefined behavior under the Rust memory model — the optimizer is free to assume it never happens and eliminate surrounding branches accordingly.
[LOW] Epoch counter saturates silently at
u64::MAX; rollback becomes incorrectFile:
third_party/move/mono-move/runtime/src/global_storage.rs(checkpoint logic)
checkpoint()usessaturating_add(1)to advancecurrent_epoch. Once the counter reachesu64::MAX, every subsequent checkpoint stores the same epoch value. New mutations within those checkpoints callis_at_epoch(u64::MAX)→true, sojournal_first_transitionskips recording the pre-mutation state. A subsequentrollback()restores journal entries only up to the savedjournal_len— mutations made after saturation are never journaled and are not undone. The inline comment acknowledges this but does not enforce the invariant; there is no error path for the saturated case.Sent by Cursor Automation: Security Review Bot
d68fad8 to
d01c60c
Compare
There was a problem hiding this comment.
Stale comment
Aptos Security Bugbot has reviewed your changes, there are still 2 issues that need to be addressed from previous scan.
Findings from prior scans still open
Finding 1 (MEDIUM) —
MoveTo:NonNull::new_uncheckedon unverified src pointer
interpreter.rs,MicroOp::MoveTohandler. Thesrcslot value is read withread_ptr, guarded bydebug_assert!(!src_ptr.is_null(), …), then immediately passed toNonNull::new_unchecked. In a release build the assert is stripped, so a null or garbagesrcpointer — reachable if the verifier fails to ensure the slot is initialized — constitutes undefined behavior. The verifier'scheck_frame_access_8(pc, src)only validates the slot is within frame bounds; it does not enforce that the slot holds an initialized, non-null pointer beforeMoveToexecutes.Finding 2 (LOW) — Epoch counter silently saturates at
u64::MAX
global_storage.rs,ResourceReadWriteSet::checkpoint. The epoch counter is bumped withsaturating_add(1). Once it reachesu64::MAX, further checkpoints do not advance the epoch; every subsequent write in every checkpoint shares the same epoch, sojournal_first_transitiontreats them all as "already journaled in current epoch" and skips journaling entirely. Arollbackthen silently restores nothing for those keys. The comment acknowledges this is "unreachable in practice" given bounded session depth, but no session-depth limit is enforced or asserted before the checkpoint call.
Note: The journal-ordering GC-safety issue (previously Finding 1 MEDIUM) has been addressed in this push —
entry.writeis now re-read from the working-map entry afterdeep_copyreturns, so the value passed tojournal_first_transitionreflects any GC-driven pointer relocation that occurred during the copy.Sent by Cursor Automation: Security Review Bot
d01c60c to
c85f16c
Compare
c85f16c to
889ddf7
Compare
889ddf7 to
4c63fb3
Compare
There was a problem hiding this comment.
Stale comment
Aptos Security Bugbot has reviewed your changes, there are still 2 issues that need to be addressed from previous scan.
Finding 2 (MEDIUM — still open):
NonNull::new_unchecked(ptr)in theMoveTohandler (interpreter.rsline 1463) is guarded only by adebug_assert!, which is stripped in release builds. If the source slot holds a null pointer (uninitialized), this is unconditional undefined behavior at runtime.Finding 3 (LOW — still open):
self.current_epoch = self.current_epoch + 1inglobal_storage.rs(line 210) is an unchecked addition. Ifcurrent_epochreachesu64::MAX, subsequent checkpoints do not advance the epoch; mutations skip journaling; and rollback silently fails to restore state.Sent by Cursor Automation: Security Review Bot
4c63fb3 to
692ee98
Compare
692ee98 to
1dbf196
Compare
1dbf196 to
38b6b22
Compare
67263a1 to
3891811
Compare
There was a problem hiding this comment.
Stale comment
Aptos Security Bugbot has reviewed your changes, there are still 2 issues that need to be addressed from previous scan.
Finding 2 (MEDIUM) —
NonNull::new_uncheckedon unverified frame slot pointer inMoveToThe
MoveTohandler reads a raw pointer from a frame slot viaread_ptr(fp, src)and immediately passes it toNonNull::new_unchecked(ptr). The only null guard isdebug_assert!(!ptr.is_null(), ...), which is stripped in release builds. The verifier'scheck_frame_access_8(pc, src)validates the slot is within-bounds but does not verify that the slot was initialised with a live heap pointer. A malformed instruction stream that passes verification but leavessrczero-initialised (or targets a slot holding a non-pointer value) produces UB in release builds.Finding 3 (LOW) — Epoch counter overflow in
checkpoint()
self.current_epoch += 1incheckpoint()is an unchecked addition; the code comment acknowledges the risk with "Note: this can never overflow in practice." Ifcurrent_epochwraps to 0, every subsequentis_at_epochcomparison produces false positives: mutations are silently skipped inrecord_write_to_journal, and rollback restores to epoch 0 entries that were already superseded. The fix ischecked_add(1).expect(…)orwrapping_addwith an explicit saturation guard.Sent by Cursor Automation: Security Review Bot
There was a problem hiding this comment.
Stale comment
Aptos Security Bugbot has reviewed your changes, there are still 2 issues that need to be addressed from previous scan.
Finding 2 (MEDIUM —
NonNull::new_uncheckedon unverified pointer inMoveTohandler)
interpreter.rsline ~1383:read_ptr(fp, src)reads an 8-byte frame slot and the result is passed directly toNonNull::new_unchecked(ptr). The only null guard isdebug_assert!(!ptr.is_null(), "MoveTo: null object pointer"), which is stripped in release builds. The runtime verifier (check_frame_access_8) validates that the frame slot is within bounds but does not verify that the stored pointer value is non-null. If any code path can leave a zero byte-pattern in that slot — through an uninitialized slot read or a future refactor — this becomes undefined behavior in release builds with no runtime trap. UseNonNull::new(ptr).ok_or(...)or a checked helper that returns aRuntimeErroron null.Finding 3 (LOW — unchecked epoch increment in
checkpoint())
global_storage.rscheckpoint():self.current_epoch += 1has no overflow guard. The comment acknowledges the risk ("can never overflow in practice"). Atu64::MAXthe next increment wraps to 0 in debug builds (panic) or silently to 0 in release builds (with overflow checks disabled). After wrapping, epoch 0 matches every write originally tagged at epoch 0;record_write_to_journalskips journaling those writes (!write.is_at_epoch(0)returns false), rollback silently restores the wrong state. Useself.current_epoch = self.current_epoch.checked_add(1).expect("epoch overflow")orsaturating_add.Sent by Cursor Automation: Security Review Bot
c2cb35d to
c6632e7
Compare
918d010 to
b40dab9
Compare
c6632e7 to
1e42abd
Compare
1e42abd to
d274cea
Compare
b40dab9 to
ad862a1
Compare
0786289 to
ad862a1
Compare
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
ad862a1 to
0786289
Compare
e63ba97 to
5076653
Compare
| self.read_write_set.move_to( | ||
| self.exec_ctx.resource_provider(), | ||
| StorageKey::Resource(address, ty), | ||
| NonNull::new_unchecked(ptr), |
There was a problem hiding this comment.
[MEDIUM] NonNull::new_unchecked on unvalidated frame pointer in MoveTo handler
The ptr read from the frame slot via read_ptr(fp, src) is passed to NonNull::new_unchecked(ptr) in release builds without a null check. The debug_assert!(!ptr.is_null(), ...) on the line above is stripped in release builds, leaving undefined behaviour if ptr is null (which can happen if the source slot was never written, or was explicitly zeroed).
The Move verifier's check_frame_access_8(pc, src) only validates that src is a valid frame offset within bounds — it does not guarantee the pointed-to slot contains a non-null heap pointer at runtime. A conforming verifier bypass or a lowering bug that emits MoveTo before the source slot is populated would trigger the UB path.
Concrete impact: UB (null-pointer dereference) in release mode on the MoveTo opcode path. No current exploit given pre-production status, but the pattern is unsafe by design.
| epoch: self.current_epoch, | ||
| }); | ||
| // Note: this can never overflow in practice. | ||
| self.current_epoch += 1; |
There was a problem hiding this comment.
[LOW] Unchecked integer overflow on current_epoch increment in checkpoint
self.current_epoch += 1 uses the default (debug: panic, release: wrapping) arithmetic on a u64. The inline comment says "this can never overflow in practice", but no enforcement exists. After 2^64 checkpoints the epoch silently wraps to 0 in release mode, causing is_at_epoch comparisons to incorrectly classify stale LocalHeap writes as current-epoch writes. This would allow a rolled-back mutation to be treated as writable, potentially producing stale-pointer reads through the write set.
Concrete impact: silent epoch aliasing after 2^64 checkpoints; no current exploit. Replace with self.current_epoch = self.current_epoch.checked_add(1).expect("epoch overflow") or a saturating variant with an explicit error path.
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, have a team admin enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit 5076653. Configure here.
| self.exec_ctx.resource_provider(), | ||
| StorageKey::Resource(address, ty), | ||
| NonNull::new_unchecked(ptr), | ||
| )?; |
There was a problem hiding this comment.
Unsafe NonNull::new_unchecked guarded only by debug_assert
Medium Severity
In the MoveTo handler, NonNull::new_unchecked(ptr) is called with only a debug_assert null check. In release builds, if the frame slot at src contains a null pointer (e.g., from a zero-initialized frame with zero_frame: true before the slot is populated), this is instant undefined behavior — NonNull::new_unchecked requires a non-null pointer unconditionally. Using NonNull::new(ptr) with an error on None would be safer.
Reviewed by Cursor Bugbot for commit 5076653. Configure here.
There was a problem hiding this comment.
Stale comment
Aptos Security Bugbot has reviewed your changes, there are still 2 issues that need to be addressed from previous scan.
New in this run: The
move_fromrefactor intotry_move_from+commit_move_from(withcommit_borrow_global_mutsymmetry) correctly addresses the stale-pointer use-after-GC issue reported in earlier runs. The error-type consolidation is clean. No new findings.Still open:
[MEDIUM]
MoveTo:NonNull::new_uncheckedon unverified frame-slot pointerFile:
third_party/move/mono-move/runtime/src/interpreter.rs, line 1377The
MoveTohandler reads a raw pointer from the frame slot withread_ptr(fp, src)and passes it directly toNonNull::new_unchecked. The only null guard isdebug_assert!(!ptr.is_null()), which is compiled out in release builds. The verifier'scheck_frame_access_8(pc, src)validates that the slot lies within the frame bounds but does not guarantee the slot was written with a non-null pointer. A default-zeroed or uninitialised slot reachesnew_uncheckedwith a null value, which is immediate undefined behavior under the Rust memory model.
[LOW] Unchecked
u64overflow oncurrent_epochincheckpointFile:
third_party/move/mono-move/runtime/src/global_storage.rs, line 321
self.current_epoch += 1uses plain Rust integer arithmetic. In release builds this wraps silently to zero. Once the counter wraps, every existingLocalHeapentry has a stored epoch greater than the newcurrent_epoch, sois_at_epoch(current_epoch)returnsfalsefor all of them. Subsequent mutations skip the journal viarecord_write_to_journal(which only journals writes from older epochs), so the pre-mutation state is never recorded androllbackcannot undo those writes. The inline comment acknowledges the overflow is theoretically possible but provides no enforcement path.Sent by Cursor Automation: Security Review Bot
| self.read_write_set.move_to( | ||
| self.exec_ctx.resource_provider(), | ||
| StorageKey::Resource(address, ty), | ||
| NonNull::new_unchecked(ptr), |
There was a problem hiding this comment.
[MEDIUM] NonNull::new_unchecked on unvalidated frame pointer in MoveTo handler
read_ptr(fp, src) returns a raw *mut u8; the only null guard is debug_assert! which is absent in release builds. The verifier's check_frame_access_8 checks slot bounds but not initialisation. A null or uninitialised slot value reaches new_unchecked, triggering undefined behavior under the Rust memory model. Replace with NonNull::new(ptr).ok_or(/* error */) or a checked helper that returns a RuntimeError on null.
| epoch: self.current_epoch, | ||
| }); | ||
| // Note: this can never overflow in practice. | ||
| self.current_epoch += 1; |
There was a problem hiding this comment.
[LOW] Unchecked integer overflow on current_epoch increment in checkpoint
self.current_epoch += 1 wraps silently to zero in release builds after u64::MAX checkpoints. When the counter wraps, all existing LocalHeap journal entries have a stored epoch larger than the new current_epoch, causing record_write_to_journal to skip journaling subsequent mutations. Those mutations become irrevocable; rollback cannot undo them. Use self.current_epoch = self.current_epoch.checked_add(1).ok_or(/* invariant violation */)? or treat saturation as a hard invariant error.
| self.read_write_set.move_to( | ||
| self.exec_ctx.resource_provider(), | ||
| StorageKey::Resource(address, ty), | ||
| NonNull::new_unchecked(ptr), |
There was a problem hiding this comment.
[MEDIUM] NonNull::new_unchecked without a SAFETY comment in MoveTo handler
ptr comes from read_ptr(fp, src), which reads a raw pointer out of the frame slot. The verifier checks that src falls within the frame's accessible region (8 bytes), but it does not verify that the slot actually holds a non-null value at runtime. A MoveTo micro-op whose source slot contains a null pointer would invoke undefined behaviour through NonNull::new_unchecked(null_ptr).
The crate's coding convention (AGENTS.md) requires a // SAFETY: comment on every unsafe operation. No such comment is present here, in contrast to the analogous call in global_storage.rs (scan) which documents // SAFETY: gc_copy_object never produces a null pointer. The debug_assert! guard fires only in debug builds and does not substitute for a proper safety argument.
Concrete impact: in a debug build the assert aborts cleanly; in a release build a null ptr passed to NonNull::new_unchecked is immediate UB. Pre-production, no current exploit.
| epoch: self.current_epoch, | ||
| }); | ||
| // Note: this can never overflow in practice. | ||
| self.current_epoch += 1; |
There was a problem hiding this comment.
[LOW] Unchecked integer arithmetic on current_epoch
self.current_epoch += 1 uses Rust's default wrapping-on-overflow (in release mode) or panic-on-overflow (in debug mode). The comment added in this commit states "this can never overflow in practice", but the crate's coding convention (AGENTS.md) requires checked arithmetic for all arithmetic operations unless correctness is proven. A comment is not a proof, and the convention calls for checked_add, saturating_add, or a documented invariant backed by a bounds check rather than relying on infeasibility arguments.
Concrete impact: u64 overflow would require 2^64 checkpoints in a single transaction, which is practically unreachable today; however, the missing check still violates the crate's own safety standard and could become a real issue if checkpoint use expands.
There was a problem hiding this comment.
Aptos Security Bugbot has reviewed your changes, there are still 2 issues that need to be addressed from previous scan.
The "Address comments" commit correctly fixed the MoveFrom deep-copy failure bug (entry was permanently marked Deleted if deep_copy errored). The two issues below remain unaddressed.
Sent by Cursor Automation: Security Review Bot
| self.read_write_set.move_to( | ||
| self.exec_ctx.resource_provider(), | ||
| StorageKey::Resource(address, ty), | ||
| NonNull::new_unchecked(ptr), |
There was a problem hiding this comment.
[MEDIUM] MoveTo: NonNull::new_unchecked on unverified frame-slot pointer
The MoveTo handler reads a raw pointer from a frame slot with read_ptr(fp, src) and passes it directly to NonNull::new_unchecked(ptr). The only null guard is debug_assert!(!ptr.is_null()), which is a no-op in release builds. Calling NonNull::new_unchecked on a null pointer is immediate undefined behavior in Rust, regardless of whether the resulting NonNull is later dereferenced.
The verifier confirms that the src frame offset is in-bounds (8 bytes), but it does not verify that the pointer value stored at that slot is non-null. A compiler or specializer bug emitting a zero-initialized slot for src would silently produce UB in release builds.
Concrete impact: no current exploit path given the experimental, pre-production status of mono-move, but the pattern is unsound. Use NonNull::new(ptr).ok_or_else(|| ...) to surface a recoverable error on null, consistent with how other pointer dereferences in this codebase are guarded.
Severity: MEDIUM
Component: mono-move experimental runtime (not yet connected to production transaction execution)
| epoch: self.current_epoch, | ||
| }); | ||
| // Note: this can never overflow in practice. | ||
| self.current_epoch += 1; |
There was a problem hiding this comment.
[LOW] Unchecked u64 overflow in checkpoint
self.current_epoch += 1 uses plain Rust integer arithmetic. In debug builds this panics on overflow; in release builds with overflow-checks = false (the default for release profiles) it wraps silently to zero. If the epoch counter wraps, is_at_epoch comparisons silently return incorrect results: entries written at old epoch 0 would appear to be at the current epoch, breaking the journal's ability to distinguish which writes to undo on rollback.
The comment "this can never overflow in practice" is correct for realistic transaction counts (a u64 exhausted at one checkpoint per nanosecond would take ~584 years), but the arithmetic itself panics in debug mode and wraps silently in release. Replace with self.current_epoch = self.current_epoch.wrapping_add(1) to make the intent explicit and safe across build profiles.
Severity: LOW
Component: mono-move experimental runtime (not yet connected to production transaction execution)
vgao1996
left a comment
There was a problem hiding this comment.
Looks mostly good, but a critical issue in the micro-op verifier.
| BorrowGlobal { | ||
| addr: FrameOffset, | ||
| ty: InternedType, | ||
| dst: FrameOffset, | ||
| }, | ||
| /// Stores a reference to the destination offset pointing to the resource | ||
| /// of the given type exists at the given address. The reference points to | ||
| /// the object owned by the current transaction's heap. Aborts if resource | ||
| /// does not exist. | ||
| /// | ||
| /// May trigger GC. | ||
| BorrowGlobalMut { | ||
| addr: FrameOffset, | ||
| ty: InternedType, | ||
| dst: FrameOffset, | ||
| }, |
There was a problem hiding this comment.
No need to change anything now, but wondering if we could just merge these two and have a is_mut: bool field.
| MicroOp::Exists { addr, ty: _, dst } | ||
| | MicroOp::BorrowGlobal { addr, ty: _, dst } | ||
| | MicroOp::BorrowGlobalMut { addr, ty: _, dst } | ||
| | MicroOp::MoveFrom { addr, ty: _, dst } => { | ||
| self.check_frame_access(Some(pc), addr, 32); | ||
| self.check_frame_access_8(pc, dst); | ||
| }, |
There was a problem hiding this comment.
This doesn't seem right.
- Exists stores a bool at
dstso this should be 1B? (Thought I'm not sure if we have done supporting <8B alignment in the specializer or not) BorrowGlobalandBorrowGlobalMutstore a reference, which is a 16B fat pointerMoveFromstores an 8B owned heap pointer
| /// TODO: move to execution context | ||
| pub fn checkpoint(&mut self) { | ||
| self.read_write_set.checkpoint(); | ||
| } | ||
|
|
||
| /// TODO: move to execution context | ||
| pub fn rollback(&mut self, n: usize) -> RuntimeResult<()> { | ||
| self.read_write_set.rollback(n) | ||
| } | ||
|
|
||
| /// TODO: move to execution context | ||
| pub fn checkpoint_depth(&self) -> usize { | ||
| self.read_write_set.checkpoint_depth() | ||
| } | ||
|
|
||
| /// TODO: move to execution context | ||
| pub fn current_epoch(&self) -> u64 { | ||
| self.read_write_set.current_epoch() | ||
| } | ||
|
|
||
| /// TODO: move to execution context | ||
| pub fn journal_len(&self) -> usize { | ||
| self.read_write_set.journal_len() | ||
| } |
There was a problem hiding this comment.
Yes, as you've pointed out, these really aren't interpreter APIs. Rather, we want to ensure that these aren't called when an interpreter is alive.
| fn deep_copy(&mut self, root: NonNull<u8>) -> RuntimeResult<NonNull<u8>> { | ||
| let root_guard = self.pinned_roots.pin(root); | ||
| match self.heap.try_deep_copy(self.exec_ctx, root_guard.get()) { | ||
| Ok(ptr) => Ok(ptr), | ||
| Err(AllocationError::RuntimeError(err)) => Err(err), | ||
| Err(AllocationError::OutOfHeapMemory { .. }) => { | ||
| gc_collect!(self)?; | ||
| // Re-read the root pointer from the pin, as its address have | ||
| // been changed by the GC. | ||
| self.heap | ||
| .try_deep_copy(self.exec_ctx, root_guard.get()) | ||
| .map_err(AllocationError::into_runtime_error) | ||
| }, | ||
| } | ||
| } |
There was a problem hiding this comment.
Ah this is a good example showing why it's helpful to let the caller decide whether to trigger GC.
There was a problem hiding this comment.
I traced through the deepy_copy algorithm both myself and with claude and can verify that the current implementation is likely sound -- GC cannot run while we are in the middle of copying a value.
| /// Returns the data-region pointer of the freshly-allocated root copy, | ||
| /// or [`AllocationError::OutOfHeapMemory`] on the first allocation that | ||
| /// does not fit. There is no GC during copying and the caller has to | ||
| /// handle memory. |
There was a problem hiding this comment.
Nit: "caller has to handle memory" => "caller may have to run GC and retry if the initial clone fails due to insufficient memory"
Also should this function be unsafe? I mean, I could feed it with any non-zero pointer I guess?
| })?; | ||
|
|
||
| // Walk pointer fields. The pointer values are pointing to old data, | ||
| // so they need to be deep-copied and the source has to be patched. |
There was a problem hiding this comment.
Comment slightly misleading -- it's not the source that needs to be patched, but the corresponding slot in the cloned parent object.
| { | ||
| EntryPtr::Writable(ptr) => ptr, | ||
| EntryPtr::NonWritable(ptr) => { | ||
| let ptr = self.deep_copy(ptr)?; |
There was a problem hiding this comment.
How feasible is it to get this logic encapsulated in global_storage.rs as well?
A key invariant we have to maintain is that an existing global value needs to be CoW'ed the next time it's borrowed mutably in a new epoch. It's not immediately clear this is indeed handled by just looking at that file.
| /// transaction write). | ||
| // TODO: | ||
| // Replace with Block-STM transaction index and incarnation pair. | ||
| pub type Version = u64; |
There was a problem hiding this comment.
Could we qualify this name better? Version is too generic.
| // Replace with Block-STM transaction index and incarnation pair. | ||
| pub type Version = u64; | ||
|
|
||
| /// Key to (in-memory) global storage. |
There was a problem hiding this comment.
Should we add some notes/invariants about the lifetime of this value, given at it includes InternedType?
| fn get_resource(&self, key: StorageKey) -> Result<StorageRead, ResourceProviderError>; | ||
| } | ||
|
|
||
| /// Empty storage with no resources. |
There was a problem hiding this comment.
Should the trait implementations move to their own files/modules?
| // Copyright (c) Aptos Foundation | ||
| // Licensed pursuant to the Innovation-Enabling Source Code License, available at https://github.com/aptos-labs/aptos-core/blob/main/LICENSE | ||
|
|
||
| //! Per-transaction global-storage state: read cache with pending writes and |
There was a problem hiding this comment.
Maybe nice to give a bit more information here (kind of a summary map of the design, what different terms mean, and why the different features/complexities are needed). Some of this may already be in the corresponding design doc, may be we can migrate some of those here?
|
|
||
| /// Global state can be saved during execution by recording checkpoints. Epoch | ||
| /// is a counter incremented by every checkpoint. | ||
| pub type Epoch = u64; |
There was a problem hiding this comment.
Based on the comment description, Epoch is a bit opaque, we can just call it CheckpointCounter or CheckpointVersion?
| /// Useful for testing GC correctness. | ||
| ForceGC, | ||
|
|
||
| /// Stores a boolean value at destination offset if resource of the given |
There was a problem hiding this comment.
Right now, all these instructions seem to be under the debugging section, which they are not.
| /// Stores a boolean value at destination offset if resource of the given | ||
| /// type exists at the given address. | ||
| Exists { | ||
| addr: FrameOffset, |
There was a problem hiding this comment.
The general convention for these instructions is to have destination first. Comment applies to all these new instructions.
| ForceGC, | ||
|
|
||
| /// Stores a boolean value at destination offset if resource of the given | ||
| /// type exists at the given address. |
There was a problem hiding this comment.
Also, let's just say destination or destination slot instead of destination offset. Here and for other new instructions.
| dst: FrameOffset, | ||
| }, | ||
| /// Stores a reference to the destination offset pointing to the resource | ||
| /// of the given type exists at the given address. Aborts if resource does |
There was a problem hiding this comment.
This sentence is very confusing: it reads as if we are storing &destination_offset. Rewrite. Same with the following instruction.
| ty: InternedType, | ||
| dst: FrameOffset, | ||
| }, | ||
| /// Stores a reference to the destination offset pointing to the resource |
There was a problem hiding this comment.
There is very little difference in the documentation about BorrowGlobal vs. BorrowGlobalMut. If there is any difference, it should reflect in the name (is it the ownership in the current transaction's heap)? Other borrows don't currently differentiate between mut and immut refs, so also worth describing why we do it here.





Description
Introduces per-transaction global storage to mono-move: a read-cache with pending writes (the working map), an undo journal, and a checkpoint stack — backed by copy-on-write semantics on the local heap. The interpreter gains five resource micro-ops and the GC is extended to trace working-map / journal pointers and an auxiliary pinned-roots set.
Conceptual change
Per-transaction state. Each transaction now owns a
ResourceReadWriteSet— aHashMap<StorageKey, Entry>of working-map cache, a linear undo journal, and a stack of checkpoints. EachEntryrecords the immutableStorageReadtaken at first touch plus aStorageWriteenum (NotModified,LocalHeap { ptr, epoch },Deleted { epoch }). Epochs are incremented on everycheckpoint()so older mutations are observable as not directly writable.Copy-on-write on the local heap. External-storage reads return zero-copy pointers; the first mutation triggers a deep copy into the local heap (
EntryPtr::WritablevsEntryPtr::NonWritable). Subsequent mutations in the same epoch hit the writable local copy directly. The deep copy is a single-pass recursion (Heap::try_deep_copy) wrapped inInterpreterContext::deep_copy, which adds a one-shot OOM retry — same shape asalloc_or_gc.srcis pinned inPinnedRootsacross the retry so the relocated address is observed via the pin.Rollback.
checkpoint()snapshotsjournal_len + epoch.rollback(n)walks the journal back to the saved length, replaying old writes into their entries; the failed-copy intermediates become unreachable and are reclaimed by the next GC.What's supported
Exists { addr, ty, dst }— read-only existence check.BorrowGlobal { addr, ty, dst }— immutable borrow (returns external or local pointer, no copy).BorrowGlobalMut { addr, ty, dst }— mutable borrow; deep-copies to the local heap if the entry isNonWritable.MoveFrom { addr, ty, dst }— extracts the resource, flipsentry.write = Deleted, journals the old write; deep-copies first if needed.MoveTo { addr, ty, src }— publishes a local-heap pointer; aborts onResourceAlreadyExists.ForceGC— unchanged; now also scans working-map / journal.checkpoint()/rollback(n)— exposed onInterpreterContextfor transactional sub-scopes.What's not supported (yet)
StorageWrite::LocalHeapcarries raw heap pointers; we have no path that re-serializes them to BCS for external storage yet.Updated GC semantics
gc_collectnow traces three root sources before the Cheney BFS:frame_layout.heap_ptr_offsets∪safe_point_layouts[pc_top].heap_ptr_offsets; caller frames useframe_layoutalone (perSafePointEntry's top-frame-only contract).PinnedRoots— new auxiliary root set. Slots managed by RAIIPinGuard. Used by the deep-copy wrapper for the one-shot OOM retry, and available for any future multi-allocation op that needs to keep a heap pointer alive across a subsequent allocation.UnsafeCell-backed sopintakes&selfand multiple guards coexist.ResourceReadWriteSet(new) —rws.scanwalksentries.values().writeandjournal.iter().write, relocating everyLocalHeap { ptr, .. }. External-read pointers are not in the heap soRootScanner::relocatereturnsNoneand skips them.RootScanner::relocate(ptr)returnsOption<*mut u8>—Nonefor null or out-of-heap pointers,Some(new)for in-heap pointers (after copying to to-space). Callers write back only onSome, which keeps non-heap pointers (external reads, frame metadata) untouched.from-spaceis freed at the swap at the end ofgc_collect. Any Rust local holding a from-space pointer across the call is dangling — hence the pin-at-the-wrapper pattern inInterpreterContext::deep_copy.Implementation notes
try_deep_copyallocates viaheap_alloconly — noalloc_or_gccalls inside the recursion, so GC never fires mid-walk.src,new, and every parent slot stay valid throughout. A failed attempt's intermediate allocations are unreachable from any GC root (the interpreter only patches the result into rws after the wrapper returns), so the wrapper's GC reclaims them before the retry.heap_allocwrites the object header (descriptor id + aligned size) and zero-initializes the data region.try_deep_copymemcpy's only the payload (src_size − OBJECT_HEADER_SIZE); the header is already correct.AllocationError { OutOfHeapMemory { requested }, RuntimeError(_) }andAllocationResult<T>for the no-GC fail path.LocalRuntimeContextbundlesLocalExecutionContext+ObjectDescriptorTableand implements bothExecutionContextandDescriptorProvider, so resource-provider lookups thread through the interpreter cleanly.How Has This Been Tested?
New integration test files under
runtime/tests/:global_storage_read.rs—Exists(absent / present / post-MoveFrom), zero-copyBorrowGlobal, abort on missing,ForceGCdoesn't disturb external reads.global_storage_mut.rs—BorrowGlobalMutexternal + same-epoch fast path,MoveTo(publish + already-exists abort),MoveFromexternal deep copy, GC relocation ofLocalHeapwrites, multi-key independence.global_storage_nested.rs— deep-copy of nested structs / vectors / enums;deep_copy_survives_gc_mid_walkexercises OOM-then-retry forBorrowGlobalMut;move_from_deep_copies_external_resource_survives_gc_mid_walkexercises the same forMoveFrom(the rws entry is flipped toDeletedbefore the copy, so the pin inPinnedRootsis what keepssrcalive across GC).global_storage_checkpoint.rs— checkpoint stack advance, rollback no-ops / underflow / restore-pre-mutation,MoveFrom→MoveToround-trip restore-to-deleted, same-epoch no-journal-growth.All 296 runtime tests pass.
cargo +nightly fmtandcargo clippy -p mono-move-runtimeare clean for this crate.Type of Change
Which Components or Systems Does This Change Impact?
Checklist
Note
High Risk
Touches VM interpreter, GC roots, and global state semantics; correctness and rollback invariants are subtle though heavily tested in new integration tests.
Overview
Adds per-transaction global resource storage to mono-move: new
MicroOps (Exists,BorrowGlobal,BorrowGlobalMut,MoveFrom,MoveTo), a pluggableResourceProviderinmono-move-core, and runtimeResourceReadWriteSet(read cache, pending writes, epoch-based journal,checkpoint/rollback).Execution path:
ExecutionContext/TransactionContext/ test contexts thread a resource provider; the interpreter dispatches the new ops through the read-write set. Immutable borrows can stay zero-copy on external heap pointers; first mutation deep-copies into the local heap (with OOM → GC → retry viaPinnedRoots).MoveTo/ missing resource errors are surfaced as newRuntimeErrorvariants.GC / heap:
alloc_or_gcandgc_collectalso scan working-map/journalLocalHeappointers;RootScannerreplaces the old unsafe pinned-root callback. Descriptor-awaretry_deep_copysupports nested struct/vector/enum graphs.Loader:
ModuleProvidermoves to core (loader drops duplicate deps). Testsuite and newruntime/tests/global_storage_*.rscover reads, mutations, nested CoW, checkpoints, and GC interaction. Publication to real storage, Block-STM versions, and specializer lowering are explicitly deferred.Reviewed by Cursor Bugbot for commit 5076653. Bugbot is set up for automated code reviews on this repo. Configure here.