[mono-move] Global storage support by georgemitenkov · Pull Request #19870 · aptos-labs/aptos-core

georgemitenkov · 2026-05-22T13:34:48Z

Description

Introduces per-transaction global storage to mono-move: a read-cache with pending writes (the working map), an undo journal, and a checkpoint stack — backed by copy-on-write semantics on the local heap. The interpreter gains five resource micro-ops and the GC is extended to trace working-map / journal pointers and an auxiliary pinned-roots set.

Conceptual change

Per-transaction state. Each transaction now owns a ResourceReadWriteSet — a HashMap<StorageKey, Entry> of working-map cache, a linear undo journal, and a stack of checkpoints. Each Entry records the immutable StorageRead taken at first touch plus a StorageWrite enum (NotModified, LocalHeap { ptr, epoch }, Deleted { epoch }). Epochs are incremented on every checkpoint() so older mutations are observable as not directly writable.

Copy-on-write on the local heap. External-storage reads return zero-copy pointers; the first mutation triggers a deep copy into the local heap (EntryPtr::Writable vs EntryPtr::NonWritable). Subsequent mutations in the same epoch hit the writable local copy directly. The deep copy is a single-pass recursion (Heap::try_deep_copy) wrapped in InterpreterContext::deep_copy, which adds a one-shot OOM retry — same shape as alloc_or_gc. src is pinned in PinnedRoots across the retry so the relocated address is observed via the pin.

Rollback. checkpoint() snapshots journal_len + epoch. rollback(n) walks the journal back to the saved length, replaying old writes into their entries; the failed-copy intermediates become unreachable and are reclaimed by the next GC.

What's supported

Exists { addr, ty, dst } — read-only existence check.
BorrowGlobal { addr, ty, dst } — immutable borrow (returns external or local pointer, no copy).
BorrowGlobalMut { addr, ty, dst } — mutable borrow; deep-copies to the local heap if the entry is NonWritable.
MoveFrom { addr, ty, dst } — extracts the resource, flips entry.write = Deleted, journals the old write; deep-copies first if needed.
MoveTo { addr, ty, src } — publishes a local-heap pointer; aborts on ResourceAlreadyExists.
ForceGC — unchanged; now also scans working-map / journal.
checkpoint() / rollback(n) — exposed on InterpreterContext for transactional sub-scopes.

What's not supported (yet)

Publication phase (design doc §"Publication of writes"). End-of-transaction commit back to external storage, storage-metadata updates (slot deposits, refunds), and Aptos VM integration are out of scope for this PR. The journal and working map exist; they're not yet drained on commit.
Per-allocation lazy CoW and dirty-bit tracking (design doc §"Extensions"). The current copy is always whole-resource on first mutation.
Resource serialization on commit — StorageWrite::LocalHeap carries raw heap pointers; we have no path that re-serializes them to BCS for external storage yet.

Updated GC semantics

gc_collect now traces three root sources before the Cheney BFS:

Call stack — top frame uses frame_layout.heap_ptr_offsets ∪ safe_point_layouts[pc_top].heap_ptr_offsets; caller frames use frame_layout alone (per SafePointEntry's top-frame-only contract).
PinnedRoots — new auxiliary root set. Slots managed by RAII PinGuard. Used by the deep-copy wrapper for the one-shot OOM retry, and available for any future multi-allocation op that needs to keep a heap pointer alive across a subsequent allocation. UnsafeCell-backed so pin takes &self and multiple guards coexist.
ResourceReadWriteSet (new) — rws.scan walks entries.values().write and journal.iter().write, relocating every LocalHeap { ptr, .. }. External-read pointers are not in the heap so RootScanner::relocate returns None and skips them.

RootScanner::relocate(ptr) returns Option<*mut u8> — None for null or out-of-heap pointers, Some(new) for in-heap pointers (after copying to to-space). Callers write back only on Some, which keeps non-heap pointers (external reads, frame metadata) untouched.

from-space is freed at the swap at the end of gc_collect. Any Rust local holding a from-space pointer across the call is dangling — hence the pin-at-the-wrapper pattern in InterpreterContext::deep_copy.

Implementation notes

try_deep_copy allocates via heap_alloc only — no alloc_or_gc calls inside the recursion, so GC never fires mid-walk. src, new, and every parent slot stay valid throughout. A failed attempt's intermediate allocations are unreachable from any GC root (the interpreter only patches the result into rws after the wrapper returns), so the wrapper's GC reclaims them before the retry.
heap_alloc writes the object header (descriptor id + aligned size) and zero-initializes the data region. try_deep_copy memcpy's only the payload (src_size − OBJECT_HEADER_SIZE); the header is already correct.
New helper crate-internal types: AllocationError { OutOfHeapMemory { requested }, RuntimeError(_) } and AllocationResult<T> for the no-GC fail path.
LocalRuntimeContext bundles LocalExecutionContext + ObjectDescriptorTable and implements both ExecutionContext and DescriptorProvider, so resource-provider lookups thread through the interpreter cleanly.

How Has This Been Tested?

New integration test files under runtime/tests/:

global_storage_read.rs — Exists (absent / present / post-MoveFrom), zero-copy BorrowGlobal, abort on missing, ForceGC doesn't disturb external reads.
global_storage_mut.rs — BorrowGlobalMut external + same-epoch fast path, MoveTo (publish + already-exists abort), MoveFrom external deep copy, GC relocation of LocalHeap writes, multi-key independence.
global_storage_nested.rs — deep-copy of nested structs / vectors / enums; deep_copy_survives_gc_mid_walk exercises OOM-then-retry for BorrowGlobalMut; move_from_deep_copies_external_resource_survives_gc_mid_walk exercises the same for MoveFrom (the rws entry is flipped to Deleted before the copy, so the pin in PinnedRoots is what keeps src alive across GC).
global_storage_checkpoint.rs — checkpoint stack advance, rollback no-ops / underflow / restore-pre-mutation, MoveFrom→MoveTo round-trip restore-to-deleted, same-epoch no-journal-growth.

All 296 runtime tests pass. cargo +nightly fmt and cargo clippy -p mono-move-runtime are clean for this crate.

Type of Change

New feature

Which Components or Systems Does This Change Impact?

Move/Aptos Virtual Machine

Checklist

I have read and followed the CONTRIBUTING doc
I have performed a self-review of my own code
I have commented my code, particularly in hard-to-understand areas
I tested both happy and unhappy path of the functionality

Note

High Risk
Touches VM interpreter, GC roots, and global state semantics; correctness and rollback invariants are subtle though heavily tested in new integration tests.

Overview
Adds per-transaction global resource storage to mono-move: new MicroOps (Exists, BorrowGlobal, BorrowGlobalMut, MoveFrom, MoveTo), a pluggable ResourceProvider in mono-move-core, and runtime ResourceReadWriteSet (read cache, pending writes, epoch-based journal, checkpoint / rollback).

Execution path: ExecutionContext / TransactionContext / test contexts thread a resource provider; the interpreter dispatches the new ops through the read-write set. Immutable borrows can stay zero-copy on external heap pointers; first mutation deep-copies into the local heap (with OOM → GC → retry via PinnedRoots). MoveTo / missing resource errors are surfaced as new RuntimeError variants.

GC / heap: alloc_or_gc and gc_collect also scan working-map/journal LocalHeap pointers; RootScanner replaces the old unsafe pinned-root callback. Descriptor-aware try_deep_copy supports nested struct/vector/enum graphs.

Loader: ModuleProvider moves to core (loader drops duplicate deps). Testsuite and new runtime/tests/global_storage_*.rs cover reads, mutations, nested CoW, checkpoints, and GC interaction. Publication to real storage, Block-STM versions, and specializer lowering are explicitly deferred.

^{Reviewed by Cursor Bugbot for commit 5076653. Bugbot is set up for automated code reviews on this repo. Configure here.}

georgemitenkov · 2026-05-22T13:35:03Z

[mono-move] Type pointers as descriptors #19929
[mono-move] Global storage support #19870 👈 (View in Graphite)
[mono-move] Global storage design #19536
main

This stack of pull requests is managed by Graphite. Learn more about stacking.

cursor

Stale comment

Aptos Security Bugbot has reviewed your changes, there are still 3 issues that need to be addressed from previous scan.

_{Sent by Cursor Automation: Security Review Bot}

cursor

Stale comment

Aptos Security Bugbot has reviewed your changes, there are still 3 issues that need to be addressed from previous scan.

_{Sent by Cursor Automation: Security Review Bot}

cursor

Stale comment

Aptos Security Bugbot has reviewed your changes, there are still 2 issues that need to be addressed from previous scan.

Fixed since last run: Finding 1 (stale prev_write UAF after GC in BorrowGlobalMut/MoveFrom) is fully addressed — the handlers now re-read entry.write from the map after deep_copy returns, correctly capturing any GC relocation of the pointer before journaling.

Still open:

[MEDIUM] MoveTo: NonNull::new_unchecked on unverified pointer — UB in release builds

File: third_party/move/mono-move/runtime/src/interpreter.rs, line 1466–1467

The MoveTo handler calls read_ptr(fp, src) and then wraps the raw pointer with NonNull::new_unchecked. The only guard is a debug_assert!(!src_ptr.is_null()), which is compiled out in release builds. The Move verifier's check_frame_access_8 checks slot bounds but does not guarantee the slot is initialized to a non-null pointer. If a frame slot is default-zeroed or a compiler/optimizer bug leaves it null, calling new_unchecked on a null pointer is immediate undefined behavior under the Rust memory model — the optimizer is free to assume it never happens and eliminate surrounding branches accordingly.

[LOW] Epoch counter saturates silently at u64::MAX; rollback becomes incorrect

File: third_party/move/mono-move/runtime/src/global_storage.rs (checkpoint logic)

checkpoint() uses saturating_add(1) to advance current_epoch. Once the counter reaches u64::MAX, every subsequent checkpoint stores the same epoch value. New mutations within those checkpoints call is_at_epoch(u64::MAX) → true, so journal_first_transition skips recording the pre-mutation state. A subsequent rollback() restores journal entries only up to the saved journal_len — mutations made after saturation are never journaled and are not undone. The inline comment acknowledges this but does not enforce the invariant; there is no error path for the saturated case.

_{Sent by Cursor Automation: Security Review Bot}

cursor

Stale comment

Aptos Security Bugbot has reviewed your changes, there are still 2 issues that need to be addressed from previous scan.

Findings from prior scans still open

Finding 1 (MEDIUM) — MoveTo: NonNull::new_unchecked on unverified src pointer
interpreter.rs, MicroOp::MoveTo handler. The src slot value is read with read_ptr, guarded by debug_assert!(!src_ptr.is_null(), …), then immediately passed to NonNull::new_unchecked. In a release build the assert is stripped, so a null or garbage src pointer — reachable if the verifier fails to ensure the slot is initialized — constitutes undefined behavior. The verifier's check_frame_access_8(pc, src) only validates the slot is within frame bounds; it does not enforce that the slot holds an initialized, non-null pointer before MoveTo executes.

Finding 2 (LOW) — Epoch counter silently saturates at u64::MAX
global_storage.rs, ResourceReadWriteSet::checkpoint. The epoch counter is bumped with saturating_add(1). Once it reaches u64::MAX, further checkpoints do not advance the epoch; every subsequent write in every checkpoint shares the same epoch, so journal_first_transition treats them all as "already journaled in current epoch" and skips journaling entirely. A rollback then silently restores nothing for those keys. The comment acknowledges this is "unreachable in practice" given bounded session depth, but no session-depth limit is enforced or asserted before the checkpoint call.

Note: The journal-ordering GC-safety issue (previously Finding 1 MEDIUM) has been addressed in this push — entry.write is now re-read from the working-map entry after deep_copy returns, so the value passed to journal_first_transition reflects any GC-driven pointer relocation that occurred during the copy.

_{Sent by Cursor Automation: Security Review Bot}

cursor

Stale comment

Aptos Security Bugbot has reviewed your changes, there are still 2 issues that need to be addressed from previous scan.

_{Sent by Cursor Automation: Security Review Bot}

cursor

Stale comment

Aptos Security Bugbot has reviewed your changes, there are still 2 issues that need to be addressed from previous scan.

_{Sent by Cursor Automation: Security Review Bot}

cursor

Stale comment

Aptos Security Bugbot has reviewed your changes, there are still 2 issues that need to be addressed from previous scan.

Finding 2 (MEDIUM — still open): NonNull::new_unchecked(ptr) in the MoveTo handler (interpreter.rs line 1463) is guarded only by a debug_assert!, which is stripped in release builds. If the source slot holds a null pointer (uninitialized), this is unconditional undefined behavior at runtime.

Finding 3 (LOW — still open): self.current_epoch = self.current_epoch + 1 in global_storage.rs (line 210) is an unchecked addition. If current_epoch reaches u64::MAX, subsequent checkpoints do not advance the epoch; mutations skip journaling; and rollback silently fails to restore state.

_{Sent by Cursor Automation: Security Review Bot}

cursor

Stale comment

Aptos Security Bugbot has reviewed your changes, there are still 2 issues that need to be addressed from previous scan.

_{Sent by Cursor Automation: Security Review Bot}

cursor

Stale comment

Aptos Security Bugbot has reviewed your changes, there are still 2 issues that need to be addressed from previous scan.

_{Sent by Cursor Automation: Security Review Bot}

cursor

Stale comment

Aptos Security Bugbot has reviewed your changes, there are still 2 issues that need to be addressed from previous scan.

Finding 2 (MEDIUM) — NonNull::new_unchecked on unverified frame slot pointer in MoveTo

The MoveTo handler reads a raw pointer from a frame slot via read_ptr(fp, src) and immediately passes it to NonNull::new_unchecked(ptr). The only null guard is debug_assert!(!ptr.is_null(), ...), which is stripped in release builds. The verifier's check_frame_access_8(pc, src) validates the slot is within-bounds but does not verify that the slot was initialised with a live heap pointer. A malformed instruction stream that passes verification but leaves src zero-initialised (or targets a slot holding a non-pointer value) produces UB in release builds.

Finding 3 (LOW) — Epoch counter overflow in checkpoint()

self.current_epoch += 1 in checkpoint() is an unchecked addition; the code comment acknowledges the risk with "Note: this can never overflow in practice." If current_epoch wraps to 0, every subsequent is_at_epoch comparison produces false positives: mutations are silently skipped in record_write_to_journal, and rollback restores to epoch 0 entries that were already superseded. The fix is checked_add(1).expect(…) or wrapping_add with an explicit saturation guard.

_{Sent by Cursor Automation: Security Review Bot}

cursor

Stale comment

Aptos Security Bugbot has reviewed your changes, there are still 2 issues that need to be addressed from previous scan.

Finding 2 (MEDIUM — NonNull::new_unchecked on unverified pointer in MoveTo handler)

interpreter.rs line ~1383: read_ptr(fp, src) reads an 8-byte frame slot and the result is passed directly to NonNull::new_unchecked(ptr). The only null guard is debug_assert!(!ptr.is_null(), "MoveTo: null object pointer"), which is stripped in release builds. The runtime verifier (check_frame_access_8) validates that the frame slot is within bounds but does not verify that the stored pointer value is non-null. If any code path can leave a zero byte-pattern in that slot — through an uninitialized slot read or a future refactor — this becomes undefined behavior in release builds with no runtime trap. Use NonNull::new(ptr).ok_or(...) or a checked helper that returns a RuntimeError on null.

Finding 3 (LOW — unchecked epoch increment in checkpoint())

global_storage.rs checkpoint(): self.current_epoch += 1 has no overflow guard. The comment acknowledges the risk ("can never overflow in practice"). At u64::MAX the next increment wraps to 0 in debug builds (panic) or silently to 0 in release builds (with overflow checks disabled). After wrapping, epoch 0 matches every write originally tagged at epoch 0; record_write_to_journal skips journaling those writes (!write.is_at_epoch(0) returns false), rollback silently restores the wrong state. Use self.current_epoch = self.current_epoch.checked_add(1).expect("epoch overflow") or saturating_add.

_{Sent by Cursor Automation: Security Review Bot}

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

cursor

Stale comment

Aptos Security Bugbot has reviewed your changes, there are still 2 issues that need to be addressed from previous scan.

_{Sent by Cursor Automation: Security Review Bot}

cursor · 2026-05-29T17:00:08Z

+                    self.read_write_set.move_to(
+                        self.exec_ctx.resource_provider(),
+                        StorageKey::Resource(address, ty),
+                        NonNull::new_unchecked(ptr),


[MEDIUM] NonNull::new_unchecked on unvalidated frame pointer in MoveTo handler

The ptr read from the frame slot via read_ptr(fp, src) is passed to NonNull::new_unchecked(ptr) in release builds without a null check. The debug_assert!(!ptr.is_null(), ...) on the line above is stripped in release builds, leaving undefined behaviour if ptr is null (which can happen if the source slot was never written, or was explicitly zeroed).

The Move verifier's check_frame_access_8(pc, src) only validates that src is a valid frame offset within bounds — it does not guarantee the pointed-to slot contains a non-null heap pointer at runtime. A conforming verifier bypass or a lowering bug that emits MoveTo before the source slot is populated would trigger the UB path.

Concrete impact: UB (null-pointer dereference) in release mode on the MoveTo opcode path. No current exploit given pre-production status, but the pattern is unsafe by design.

cursor · 2026-05-29T17:00:08Z

+            epoch: self.current_epoch,
+        });
+        // Note: this can never overflow in practice.
+        self.current_epoch += 1;


[LOW] Unchecked integer overflow on current_epoch increment in checkpoint

self.current_epoch += 1 uses the default (debug: panic, release: wrapping) arithmetic on a u64. The inline comment says "this can never overflow in practice", but no enforcement exists. After 2^64 checkpoints the epoch silently wraps to 0 in release mode, causing is_at_epoch comparisons to incorrectly classify stale LocalHeap writes as current-epoch writes. This would allow a rolled-back mutation to be treated as writable, potentially producing stale-pointer reads through the write set.

Concrete impact: silent epoch aliasing after 2^64 checkpoints; no current exploit. Replace with self.current_epoch = self.current_epoch.checked_add(1).expect("epoch overflow") or a saturating variant with an explicit error path.

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, have a team admin enable autofix in the Cursor dashboard.}

^{Reviewed by Cursor Bugbot for commit 5076653. Configure here.}

cursor · 2026-05-29T17:00:41Z

+                        self.exec_ctx.resource_provider(),
+                        StorageKey::Resource(address, ty),
+                        NonNull::new_unchecked(ptr),
+                    )?;


Unsafe NonNull::new_unchecked guarded only by debug_assert

Medium Severity

In the MoveTo handler, NonNull::new_unchecked(ptr) is called with only a debug_assert null check. In release builds, if the frame slot at src contains a null pointer (e.g., from a zero-initialized frame with zero_frame: true before the slot is populated), this is instant undefined behavior — NonNull::new_unchecked requires a non-null pointer unconditionally. Using NonNull::new(ptr) with an error on None would be safer.

^{Reviewed by Cursor Bugbot for commit 5076653. Configure here.}

cursor

Stale comment

Aptos Security Bugbot has reviewed your changes, there are still 2 issues that need to be addressed from previous scan.

New in this run: The move_from refactor into try_move_from + commit_move_from (with commit_borrow_global_mut symmetry) correctly addresses the stale-pointer use-after-GC issue reported in earlier runs. The error-type consolidation is clean. No new findings.

Still open:

[MEDIUM] MoveTo: NonNull::new_unchecked on unverified frame-slot pointer

File: third_party/move/mono-move/runtime/src/interpreter.rs, line 1377

The MoveTo handler reads a raw pointer from the frame slot with read_ptr(fp, src) and passes it directly to NonNull::new_unchecked. The only null guard is debug_assert!(!ptr.is_null()), which is compiled out in release builds. The verifier's check_frame_access_8(pc, src) validates that the slot lies within the frame bounds but does not guarantee the slot was written with a non-null pointer. A default-zeroed or uninitialised slot reaches new_unchecked with a null value, which is immediate undefined behavior under the Rust memory model.

[LOW] Unchecked u64 overflow on current_epoch in checkpoint

File: third_party/move/mono-move/runtime/src/global_storage.rs, line 321

self.current_epoch += 1 uses plain Rust integer arithmetic. In release builds this wraps silently to zero. Once the counter wraps, every existing LocalHeap entry has a stored epoch greater than the new current_epoch, so is_at_epoch(current_epoch) returns false for all of them. Subsequent mutations skip the journal via record_write_to_journal (which only journals writes from older epochs), so the pre-mutation state is never recorded and rollback cannot undo those writes. The inline comment acknowledges the overflow is theoretically possible but provides no enforcement path.

_{Sent by Cursor Automation: Security Review Bot}

cursor · 2026-05-29T17:02:33Z

+                    self.read_write_set.move_to(
+                        self.exec_ctx.resource_provider(),
+                        StorageKey::Resource(address, ty),
+                        NonNull::new_unchecked(ptr),


[MEDIUM] NonNull::new_unchecked on unvalidated frame pointer in MoveTo handler

read_ptr(fp, src) returns a raw *mut u8; the only null guard is debug_assert! which is absent in release builds. The verifier's check_frame_access_8 checks slot bounds but not initialisation. A null or uninitialised slot value reaches new_unchecked, triggering undefined behavior under the Rust memory model. Replace with NonNull::new(ptr).ok_or(/* error */) or a checked helper that returns a RuntimeError on null.

cursor · 2026-05-29T17:02:33Z

+            epoch: self.current_epoch,
+        });
+        // Note: this can never overflow in practice.
+        self.current_epoch += 1;


[LOW] Unchecked integer overflow on current_epoch increment in checkpoint

self.current_epoch += 1 wraps silently to zero in release builds after u64::MAX checkpoints. When the counter wraps, all existing LocalHeap journal entries have a stored epoch larger than the new current_epoch, causing record_write_to_journal to skip journaling subsequent mutations. Those mutations become irrevocable; rollback cannot undo them. Use self.current_epoch = self.current_epoch.checked_add(1).ok_or(/* invariant violation */)? or treat saturation as a hard invariant error.

cursor

Stale comment

Aptos Security Bugbot has reviewed your changes, there are still 2 issues that need to be addressed from previous scan.

_{Sent by Cursor Automation: Security Review Bot}

cursor · 2026-05-29T17:02:56Z

+                    self.read_write_set.move_to(
+                        self.exec_ctx.resource_provider(),
+                        StorageKey::Resource(address, ty),
+                        NonNull::new_unchecked(ptr),


[MEDIUM] NonNull::new_unchecked without a SAFETY comment in MoveTo handler

ptr comes from read_ptr(fp, src), which reads a raw pointer out of the frame slot. The verifier checks that src falls within the frame's accessible region (8 bytes), but it does not verify that the slot actually holds a non-null value at runtime. A MoveTo micro-op whose source slot contains a null pointer would invoke undefined behaviour through NonNull::new_unchecked(null_ptr).

The crate's coding convention (AGENTS.md) requires a // SAFETY: comment on every unsafe operation. No such comment is present here, in contrast to the analogous call in global_storage.rs (scan) which documents // SAFETY: gc_copy_object never produces a null pointer. The debug_assert! guard fires only in debug builds and does not substitute for a proper safety argument.

Concrete impact: in a debug build the assert aborts cleanly; in a release build a null ptr passed to NonNull::new_unchecked is immediate UB. Pre-production, no current exploit.

cursor · 2026-05-29T17:02:56Z

+            epoch: self.current_epoch,
+        });
+        // Note: this can never overflow in practice.
+        self.current_epoch += 1;


[LOW] Unchecked integer arithmetic on current_epoch

self.current_epoch += 1 uses Rust's default wrapping-on-overflow (in release mode) or panic-on-overflow (in debug mode). The comment added in this commit states "this can never overflow in practice", but the crate's coding convention (AGENTS.md) requires checked arithmetic for all arithmetic operations unless correctness is proven. A comment is not a proof, and the convention calls for checked_add, saturating_add, or a documented invariant backed by a bounds check rather than relying on infeasibility arguments.

Concrete impact: u64 overflow would require 2^64 checkpoints in a single transaction, which is practically unreachable today; however, the missing check still violates the crate's own safety standard and could become a real issue if checkpoint use expands.

cursor

Aptos Security Bugbot has reviewed your changes, there are still 2 issues that need to be addressed from previous scan.

The "Address comments" commit correctly fixed the MoveFrom deep-copy failure bug (entry was permanently marked Deleted if deep_copy errored). The two issues below remain unaddressed.

_{Sent by Cursor Automation: Security Review Bot}

cursor · 2026-05-29T17:05:38Z

+                    self.read_write_set.move_to(
+                        self.exec_ctx.resource_provider(),
+                        StorageKey::Resource(address, ty),
+                        NonNull::new_unchecked(ptr),


[MEDIUM] MoveTo: NonNull::new_unchecked on unverified frame-slot pointer

The MoveTo handler reads a raw pointer from a frame slot with read_ptr(fp, src) and passes it directly to NonNull::new_unchecked(ptr). The only null guard is debug_assert!(!ptr.is_null()), which is a no-op in release builds. Calling NonNull::new_unchecked on a null pointer is immediate undefined behavior in Rust, regardless of whether the resulting NonNull is later dereferenced.

The verifier confirms that the src frame offset is in-bounds (8 bytes), but it does not verify that the pointer value stored at that slot is non-null. A compiler or specializer bug emitting a zero-initialized slot for src would silently produce UB in release builds.

Concrete impact: no current exploit path given the experimental, pre-production status of mono-move, but the pattern is unsound. Use NonNull::new(ptr).ok_or_else(|| ...) to surface a recoverable error on null, consistent with how other pointer dereferences in this codebase are guarded.

Severity: MEDIUM
Component: mono-move experimental runtime (not yet connected to production transaction execution)

cursor · 2026-05-29T17:05:38Z

+            epoch: self.current_epoch,
+        });
+        // Note: this can never overflow in practice.
+        self.current_epoch += 1;


[LOW] Unchecked u64 overflow in checkpoint

self.current_epoch += 1 uses plain Rust integer arithmetic. In debug builds this panics on overflow; in release builds with overflow-checks = false (the default for release profiles) it wraps silently to zero. If the epoch counter wraps, is_at_epoch comparisons silently return incorrect results: entries written at old epoch 0 would appear to be at the current epoch, breaking the journal's ability to distinguish which writes to undo on rollback.

The comment "this can never overflow in practice" is correct for realistic transaction counts (a u64 exhausted at one checkpoint per nanosecond would take ~584 years), but the arithmetic itself panics in debug mode and wraps silently in release. Replace with self.current_epoch = self.current_epoch.wrapping_add(1) to make the intent explicit and safe across build profiles.

Severity: LOW
Component: mono-move experimental runtime (not yet connected to production transaction execution)

vgao1996

Looks mostly good, but a critical issue in the micro-op verifier.

vgao1996 · 2026-05-29T21:25:20Z

+    BorrowGlobal {
+        addr: FrameOffset,
+        ty: InternedType,
+        dst: FrameOffset,
+    },
+    /// Stores a reference to the destination offset pointing to the resource
+    /// of the given type exists at the given address. The  reference points to
+    /// the object owned by the current transaction's heap. Aborts if resource
+    /// does not exist.
+    ///
+    /// May trigger GC.
+    BorrowGlobalMut {
+        addr: FrameOffset,
+        ty: InternedType,
+        dst: FrameOffset,
+    },


No need to change anything now, but wondering if we could just merge these two and have a is_mut: bool field.

vgao1996 · 2026-05-29T21:29:03Z

+            MicroOp::Exists { addr, ty: _, dst }
+            | MicroOp::BorrowGlobal { addr, ty: _, dst }
+            | MicroOp::BorrowGlobalMut { addr, ty: _, dst }
+            | MicroOp::MoveFrom { addr, ty: _, dst } => {
+                self.check_frame_access(Some(pc), addr, 32);
+                self.check_frame_access_8(pc, dst);
+            },


This doesn't seem right.

Exists stores a bool at dst so this should be 1B? (Thought I'm not sure if we have done supporting <8B alignment in the specializer or not)

BorrowGlobal and BorrowGlobalMut store a reference, which is a 16B fat pointer

MoveFrom stores an 8B owned heap pointer

vgao1996 · 2026-05-29T21:33:00Z

+    /// TODO: move to execution context
+    pub fn checkpoint(&mut self) {
+        self.read_write_set.checkpoint();
+    }
+
+    /// TODO: move to execution context
+    pub fn rollback(&mut self, n: usize) -> RuntimeResult<()> {
+        self.read_write_set.rollback(n)
+    }
+
+    /// TODO: move to execution context
+    pub fn checkpoint_depth(&self) -> usize {
+        self.read_write_set.checkpoint_depth()
+    }
+
+    /// TODO: move to execution context
+    pub fn current_epoch(&self) -> u64 {
+        self.read_write_set.current_epoch()
+    }
+
+    /// TODO: move to execution context
+    pub fn journal_len(&self) -> usize {
+        self.read_write_set.journal_len()
+    }


Yes, as you've pointed out, these really aren't interpreter APIs. Rather, we want to ensure that these aren't called when an interpreter is alive.

vgao1996 · 2026-05-29T21:51:52Z

+    fn deep_copy(&mut self, root: NonNull<u8>) -> RuntimeResult<NonNull<u8>> {
+        let root_guard = self.pinned_roots.pin(root);
+        match self.heap.try_deep_copy(self.exec_ctx, root_guard.get()) {
+            Ok(ptr) => Ok(ptr),
+            Err(AllocationError::RuntimeError(err)) => Err(err),
+            Err(AllocationError::OutOfHeapMemory { .. }) => {
+                gc_collect!(self)?;
+                // Re-read the root pointer from the pin, as its address have
+                // been changed by the GC.
+                self.heap
+                    .try_deep_copy(self.exec_ctx, root_guard.get())
+                    .map_err(AllocationError::into_runtime_error)
+            },
+        }
+    }


Ah this is a good example showing why it's helpful to let the caller decide whether to trigger GC.

I traced through the deepy_copy algorithm both myself and with claude and can verify that the current implementation is likely sound -- GC cannot run while we are in the middle of copying a value.

vgao1996 · 2026-05-29T21:59:01Z

+    /// Returns the data-region pointer of the freshly-allocated root copy,
+    /// or [`AllocationError::OutOfHeapMemory`] on the first allocation that
+    /// does not fit. There is no GC during copying and the caller has to
+    /// handle memory.


Nit: "caller has to handle memory" => "caller may have to run GC and retry if the initial clone fails due to insufficient memory"

Also should this function be unsafe? I mean, I could feed it with any non-zero pointer I guess?

vgao1996 · 2026-05-29T22:00:58Z

+        })?;
+
+        // Walk pointer fields. The pointer values are pointing to old data,
+        // so they need to be deep-copied and the source has to be patched.


Comment slightly misleading -- it's not the source that needs to be patched, but the corresponding slot in the cloned parent object.

vgao1996 · 2026-05-29T22:12:25Z

+                    {
+                        EntryPtr::Writable(ptr) => ptr,
+                        EntryPtr::NonWritable(ptr) => {
+                            let ptr = self.deep_copy(ptr)?;


How feasible is it to get this logic encapsulated in global_storage.rs as well?

A key invariant we have to maintain is that an existing global value needs to be CoW'ed the next time it's borrowed mutably in a new epoch. It's not immediately clear this is indeed handled by just looking at that file.

vineethk

Review 1/n

vineethk · 2026-05-29T20:07:11Z

+/// transaction write).
+// TODO:
+//   Replace with Block-STM transaction index and incarnation pair.
+pub type Version = u64;


Could we qualify this name better? Version is too generic.

vineethk · 2026-05-29T20:08:47Z

+//   Replace with Block-STM transaction index and incarnation pair.
+pub type Version = u64;
+
+/// Key to (in-memory) global storage.


Should we add some notes/invariants about the lifetime of this value, given at it includes InternedType?

vineethk · 2026-05-29T20:30:12Z

+    fn get_resource(&self, key: StorageKey) -> Result<StorageRead, ResourceProviderError>;
+}
+
+/// Empty storage with no resources.


Should the trait implementations move to their own files/modules?

vineethk · 2026-05-29T20:34:09Z

+// Copyright (c) Aptos Foundation
+// Licensed pursuant to the Innovation-Enabling Source Code License, available at https://github.com/aptos-labs/aptos-core/blob/main/LICENSE
+
+//! Per-transaction global-storage state: read cache with pending writes and


Maybe nice to give a bit more information here (kind of a summary map of the design, what different terms mean, and why the different features/complexities are needed). Some of this may already be in the corresponding design doc, may be we can migrate some of those here?

vineethk · 2026-05-29T20:40:37Z

+
+/// Global state can be saved during execution by recording checkpoints. Epoch
+/// is a counter incremented by every checkpoint.
+pub type Epoch = u64;


Based on the comment description, Epoch is a bit opaque, we can just call it CheckpointCounter or CheckpointVersion?

vineethk · 2026-05-29T22:22:26Z

    /// Useful for testing GC correctness.
    ForceGC,
+
+    /// Stores a boolean value at destination offset if resource of the given


Right now, all these instructions seem to be under the debugging section, which they are not.

vineethk · 2026-05-29T22:24:38Z

+    /// Stores a boolean value at destination offset if resource of the given
+    /// type exists at the given address.
+    Exists {
+        addr: FrameOffset,


The general convention for these instructions is to have destination first. Comment applies to all these new instructions.

vineethk · 2026-05-29T22:26:18Z

    ForceGC,
+
+    /// Stores a boolean value at destination offset if resource of the given
+    /// type exists at the given address.


Also, let's just say destination or destination slot instead of destination offset. Here and for other new instructions.

vineethk · 2026-05-29T22:28:02Z

+        dst: FrameOffset,
+    },
+    /// Stores a reference to the destination offset pointing to the resource
+    /// of the given type exists at the given address. Aborts if resource does


This sentence is very confusing: it reads as if we are storing &destination_offset. Rewrite. Same with the following instruction.

vineethk · 2026-05-29T22:32:16Z

+        ty: InternedType,
+        dst: FrameOffset,
+    },
+    /// Stores a reference to the destination offset pointing to the resource


There is very little difference in the documentation about BorrowGlobal vs. BorrowGlobalMut. If there is any difference, it should reflect in the name (is it the ownership in the current transaction's heap)? Other borrows don't currently differentiate between mut and immut refs, so also worth describing why we do it here.

georgemitenkov mentioned this pull request May 22, 2026

[mono-move] Global storage design #19536

Merged

22 tasks

georgemitenkov added the mono-move label May 22, 2026

georgemitenkov force-pushed the george/global-storage-impl branch 2 times, most recently from d7cec7d to d668799 Compare May 22, 2026 13:52