Skip to content

perf(throw_error): cheaper error construction and hook installation#4777

Open
sabify wants to merge 3 commits into
leptos-rs:leptos_0.9from
sabify:perf/any_error-optimizations
Open

perf(throw_error): cheaper error construction and hook installation#4777
sabify wants to merge 3 commits into
leptos-rs:leptos_0.9from
sabify:perf/any_error-optimizations

Conversation

@sabify

@sabify sabify commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

Two small optimizations to throw_error hot paths, with no API breakage or behavior change.

1. Single-allocation Error::new for concrete error types

The only way to construct an Error was the blanket From impl, which boxes the value into a Box<dyn Error> and then re-allocates it behind an Arc (the refcount header prevents reusing the box's allocation) — two allocations plus a memcpy per error.

This adds an additive inherent constructor that places a concrete error directly into the Arc in one allocation (~1.7x faster in isolated benches), and routes the two call sites that already hold a concrete type through it:

  • server_fn: From<ServerFnError> for Error (runs on every server-fn error conversion)
  • hydration_context: serialized error deserialization during hydration

The &str/String/anyhow paths keep using From (they need the box anyway).

2. Swap instead of clone in ErrorHookFuture::poll

ErrorHookFuture::poll cloned its captured hook Arc on every poll — two atomic refcount ops per poll on a future that wraps every Suspense body and is polled once per await-point resumption.

Since the future only needs to lend the hook to the thread-local slot for the duration of the poll, it now mem::swaps it in and back out via a panic-safe drop guard: zero atomic ops per poll (~4.1ns → ~1.7ns per poll in isolated benches). Observable semantics are unchanged; a regression test asserts the captured hook overrides the ambient hook during the poll and the ambient hook is restored afterwards.

sabify added 2 commits June 7, 2026 13:16
…crete sites

Error construction on the throw path performed two heap allocations and a
memcpy where one allocation suffices.

Problem
-------
The only public constructor is the blanket conversion:

    impl<T> From<T> for Error
    where
        T: Into<Box<dyn error::Error + Send + Sync + 'static>>,
    {
        fn from(value: T) -> Self {
            Error(Arc::from(value.into()))
        }
    }

For a concrete, sized error, `value.into()` first materialises a
`Box<dyn Error + Send + Sync>` (allocation #1). `Arc::from(Box<T>)` for an
unsized payload cannot reuse the box's allocation, because the `Arc` must
prepend its strong/weak refcount header in front of the value: it allocates a
fresh, larger block (allocation #2), memcpys the error out of the box, and
frees the box. Net cost: 2 allocations + 1 free + 1 memcpy.

`From` itself cannot be narrowed to `T: error::Error` without breaking
`Error::from("str")` / `Error::from(String)` (neither implements
`std::error::Error`); the two bounds overlap, so a second `From` impl would
violate coherence.

Profiling
---------
Isolated bench, mean of 2 runs:

    fn bench<F: FnMut()>(name: &str, iters: u64, mut f: F) {
        for _ in 0..(iters / 10).max(1) { f(); }     // warmup
        let t = Instant::now();
        for _ in 0..iters { f(); }
        println!("{name:<44} {:>9.3} ns/op",
            t.elapsed().as_nanos() as f64 / iters as f64);
    }

    bench("Error::from(DummyErr) [Box->Arc realloc]", 5_000_000,
        || { black_box(&Error::from(black_box(DummyErr { _pad: [0; 32] }))); });
    bench("Error::new(DummyErr)  [single alloc]",     5_000_000,
        || { black_box(&Error::new(black_box(DummyErr { _pad: [0; 32] }))); });

    Error::from(DummyErr) [Box->Arc realloc]   ~35.3 ns/op
    Error::new(DummyErr)  [single alloc]       ~20.5 ns/op   (~1.7x faster)

The 2x-ish ratio is the fingerprint of the eliminated second allocation: if
`Arc::from(Box)` reused the box's memory the two would land together.

Solution
--------
Add an additive, non-breaking inherent constructor that places a concrete
error directly into the `Arc` (one allocation, no copy):

    pub fn new<E>(err: E) -> Self
    where
        E: error::Error + Send + Sync + 'static,
    {
        Error(Arc::new(err))      // Arc<E> coerces to Arc<dyn Error> in place
    }

Route the hot construction sites that already hold a concrete error type
through it:

  - server_fn `From<ServerFnError> for Error`: `ServerFnErrorWrapper` derives
    `std::error::Error`, so it can skip the box. Runs on every server-fn
    error conversion.
  - hydration_context error deserialization: `SerializedError` is a concrete
    `std::error::Error`. Runs per serialized error during hydration.

The `&str` / `String` / `anyhow` paths keep using `From` (they are unsized on
arrival and need the box anyway).
…:poll

`ErrorHookFuture::poll` cloned the captured hook `Arc` on every single poll.

Problem
-------
The wrapper installs its captured hook into the thread-local slot for the
duration of each poll:

    fn poll(self: Pin<&mut Self>, cx: &mut Context<'_>) -> Poll<Self::Output> {
        let this = self.project();
        let _hook = ResetErrorHookOnDrop(ERROR_HOOK.with_borrow_mut(|cur| {
            std::mem::replace(cur, this.hook.clone())   // atomic inc every poll
        }));
        this.inner.poll(cx)                              // ...atomic dec on drop
    }

Every poll clones `this.hook` (`Option<Arc<dyn ErrorHook>>`): one atomic
increment to install it, then one atomic decrement when the swapped-in clone is
dropped through the guard. That is two atomic RMW operations per poll even
though the hook being installed is the value the future already owns in its own
field — it only needs to lend it, not own a second copy.

`leptos` wraps each `Suspense` body future in `ErrorHookFuture`, which is polled
once per await-point resumption for the life of the suspense, so this is a
genuinely repeated path. Under cross-thread sharing the atomics also risk
cache-line contention beyond the single-thread cost.

Profiling
---------
opt-level=3 + lto, mean of 2 runs. `CloneWrapper`/`SwapWrapper` mirror the old
and new poll bodies over a thread-local slot to isolate the atomic delta:

    // clone (old):
    let prev = SLOT.with_borrow_mut(|c| mem::replace(c, this.hook.clone()));
    let r = Pin::new(&mut this.inner).poll(cx);
    SLOT.with_borrow_mut(|c| *c = prev);

    // swap (new):
    SLOT.with_borrow_mut(|c| mem::swap(c, &mut this.hook));
    let r = Pin::new(&mut this.inner).poll(cx);
    SLOT.with_borrow_mut(|c| mem::swap(c, &mut this.hook));

    poll bare future          [baseline]      ~0.40 ns/op
    poll wrapper              [clone hook]     ~4.11 ns/op
    poll wrapper              [swap hook]      ~0.60 ns/op   (~3.5 ns saved)
    poll ErrorHookFuture      [real, swap]     ~1.65 ns/op   (clone was ~4.13)

Solution
--------
Lend the future's own hook to the slot with a swap (no clone) and swap it back
with a panic-safe guard, so zero atomic refcount ops run per poll:

    ERROR_HOOK.with_borrow_mut(|cur| std::mem::swap(cur, this.hook));
    struct Restore<'a>(&'a mut Option<Arc<dyn ErrorHook>>);
    impl Drop for Restore<'_> {
        fn drop(&mut self) {
            let _ = ERROR_HOOK.try_with(|cell| {
                if let Ok(mut cur) = cell.try_borrow_mut() {
                    std::mem::swap(&mut *cur, self.0);
                }
            });
        }
    }
    let _restore = Restore(this.hook);
    this.inner.poll(cx)

`this.hook` and `this.inner` are disjoint projected fields, so holding
`&mut this.hook` in the guard while polling `inner` is sound. Observable
semantics are unchanged: during the poll the slot holds the captured hook
(including the `None` case, which correctly clears the slot); after the poll the
slot holds the previous ambient hook and `this.hook` is restored — exactly as
the clone version, minus the atomics. The guard mirrors `ResetErrorHookOnDrop`'s
best-effort, panic-safe restore. Adds a regression test asserting the captured
hook overrides the ambient hook during the poll and is restored afterwards.
@sabify sabify changed the title ## perf(throw_error): cheaper error construction and hook installation perf(throw_error): cheaper error construction and hook installation Jun 11, 2026
Comment thread any_error/src/lib.rs Outdated
@gbj

gbj commented Jun 12, 2026

Copy link
Copy Markdown
Collaborator

9cddc6c seems like micro-optimization churn. It's talking about saving nanoseconds in the context of doing async web request/response work...

@sabify

sabify commented Jun 13, 2026

Copy link
Copy Markdown
Contributor Author

Yes, a 2.41x faster optimization in hot path + panic-safe drop guard.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants