[web] Fix use-after-free of model buffer with use_ort_model_bytes_for_initializers#29033
Open
yrliou wants to merge 1 commit into
Open
[web] Fix use-after-free of model buffer with use_ort_model_bytes_for_initializers#29033yrliou wants to merge 1 commit into
yrliou wants to merge 1 commit into
Conversation
…_initializers `createSession()` copies the model data into a WASM heap buffer (`modelDataOffset`) and unconditionally `_free()`s it in its `finally` block, right after the session is built. That is safe by default, where ORT copies every weight initializer into separate session-owned memory. But when `session.use_ort_model_bytes_for_initializers` is set, the initializer tensors point *directly* into that buffer instead of being copied, so freeing it after creation is a use-after-free: later allocations overwrite the weights and inference produces garbage. This makes the buffer a per-session resource, freed alongside the input/ output name buffers that `releaseSession()` already cleans up: - `SessionMetadata` gains a `modelDataOffsetToFree` field (0 when there is nothing to defer). - When the option is set, `createSession()` stores `modelDataOffset` in that field and the `finally` skips the free; the error path leaves the field 0 so the buffer is still freed and never leaks. - `releaseSession()` frees the stored offset after `_OrtReleaseSession()`. - The two `activeSessions.set()` rebuilds in `run()` preserve the field so it survives IO-binding updates. Default sessions (no option) are unaffected: the buffer is still freed immediately in `createSession()`. Fixes microsoft#19491. `use_ort_model_bytes_for_initializers` avoids a full second copy of the weights in the WASM heap (measured ~halved steady memory for an int4 model), but is unusable on web until the buffer outlives session creation. Web is the only platform that both allocates and frees this buffer internally; native APIs leave ownership with the caller, who keeps it alive for the session lifetime. Verified manually in a browser: with the option set, inference now produces correct output and steady WASM memory drops as expected, and sessions create/run/release without crashing.
Contributor
|
@yrliou please read the following Contributor License Agreement(CLA). If you agree with the CLA, please reply with the following information.
Contributor License AgreementContribution License AgreementThis Contribution License Agreement (“Agreement”) is agreed to by the party signing below (“You”),
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Seeing inference result becomes garbage when optimizing memory footprints as suggested by https://onnxruntime.ai/docs/performance/model-optimizations/ort-format-models.html#load-ort-format-model-from-an-in-memory-byte-array.
The root cause is that
createSession()copies the model data into a WASM heap buffer (modelDataOffset) and unconditionally_free()s it in itsfinallyblock, right after the session is built. That is safe by default, where ORT copies every weight initializer into separate session-owned memory.But when
session.use_ort_model_bytes_for_initializersis set, the initializer tensors point directly into that buffer instead of being copied, so freeing it after creation is a use-after-free: later allocations overwrite the weights and inference produces garbage.This PR makes the buffer a per-session resource, freed alongside the input/ output name buffers that
releaseSession()already cleans up:SessionMetadatagains amodelDataOffsetToFreefield (0 when there is nothing to defer).createSession()storesmodelDataOffsetin that field and thefinallyskips the free; the error path leaves the field 0 so the buffer is still freed and never leaks.releaseSession()frees the stored offset after_OrtReleaseSession().activeSessions.set()rebuilds inrun()preserve the field so it survives IO-binding updates.Default sessions (no option) are unaffected: the buffer is still freed immediately in
createSession().Motivation and Context
Fixes #19491.
use_ort_model_bytes_for_initializersavoids a full second copy of the weights in the WASM heap (measured ~halved steady memory for an int4 model), but is unusable on web until the buffer outlives session creation. Web is the only platform that both allocates and frees this buffer internally; native APIs leave ownership with the caller, who keeps it alive for the session lifetime.Verified manually in a browser: with the option set, inference now produces correct output and steady WASM memory drops as expected, and sessions create/run/release without crashing.