Skip to content

[js/web] Forward WebGPU EP buffer cache mode options from JS#29017

Open
ssam18 wants to merge 2 commits into
microsoft:mainfrom
ssam18:feat/webgpu-buffer-cache-mode-js-29016
Open

[js/web] Forward WebGPU EP buffer cache mode options from JS#29017
ssam18 wants to merge 2 commits into
microsoft:mainfrom
ssam18:feat/webgpu-buffer-cache-mode-js-29016

Conversation

@ssam18

@ssam18 ssam18 commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

Description

The native WebGPU EP already supports the buffer cache mode options (ep.webgpuexecutionprovider.storageBufferCacheMode and friends), but onnxruntime-web never forwarded them from executionProviders, so they were unreachable from JS. This adds storageBufferCacheMode, uniformBufferCacheMode, queryResolveBufferCacheMode and defaultBufferCacheMode to WebGpuExecutionProviderOption and forwards them to the EP the same way validationMode is forwarded today, with the values validated against the set the native side accepts. The options ride the existing SessionOptionsAppendExecutionProvider path, which prefixes each key into exactly the config entry the EP reads, so no native changes are needed.

Motivation and Context

Fixes #29016. For static shape models, storageBufferCacheMode: 'simple' reuses exact size buffers across runs instead of allocating new bucket sized ones, which the issue's repro shows cutting peak WebGPU memory by about 27 percent. Verified locally with tsc builds of js/common and js/web, prettier and eslint, the js/common unit tests, and type level checks that the new options compile and invalid values are rejected.

The native WebGPU EP already understands the storage, uniform, query resolve and default buffer cache mode options, but onnxruntime-web never forwarded them from executionProviders, so web users could not configure them. This adds the four fields to WebGpuExecutionProviderOption and passes them through to the EP like the existing validationMode option, with the same value validation the native side performs. For static shape models, setting storageBufferCacheMode to simple lets exact size buffers be reused across runs, which can cut peak GPU memory noticeably compared to the default bucket mode.

Fixes microsoft#29016
@popelenkow

Copy link
Copy Markdown

Thanks for the quick fix.

One concern: should defaultBufferCacheMode be exposed in this PR?

The native side declares/parses ep.webgpuexecutionprovider.defaultBufferCacheMode, but it does not appear to be wired into the active BufferManager.

It is parsed here together with the other buffer cache modes:

WebGpuBufferCacheConfig& buffer_cache_config = config.buffer_cache_config;
parse_buffer_cache_mode(kStorageBufferCacheMode, buffer_cache_config.storage.mode);
parse_buffer_cache_mode(kUniformBufferCacheMode, buffer_cache_config.uniform.mode);
parse_buffer_cache_mode(kQueryResolveBufferCacheMode, buffer_cache_config.query_resolve.mode);
parse_buffer_cache_mode(kDefaultBufferCacheMode, buffer_cache_config.default_entry.mode);

But when the main WebGPU BufferManager is created, only storage/uniform/queryResolve modes are passed:

buffer_mgr_ = BufferManagerFactory::Create(*this,
config.buffer_cache_config.storage.mode,
config.buffer_cache_config.uniform.mode,
config.buffer_cache_config.query_resolve.mode);

BufferManager itself also accepts only those three modes, and default_cache_ is hardcoded to BufferCacheMode::Disabled:

BufferManager::BufferManager(WebGpuContext& context, BufferCacheMode storage_buffer_cache_mode, BufferCacheMode uniform_buffer_cache_mode, BufferCacheMode query_resolve_buffer_cache_mode)
: context_{context},
storage_cache_{CreateBufferCacheManager(storage_buffer_cache_mode)},
uniform_cache_{CreateBufferCacheManager(uniform_buffer_cache_mode)},
query_resolve_cache_{CreateBufferCacheManager(query_resolve_buffer_cache_mode)},
default_cache_{CreateBufferCacheManager(BufferCacheMode::Disabled)} {

The factory also has no fourth/default cache mode parameter:

std::unique_ptr<BufferManager> BufferManagerFactory::Create(WebGpuContext& context, BufferCacheMode storage_buffer_cache_mode, BufferCacheMode uniform_buffer_cache_mode, BufferCacheMode query_resolve_buffer_cache_mode) {
return std::make_unique<BufferManager>(context, storage_buffer_cache_mode, uniform_buffer_cache_mode, query_resolve_buffer_cache_mode);
}

So exposing defaultBufferCacheMode from JS may make it look configurable even though it is effectively ignored unless the native BufferManager wiring is updated as well.

The JS forwarding side of this PR already exposed defaultBufferCacheMode but the native BufferManager hardcoded its default_cache_ to Disabled, so the option had no effect end to end. Extend BufferManager and BufferManagerFactory::Create to take a fourth mode, plumb config.buffer_cache_config.default_entry.mode from webgpu_context into the main BufferManager, and preserve current behavior for the initializer and per-graph managers by passing Disabled there. Addresses popelenkow review comment on microsoft#29017.

Signed-off-by: Samaresh Kumar Singh <ssam3003@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Web] Expose WebGPU EP buffer cache mode options in JS

2 participants