Describe the issue
Problem:
The native WebGPU EP already supports buffer cache mode options in C++, for example ep.webgpuexecutionprovider.storageBufferCacheMode.
However, onnxruntime-web does not currently forward these options from JS, so web users cannot configure the WebGPU buffer cache mode from executionProviders.
This can matter for static-shape models. In our repro, the default bucket mode uses significantly more WebGPU buffer memory than simple mode:
| Metric |
bucket default |
simple |
| Weights after session create |
706 MB |
706 MB |
| Peak live bytes via WebGPU API |
5.13 GB |
3.73 GB (-27.3%) |
| New allocations on run 1 |
4.42 GB |
3.02 GB |
| New allocations on run 2 |
3.04 GB |
0 MB |
The model is static-shape, so the second run is the most important comparison. With the default bucket mode, the second run still creates about 3.04 GB of new WebGPU buffers. With simple mode, exact-size buffers are reused and the second run creates no new buffers.
The simple result above uses the same onnxruntime-web@1.26.0 WebGPU bundle with a small JS patch that forwards storageBufferCacheMode.
Proposal:
Forward the existing WebGPU EP buffer cache mode options from JS to the WebGPU EP.
Example:
executionProviders: [{
name: 'webgpu',
storageBufferCacheMode: 'simple',
}]
To reproduce
- Open https://musetric.github.io/onnxruntime-webgpu-buffer-cache-repro/ in Chrome or Edge with WebGPU enabled.
- Click "Run both".
- Wait for the model files to download from Hugging Face. The first run downloads about 741 MB.
- Compare "Official bucket" and "Patched simple" in the results table.
- Open the "Bucket memory" and "Simple memory" tabs to inspect the peak live-buffer histograms.
The demo runs the same static-shape model twice per mode. The only intended difference is:
- official mode: original
onnxruntime-web@1.26.0 WebGPU bundle, default bucket
- patched mode: same bundle with JS forwarding for
storageBufferCacheMode, then storageBufferCacheMode: 'simple'
Urgency
No response
ONNX Runtime Installation
Released Package
ONNX Runtime Version or Commit ID
1.26.0
Execution Provider
'webgpu' (WebGPU)
Describe the issue
Problem:
The native WebGPU EP already supports buffer cache mode options in C++, for example
ep.webgpuexecutionprovider.storageBufferCacheMode.However,
onnxruntime-webdoes not currently forward these options from JS, so web users cannot configure the WebGPU buffer cache mode fromexecutionProviders.This can matter for static-shape models. In our repro, the default
bucketmode uses significantly more WebGPU buffer memory thansimplemode:The model is static-shape, so the second run is the most important comparison. With the default
bucketmode, the second run still creates about 3.04 GB of new WebGPU buffers. Withsimplemode, exact-size buffers are reused and the second run creates no new buffers.The
simpleresult above uses the sameonnxruntime-web@1.26.0WebGPU bundle with a small JS patch that forwardsstorageBufferCacheMode.Proposal:
Forward the existing WebGPU EP buffer cache mode options from JS to the WebGPU EP.
Example:
To reproduce
The demo runs the same static-shape model twice per mode. The only intended difference is:
onnxruntime-web@1.26.0WebGPU bundle, defaultbucketstorageBufferCacheMode, thenstorageBufferCacheMode: 'simple'Urgency
No response
ONNX Runtime Installation
Released Package
ONNX Runtime Version or Commit ID
1.26.0
Execution Provider
'webgpu' (WebGPU)