Skip to content

[Web] Expose WebGPU EP buffer cache mode options in JS #29016

@popelenkow

Description

@popelenkow

Describe the issue

Problem:
The native WebGPU EP already supports buffer cache mode options in C++, for example ep.webgpuexecutionprovider.storageBufferCacheMode.

However, onnxruntime-web does not currently forward these options from JS, so web users cannot configure the WebGPU buffer cache mode from executionProviders.

This can matter for static-shape models. In our repro, the default bucket mode uses significantly more WebGPU buffer memory than simple mode:

Metric bucket default simple
Weights after session create 706 MB 706 MB
Peak live bytes via WebGPU API 5.13 GB 3.73 GB (-27.3%)
New allocations on run 1 4.42 GB 3.02 GB
New allocations on run 2 3.04 GB 0 MB

The model is static-shape, so the second run is the most important comparison. With the default bucket mode, the second run still creates about 3.04 GB of new WebGPU buffers. With simple mode, exact-size buffers are reused and the second run creates no new buffers.

The simple result above uses the same onnxruntime-web@1.26.0 WebGPU bundle with a small JS patch that forwards storageBufferCacheMode.

Proposal:
Forward the existing WebGPU EP buffer cache mode options from JS to the WebGPU EP.

Example:

executionProviders: [{
  name: 'webgpu',
  storageBufferCacheMode: 'simple',
}]

To reproduce

  1. Open https://musetric.github.io/onnxruntime-webgpu-buffer-cache-repro/ in Chrome or Edge with WebGPU enabled.
  2. Click "Run both".
  3. Wait for the model files to download from Hugging Face. The first run downloads about 741 MB.
  4. Compare "Official bucket" and "Patched simple" in the results table.
  5. Open the "Bucket memory" and "Simple memory" tabs to inspect the peak live-buffer histograms.

The demo runs the same static-shape model twice per mode. The only intended difference is:

  • official mode: original onnxruntime-web@1.26.0 WebGPU bundle, default bucket
  • patched mode: same bundle with JS forwarding for storageBufferCacheMode, then storageBufferCacheMode: 'simple'

Urgency

No response

ONNX Runtime Installation

Released Package

ONNX Runtime Version or Commit ID

1.26.0

Execution Provider

'webgpu' (WebGPU)

Metadata

Metadata

Assignees

No one assigned

    Labels

    ep:WebGPUort-web webgpu providerplatform:webissues related to ONNX Runtime web; typically submitted using template

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions