Skip to content

fix(onnx): fall back when an execution provider's shared lib fails to load#1706

Open
tsushanth wants to merge 1 commit into
huggingface:mainfrom
tsushanth:fix/auto-device-cuda-fallback
Open

fix(onnx): fall back when an execution provider's shared lib fails to load#1706
tsushanth wants to merge 1 commit into
huggingface:mainfrom
tsushanth:fix/auto-device-cuda-fallback

Conversation

@tsushanth

Copy link
Copy Markdown

Closes #1642.

Summary

On Linux x64 the auto-fallback chain pushes cuda first. ONNX Runtime fails the whole session when any provider in executionProviders fails to register, so a box with no CUDA install throws

OrtSessionOptionsAppendExecutionProvider_Cuda: Failed to load shared library

and never reaches webgpu / cpudevice: 'auto' degenerates into a hard error instead of falling back. Same shape for any other optional provider whose shared library isn't on the host.

Fix

createInferenceSession in backends/onnx.js:

  • Tracks providers that have failed to register in a module-level Set so subsequent sessions skip them up front rather than re-probing on every call.
  • On InferenceSession.create failure, parses the failing provider name out of the ORT error (OrtSessionOptionsAppendExecutionProvider_<Name>), drops it from the list, logs which provider got disabled, and retries with the remaining providers — but only when there's something left to fall back to.
  • Leaves caller-supplied non-array executionProviders and unrelated failures untouched: the retry is gated on a regex match of the ORT-specific load-failure message and the presence of more than one provider, so non-provider-load errors (model corruption, etc.) still surface immediately.

Notes

I didn't add a unit test — the existing test suite is integration-style (loads real models from HF Hub) and there's no module-level mock pattern for onnxruntime-node's InferenceSession.create. The fix is auditable as a self-contained change in onnx.js and was structured to be a no-op in the happy path (no caller-supplied providers, single-provider chains, non-Linux platforms).

… load

Closes huggingface#1642.

On Linux x64 the auto-fallback chain pushes `cuda` first. ONNX Runtime
fails the whole session when ANY provider fails to register, so a box
with no CUDA install throws `OrtSessionOptionsAppendExecutionProvider_
Cuda: Failed to load shared library` and never reaches webgpu/cpu — the
"auto" behavior degenerates into a hard error.

`createInferenceSession` now:
- Tracks providers that have failed to load in a module-level Set so
  subsequent sessions skip them up front rather than re-probing.
- On `InferenceSession.create` failure, detects the failing provider name
  from the ORT error message, drops it from the list, logs which provider
  was disabled, and retries with the remaining providers when there's
  something left to fall back to.
- Leaves caller-supplied non-list `executionProviders` and unrelated
  failures untouched — the retry is gated on a regex match of the
  ORT-specific load-failure message and the presence of >1 provider.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Auto device on Linux x64 fails hard when CUDA shared library is unavailable instead of falling back

1 participant