fix(onnx): fall back when an execution provider's shared lib fails to load by tsushanth · Pull Request #1706 · huggingface/transformers.js

tsushanth · 2026-06-10T20:51:37Z

Closes #1642.

Summary

On Linux x64 the auto-fallback chain pushes cuda first. ONNX Runtime fails the whole session when any provider in executionProviders fails to register, so a box with no CUDA install throws

OrtSessionOptionsAppendExecutionProvider_Cuda: Failed to load shared library

and never reaches webgpu / cpu — device: 'auto' degenerates into a hard error instead of falling back. Same shape for any other optional provider whose shared library isn't on the host.

Fix

createInferenceSession in backends/onnx.js:

Tracks providers that have failed to register in a module-level Set so subsequent sessions skip them up front rather than re-probing on every call.
On InferenceSession.create failure, parses the failing provider name out of the ORT error (OrtSessionOptionsAppendExecutionProvider_<Name>), drops it from the list, logs which provider got disabled, and retries with the remaining providers — but only when there's something left to fall back to.
Leaves caller-supplied non-array executionProviders and unrelated failures untouched: the retry is gated on a regex match of the ORT-specific load-failure message and the presence of more than one provider, so non-provider-load errors (model corruption, etc.) still surface immediately.

Notes

I didn't add a unit test — the existing test suite is integration-style (loads real models from HF Hub) and there's no module-level mock pattern for onnxruntime-node's InferenceSession.create. The fix is auditable as a self-contained change in onnx.js and was structured to be a no-op in the happy path (no caller-supplied providers, single-provider chains, non-Linux platforms).

… load Closes huggingface#1642. On Linux x64 the auto-fallback chain pushes `cuda` first. ONNX Runtime fails the whole session when ANY provider fails to register, so a box with no CUDA install throws `OrtSessionOptionsAppendExecutionProvider_ Cuda: Failed to load shared library` and never reaches webgpu/cpu — the "auto" behavior degenerates into a hard error. `createInferenceSession` now: - Tracks providers that have failed to load in a module-level Set so subsequent sessions skip them up front rather than re-probing. - On `InferenceSession.create` failure, detects the failing provider name from the ORT error message, drops it from the list, logs which provider was disabled, and retries with the remaining providers when there's something left to fall back to. - Leaves caller-supplied non-list `executionProviders` and unrelated failures untouched — the retry is gated on a regex match of the ORT-specific load-failure message and the presence of >1 provider. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(onnx): fall back when an execution provider's shared lib fails to load#1706

fix(onnx): fall back when an execution provider's shared lib fails to load#1706
tsushanth wants to merge 1 commit into
huggingface:mainfrom
tsushanth:fix/auto-device-cuda-fallback

tsushanth commented Jun 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

tsushanth commented Jun 10, 2026

Summary

Fix

Notes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant