fix(onnx): fall back when an execution provider's shared lib fails to load#1706
Open
tsushanth wants to merge 1 commit into
Open
fix(onnx): fall back when an execution provider's shared lib fails to load#1706tsushanth wants to merge 1 commit into
tsushanth wants to merge 1 commit into
Conversation
… load Closes huggingface#1642. On Linux x64 the auto-fallback chain pushes `cuda` first. ONNX Runtime fails the whole session when ANY provider fails to register, so a box with no CUDA install throws `OrtSessionOptionsAppendExecutionProvider_ Cuda: Failed to load shared library` and never reaches webgpu/cpu — the "auto" behavior degenerates into a hard error. `createInferenceSession` now: - Tracks providers that have failed to load in a module-level Set so subsequent sessions skip them up front rather than re-probing. - On `InferenceSession.create` failure, detects the failing provider name from the ORT error message, drops it from the list, logs which provider was disabled, and retries with the remaining providers when there's something left to fall back to. - Leaves caller-supplied non-list `executionProviders` and unrelated failures untouched — the retry is gated on a regex match of the ORT-specific load-failure message and the presence of >1 provider. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #1642.
Summary
On Linux x64 the auto-fallback chain pushes
cudafirst. ONNX Runtime fails the whole session when any provider inexecutionProvidersfails to register, so a box with no CUDA install throwsand never reaches
webgpu/cpu—device: 'auto'degenerates into a hard error instead of falling back. Same shape for any other optional provider whose shared library isn't on the host.Fix
createInferenceSessioninbackends/onnx.js:Setso subsequent sessions skip them up front rather than re-probing on every call.InferenceSession.createfailure, parses the failing provider name out of the ORT error (OrtSessionOptionsAppendExecutionProvider_<Name>), drops it from the list, logs which provider got disabled, and retries with the remaining providers — but only when there's something left to fall back to.executionProvidersand unrelated failures untouched: the retry is gated on a regex match of the ORT-specific load-failure message and the presence of more than one provider, so non-provider-load errors (model corruption, etc.) still surface immediately.Notes
I didn't add a unit test — the existing test suite is integration-style (loads real models from HF Hub) and there's no module-level mock pattern for
onnxruntime-node'sInferenceSession.create. The fix is auditable as a self-contained change inonnx.jsand was structured to be a no-op in the happy path (no caller-supplied providers, single-provider chains, non-Linux platforms).