Skip to content

Update dependency com.microsoft.onnxruntime:onnxruntime to v1.26.0#3790

Merged
epugh merged 2 commits into
apache:mainfrom
solrbot:renovate/onnx
Jun 3, 2026
Merged

Update dependency com.microsoft.onnxruntime:onnxruntime to v1.26.0#3790
epugh merged 2 commits into
apache:mainfrom
solrbot:renovate/onnx

Conversation

@solrbot

@solrbot solrbot commented Oct 17, 2025

Copy link
Copy Markdown
Collaborator

This PR contains the following updates:

Package Type Update Change
com.microsoft.onnxruntime:onnxruntime (source) dependencies minor 1.24.31.26.0

Release Notes

microsoft/onnxruntime (com.microsoft.onnxruntime:onnxruntime)

v1.26.0: 1.26.0

n.b. The following was generated via LLM from Git history. Only the contributor list has been verified.

ONNX Runtime Release 1.26.0

Announcement - Breaking Changes

  • Support for CUDA 12 will be removed in 1.27.0.
    • CUDA 13 will continue to be published as onnxruntime-<os>-<arch>-gpu_cuda13-<version>.<ext>
  • CUDA runtime will be moving soon to a dedicated Execution Provider (EP) instead of a published package from ORT core.

Highlights

  • Added optional memory mapping for .ort model loads (#​28164).
  • Added RISC-V Vector (RVV) support for CPU EP (#​28261).
  • OpenVINO EP upgraded for 1.26.0 development release (#​28297).
  • WebGPU gained GridSample support (#​28264) and Split-K improvements (#​28151).
  • CUDA plugin EP gained graph support (#​28002), profiling API (#​28216).

Security and Reliability Hardening

  • Replaced unrestricted Python setattr configuration with an allowlist (#​28083).
  • Hardened multiple OOB and overflow scenarios across ML and core ops:
  • Fixed session logger use-after-free during EP teardown under verbose logging (#​28274).

CUDA, Attention, and MLAS

  • Filled CUDA opset/operator gaps and extended support:
    • Transpose opset 23 -> 25 (#​27740).
    • QuantizeLinear/DequantizeLinear opset 25 (#​28046).
    • CUDA TopK INT8/INT16/UINT8 support (#​27862).
    • LabelEncoder CUDA support for numeric types (#​28045).
  • Attention/GQA improvements:
    • Fixed ONNX Attention min-bias alignment crash on SM<80 and masked-batch NaN behavior (#​27831).
    • Added FP32 QK accumulation path for unfused GQA attention (#​28198).
    • Added CUDART_VERSION reduction compatibility in GQA attention (#​28296).
    • Fixed CUDA 13 build error in GQA unfused attention (#​28309).
    • PagedAttention fallback for SM<80 fp16 (#​28200).
  • MLAS updates:
    • FP16 Gelu enablement (#​26815).
    • Arm64 BF16 fast-math conv kernels for NCHW/NCHWc paths (#​27878).

WebGPU, WebNN, and JavaScript

  • WebGPU feature and correctness updates:
    • Added GridSample (#​28264).
    • Split-K support for batch size > 1 (#​28151).
    • MatMulNBits refactor and batching improvements (#​28109, #​28197).
    • MHA correctness fix when present outputs are not requested (#​28027).
    • Buffer upload overflow fix (#​27948).
    • Position ID bounds validation in WebGPU/JS RotaryEmbedding (#​28214).
  • WebNN change:
    • Renamed pool2d property roundingType -> outputShapeRounding (#​28172).
  • JavaScript ecosystem maintenance:
    • Multiple dependency bumps.

Plugin EP and EP Ecosystem

Contributors

@​tianleiwu, @​yuslepukhin, @​edgchen1, @​vraspar, @​hariharans29, @​skottmckay, @​eserscor, @​xadupre, @​sanaa-hamel-microsoft, @​elwhyjay, @​Rishi-Dave, @​titaiwangms, @​adrianlizarraga, @​jatinwadhwa921, @​jchen10, @​Jiawei-Shao, @​maxwbuckley, @​preetha-intel, @​qjia7, @​qti-hungjuiw, @​RajeevSekar, @​umangb-09, @​adrastogi, @​akote123, @​amd-genmingz, @​ankitm3k, @​apsonawane, @​bachelor-dou, @​baijumeswani, @​bopeng1234, @​chilo-ms, @​chwarr, @​Craigacp, @​dccarmo, @​derdeljan-msft, @​ericcraw, @​fdwr, @​fs-eire, @​gaugarg-nv, @​gblong1, @​GopalakrishnanN, @​Honry, @​intbf, @​ishwar-raut1, @​Jaswanth51, @​javier-intel, @​JonathanC-ARM, @​julia-thorn, @​justinchuby, @​jwludzik, @​Kevin-Taha, @​Kotomi-Du, @​MayureshV1, @​mdvoretc-intel, @​miaobin, @​milpuz01, @​mingyueliuh, @​mklimenk, @​n1harika, @​prathikr, @​psakhamoori, @​qti-yuduo, @​quic-calvnguy, @​RyanMetcalfeInt8, @​sfatimar, @​sgbihu, @​ShirasawaSama, @​ssam18, @​susbhere, @​sushraja-msft, @​TejalKhade28, @​theHamsta, @​TomCrypto, @​TsofnatMaman, @​velonica0, @​vthaniel, @​wenqinI, @​xhan65, @​xhcao

v1.25.1: ONNX Runtime v1.25.1

n.b. This changelog is LLM generated. Only the contributor listing has been verified.

ONNX Runtime Release 1.25.1

📢 Announcements & Breaking Changes

ONNX Op Updates
  • Enhanced ONNX operator support with new opset versions: Reshape (opset 25), Transpose (opset 24) (#​27752)

✨ New Features

📊 New ONNX Ops & Model Support
  • LinearAttention and CausalConvState operators for Qwen3.5 model support (#​27907)
  • RotaryEmbedding (RotEMB) and RMSNorm operators added (#​27752)
  • Linear Attention signature support (#​27842)

🌐 Web & JavaScript

WebGPU EP
  • Qwen3.5 model support on WebGPU execution provider (#​27996)
  • QMoE 1-token decode path optimization — fused operations to reduce GPU dispatches for improved performance (#​27998)

🐛 Bug Fixes

Core Runtime Fixes
  • Improved filesystem error messages during Linux device discovery for better debugging experience (#​27289)
  • Fixed missing include for SetRawDataInTensorProto in NVIDIA TensorRT RTX tests (#​28065)

🙏 Contributors

Thanks to our 7 contributors for this release:
@​guschmue, @​sanaa-hamel-microsoft, @​apsonawane, @​eserscor, @​ishwar-raut1, @​qjia7, @​theHamsta

Full Changelog: microsoft/onnxruntime@v1.25.0...v1.25.1

v1.25.0: ONNX Runtime v1.25.0

📢 Announcements & Breaking Changes

Build & Platform
  • C++20 is now required to build ONNX Runtime from source. Minimum toolchains: MSVC 19.29+, GCC 10+, Clang 10+. Users of prebuilt packages are unaffected. (#​27178)
  • CUDA minimum version raised to 12.0 — CUDA 11.x is no longer supported. Users pinned to CUDA 11.x should stay on ORT 1.24.x or upgrade their CUDA toolkit/driver. (#​27570)
  • ONNX upgraded to 1.21.0 (#​27601)
  • sympy is now an optional dependency for Python builds. (#​27200)
Execution Provider Changes
  • ArmNN EP has been removed. Users should remove any --use_armnn build flags and migrate to the MLAS/KleidiAI-backed CPU EP or QNN EP for Qualcomm hardware. (#​27447)
API Version

🔒 Security Fixes

  • Fixed potential integer truncation leading to heap out-of-bounds read/write (#​27544)
  • Addressed Pad Reflect vulnerability (#​27652)
  • Security fix for transpose optimizer (#​27555)
  • Upgraded minimatch 3.1.2 → 3.1.4 for CVE-2026-27904 (#​27667)
  • Hardened shell command handling for constant strings (#​27840)
  • Added validation of onnx::TensorProto data size before allocation (#​27547)
  • Cleaned up external data path validation (#​27539)
  • Fixed misaligned address reads for tensor attributes from raw data buffers (#​27312)
  • Fixed CPU Attention overflow issue (#​27822)
  • Fixed CPU LRN integer overflow issues (#​27886)
  • Additional input validation hardening:

✨ New Features

🔌 Execution Provider Plugin API & CUDA Plugin EP

ORT 1.25.0 introduces the CUDA Plugin EP — the first core implementation that enables third-party CUDA-backed EPs to be delivered as dynamically loaded plugins without rebuilding ORT.

  • CUDA Plugin EP: Core implementation (#​27816)
  • CUDA Plugin EP: BFC-style arena and CUDA mempool allocators for stream-aware memory management (#​27931)
  • Plugin EP Sync API for synchronous execution (#​27538)
  • Plugin EP event profiling APIs (#​27649)
  • Plugin EP APIs to retrieve ONNX operator schemas (#​27713)
  • Annotation-based graph partitioning with resource accounting (#​27595, #​27972)
  • EP API adapter improvements: header-only adapter, OpKernelInfo::GetConfigOptions, LoggingManager::HasDefaultLogger() (#​26879, #​26919, #​27540, #​27541, #​27587)
  • WebGPU EP made compatible with EP API (#​26907)
🔧 Core APIs
  • Per-session thread pool work callbacks API (#​27253)
  • enable_profiling in RunOptions (#​26846)
  • KernelInfo string-array attribute APIs for C and C++ (#​27599)
  • OrtModel input support for Compile API (#​27332)
  • Session config to create weightless EPContext models during compilation (#​27197)
  • Compiled model compatibility APIs in example plugin EP (#​27088)
  • Model Package support (preview): Initial infrastructure for automatically selecting compiled EPContext model variants from a packaged collection based on EP, device, and hardware constraints. The directory structure is not yet finalized. (#​27786)
📊 New ONNX Ops & Opset Coverage

🖥️ Execution Provider Updates

NVIDIA CUDA EP
  • GQA with XQA and quantized KV cache, including FP8 (E4M3) KV cache support (#​27246, #​27321)
  • CUDA graph capture compatibility for LLM ops and pre-compiled paths (#​27484, #​27477)
  • Volumetric (3-D) GridSample support (#​27201)
  • Optimized 3D nearest resize kernel for 5D tensors (#​27578)
  • Optional router_weights input to QMoE (#​27687)
NVIDIA TensorRT RTX EP
  • D3D12 external resource import support (#​26948)
Qualcomm QNN EP
  • Disabled file mapping for embedded cache (#​27627)
  • Fixed use-after-free of logger object (#​27804)
  • Fixed wheel build issues on WSL and Linux SDK version propagation (#​27730, #​27800)
Other EPs
  • VitisAI EP: Added PE version info to provider DLL (#​27626)
  • DML EP: Fixed overflow in DmlGraphFusionHelper::ProcessInputData (#​27815), fixed new-delete mismatch in QuantizeLinear (#​27823)

🌐 Web & JavaScript

WebGPU EP — Performance
WebGPU EP — New Op Support
WebGPU EP — Stability
  • Fixed device destroyed on session release breaking recreation (#​27634)
  • Fixed static destruction crash on exit (#​27470, #​27569)
  • Backward compat: Legacy WebGPU/WebNN memory info names are now accepted again (#​27637)
  • Deterministic Split-K handling (#​27086), buffer segment alignment fix (#​27853)
  • Binary size reduction for WebAssembly builds (#​27370, #​27371)
WebNN EP
Node.js & React Native
  • Fixed float16 tensor support in Node.js and React Native (#​27327, #​27549)
  • Added 16KB page size alignment for Android (required for Android 15+) (#​27523)

🧠 CPU & Core Optimizations

MLAS / KleidiAI / Quantization
  • KleidiAI BF16 SME2 kernel integration (#​26773), asymmetric 4-bit MatMulNBits on ARM64 (#​27751)
  • Fused Silu and Gelu kernels for AVX512 (#​27690)
  • Depthwise conv kernel for NCHW on AVX512 (#​27874)
  • ARM64 NCHWc NEON asm kernels (#​27099, #​27788), BF16 KAI SBGemm on NCHWc ARM (#​27703)
  • POWER10 Sgemm PackA optimization (#​27575)
  • Improved pre-packing for 2-bit LUT kernels (#​27131)
  • Backend kernel selector config in MLAS, allowing users to opt out of KleidiAI kernels on ARM platforms
    (#​27136)
DQ→MatMulNBits Fusion

Extended to cover significantly more quantized LLM inference scenarios on CPU:

  • 2-bit and 8-bit weights with Cast(fp16→fp32) patterns (#​27614)
  • FP16 models on CPU EP (#​27640), fp16 8-bit on ARM64 (#​27692)
  • Gemm + per-tensor/per-channel quantization (#​27769)
  • FP16 quantized weight compatibility: models with HQNBIT quantized weights now route through the FP32 MLAS path for broader CPU compatibility (#​27820)
Model Optimizer & Fusions
  • Qwen3 model type support and RotaryEmbedding fusion for Qwen3 RoPE patterns (#​27556, #​27590)
  • MobileClip attention fusion for both attention block patterns (#​27883)
  • Nemotron speech conformer encoder MHA fusion (#​27764)
  • Fixed GPT-2 no-past attention fusion for transformers ≥ 4.27 (#​27449)
  • Fixed BART attention fusion for SDPA pattern from transformers ≥ 4.49 (#​27458)
  • Pre-layer normalization support in attention fusion (#​27418)
  • SkipLayerNorm fusion with bias Add (#​27765), broadcasting skip shapes (#​27489)
  • SpaceToDepth fusion pattern (#​27747)
  • NCHWc transformer: more patterns and ONNX-domain Gelu/HardSigmoid activations (#​27691, #​27821)
  • Optimized qMoE code path for single-token execution (#​27383)
  • ONNX Attention KV cache optimization with ConcatNewToPast (#​27613)

🔌 Language Bindings

Python
  • Exposed OrtDeviceVendorId enum for vendor-aware OrtDevice aliases (#​27594)
  • Added bindings for GetCompatibilityInfoFromModel / GetCompatibilityInfoFromModelBytes (#​27565)
  • Fixed OrtValue.from_dlpack rejecting zero-size tensors as non-contiguous (#​27451)
C#
  • Added bindings for GetCompatibilityInfoFromModel / GetCompatibilityInfoFromModelBytes (#​27565)
Java
  • Avoid provider resource extraction when library already exists in onnxruntime.native.path (#​27668)

🐛 Bug Fixes

Critical Fixes
  • Fixed CPU Attention overflow issue (#​27822)
  • Fixed CPU LRN integer overflow issues (#​27886)
  • Fixed incorrect pad indices in AveragePool count_include_pad computation — silent correctness issue (#​27375)
  • Fixed integer division/modulo by zero in CPU EP Div and Mod operators (#​27693, #​27833)
  • Fixed non-ASCII Unicode model path crash (#​27724)
  • Fixed arithmetic overflow in Det operator (#​27070)
  • Fixed narrow-to-wide string conversion bugs in DLL load error reporting (#​27777)
Operator & Graph Fixes
  • Fixed 3D attention mask broadcasting in MHA (#​27464)
  • Fixed GQA shape inference for present outputs (#​27250)
  • Fixed Einsum bugs for reduction and empty input cases (#​27225, #​27226)
  • Prevented cross-EP Cast fusion in RemoveDuplicateCastTransformer (#​27363)
  • Fixed ConvTranspose bias input validation on CPU/CUDA (#​27209)
  • Fixed Cast node naming collisions in float16 conversion (#​27469)
  • Fixed concat/slice elimination and unsqueeze elimination against optional attrs and invalid models (#​27638)
  • Improved EPContext error message when node is not assigned to an EP (#​27474)
EP-Specific Fixes
  • Fixed MiGraphX EP double allocation (#​27551)
  • Fixed MLAS qgemm dispatch and kernel regressions in quantized conv tests (#​27671)
  • Fixed run-level profiling for subgraph operators (#​27870)
  • Fixed --build_wasm_static_lib implicitly enabling --build_wasm (#​27342)

🙏 Contributors

Thanks to our 72 contributors for this release!

@​tianleiwu, @​fs-eire, @​edgchen1, @​titaiwangms, @​hariharans29, @​eserscor, @​Rishi-Dave, @​guschmue, @​adrianlizarraga, @​jambayk, @​qjia7, @​skottmckay, @​adrastogi, @​sanaa-hamel-microsoft, @​yuslepukhin, @​ingyukoh, @​Jiawei-Shao, @​vraspar, @​xhcao, @​chilo-ms, @​Honry, @​JonathanC-ARM, @​kunal-vaishnavi, @​ShirasawaSama, @​chaya2350, @​derdeljan-msft, @​gedoensmax, @​HectorSVC, @​milpuz01, @​quic-calvnguy, @​xenova, @​akholodnamdcom, @​AlekseiNikiforovIBM, @​amd-genmingz, @​ashrit-ms, @​bachelor-dou, @​BODAPATIMAHESH, @​Colm-in-Arm, @​daijh, @​dodokw, @​fanchenkong1, @​ivarusic-amd, @​JanSellner, @​jchen10, @​jiafatom, @​jnagi-intel, @​johannes-rehm-snkeos, @​justinchuby, @​keshavv27, @​Kevin-Taha, @​kevinlam92, @​kpkbandi, @​Laan33, @​melkap01-Arm, @​miaobin, @​n-v-k, @​nico-martin, @​patryk-kaiser-ARM, @​praneshgo, @​prathikr, @​qc-tbhardwa, @​sagarbhure-msft, @​sdotpeng, @​the0cp, @​TsofnatMaman, @​umangb-09, @​walidbr, @​wenqinI, @​xadupre, @​xhan65, @​xiaofeihan1


Full Changelog: v1.24.4...v1.25.0


Configuration

📅 Schedule: (UTC)

  • Branch creation
    • Only on Sunday (* * * * 0)
  • Automerge
    • At any time (no schedule defined)

🚦 Automerge: Disabled by config. Please merge this manually once you are satisfied.

Rebasing: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox.

🔕 Ignore: Close this PR and you won't be reminded about this update again.


  • If you want to rebase/retry this PR, check this box

This PR has been generated by Renovate Bot

@solrbot solrbot added the exempt-stale Prevent a PR from going stale label Oct 17, 2025
@solrbot solrbot force-pushed the renovate/onnx branch 3 times, most recently from ed07788 to 4c47e95 Compare October 23, 2025 02:05
@solrbot solrbot changed the title Update dependency com.microsoft.onnxruntime:onnxruntime to v1.23.1 Update dependency com.microsoft.onnxruntime:onnxruntime to v1.23.2 Oct 30, 2025
@solrbot solrbot force-pushed the renovate/onnx branch 3 times, most recently from d77242d to b4ec439 Compare December 18, 2025 20:38
@solrbot solrbot changed the title Update dependency com.microsoft.onnxruntime:onnxruntime to v1.23.2 Update dependency com.microsoft.onnxruntime:onnxruntime to v1.24.1 Feb 10, 2026
@solrbot solrbot force-pushed the renovate/onnx branch 2 times, most recently from 6e75ae5 to d588b69 Compare February 24, 2026 20:50
@solrbot solrbot changed the title Update dependency com.microsoft.onnxruntime:onnxruntime to v1.24.1 Update dependency com.microsoft.onnxruntime:onnxruntime to v1.24.2 Feb 24, 2026
@solrbot solrbot changed the title Update dependency com.microsoft.onnxruntime:onnxruntime to v1.24.2 Update dependency com.microsoft.onnxruntime:onnxruntime to v1.24.3 Apr 20, 2026
@solrbot solrbot changed the title Update dependency com.microsoft.onnxruntime:onnxruntime to v1.24.3 Update dependency com.microsoft.onnxruntime:onnxruntime to v1.25.0 Apr 23, 2026
@solrbot solrbot force-pushed the renovate/onnx branch 3 times, most recently from 3278039 to 8d883c9 Compare April 28, 2026 11:00
@solrbot solrbot changed the title Update dependency com.microsoft.onnxruntime:onnxruntime to v1.25.0 Update dependency com.microsoft.onnxruntime:onnxruntime to v1.25.1 May 3, 2026
@solrbot solrbot changed the title Update dependency com.microsoft.onnxruntime:onnxruntime to v1.25.1 Update dependency com.microsoft.onnxruntime:onnxruntime to v1.26.0 May 13, 2026
@solrbot solrbot force-pushed the renovate/onnx branch 3 times, most recently from d4905bf to da3d1f1 Compare May 18, 2026 22:54
@epugh

epugh commented Jun 3, 2026

Copy link
Copy Markdown
Contributor

i dug in a bit on why onnx is part of core gradle.lockfile and found out that we have some maybe icky test structure. I will open a JIRA for this.

@epugh epugh merged commit 9e33867 into apache:main Jun 3, 2026
5 checks passed
epugh pushed a commit that referenced this pull request Jun 3, 2026
@solrbot solrbot deleted the renovate/onnx branch June 3, 2026 19:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

dependencies Dependency upgrades exempt-stale Prevent a PR from going stale module:sql tool:build

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants