File at: https://github.com/ROCm/AMDMIGraphX/issues/new
Summary
On gfx1150 an fp16 RT-DETR-style object-detection model produces incorrect results with the GPU target: migraphx-driver verify --gpu fails on every output, including the integer class-label output flipping to wrong classes. The only working code path on gfx1150 is MLIR — disabling MLIR aborts at runtime with HSA_STATUS_ERROR_INVALID_ISA, so the divergence cannot be A/B-isolated against the non-MLIR path on this device.
Environment
|
|
| MIGraphX |
develop @ f54ca35 (2.16.0) |
| GPU |
gfx1150 (AMD Ryzen AI 9 HX PRO 370 / Radeon 890M, Strix Point iGPU) |
| ROCm |
6.4.4-129 |
| OS |
Ubuntu 24.04.3 LTS, kernel 6.14.0-29-generic |
| Model |
RT-DETR-style object detector, fp16, 512x512 input (3 outputs: scores {1,150}, boxes {1,150,4}, labels {1,150}) |
Repro
migraphx-driver verify object-detector-fast-512p-fp16.onnx --onnx --gpu
(Prerequisite: develop currently aborts earlier in simplify_reshapes for this model — see the note at the bottom.)
Actual result (MLIR enabled, default)
RMS Error: 0.336279 | Max diff: 74 | Mismatch at 2: 11 != 2 # labels
RMS Error: 0.312404 | Max diff: 0.480432 | Mismatch at 0: -0.229004 != -0.225708 # float
[gfx1150] RT-DETR fp16 model: wrong GPU output (label flips) on develop; non-MLIR path aborts with INVALID_ISA
File at: https://github.com/ROCm/AMDMIGraphX/issues/new
Summary
On gfx1150 an fp16 RT-DETR-style object-detection model produces incorrect results with the GPU target: migraphx-driver verify --gpu fails on every output, including the integer class-label output flipping to wrong classes. The only working code path on gfx1150 is MLIR — disabling MLIR aborts at runtime with HSA_STATUS_ERROR_INVALID_ISA, so the divergence cannot be A/B-isolated against the non-MLIR path on this device.
Environment
|
|
| MIGraphX |
develop @ f54ca35 (2.16.0) |
| GPU |
gfx1150 (AMD Ryzen AI 9 HX PRO 370 / Radeon 890M, Strix Point iGPU) |
| ROCm |
6.4.4-129 |
| OS |
Ubuntu 24.04.3 LTS, kernel 6.14.0-29-generic |
| Model |
RT-DETR-style object detector, fp16, 512x512 input (3 outputs: scores {1,150}, boxes {1,150,4}, labels {1,150}) |
Repro
migraphx-driver verify object-detector-fast-512p-fp16.onnx --onnx --gpu
(Prerequisite: develop currently aborts earlier in simplify_reshapes for this model — see the note at the bottom. The results below are with that compile-time abort worked around so the model reaches execution.)
Actual result (MLIR enabled, default)
[ERROR] verify_args.cpp:56 FAILED: object-detector-fast-512p-fp16.onnx
[ERROR] verify_args.cpp:57 RMS Error: 0.336279
[ERROR] verify_args.cpp:68 Max diff: 74
[ERROR] verify_args.cpp:73 Mismatch at 2: 11 != 2 # labels output: GPU=11, ref=2
[ERROR] verify_args.cpp:56 FAILED: object-detector-fast-512p-fp16.onnx
[ERROR] verify_args.cpp:57 RMS Error: 0.312404
[ERROR] verify_args.cpp:68 Max diff: 0.480432
[ERROR] verify_args.cpp:73 Mismatch at 0: -0.229004 != -0.225708 # float output
The float outputs are mostly close (max abs diff ~0.48) but enough to flip low-confidence detection classes in the integer label output.
What was ruled out
-
adjust_allocation "output buffer doesn't match" warnings are benign. During compile these fire for several gpu::precompile_op (pooling, MLIR convolutions, channelwise_conv). Instrumenting src/adjust_allocation.cpp shows every one is alias = slice with identical lens, identical meaningful strides, and identical byte size — they differ only in the stride of a size-1 outer dimension, e.g.
alias_shape = half_type, {1,16,256,256}, {2097152,65536,256,1}
ins_shape = half_type, {1,16,256,256}, {1048576,65536,256,1}
alias_bytes = 2097152 ins_bytes = 2097152
These are legitimate "write into a concat slice" aliases; the pass correctly leaves them alone. They are not the cause of the wrong output. (Minor: the shape-equality check / warning could ignore size-1-dim strides to avoid the false positive.)
-
Non-MLIR path is unusable on gfx1150. With MIGRAPHX_DISABLE_MLIR=1 the program aborts at runtime:
:0:rocdevice.cpp :2992: Callback: Queue aborting with error : HSA_STATUS_ERROR_INVALID_ISA: The instruction set architecture is invalid. code: 0x100f
So on gfx1150 the divergence cannot be compared against the non-MLIR fallback.
Likely area
The remaining divergence is consistent with an fp16 codegen/accuracy issue in the MLIR-generated convolutions on gfx1150 (rather than a buffer/aliasing or shape bug in MIGraphX core). Pointers for triage: confirm whether the gfx1150 non-MLIR kernels are expected to be built (the INVALID_ISA suggests they are not), and check rocMLIR fp16 conv accuracy on gfx1150 vs gfx1100/gfx1101.
Note: separate simplify_reshapes compile-time abort
This model also hits a separate simplify_reshapes abort on develop (find_reshape_dot building an invalid reshape: "Reshape: Wrong number of elements ... 524288 ... 32768"). Fix proposed in branch ycastill2-amd:fix-reshape-dot-element-count. That fix is required just to reach execution for this report.
Summary
On gfx1150 an fp16 RT-DETR-style object-detection model produces incorrect results with the GPU target:
migraphx-driver verify --gpufails on every output, including the integer class-label output flipping to wrong classes. The only working code path on gfx1150 is MLIR — disabling MLIR aborts at runtime withHSA_STATUS_ERROR_INVALID_ISA, so the divergence cannot be A/B-isolated against the non-MLIR path on this device.Environment
develop@f54ca35(2.16.0){1,150}, boxes{1,150,4}, labels{1,150})Repro
(Prerequisite:
developcurrently aborts earlier insimplify_reshapesfor this model — see the note at the bottom.)Actual result (MLIR enabled, default)
[gfx1150] RT-DETR fp16 model: wrong GPU output (label flips) on develop; non-MLIR path aborts with INVALID_ISA
Summary
On gfx1150 an fp16 RT-DETR-style object-detection model produces incorrect results with the GPU target:
migraphx-driver verify --gpufails on every output, including the integer class-label output flipping to wrong classes. The only working code path on gfx1150 is MLIR — disabling MLIR aborts at runtime withHSA_STATUS_ERROR_INVALID_ISA, so the divergence cannot be A/B-isolated against the non-MLIR path on this device.Environment
develop@f54ca35(2.16.0){1,150}, boxes{1,150,4}, labels{1,150})Repro
(Prerequisite:
developcurrently aborts earlier insimplify_reshapesfor this model — see the note at the bottom. The results below are with that compile-time abort worked around so the model reaches execution.)Actual result (MLIR enabled, default)
The float outputs are mostly close (max abs diff ~0.48) but enough to flip low-confidence detection classes in the integer label output.
What was ruled out
adjust_allocation"output buffer doesn't match" warnings are benign. During compile these fire for severalgpu::precompile_op(pooling, MLIR convolutions, channelwise_conv). Instrumentingsrc/adjust_allocation.cppshows every one isalias = slicewith identical lens, identical meaningful strides, and identical byte size — they differ only in the stride of a size-1 outer dimension, e.g.These are legitimate "write into a concat slice" aliases; the pass correctly leaves them alone. They are not the cause of the wrong output. (Minor: the shape-equality check / warning could ignore size-1-dim strides to avoid the false positive.)
Non-MLIR path is unusable on gfx1150. With
MIGRAPHX_DISABLE_MLIR=1the program aborts at runtime:So on gfx1150 the divergence cannot be compared against the non-MLIR fallback.
Likely area
The remaining divergence is consistent with an fp16 codegen/accuracy issue in the MLIR-generated convolutions on gfx1150 (rather than a buffer/aliasing or shape bug in MIGraphX core). Pointers for triage: confirm whether the gfx1150 non-MLIR kernels are expected to be built (the
INVALID_ISAsuggests they are not), and check rocMLIR fp16 conv accuracy on gfx1150 vs gfx1100/gfx1101.Note: separate
simplify_reshapescompile-time abortThis model also hits a separate
simplify_reshapesabort ondevelop(find_reshape_dotbuilding an invalid reshape: "Reshape: Wrong number of elements ... 524288 ... 32768"). Fix proposed in branchycastill2-amd:fix-reshape-dot-element-count. That fix is required just to reach execution for this report.