[gfx1150] RT-DETR fp16 model: wrong GPU output (label flips) on develop; non-MLIR path aborts with INVALID_ISA

> File at: https://github.com/ROCm/AMDMIGraphX/issues/new

## Summary

On **gfx1150** an fp16 RT-DETR-style object-detection model produces incorrect results with the GPU target: `migraphx-driver verify --gpu` fails on every output, including the integer class-label output flipping to wrong classes. The only working code path on gfx1150 is MLIR — disabling MLIR aborts at runtime with `HSA_STATUS_ERROR_INVALID_ISA`, so the divergence cannot be A/B-isolated against the non-MLIR path on this device.

## Environment

| | |
|---|---|
| MIGraphX | `develop` @ `f54ca35` (2.16.0) |
| GPU | gfx1150 (AMD Ryzen AI 9 HX PRO 370 / Radeon 890M, Strix Point iGPU) |
| ROCm | 6.4.4-129 |
| OS | Ubuntu 24.04.3 LTS, kernel 6.14.0-29-generic |
| Model | RT-DETR-style object detector, fp16, 512x512 input (3 outputs: scores `{1,150}`, boxes `{1,150,4}`, labels `{1,150}`) |

## Repro

```
migraphx-driver verify object-detector-fast-512p-fp16.onnx --onnx --gpu
```

(Prerequisite: `develop` currently aborts earlier in `simplify_reshapes` for this model — see the note at the bottom.)

## Actual result (MLIR enabled, default)

```
RMS Error: 0.336279 | Max diff: 74 | Mismatch at 2: 11 != 2  # labels
RMS Error: 0.312404 | Max diff: 0.480432 | Mismatch at 0: -0.229004 != -0.225708  # float
```




# [gfx1150] RT-DETR fp16 model: wrong GPU output (label flips) on develop; non-MLIR path aborts with INVALID_ISA

> File at: https://github.com/ROCm/AMDMIGraphX/issues/new

## Summary

On **gfx1150** an fp16 RT-DETR-style object-detection model produces incorrect results with the GPU target: `migraphx-driver verify --gpu` fails on every output, including the integer class-label output flipping to wrong classes. The only working code path on gfx1150 is MLIR — disabling MLIR aborts at runtime with `HSA_STATUS_ERROR_INVALID_ISA`, so the divergence cannot be A/B-isolated against the non-MLIR path on this device.

## Environment

| | |
|---|---|
| MIGraphX | `develop` @ `f54ca35` (2.16.0) |
| GPU | gfx1150 (AMD Ryzen AI 9 HX PRO 370 / Radeon 890M, Strix Point iGPU) |
| ROCm | 6.4.4-129 |
| OS | Ubuntu 24.04.3 LTS, kernel 6.14.0-29-generic |
| Model | RT-DETR-style object detector, fp16, 512x512 input (3 outputs: scores `{1,150}`, boxes `{1,150,4}`, labels `{1,150}`) |

## Repro

```
migraphx-driver verify object-detector-fast-512p-fp16.onnx --onnx --gpu
```

(Prerequisite: `develop` currently aborts earlier in `simplify_reshapes` for this model — see the note at the bottom. The results below are with that compile-time abort worked around so the model reaches execution.)

## Actual result (MLIR enabled, default)

```
[ERROR] verify_args.cpp:56 FAILED: object-detector-fast-512p-fp16.onnx
[ERROR] verify_args.cpp:57 RMS Error: 0.336279
[ERROR] verify_args.cpp:68 Max diff: 74
[ERROR] verify_args.cpp:73 Mismatch at 2: 11 != 2          # labels output: GPU=11, ref=2

[ERROR] verify_args.cpp:56 FAILED: object-detector-fast-512p-fp16.onnx
[ERROR] verify_args.cpp:57 RMS Error: 0.312404
[ERROR] verify_args.cpp:68 Max diff: 0.480432
[ERROR] verify_args.cpp:73 Mismatch at 0: -0.229004 != -0.225708   # float output
```

The float outputs are mostly close (max abs diff ~0.48) but enough to flip low-confidence detection classes in the integer label output.

## What was ruled out

1. **`adjust_allocation` "output buffer doesn't match" warnings are benign.** During compile these fire for several `gpu::precompile_op` (pooling, MLIR convolutions, channelwise_conv). Instrumenting `src/adjust_allocation.cpp` shows every one is `alias = slice` with **identical lens, identical meaningful strides, and identical byte size** — they differ only in the stride of a **size-1 outer dimension**, e.g.

   ```
   alias_shape = half_type, {1,16,256,256}, {2097152,65536,256,1}
   ins_shape   = half_type, {1,16,256,256}, {1048576,65536,256,1}
   alias_bytes = 2097152   ins_bytes = 2097152
   ```

   These are legitimate "write into a concat slice" aliases; the pass correctly leaves them alone. They are **not** the cause of the wrong output. (Minor: the shape-equality check / warning could ignore size-1-dim strides to avoid the false positive.)

2. **Non-MLIR path is unusable on gfx1150.** With `MIGRAPHX_DISABLE_MLIR=1` the program aborts at runtime:

   ```
   :0:rocdevice.cpp :2992: Callback: Queue aborting with error : HSA_STATUS_ERROR_INVALID_ISA: The instruction set architecture is invalid. code: 0x100f
   ```

   So on gfx1150 the divergence cannot be compared against the non-MLIR fallback.

## Likely area

The remaining divergence is consistent with an fp16 codegen/accuracy issue in the MLIR-generated convolutions on gfx1150 (rather than a buffer/aliasing or shape bug in MIGraphX core). Pointers for triage: confirm whether the gfx1150 non-MLIR kernels are expected to be built (the `INVALID_ISA` suggests they are not), and check rocMLIR fp16 conv accuracy on gfx1150 vs gfx1100/gfx1101.

## Note: separate `simplify_reshapes` compile-time abort

This model also hits a separate `simplify_reshapes` abort on `develop` (`find_reshape_dot` building an invalid reshape: "Reshape: Wrong number of elements ... 524288 ... 32768"). Fix proposed in branch `ycastill2-amd:fix-reshape-dot-element-count`. That fix is required just to reach execution for this report.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[gfx1150] RT-DETR fp16 model: wrong GPU output (label flips) on develop; non-MLIR path aborts with INVALID_ISA #4996

Summary

Environment

Repro

Actual result (MLIR enabled, default)

[gfx1150] RT-DETR fp16 model: wrong GPU output (label flips) on develop; non-MLIR path aborts with INVALID_ISA

Summary

Environment

Repro

Actual result (MLIR enabled, default)

What was ruled out

Likely area

Note: separate `simplify_reshapes` compile-time abort

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development


MIGraphX	`develop` @ `f54ca35` (2.16.0)
GPU	gfx1150 (AMD Ryzen AI 9 HX PRO 370 / Radeon 890M, Strix Point iGPU)
ROCm	6.4.4-129
OS	Ubuntu 24.04.3 LTS, kernel 6.14.0-29-generic
Model	RT-DETR-style object detector, fp16, 512x512 input (3 outputs: scores `{1,150}`, boxes `{1,150,4}`, labels `{1,150}`)

Uh oh!

[gfx1150] RT-DETR fp16 model: wrong GPU output (label flips) on develop; non-MLIR path aborts with INVALID_ISA #4996

Description

Summary

Environment

Repro

Actual result (MLIR enabled, default)

[gfx1150] RT-DETR fp16 model: wrong GPU output (label flips) on develop; non-MLIR path aborts with INVALID_ISA

Summary

Environment

Repro

Actual result (MLIR enabled, default)

What was ruled out

Likely area

Note: separate simplify_reshapes compile-time abort

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

Note: separate `simplify_reshapes` compile-time abort