Collecting environment information...
PyTorch version: 2.13.0.dev20260422+xpu
Is debug build: False
CUDA used to build PyTorch: None
ROCM used to build PyTorch: N/A
OS: Ubuntu 24.04.2 LTS (x86_64)
GCC version: (Ubuntu 13.3.0-6ubuntu2~24.04.1) 13.3.0
Clang version: 18.1.3 (1ubuntu1)
CMake version: Could not collect
Libc version: glibc-2.39
Python version: 3.10.20 | packaged by conda-forge | (main, Mar 5 2026, 16:42:22) [GCC 14.3.0] (64-bit runtime)
Python platform: Linux-6.8.0-110-generic-x86_64-with-glibc2.39
Is CUDA available: False
CUDA runtime version: No CUDA
CUDA_MODULE_LOADING set to: N/A
GPU models and configuration: No CUDA
Nvidia driver version: No CUDA
cuDNN version: No CUDA
Is XPU available: True
XPU used to build PyTorch: 20250302
Intel GPU driver version:
* intel-opencl-icd: 25.18.33578.51-1146~24.04
* libze1: 1.24.0.0-1146~24.04
Intel GPU models onboard:
* Intel(R) Data Center GPU Max 1550
Intel GPU models detected:
* [0] _XpuDeviceProperties(name='Intel(R) Data Center GPU Max 1550', platform_name='Intel(R) oneAPI Unified Runtime over Level-Zero', type='gpu', device_id=0xBD5, uuid=8680d50b-2f00-0000-9a00-000000000001, driver_version='1.6.33578+51', total_memory=65520MB, local_mem_size=128KB, max_compute_units=512, memory_clock_rate=3200MHz, memory_bus_width=64-bit, gpu_eu_count=512, gpu_subslice_count=64, max_work_group_size=1024, max_num_sub_groups=64, sub_group_sizes=[16 32], has_fp16=1, has_fp64=1, has_atomic64=1)
* [1] _XpuDeviceProperties(name='Intel(R) Data Center GPU Max 1550', platform_name='Intel(R) oneAPI Unified Runtime over Level-Zero', type='gpu', device_id=0xBD5, uuid=8680d50b-2f00-0000-9a00-000000000002, driver_version='1.6.33578+51', total_memory=65520MB, local_mem_size=128KB, max_compute_units=512, memory_clock_rate=3200MHz, memory_bus_width=64-bit, gpu_eu_count=512, gpu_subslice_count=64, max_work_group_size=1024, max_num_sub_groups=64, sub_group_sizes=[16 32], has_fp16=1, has_fp64=1, has_atomic64=1)
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True
Caching allocator config: N/A
Versions of relevant libraries:
[pip3] dpcpp-cpp-rt==2025.3.2
[pip3] impi-rt==2021.17.2
[pip3] intel-cmplr-lib-rt==2025.3.2
[pip3] intel-cmplr-lib-ur==2025.3.2
[pip3] intel-cmplr-lic-rt==2025.3.2
[pip3] intel-opencl-rt==2025.3.2
[pip3] intel-openmp==2025.3.2
[pip3] intel-pti==0.16.0
[pip3] intel-sycl-rt==2025.3.2
[pip3] mkl==2025.3.1
[pip3] numpy==2.2.6
[pip3] oneccl==2021.17.2
[pip3] oneccl-devel==2021.17.2
[pip3] onemkl-license==2025.3.1
[pip3] onemkl-sycl-blas==2025.3.1
[pip3] onemkl-sycl-dft==2025.3.1
[pip3] onemkl-sycl-lapack==2025.3.1
[pip3] onemkl-sycl-rng==2025.3.1
[pip3] onemkl-sycl-sparse==2025.3.1
[pip3] tbb==2022.3.1
[pip3] tcmlib==1.4.1
[pip3] torch==2.13.0.dev20260422+xpu
[pip3] torchaudio==2.11.0.dev20260422+xpu
[pip3] torchvision==0.27.0.dev20260422+xpu
[pip3] triton-xpu==3.7.1+git21033c4e
[pip3] umf==1.0.3
🐛 Describe the bug
On torch 2.13.0.dev20260422+xpu,
torch.compile(..., backend='inductor')on XPU returns the wrong result foradaptive_avg_pool2d(x, 7).flatten(1).sum(dim=-1)while both CPU and XPU eager are correct.The repro matches the backend-semantic pattern fixed upstream by the Inductor contiguous-check / exact-stride patch from pytorch/pytorch#180898.
Reproducer:
Observed output:
Additional context:
torch.compileproduces wrong results when fusingadaptive_avg_pool2dwithflatten + sumpytorch/pytorch#180848torch.compileproduces wrong results foradaptive_avg_pool2d+flatten+sumfusion pytorch/pytorch#180956scripts/run_with_xpu_python.sh artifacts/latest-repro.pyVersions
Collected with torch/utils/collect_env.py