On MI300, when executing a simple/empty circuit with >=32qubits, Kokkos hangs at allocating the statevector. Running with AMD_LOG_LEVEL=5 we can see
:3:hip_module.cpp :825 : 1898311765 us: [pid:20227 tid: 0x714957d1e140] [32m hipLaunchKernel ( 0x7107123dfe80, {4194304,1,1}, {1,1024,1}, 0x7ffe68498760, 0, 0xc0b3630 ) [0m
:5:hip_device.cpp :192 : 1898311769 us: [pid:20227 tid: 0x714957d1e140] Waiting on nullstream 0xc87c2f0
:3:rocdevice.cpp :2805: 1898311771 us: [pid:20227 tid: 0x714957d1e140] Check HW event = 0x7147f93f7f80
:5:command.cpp :355 : 1898311773 us: [pid:20227 tid: 0x714957d1e140] Command (KernelExecution) enqueued: 0xca68360 to queue: 0xc0b3630
:4:rocvirtual.cpp :975 : 1898311781 us: [pid:20227 tid: 0x714957d1e140] Arg0: = 0xd0 aa 0b 0c 00 00 00 00 a0 c5 0b 0c 00 00 00 00 00 01 80 f7 f6 70 00 00 00 00 00 00 01 00 00 00 d0 87 49 68 fe 7f 00 00 05 00 00 00 00 00 00 00 64 61 74 61 5f 00 00 00 00 00 00 00 00 00 00 00 ... (size:0x78)
:3:rocvirtual.cpp :3596: 1898311793 us: [pid:20227 tid: 0x714957d1e140] ShaderName : void Kokkos::Impl::hip_parallel_launch_local_memory<Kokkos::Impl::ParallelFor<Kokkos::Impl::ViewValueFunctor<Kokkos::Device<Kokkos::HIP, Kokkos::HIPSpace>, Kokkos::complex<double> >, Kokkos::RangePolicy<Kokkos::HIP, Kokkos::IndexType<long>, Kokkos::Impl::ViewValueFunctor<Kokkos::Device<Kokkos::HIP, Kokkos::HIPSpace>, Kokkos::complex<double> >::ConstructTag>, Kokkos::HIP>, 1024u, 1u>(Kokkos::Impl::ParallelFor<Kokkos::Impl::ViewValueFunctor<Kokkos::Device<Kokkos::HIP, Kokkos::HIPSpace>, Kokkos::complex<double> >, Kokkos::RangePolicy<Kokkos::HIP, Kokkos::IndexType<long>, Kokkos::Impl::ViewValueFunctor<Kokkos::Device<Kokkos::HIP, Kokkos::HIPSpace>, Kokkos::complex<double> >::ConstructTag>, Kokkos::HIP>)
:5:rocvirtual.cpp :3792: 1898311797 us: [pid:20227 tid: 0x714957d1e140] KernargSegmentByteSize = 376 KernargSegmentAlignment = 128
:4:rocvirtual.cpp :1177: 1898311801 us: [pid:20227 tid: 0x714957d1e140] SWq=0x7137e7afa000, HWq=0x710704c00000, id=4, Dispatch Header = 0xb02 (type=2, barrier=1, acquire=1, release=1), setup=3, grid=[4194304, 1024, 1], workgroup=[1, 1024, 1], private_seg_size=0, group_seg_size=0, kernel_obj=0x7127e776ea80, kernarg_address=0x7106f9400000, completion_signal=0x0, correlation_id=0, rptr=3, wptr=3
:3:hip_module.cpp :826 : 1898311806 us: [pid:20227 tid: 0x714957d1e140] hipLaunchKernel: Returned hipSuccess : : duration: 41 us
:3:hip_stream.cpp :403 : 1898311810 us: [pid:20227 tid: 0x714957d1e140] [32m hipStreamSynchronize ( 0xc0b3630 ) [0m
:5:commandqueue.cpp :185 : 1898311812 us: [pid:20227 tid: 0x714957d1e140] finish() called with batch size: 1, cpu_wait: 0, fence dirty: 1
:5:command.cpp :355 : 1898311814 us: [pid:20227 tid: 0x714957d1e140] Command (InternalMarker) enqueued: 0xc901140 to queue: 0xc0b3630
:4:rocvirtual.cpp :1554: 1898311817 us: [pid:20227 tid: 0x714957d1e140] SWq=0x7137e7afa000, HWq=0x710704c00000, id=4, BarrierAND Header = 0x1503 (type=3, barrier=1, acquire=2, release=2), dep_signal=[0x0, 0x0, 0x0, 0x0, 0x0], completion_signal=0x7147f93f4d00, rptr=3, wptr=4
:4:rocvirtual.cpp :708 : 1898311819 us: [pid:20227 tid: 0x714957d1e140] Host wait on completion_signal=0x7147f93f4d00
On MI300, when executing a simple/empty circuit with >=32qubits, Kokkos hangs at allocating the statevector. Running with
AMD_LOG_LEVEL=5we can see