Skip to content

Running 32 qubits with Lightning-Kokkos on AMDGPU fails to allocate memory #1338

@josephleekl

Description

@josephleekl

On MI300, when executing a simple/empty circuit with >=32qubits, Kokkos hangs at allocating the statevector. Running with AMD_LOG_LEVEL=5 we can see

:3:hip_module.cpp           :825 : 1898311765 us: [pid:20227 tid: 0x714957d1e140] [32m hipLaunchKernel ( 0x7107123dfe80, {4194304,1,1}, {1,1024,1}, 0x7ffe68498760, 0, 0xc0b3630 ) [0m
:5:hip_device.cpp           :192 : 1898311769 us: [pid:20227 tid: 0x714957d1e140] Waiting on nullstream 0xc87c2f0
:3:rocdevice.cpp            :2805: 1898311771 us: [pid:20227 tid: 0x714957d1e140] Check HW event = 0x7147f93f7f80
:5:command.cpp              :355 : 1898311773 us: [pid:20227 tid: 0x714957d1e140] Command (KernelExecution) enqueued: 0xca68360 to queue: 0xc0b3630
:4:rocvirtual.cpp           :975 : 1898311781 us: [pid:20227 tid: 0x714957d1e140] Arg0:   = 0xd0 aa 0b 0c 00 00 00 00 a0 c5 0b 0c 00 00 00 00 00 01 80 f7 f6 70 00 00 00 00 00 00 01 00 00 00 d0 87 49 68 fe 7f 00 00 05 00 00 00 00 00 00 00 64 61 74 61 5f 00 00 00 00 00 00 00 00 00 00 00 ... (size:0x78)
:3:rocvirtual.cpp           :3596: 1898311793 us: [pid:20227 tid: 0x714957d1e140] ShaderName : void Kokkos::Impl::hip_parallel_launch_local_memory<Kokkos::Impl::ParallelFor<Kokkos::Impl::ViewValueFunctor<Kokkos::Device<Kokkos::HIP, Kokkos::HIPSpace>, Kokkos::complex<double> >, Kokkos::RangePolicy<Kokkos::HIP, Kokkos::IndexType<long>, Kokkos::Impl::ViewValueFunctor<Kokkos::Device<Kokkos::HIP, Kokkos::HIPSpace>, Kokkos::complex<double> >::ConstructTag>, Kokkos::HIP>, 1024u, 1u>(Kokkos::Impl::ParallelFor<Kokkos::Impl::ViewValueFunctor<Kokkos::Device<Kokkos::HIP, Kokkos::HIPSpace>, Kokkos::complex<double> >, Kokkos::RangePolicy<Kokkos::HIP, Kokkos::IndexType<long>, Kokkos::Impl::ViewValueFunctor<Kokkos::Device<Kokkos::HIP, Kokkos::HIPSpace>, Kokkos::complex<double> >::ConstructTag>, Kokkos::HIP>)
:5:rocvirtual.cpp           :3792: 1898311797 us: [pid:20227 tid: 0x714957d1e140] KernargSegmentByteSize = 376 KernargSegmentAlignment = 128
:4:rocvirtual.cpp           :1177: 1898311801 us: [pid:20227 tid: 0x714957d1e140] SWq=0x7137e7afa000, HWq=0x710704c00000, id=4, Dispatch Header = 0xb02 (type=2, barrier=1, acquire=1, release=1), setup=3, grid=[4194304, 1024, 1], workgroup=[1, 1024, 1], private_seg_size=0, group_seg_size=0, kernel_obj=0x7127e776ea80, kernarg_address=0x7106f9400000, completion_signal=0x0, correlation_id=0, rptr=3, wptr=3
:3:hip_module.cpp           :826 : 1898311806 us: [pid:20227 tid: 0x714957d1e140] hipLaunchKernel: Returned hipSuccess : : duration: 41 us
:3:hip_stream.cpp           :403 : 1898311810 us: [pid:20227 tid: 0x714957d1e140] [32m hipStreamSynchronize ( 0xc0b3630 ) [0m
:5:commandqueue.cpp         :185 : 1898311812 us: [pid:20227 tid: 0x714957d1e140] finish() called with batch size: 1, cpu_wait: 0, fence dirty: 1
:5:command.cpp              :355 : 1898311814 us: [pid:20227 tid: 0x714957d1e140] Command (InternalMarker) enqueued: 0xc901140 to queue: 0xc0b3630
:4:rocvirtual.cpp           :1554: 1898311817 us: [pid:20227 tid: 0x714957d1e140] SWq=0x7137e7afa000, HWq=0x710704c00000, id=4, BarrierAND Header = 0x1503 (type=3, barrier=1, acquire=2, release=2), dep_signal=[0x0, 0x0, 0x0, 0x0, 0x0], completion_signal=0x7147f93f4d00, rptr=3, wptr=4
:4:rocvirtual.cpp           :708 : 1898311819 us: [pid:20227 tid: 0x714957d1e140] Host wait on completion_signal=0x7147f93f4d00

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions