Skip to content

DRAFT: OpenMP Target Build Scripts for El Cap#1937

Open
rchen20 wants to merge 5 commits intodevelopfrom
task/chen59/omptargetelcap
Open

DRAFT: OpenMP Target Build Scripts for El Cap#1937
rchen20 wants to merge 5 commits intodevelopfrom
task/chen59/omptargetelcap

Conversation

@rchen20
Copy link
Copy Markdown
Member

@rchen20 rchen20 commented Nov 4, 2025

Summary

  • This PR is a feature
  • It does the following:
    • Adds OpenMP Target build scripts for the El Cap platform at the request of me

@rchen20 rchen20 self-assigned this Nov 4, 2025
@rchen20
Copy link
Copy Markdown
Member Author

rchen20 commented Nov 4, 2025

The following linking error occurs:

[ 21%] Linking CXX executable ../../../test-forall-atomic-basic-OpenMPTarget.exe
cd /p/vast1/chen59/allraja/rajaomptarget/raja_git_omptargetelcap/build_lc_toss4-amdclang-omptarget-6.4.2-gfx942/test/functional/forall/atomic-basic && /usr/tce/backend/installations/linux-rhel8-x86_64/gcc-8.3.1/cmake-3.24.2-ywx52e32uh6gkxzuyubpwkulzgdvxyh6/bin/cmake -E cmake_link_script CMakeFiles/test-forall-atomic-basic-OpenMPTarget.exe.dir/link.txt --verbose=1
/opt/rocm-6.4.2/llvm/bin/amdclang++  -Wall -Wextra      --gcc-toolchain=/opt/rh/gcc-toolset-13/root/usr -O2 -fopenmp -fopenmp-targets=amdgcn-amd-amdhsa -Xopenmp-target=amdgcn-amd-amdhsa -march=gfx942 "CMakeFiles/test-forall-atomic-basic-OpenMPTarget.exe.dir/test-forall-atomic-basic-OpenMPTarget.cpp.o" -o ../../../test-forall-atomic-basic-OpenMPTarget.exe  ../../../../lib/libgtest_main.a ../../../../lib/libgtest.a -lpthread ../../../../lib/libRAJA.a -lpthread ../../../../lib/libcamp.a -ldl 
/opt/rocm-6.4.2/lib/llvm/bin/clang-offload-packager: warning: Multiple inputs match to a single file, '/var/tmp/chen59/libRAJA-openmp-amdgcn-amd-amdhsa-unpackaged-f17783.bc'
lld: error: undefined symbol: memcpy
>>> referenced by /var/tmp/chen59/libcamp-openmp-amdgcn-amd-amdhsa-2f0fcc.o:(__omp_offloading_4e_55a8c43a__ZN4RAJA6policy3omp11forall_implILm8ERNS_17TypedRangeSegmentIllEEZ25ForallAtomicBasicTestImplINS1_28omp_target_parallel_for_execILm8EEENS_11auto_atomicEN4camp9resources2v13OmpElS4_dEvT2_EUllE_NS_4expt15ForallParamPackIJEEEEENSt9enable_ifIXsr6all_ofINSG_11type_traits18is_ForallParamPackISE_EENSK_24is_ForallParamPack_emptyISE_EEEE5valueENSC_10EventProxyISD_EEE4typeESD_RKNS7_IXT_EEEOT0_OT1_SE__l137)
>>> referenced by /var/tmp/chen59/libcamp-openmp-amdgcn-amd-amdhsa-2f0fcc.o:(__omp_offloading_4e_55a8c43a__ZN4RAJA6policy3omp11forall_implILm8ERNS_17TypedRangeSegmentIllEEZ25ForallAtomicBasicTestImplINS1_28omp_target_parallel_for_execILm8EEENS_11auto_atomicEN4camp9resources2v13OmpElS4_dEvT2_EUllE_NS_4expt15ForallParamPackIJEEEEENSt9enable_ifIXsr6all_ofINSG_11type_traits18is_ForallParamPackISE_EENSK_24is_ForallParamPack_emptyISE_EEEE5valueENSC_10EventProxyISD_EEE4typeESD_RKNS7_IXT_EEEOT0_OT1_SE__l137)
>>> referenced by /var/tmp/chen59/libcamp-openmp-amdgcn-amd-amdhsa-2f0fcc.o:(__omp_offloading_4e_55a8c43a__ZN4RAJA6policy3omp11forall_implILm8ERNS_17TypedRangeSegmentIllEEZ25ForallAtomicBasicTestImplINS1_28omp_target_parallel_for_execILm8EEENS_11auto_atomicEN4camp9resources2v13OmpElS4_dEvT2_EUllE_NS_4expt15ForallParamPackIJEEEEENSt9enable_ifIXsr6all_ofINSG_11type_traits18is_ForallParamPackISE_EENSK_24is_ForallParamPack_emptyISE_EEEE5valueENSC_10EventProxyISD_EEE4typeESD_RKNS7_IXT_EEEOT0_OT1_SE__l137)
>>> referenced 969 more times
clang++: error: linker command failed with exit code 1 (use -v to see invocation)

After consultation, John G. found in his own tests that this error occurs when a memcpy occurs within an OpenMP Target region, which shouldn't be allowed within such a region. @trws or @MrBurmark would you have any insight to how this might happen?

@MrBurmark
Copy link
Copy Markdown
Member

It looks like this occurs within an atomic test. This might come from use of memcpy in the implementation of atomics for types that are not natively supported.

-DHIP_ROOT_DIR="/opt/rocm-${COMP_VER}/hip" \
-DHIP_PATH=/opt/rocm-${COMP_VER}/llvm/bin \
-DENABLE_CLANGFORMAT=On \
-DCLANGFORMAT_EXECUTABLE=/opt/rocm-5.2.3/llvm/bin/clang-format \
Copy link
Copy Markdown
Member

@rhornung67 rhornung67 Nov 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should point at regular clang for clang-format

set(CMAKE_CXX_FLAGS_RELEASE "-O2" CACHE STRING "")
set(CMAKE_CXX_FLAGS_RELWITHDEBINFO "-O2 -g" CACHE STRING "")
set(CMAKE_CXX_FLAGS_DEBUG "-O0 -g" CACHE STRING "")
set(CMAKE_CXX_FLAGS_RELEASE "--gcc-toolchain=/opt/rh/gcc-toolset-13/root/usr -O2" CACHE STRING "")
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should be using the hip_4 host-config file

@rhornung67
Copy link
Copy Markdown
Member

@rchen20 does the regular clang compiler work?

Looking through the code, I don't see anything obviously wrong. Could the memcpy be related to the camp EventProxy?

@rchen20
Copy link
Copy Markdown
Member Author

rchen20 commented Nov 4, 2025

@rchen20 does the regular clang compiler work?

The regular clang compiler is not available on our El Cap platforms, so I should remove that particular script.

Looking through the code, I don't see anything obviously wrong. Could the memcpy be related to the camp EventProxy?

I don't see any memcpy's in Camp::EventProxy, but there are implementations of memcpy in Camp::resource. After removing those implementations, the same memcpy linking error occurs. I may need to do as Jason suggests and dig in to our atomic implementations and see if the memcpy is coming from there.

@rhornung67
Copy link
Copy Markdown
Member

@rchen20 I skimmed the RAJA atomics and did not see any memcpy calls. A thought....what happens if you remove the CAS atomic check here https://github.com/LLNL/RAJA/blob/develop/test/functional/forall/atomic-basic/tests/test-forall-atomic-basic.hpp#L105?

@rchen20
Copy link
Copy Markdown
Member Author

rchen20 commented Nov 4, 2025

@rchen20 I skimmed the RAJA atomics and did not see any memcpy calls. A thought....what happens if you remove the CAS atomic check here https://github.com/LLNL/RAJA/blob/develop/test/functional/forall/atomic-basic/tests/test-forall-atomic-basic.hpp#L105?

That's a good idea, but it still gave the same error after removing the CAS operation.

@rhornung67
Copy link
Copy Markdown
Member

Maybe try eliminating entries in the list of data types we test to see if that can help pin down the issue?

Comment on lines +36 to +47
void * custom_memcpy( void * dest, const void * src, size_t len )
{
char * customdest = (char *) dest;
const char * customsrc = (const char *) src;

for ( size_t ii = 0; ii < len; ++ii )
{
customdest[ii] = customsrc[ii];
}

return dest;
}
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We shouldn't need to do this. Asking AMD about this.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you hear anything back? It shouldn't be required, makes me wonder if there's a header or builtin missing that's causing problems.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They reproduced it a couple weeks ago and are looking in to solving it. The Cray compiler has these symbols in libu.a, where they apparently keep a hodgepodge collection of these implementations. I don't know where AMD's implementations lie, but they probably just need to add these memory functions in somewhere.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @rchen20. If you don't get satisfaction on this, let me know. I'm going to try and mention it on our checkin with AMD engineering this afternoon to keep it on their radar.

@jonesholger
Copy link
Copy Markdown
Contributor

No luck with rocm 7.2.1 either. Still has memcpy symbol error device side. ROCM omptarget is still a bit basic that way - no requirement to expose memcpy, So custom_memcpy is reasonable.

AFAIK cuda/hip "lowers" memcpy to intrinsic __builtin so I locally modified the reinterp_A_as_B :

template<typename A, typename B>
RAJA_INLINE RAJA_HOST_DEVICE constexpr B reinterp_A_as_B(A const& a)
{
  static_assert(sizeof(A) == sizeof(B), "A and B must be the same size");

  B b;
#if defined(__AMDGCN__) && defined(_OPENMP)
  __builtin_memcpy(&b, &a, sizeof(A));
#else
  memcpy(&b, &a, sizeof(A));
#endif
  return b;
}

but more worrying is test on tuo: ctest -j 16 -R OpenMPTarget
67% tests passed, 20 tests failed out of 60

Total Test time (real) = 1503.94 sec

The following tests FAILED:
94 - test-forall-IndexSet-OpenMPTarget.exe (Subprocess aborted)
100 - test-forall-IndexSetView-OpenMPTarget.exe (Subprocess aborted)
109 - test-forall-RangeSegment-OpenMPTarget.exe (Failed)
110 - test-forall-RangeStrideSegment-OpenMPTarget.exe (Subprocess aborted)
198 - test-forall-ResourceIndexSet-OpenMPTarget.exe (Subprocess aborted)
207 - test-forall-resource-RangeSegment-OpenMPTarget.exe (Failed)
208 - test-forall-resource-RangeStrideSegment-OpenMPTarget.exe (Subprocess aborted)
217 - test-forall-atomic-basic-OpenMPTarget.exe (Timeout)
223 - test-forall-AtomicMultiView-OpenMPTarget.exe (Subprocess aborted)
237 - test-forall-AtomicRefAdd-OpenMPTarget.exe (Subprocess aborted)
238 - test-forall-AtomicRefSub-OpenMPTarget.exe (Subprocess aborted)
239 - test-forall-AtomicRefLoadStore-OpenMPTarget.exe (Subprocess aborted)
240 - test-forall-AtomicRefCAS-OpenMPTarget.exe (Subprocess aborted)
241 - test-forall-AtomicRefMinMax-OpenMPTarget.exe (Timeout)
242 - test-forall-AtomicRefLogical-OpenMPTarget.exe (Subprocess aborted)
254 - test-kernel-resource-basic-single-loop-Segments-OpenMPTarget.exe (Subprocess aborted)
255 - test-kernel-basic-single-loop-Segments-OpenMPTarget.exe (Subprocess aborted)
258 - test-kernel-basic-single-icount-loop-OpenMPTarget.exe (Subprocess aborted)
715 - test-resource-Depends-OpenMPTarget.exe (Failed)
716 - test-resource-MultiStream-OpenMPTarget.exe (Subprocess aborted)

@jonesholger
Copy link
Copy Markdown
Contributor

Interesting: run one test
[ RUN ] OpenMPTarget/ForallIndexSetTest/1.IndexSetForall
AMDGPU fatal error 1: "unknown or internal error" received error in queue 0x1555553f2000: HSA_STATUS_ERROR_MEMORY_APERTURE_VIOLATION: The agent attempted to access memory beyond the largest legal address.
Aborted (core dumped)


Put it under a microscope
LIBOMPTARGET_DEBUG=1 LIBOMPTARGET_INFO=16 LIBOMPTARGET_PLUGIN=AMDGPU ./test-forall-IndexSet-OpenMPTarget.exe

[ OK ] OpenMPTarget/ForallIndexSetTest/5.IndexSetForall (5 ms)
[----------] 1 test from OpenMPTarget/ForallIndexSetTest/5 (5 ms total)

[----------] Global test environment tear-down
[==========] 6 tests from 6 test suites ran. (166 ms total)
[ PASSED ] 6 tests.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants