Skip to content

Failing CUDA tests#861

Open
luraess wants to merge 10 commits into
masterfrom
lr/fix-tests
Open

Failing CUDA tests#861
luraess wants to merge 10 commits into
masterfrom
lr/fix-tests

Conversation

@luraess

@luraess luraess commented Aug 12, 2024

Copy link
Copy Markdown
Contributor

Also, bump AMDGPU to 1.0 release (addresses #860 also for tests)
[EDIT]
AMDGPU compat handled in #869 to focus on CUDA tests here

@luraess

luraess commented Aug 12, 2024

Copy link
Copy Markdown
Contributor Author
  • Seems that ubuntu-latest test for test_datatype.jl fail for nightly on both default and openmpi-jll.
  • CUDA tests on >= v1.9 struggle with test_allreduce and test_allgather and more collectives (tried excluding them for now).

@luraess

luraess commented Aug 13, 2024

Copy link
Copy Markdown
Contributor Author

On CPU, we get failing test in Ubuntu-latest, [EDIT] Julia 1.9 and 1.10, for PrimitiveType = Primitive80 on:

Any hint what could go wrong there?

@giordano

Copy link
Copy Markdown
Member

#853

@giordano

Copy link
Copy Markdown
Member

On CPU, we get failing test in Ubuntu-latest, Julia 1.9 and 1.10

Where do you see failures with julia 1.9 and 1.10? It looks to me only Julia nightly is failing

@luraess

luraess commented Aug 13, 2024

Copy link
Copy Markdown
Contributor Author

Where do you see failures with julia 1.9 and 1.10? It looks to me only Julia nightly is failing

Correct, only nightly is failing. 1.9 and 1.10 fail with CUDA MPI

@giordano giordano requested a review from vchuravy August 13, 2024 10:09
@luraess

luraess commented Aug 13, 2024

Copy link
Copy Markdown
Contributor Author

Now CUDA tests segfault on test_basic.jl https://buildkite.com/julialang/mpi-dot-jl/builds/1520#01914b09-c528-4d9b-9c31-d8273912270d/286-489, which suggests it's not related to collective but to something else that brakes CUDA-aware MPI in CI. I will revert the excluded tests and one would need to dig further.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants