NCCL backend segfaults on new_comm when built against PyTorch source

When building torchcomms from source against a local PyTorch source build, any call to `new_comm("nccl", ..)` immediately segfaults inside `ncclCommInitRankConfig`. 

Repro - 
```
import faulthandler, torch, os
faulthandler.enable()

rank = int(os.environ["RANK"])
local_rank = int(os.environ["LOCAL_RANK"])
device = torch.device(f"cuda:{local_rank}")

torch.cuda.set_device(local_rank)
torch.cuda.init()
torch.zeros(1, device=device)

from torchcomms import new_comm
comm = new_comm("nccl", device, name="test_comm")  # SIGSEGV here

# torchrun --nproc_per_node=2 repro.py
```

Environment - 
- OS: Fedora 41 (Container Image)
- Python: 3.13.9
- PyTorch: 2.12.0a0+git54d8d2a (built from source)
- CUDA toolkit: 12.8
- CUDA driver: 580.82.07 (CUDA 13.0)
- NCCL: 2.28.9
- GPU: 2x NVIDIA H200
- torchcomms: 0.2.0 (built with USE_NCCLX=OFF USE_TRANSPORT=OFF)

acc claude - the issue is that the PyTorch-source-build path in the NCCL CMakeLists links `libnccl_static.a`, which bundles hidden-visibility stubs for `cudaGetDriverEntryPoint` that shadow the real `libcudart.so` at link time. NCCL's CUDA driver function pointers never get resolved, so `ncclCommInitRankConfig` segfaults on a NULL `cuCtxGetCurrent` call. Switching to `libnccl.so` (like the other build paths in CMakeLists currently do) fixes it. 

I have verified the above on my env and happy to submit the fix.

cc: @d4l3k 





Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

NCCL backend segfaults on new_comm when built against PyTorch source #2406

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

NCCL backend segfaults on new_comm when built against PyTorch source #2406

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions