Problem
On mache 1:
run command "nvidia-smi", output:
NVIDIA-SMI 560.28.03 Driver Version: 560.28.03 CUDA Version: 12.6
then I run image NeMo2602 or NeMo2604 on the machine,and run "nvidia-smi" in container, the output is
NVIDIA-SMI 560.28.03 Driver Version: 560.28.03 CUDA Version: 12.6
then run "torch.cuda.is_available()", output is:
python
Python 3.12.3 (main, Jan 22 2026, 20:57:42) [GCC 13.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> print(torch.cuda.is_available())
/usr/local/lib/python3.12/dist-packages/torch/cuda/__init__.py:180: UserWarning: CUDA initialization: The NVIDIA driver on your system is too old (found version 12060). Please update your GPU driver by downloading and installing a new version from the URL: http://www.nvidia.com/Download/index.aspx Alternatively, go to: https://pytorch.org to install a PyTorch version that has been compiled with your version of the CUDA driver. (Triggered internally at /opt/pytorch/pytorch/c10/cuda/CUDAFunctions.cpp:119.)
return torch._C._cuda_getDeviceCount() > 0
False
>>> print(torch.version.cuda)
13.1
On mache 2
run command "nvidia-smi", output:
NVIDIA-SMI 550.90.12 Driver Version: 550.90.12 CUDA Version: 12.4
then I run image NeMo2602 or NeMo2604 on the machine,and run "nvidia-smi" in container, the output is
NVIDIA-SMI 550.90.12 Driver Version: 550.90.12 CUDA Version: 13.1
then run "torch.cuda.is_available()", output is:
import torch
>>> print(torch.cuda.is_available())
True
>>> print(torch.version.cuda)
13.1
Minimal repro
Older NeMo images (v25.09, v25.04) worked fine on any host CUDA (12.2, 12.4, or 12.6). The problem only started appearing with the 26.xx image
Expected behavior
the images worked fine on cuda 12.6, since it worked fine on cuda 12.4
Affected area
area:build
Regression?
Yes
Environment
No response
Logs
Problem
On mache 1:
run command "nvidia-smi", output:
then I run image NeMo2602 or NeMo2604 on the machine,and run "nvidia-smi" in container, the output is
then run "torch.cuda.is_available()", output is:
On mache 2
run command "nvidia-smi", output:
then I run image NeMo2602 or NeMo2604 on the machine,and run "nvidia-smi" in container, the output is
then run "torch.cuda.is_available()", output is:
Minimal repro
Older NeMo images (v25.09, v25.04) worked fine on any host CUDA (12.2, 12.4, or 12.6). The problem only started appearing with the 26.xx image
Expected behavior
the images worked fine on cuda 12.6, since it worked fine on cuda 12.4
Affected area
area:build
Regression?
Yes
Environment
No response
Logs