fix(models): audit fp32 protected tensors by yuhezhang-ai · Pull Request #2598 · NVIDIA-NeMo/Automodel

yuhezhang-ai · 2026-06-16T16:29:02Z

Summary

Audit model-specific fp32-protected tensors from Audit model-specific fp32 protected tensors during dtype casting #2570 and keep the remaining model-specific buffers/params protected through dtype casts.
Preserve MoE gate/rotary fp32 buffers for MiniMax, HY, Qwen3-Omni, and Qwen3-VL paths.
Move Nemotron V3 Mamba A_log/dt_bias/D into fp32 holder modules, with HF-compatible state-dict adapter routing for the public checkpoint keys.
Keep callable fp32 holders usable under real FSDP2 by materializing full tensors from FSDP DTensor params and keeping strict holder subtrees unresharded during the parent forward.

Why

model.torch_dtype=bf16 and broad dtype casts can round small tensors that are part of a model's numerical/checkpoint contract. Trainable fp32 params that live directly on mixed modules also cannot be isolated cleanly by FSDP, so this follows the _fp32_params holder pattern used by the stacked Qwen GDN branch.

The real 2-GPU FSDP smoke exposed one more holder-specific issue: a callable holder that returns its parameter can hand the parent module a sharded DTensor. The fix keeps storage sharded/fp32, but returns the full fp32 tensor value to the caller.

Notes

This PR is stacked on yuhez/fix/qwen-gdn-fp32-precision. After that branch lands, this should be rebased onto main.

Validation

uv run --no-sync pytest tests/unit_tests/distributed/test_parallelizer_utils.py tests/unit_tests/distributed/test_fp32_compute_contract.py -q (25 passed)
focused GPT-OSS/Nemotron/DeepSeek holder tests (30 passed)
uv run --no-sync pytest tests/unit_tests/models/nemotron_v3/test_nemotron_v3_state_dict_adapter.py tests/unit_tests/distributed/test_fp32_compute_contract.py -q (30 passed)
focused Nemotron V3 Mamba/CP dtype tests (9 passed)
focused GPT-OSS holder/state-dict tests (5 passed, 1 skipped)
focused fp32 protected dtype regression suite (11 passed)
2-GPU H100 Slurm smoke: fp32-holder-fsdp-smoke-c938fb8-r3, job 12863187, exit 0:0
ruff check on touched files
git diff --check

copy-pr-bot · 2026-06-16T16:29:06Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

akoumpa · 2026-06-16T16:39:39Z

/nvskills-ci

yuhezhang-ai · 2026-06-16T16:59:48Z

/ok to test 704a39c

Signed-off-by: Yuhe Zhang <yuhez@nvidia.com>

yuhezhang-ai · 2026-06-16T21:45:49Z

/ok to test 9e1db7b

yuhezhang-ai · 2026-06-16T21:47:55Z

/nvskills-ci

yuhezhang-ai · 2026-06-16T21:48:02Z

/claude review

claude · 2026-06-16T21:52:07Z

        assert original_fn is not None, "apply_fsdp2_sharding_recursively not found in module globals"

-        def _fsdp_by_dtype(module, mesh, mp_policy, offload_policy=None, *args, **kwargs):
+        def _fsdp_by_dtype(


LGTM — clean PR. The fp32 holder pattern, reshard threading, and state-dict adapter routing all look correct. Good test coverage across models and distributed paths.

akoumpa added the r0.5.0 Auto-cherrypick to release branch. Apply before merge; cherrypick happens after merge. label Jun 16, 2026

copy-pr-bot Bot temporarily deployed to test June 16, 2026 17:00 Inactive

copy-pr-bot Bot temporarily deployed to nemo-ci June 16, 2026 17:00 Inactive

copy-pr-bot Bot temporarily deployed to public June 16, 2026 17:00 Inactive

copy-pr-bot Bot temporarily deployed to public June 16, 2026 17:03 Inactive

copy-pr-bot Bot temporarily deployed to nemo-ci June 16, 2026 17:05 Inactive

yuhezhang-ai linked an issue Jun 16, 2026 that may be closed by this pull request

Audit model-specific fp32 protected tensors during dtype casting #2570

Open

Base automatically changed from yuhez/fix/qwen-gdn-fp32-precision to main June 16, 2026 20:52

yuhezhang-ai added 7 commits June 16, 2026 13:57

fix(models): audit fp32 protected dtype casts

68f98aa

Signed-off-by: Yuhe Zhang <yuhez@nvidia.com>

fix(gpt-oss): isolate attention sinks in fp32 holder

0abf403

Signed-off-by: Yuhe Zhang <yuhez@nvidia.com>

fix(nemotron-v3): isolate mamba fp32 params

0611261

Signed-off-by: Yuhe Zhang <yuhez@nvidia.com>

fix(nemotron-v3): keep mamba D fp32

a0e0317

Signed-off-by: Yuhe Zhang <yuhez@nvidia.com>

fix(distributed): keep fp32 holders unresharded in forward

4a73380

Signed-off-by: Yuhe Zhang <yuhez@nvidia.com>

fix(distributed): thread reshard through dtype-aware sharding

00f3e9b

Signed-off-by: Yuhe Zhang <yuhez@nvidia.com>

fix(distributed): warn on pp reshard override

b035704

Signed-off-by: Yuhe Zhang <yuhez@nvidia.com>

yuhezhang-ai force-pushed the yuhez/fix/fp32-protected-tensor-audit branch from b2ec859 to b035704 Compare June 16, 2026 21:00

fix(gpt-oss): keep attention sinks resident dtype

9e1db7b

Signed-off-by: Yuhe Zhang <yuhez@nvidia.com>

copy-pr-bot Bot temporarily deployed to nemo-ci June 16, 2026 21:46 Inactive

copy-pr-bot Bot temporarily deployed to test June 16, 2026 21:46 Inactive

copy-pr-bot Bot temporarily deployed to public June 16, 2026 21:46 Inactive

copy-pr-bot Bot had a problem deploying to public June 16, 2026 21:48 Failure

copy-pr-bot Bot temporarily deployed to nemo-ci June 16, 2026 21:51 Inactive

claude Bot reviewed Jun 16, 2026

View reviewed changes

copy-pr-bot Bot had a problem deploying to public June 16, 2026 21:58 Failure

copy-pr-bot Bot temporarily deployed to public June 16, 2026 22:48 Inactive

copy-pr-bot Bot temporarily deployed to public June 16, 2026 22:49 Inactive

yuhezhang-ai marked this pull request as ready for review June 16, 2026 22:49

yuhezhang-ai requested a review from a team as a code owner June 16, 2026 22:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(models): audit fp32 protected tensors#2598

fix(models): audit fp32 protected tensors#2598
yuhezhang-ai wants to merge 8 commits into
mainfrom
yuhez/fix/fp32-protected-tensor-audit

yuhezhang-ai commented Jun 16, 2026 •

edited

Loading

Uh oh!

copy-pr-bot Bot commented Jun 16, 2026

Uh oh!

akoumpa commented Jun 16, 2026

Uh oh!

yuhezhang-ai commented Jun 16, 2026

Uh oh!

yuhezhang-ai commented Jun 16, 2026

Uh oh!

yuhezhang-ai commented Jun 16, 2026

Uh oh!

yuhezhang-ai commented Jun 16, 2026

Uh oh!

claude Bot Jun 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

yuhezhang-ai commented Jun 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Why

Notes

Validation

Uh oh!

copy-pr-bot Bot commented Jun 16, 2026

Uh oh!

akoumpa commented Jun 16, 2026

Uh oh!

yuhezhang-ai commented Jun 16, 2026

Uh oh!

yuhezhang-ai commented Jun 16, 2026

Uh oh!

yuhezhang-ai commented Jun 16, 2026

Uh oh!

yuhezhang-ai commented Jun 16, 2026

Uh oh!

claude Bot Jun 16, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

yuhezhang-ai commented Jun 16, 2026 •

edited

Loading