Skip to content

chore: nightly sync main into dev (18_06_2026)#5402

Closed
svcnvidia-nemo-ci wants to merge 36 commits into
devfrom
main2dev/18_06_2026
Closed

chore: nightly sync main into dev (18_06_2026)#5402
svcnvidia-nemo-ci wants to merge 36 commits into
devfrom
main2dev/18_06_2026

Conversation

@svcnvidia-nemo-ci

Copy link
Copy Markdown

Nightly sync: main → dev (34 commits, 18_06_2026)

Automated nightly sync of main into dev, started from origin/dev with
git merge origin/main --no-edit and resolved surgically to preserve dev-only
features (enforced by the pre-push dev-feature-preservation guard).

Python lines: +4967 / -62 across 36 files

What landed

Main's new, self-contained features were synced cleanly:

Conflicts resolved (combined dev + main)

16 files had textual conflicts, resolved by combining both sides:

  • finalize_model_grads.py — kept both the expert_bias is not None (dev) and frozen_expert_bias (main) guards
  • rope_utils.py — kept dev's CUDA-graph-compatible THD RoPE (already incorporates the CP packed-freqs fix) + main's apply_rotary_pos_emb default
  • gpt_model.py (import) / moe/router.py (init) — combined both symbols/attrs
  • moe/experts.py — kept dev's _unsupported(...) refactor (consistent with the whole function)
  • fine_grained_activation_offload.py — kept dev's debug msg + added main's _can_manage_tensor_for_offload/_te_do_not_offload guards
  • transformer_config.py — combined dev's offload asserts with main's fused_group_mlp validation
  • checkpointing.py — kept main's async-logits scheduling + dev's formatting
  • theoretical_memory_usage.py — kept main's LatentMoE routed_expert_hidden_size + dev's formatting
  • arguments.py — combined imports (restored dev's dataclasses/F/PkgVersion), kept dev's args + added main's --rl-profile, --rl-profile-dir, --freeze-all-layers, --override-ckpt-iteration, --logits-* args
  • pretrain_gpt.py / pretrain_hybrid.py — reconciled get_batch against the merged helper signatures (dev's mtp_on_this_rank backward-compat, dev's dynamic_context_parallel rename, main's _build_cached_logits_loss_func)
  • dependency triple (pyproject.toml/uv.lock/docker/Dockerfile.ci.dev) and .github/CODEOWNERS kept at dev's versions (verified identical to origin/dev); no new git sources in main to reconcile

Deferred to a future sync (dev-feature-preservation guard)

Where main's modifications to existing dev files would have dropped dev-unique
lines (the guard's hard-abort condition), dev's version was kept and main's change
deferred. These are documented main commits whose changes touched code dev had
diverged on; they will re-sync once the competing work reconciles:

The guard passes (0 dropped dev-only lines) and all changed files parse.

🤖 Generated with Claude Code

tdene and others added 30 commits June 12, 2026 16:35
Signed-off-by: Teodor-Dumitru Ene <teodord.ene@gmail.com>
Signed-off-by: Asha Anoosheh <aanoosheh@nvidia.com>
Signed-off-by: Teodor-Dumitru Ene <teodord.ene@gmail.com>
Co-authored-by: Jorge Albericio <jalbericiola@nvidia.com>
Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com>
Co-authored-by: Claude Fable 5 <noreply@anthropic.com>
Signed-off-by: janEbert <janpabloe@nvidia.com>
Signed-off-by: Philip Petrakian <ppetrakian@nvidia.com>
Co-authored-by: Philip Petrakian <ppetrakian@nvidia.com>
Signed-off-by: Helen Ngo <helenn@nvidia.com>
Signed-off-by: Teodor-Dumitru Ene <teodord.ene@gmail.com>
Co-authored-by: Jorge Albericio <jalbericiola@nvidia.com>
Signed-off-by: Teodor-Dumitru Ene <teodord.ene@gmail.com>
Signed-off-by: ykarnati <ykarnati@nvidia.com>
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: Shijie Wang <jaywan@nvidia.com>
…izer) (#5333)

Signed-off-by: ykarnati <ykarnati@nvidia.com>
Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
#5360)

Signed-off-by: oliver könig <okoenig@nvidia.com>
Signed-off-by: Antoni-Joan Solergibert <asolergibert@nvidia.com>
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-authored-by: Deepak Narayanan <dnarayanan@nvidia.com>
… module globals (#5351)

Signed-off-by: ilml <tolong@nvidia.com>
Signed-off-by: Keshav Santhanam <ksanthanam@nvidia.com>
…h space buffers (#5348)

Signed-off-by: Keshav Santhanam <ksanthanam@nvidia.com>
Signed-off-by: Teodor-Dumitru Ene <teodord.ene@gmail.com>
Signed-off-by: Teodor-Dumitru Ene <teodord.ene@gmail.com>
Co-authored-by: Teodor-Dumitru Ene <teodord.ene@gmail.com>
Signed-off-by: sraman <sraman@nvidia.com>
Signed-off-by: Jingyue Wu <wujingyue@gmail.com>
Signed-off-by: Hollow Man <hollowman@opensuse.org>
Signed-off-by: Jingyue Wu <wujingyue@gmail.com>
Signed-off-by: Jingyue Wu <wujingyue@gmail.com>
Signed-off-by: ykarnati <ykarnati@nvidia.com>
…5372)

Signed-off-by: ykarnati <ykarnati@nvidia.com>
Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
@svcnvidia-nemo-ci

Copy link
Copy Markdown
Author

/ok to test 52bfe0a

Phase-3 CI fix. The merge deferred several main source changes to preserve
dev-only code (pre-push guard). These unit tests assert main's deferred
behavior, so revert them to dev to match the dev-equivalent source:
- test_fine_grained_activation_offloading.py (main's _can_manage_tensor_for_offload guard)
- test_multi_latent_attention.py + test_optimizer.py (#5310 fused MLA QKV down-proj)
- test_weight_and_optimizer_memory.py (#5145 LatentMoE memory)
- test_hybrid_moe_model.py (#3956 moe_grad_scale_func)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: svcnvidia-nemo-ci <svcnvidia-nemo-ci@nvidia.com>
@svcnvidia-nemo-ci

Copy link
Copy Markdown
Author

/ok to test a21f083

@svcnvidia-nemo-ci

Copy link
Copy Markdown
Author

✅ Ready for human review — CI summary

166 / 167 non-exempt required checks are green, including the Nemo_CICD_Test
aggregate gate (all unit-test buckets + the full functional/integration golden-value
suite + linting + copyright + wheel builds).
GitHub Actions run: https://github.com/NVIDIA/Megatron-LM/actions/runs/27790303479

The only non-green non-exempt check is cicd-mbridge-testing, which is a
verified pre-existing / external failure (NOT sync-caused) — see evidence below.

Pre-existing failure: cicd-mbridge-testing

The MBridge job checks out NVIDIA-NeMo/Megatron-Bridge@main and runs its unit suite
against this mcore commit. The 4 failing tests are all Megatron-Bridge Gemma4 tests:

  • models/gemma/test_gemma4_modeling.py::...injects_layer_inputs_and_restores_state
  • models/gemma/test_gemma4_modeling.py::...wraps_checkpointed_forward
  • models/gemma/test_gemma4_provider.py::...threads_per_layer_inputs_to_each_layer
    AttributeError: module 'megatron.core.transformer.transformer_block' has no attribute 'checkpointed_forward'
  • models/gemma_vl/test_gemma4_vl_modeling.py::...scatters_sequence_parallel_decoder_input
    TypeError: fake_scatter() got an unexpected keyword argument 'group'

Why this is not caused by the sync (empirical evidence):

  1. The failing file is unchanged by this PR: git diff origin/dev HEAD -- megatron/core/transformer/transformer_block.py is empty (merged transformer_block.py is byte-identical to origin/dev).
  2. transformer_block.checkpointed_forward was removed by a dev-side refactor: from megatron.core.recompute import checkpointed_forward exists in the merge-base and origin/main, but origin/dev refactored it away (now uses a _checkpointed_forward method). So the symbol Megatron-Bridge patches is absent because dev removed it, independent of this merge.
  3. The same Launch_Unit_Tests_Core job fails identically for a different concurrent mcore commit: NVIDIA-NeMo/Megatron-Bridge run 27791666020 (branch mcore-testing-27791614845).
  4. The previous nightly sync (chore: nightly sync main into dev (12_06_2026) #5314) passed cicd-mbridge-testing, i.e. before this Megatron-Bridge ↔ mcore API skew appeared.
  5. It is not fixable from mcore: the scatter(group=...) mismatch lives in Megatron-Bridge's Gemma4-VL test mock, and re-adding checkpointed_forward would partially revert dev's intentional refactor (and dev itself is incompatible). The fix belongs in Megatron-Bridge (align Gemma4 tests with dev's _checkpointed_forward API) or a coordinated mcore release.

Merge notes

  • 34 commits synced from main (Python: +4967 / −62 across 36 files at merge time).
  • Protected files kept at dev's version (verified identical): .github/CODEOWNERS, pyproject.toml, uv.lock, docker/Dockerfile.ci.dev. No new [tool.uv.sources] git sources in main required reconciliation.
  • Conflicts (16) resolved by combining dev + main, e.g. finalize_model_grads.py (both expert-bias guards), rope_utils.py (dev's CUDA-graph THD RoPE + main's apply_rotary_pos_emb default), transformer_config.py (dev offload asserts + main fused_group_mlp validation), checkpointing.py (main async-logits scheduling + dev formatting), arguments.py (restored dev imports + added main's --rl-profile/--freeze-all-layers/--override-ckpt-iteration/--logits-* args).
  • Dev-feature-preservation guard: to keep the pre-push guard green (0 dropped dev-only lines), several main modifications to existing dev files were deferred in favor of dev's versions where main's change would have dropped dev-unique lines (e.g. dev's absorbed-MLA separate-K/V incl. SP-assert / dynamic-CP, dev's inference engine). Main's new, self-contained features landed cleanly: offline logits distillation, RL profiling, DBuffer/FSDP-experimental, MIMO + DDP pg_collection threading (with their unit tests).
  • Phase-3 CI fixes (one rolling fix commit): unit tests asserting the deferred main behavior were aligned to dev (test_fine_grained_activation_offloading, test_multi_latent_attention, test_optimizer, test_weight_and_optimizer_memory); test_train_step_schedule_plumbing's mock was extended to tolerate dev's extra train_step args. test_hybrid_moe_model kept main's version (merged config legitimately has moe_grad_scale_func).

🤖 Generated with Claude Code

@svcnvidia-nemo-ci

Copy link
Copy Markdown
Author

Superseded by today's nightly sync.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

complexity: high Run functional tests Run MBridge tests Attach this for testing this PR against MBridge main

Projects

None yet

Development

Successfully merging this pull request may close these issues.