chore: nightly sync main into dev (18_06_2026)#5402
Conversation
Signed-off-by: Teodor-Dumitru Ene <teodord.ene@gmail.com>
Signed-off-by: Asha Anoosheh <aanoosheh@nvidia.com>
Signed-off-by: Teodor-Dumitru Ene <teodord.ene@gmail.com> Co-authored-by: Jorge Albericio <jalbericiola@nvidia.com>
Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com> Co-authored-by: Claude Fable 5 <noreply@anthropic.com>
Signed-off-by: janEbert <janpabloe@nvidia.com> Signed-off-by: Philip Petrakian <ppetrakian@nvidia.com> Co-authored-by: Philip Petrakian <ppetrakian@nvidia.com>
Signed-off-by: Helen Ngo <helenn@nvidia.com>
Signed-off-by: Teodor-Dumitru Ene <teodord.ene@gmail.com> Co-authored-by: Jorge Albericio <jalbericiola@nvidia.com>
Signed-off-by: Helen Ngo <helenn@nvidia.com>
Signed-off-by: Teodor-Dumitru Ene <teodord.ene@gmail.com>
Signed-off-by: ykarnati <ykarnati@nvidia.com> Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: Shijie Wang <jaywan@nvidia.com>
#5347) Signed-off-by: Ajay Balasa <abalasa@nvidia.com>
…izer) (#5333) Signed-off-by: ykarnati <ykarnati@nvidia.com> Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
#5360) Signed-off-by: oliver könig <okoenig@nvidia.com>
… module globals (#5351) Signed-off-by: ilml <tolong@nvidia.com>
Signed-off-by: Keshav Santhanam <ksanthanam@nvidia.com>
…h space buffers (#5348) Signed-off-by: Keshav Santhanam <ksanthanam@nvidia.com>
Signed-off-by: Teodor-Dumitru Ene <teodord.ene@gmail.com>
Signed-off-by: Teodor-Dumitru Ene <teodord.ene@gmail.com> Co-authored-by: Teodor-Dumitru Ene <teodord.ene@gmail.com>
Signed-off-by: sraman <sraman@nvidia.com>
Signed-off-by: Jingyue Wu <wujingyue@gmail.com>
Signed-off-by: Hollow Man <hollowman@opensuse.org>
Signed-off-by: Jingyue Wu <wujingyue@gmail.com>
Signed-off-by: Jingyue Wu <wujingyue@gmail.com>
…5082) Signed-off-by: hongbinl <hongbinl@nvidia.com>
Signed-off-by: ykarnati <ykarnati@nvidia.com>
…5372) Signed-off-by: ykarnati <ykarnati@nvidia.com> Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
|
/ok to test 52bfe0a |
0362e81 to
52bfe0a
Compare
Phase-3 CI fix. The merge deferred several main source changes to preserve dev-only code (pre-push guard). These unit tests assert main's deferred behavior, so revert them to dev to match the dev-equivalent source: - test_fine_grained_activation_offloading.py (main's _can_manage_tensor_for_offload guard) - test_multi_latent_attention.py + test_optimizer.py (#5310 fused MLA QKV down-proj) - test_weight_and_optimizer_memory.py (#5145 LatentMoE memory) - test_hybrid_moe_model.py (#3956 moe_grad_scale_func) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: svcnvidia-nemo-ci <svcnvidia-nemo-ci@nvidia.com>
|
/ok to test a21f083 |
52bfe0a to
a21f083
Compare
✅ Ready for human review — CI summary166 / 167 non-exempt required checks are green, including the The only non-green non-exempt check is Pre-existing failure:
|
|
Superseded by today's nightly sync. |
Nightly sync: main → dev (34 commits, 18_06_2026)
Automated nightly sync of
mainintodev, started fromorigin/devwithgit merge origin/main --no-editand resolved surgically to preserve dev-onlyfeatures (enforced by the pre-push dev-feature-preservation guard).
Python lines: +4967 / -62 across 36 files
What landed
Main's new, self-contained features were synced cleanly:
megatron/training/distillation/*, Offline Logits-Based Knowledge Distillation #5019)megatron/rl/rl_profiling.py, Profiling #3110).../megatron_fsdp/experimental/*, Add minimal DBuffer implementation #4835)MimoModel.zero_grad_buffer(Thread MIMO support through the stock training loop (schedule + optimizer) #5333, Add MimoModel.zero_grad_buffer delegating to active DDP submodules #5372) and DDPpg_collectionthreading (Thread pg_collection through wrap_model_chunks_with_ddp #5328), with their new unit testsConflicts resolved (combined dev + main)
16 files had textual conflicts, resolved by combining both sides:
finalize_model_grads.py— kept both theexpert_bias is not None(dev) andfrozen_expert_bias(main) guardsrope_utils.py— kept dev's CUDA-graph-compatible THD RoPE (already incorporates the CP packed-freqs fix) + main'sapply_rotary_pos_embdefaultgpt_model.py(import) /moe/router.py(init) — combined both symbols/attrsmoe/experts.py— kept dev's_unsupported(...)refactor (consistent with the whole function)fine_grained_activation_offload.py— kept dev's debug msg + added main's_can_manage_tensor_for_offload/_te_do_not_offloadguardstransformer_config.py— combined dev's offload asserts with main'sfused_group_mlpvalidationcheckpointing.py— kept main's async-logits scheduling + dev's formattingtheoretical_memory_usage.py— kept main's LatentMoErouted_expert_hidden_size+ dev's formattingarguments.py— combined imports (restored dev'sdataclasses/F/PkgVersion), kept dev's args + added main's--rl-profile,--rl-profile-dir,--freeze-all-layers,--override-ckpt-iteration,--logits-*argspretrain_gpt.py/pretrain_hybrid.py— reconciledget_batchagainst the merged helper signatures (dev'smtp_on_this_rankbackward-compat, dev'sdynamic_context_parallelrename, main's_build_cached_logits_loss_func)pyproject.toml/uv.lock/docker/Dockerfile.ci.dev) and.github/CODEOWNERSkept at dev's versions (verified identical toorigin/dev); no new git sources in main to reconcileDeferred to a future sync (dev-feature-preservation guard)
Where main's modifications to existing dev files would have dropped dev-unique
lines (the guard's hard-abort condition), dev's version was kept and main's change
deferred. These are documented main commits whose changes touched code dev had
diverged on; they will re-sync once the competing work reconciles:
dynamic_engine.pykept at dev; main-onlytest_cg_admission_gating.pyremoved accordinglyabsorbed_mla.py+ test kept at dev (external combined-spec interface is unchanged, so callers are unaffected)The guard passes (0 dropped dev-only lines) and all changed files parse.
🤖 Generated with Claude Code