Releases: NVIDIA-NeMo/Megatron-Bridge
Releases · NVIDIA-NeMo/Megatron-Bridge
NVIDIA Megatron-Bridge 0.4.2
Highlights
- Expanded performance configs for DeepSeek V3, Qwen, GPT-OSS, and WAN
- Supported fp4_param_gather mixed precision config
- Enhanced security in dataset checkpoint deserialization and url loading. Safer trust_remote_code handling.
Performance
- NVFP4 with 4-bit parameter AllGather in DP communications (PR#3364, PR#4005)
- DSV3 B300 recipe tuning (PR#3549)
- DSV3 B200 recipe tuning (PR#3368)
- Qwen3 235B A22B B300 recipe tuning (PR#3490)
- NT3 super B300 recipe tuning (PR#3579)
- GPT-OSS B200 regression fix (PR#3614)
Software Component
Known issues
- There is a known issue with Evaluator when installing nvidia-vlmeval inside /opt/NeMo-FW. Please use the /opt/Megatron-Bridge directory to install the package:
cd /opt/Megatron-Bridge
uv pip install nvidia-vlmeval
Changelog Details
- beep boop 🤖: Bumping megatron.bridge to v0.4.1 by @nemo-automation-bot[bot] :: PR: #3363
- cp:
[perf] fix: guard cuda_graph_scope validation against None (3249)intor0.4.0by @svcnvidia-nemo-ci :: PR: #3262 - cp:
fix(perf): set NCCL env vars when nccl_ub enabled via recipe config (3283)intor0.4.0by @yaoyu-33 :: PR: #3305 - cp:
Enable nemo-ci tests (short runs - perf and non-perf) for Wan + Updating recipes names (3179)intor0.4.0by @svcnvidia-nemo-ci :: PR: #3324 - cp:
Perf script utility to lock gpu frequency. (2977)intor0.4.0by @svcnvidia-nemo-ci :: PR: #3326 - cp:
fix(gemma3-vl): force right-padding in VLM collate to prevent token loss (3331)intor0.4.0by @svcnvidia-nemo-ci :: PR: #3332 - cp:
fix(perf): read baseline values from golden values when using new format (3334)intor0.4.0by @svcnvidia-nemo-ci :: PR: #3338 - [docs] chore: bump versions1.json to 0.4.0 (latest) by @ko3n1g :: PR: #3376
- b200 DSv3 better cfg (#3368), mxfp8 to fp8_cs for h100 gpt-oss #3378 by @malay-nagda :: PR: #3420
- 2604 perf summary (#3377) by @malay-nagda :: PR: #3405
- cp:
docs(releases): add 26.04 software component versions (3421)intor0.4.0by @svcnvidia-nemo-ci :: PR: #3430 - cp:
b200 DSv3 better cfg (3368)intor0.4.0by @svcnvidia-nemo-ci :: PR: #3401 - cp:
[training] fix: report memory on 2nd iteration to better reflect actual peak (3169)intor0.4.0by @dingqingy-nv :: PR: #3367 - cp:
Update Qwen3-VL pretrain perf configs for 30B and 235B (3327)intor0.4.0by @svcnvidia-nemo-ci :: PR: #3342 - cp:
docs: Add container version to docs version picker (3434)intor0.4.0by @svcnvidia-nemo-ci :: PR: #3435 - cp: [docs] Add Megatron Bridge 0.4.0 release notes (#3419) by @chtruong814 :: PR: #3439
- cp: fix(test): clone mmap-backed tensors before overwriting safetensors file (#3335) by @yaoyu-33 :: PR: #3441
- cp:
[test] refactor: move diffusion tests to test_groups directory (3275)intor0.4.0by @svcnvidia-nemo-ci :: PR: #3442 - remove archival data from main page by @malay-nagda :: PR: #3448
- cp:
fix: set 644 permissions on COPY'd files to match cloned repos (3431)intor0.4.0by @svcnvidia-nemo-ci :: PR: #3450 - cp:
[perf] fix: use direct assignment for NCCL env vars when nccl_ub enabled (3350)intor0.4.0by @svcnvidia-nemo-ci :: PR: #3453 - cp:
[training] feat: enable fp4_param_gather in MixedPrecisionConfig (3364)intor0.4.0by @svcnvidia-nemo-ci :: PR: #3454 - cp:
fix(docker): replace rdma-core source build with system package install (3429)intor0.4.0by @svcnvidia-nemo-ci :: PR: #3457 - cp:
[training] fix: record CUDA memory history before snapshot so dumps are non-empty (#3487)intor0.4.0by @dingqingy-nv :: PR: #3508 - cp:
[vulnops][misc] fix: Add allowlist validation for _target_ instantiation (3142)intor0.4.0by @svcnvidia-nemo-ci :: PR: #3540 - cp:
[vulnops][data] fix: Replace unsafe pickle.loads with restricted unpickler in Qwen VL pipeline (3139)intor0.4.0by @svcnvidia-nemo-ci :: PR: #3541 - cp:
[vulnops][ckpt] fix: Use weights_only=True in ModelOpt checkpoint loading (3138)intor0.4.0by @svcnvidia-nemo-ci :: PR: #3542 - cp:
[vulnops][ckpt] fix: Use weights_only=True in TrainState checkpoint loading (3506)intor0.4.0by @svcnvidia-nemo-ci :: PR: #3557 - cp:
[vulnops][data] fix: Replace unsafe pickle.load with restricted unpickler for index metadata (3140)intor0.4.0by @svcnvidia-nemo-ci :: PR: #3558 - cp:
[vulnops] fix: _contains_code_references allowlist bypass leads to RCE (3379)intor0.4.0by @svcnvidia-nemo-ci :: PR: #3559 - fix: Add security warning for trust_remote_code and remove hardcoded True by @chtruong814 :: PR: #3539
- cp:
Cleanup TE cuda graphs with the right api (3459)intor0.4.0by @svcnvidia-nemo-ci :: PR: #3476 - cp:
Update DeepSeek-V3 configs for B300 (3549)intor0.4.0by @svcnvidia-nemo-ci :: PR: #3565 - cp:
log repo status manual (3570)intor0.4.0by @svcnvidia-nemo-ci :: PR: #3572 - cp:
ci: post merge comment with SHA after successful CI run (3567)intor0.4.0by @svcnvidia-nemo-ci :: PR: #3573 - cp:
[perf] update: switch GPT-OSS GB200 V2 dispatcher default to alltoall (3561)intor0.4.0by @svcnvidia-nemo-ci :: PR: #3577 - cp:
no fp4 param gather (3578)intor0.4.0by @svcnvidia-nemo-ci :: PR: #3580 - cp:
fix(evaluate): skip non-dict golden value entries such as job_id (3581)intor0.4.0by @svcnvidia-nemo-ci :: PR: #3582 - cp:
[vulnops][data] fix: Validate URLs in VLM video loader to prevent SSRF (3482)intor0.4.0by @svcnvidia-nemo-ci :: PR: #3588 - fix(docker): suppress lightning from uv resolution in fw_pyproject by @ko3n1g :: PR: #3602
- cp:
[vulnops][data] fix: Remove unnecessary allow_pickle=True and add security warnings (3141)intor0.4.0by @svcnvidia-nemo-ci :: PR: #3615 - cp:
[vulnops][data] fix: Replace allow_pickle=True with restricted unpickler in packed dataset loading (3616)intor0.4.0by @svcnvidia-nemo-ci :: PR: #3629 - cp:
add VP for LoRA Lm3 70B (3547)intor0.4.0by @svcnvidia-nemo-ci :: PR: #3596 - cp:
num_layers_fix- qwen vl 235b_a22b on B200 (3589)intor0.4.0by @svcnvidia-nemo-ci :: PR: #3603 - cp:
fix(docker): resolve lightning not found on PyPI by providing local stub (3604)intor0.4.0by @svcnvidia-nemo-ci :: PR: #3606 - cp:
70b_lora_gb200_bf16_fix (3623)intor0.4.0by @svcnvidia-nemo-ci :: PR: #3627 - cp:
[vulnops] fix: Add SSRF protection to image-loading utilities (3630)intor0.4.0by @svcnvidia-nemo-ci :: PR: #3632 - chore(beep boop 🤖): Bump
uv.lock(r0.4.0, mcore-core_r0.17.0) (2026-04-30) by @svcnvidia-nemo-ci :: PR: #3591 - cp:
[vulnops] fix: Add SSRF protection to audio URL loading (3633)intor0.4.0by @svcnvidia-nemo-ci :: PR: #3636 - cp:
fix(perf): keep PCT binding for deepseek_v3 large_scale on b300 (3656)intor0.4.0by @svcnvidia-nemo-ci :: PR: #3657 - fix: apply vllm PR 36192 patch and bump pillow to 12.20 by @ko3n1g :: PR: #3671
- cp:
Add previously removed NemotronHBridge SequentialMLP mappings (3628)intor0.4.0by @svcnvidia-nemo-ci :: PR: #3701 - Use HybridEP flex dispatcher for Qwen3 235B B300 perf configs (#3490) by @rhmukundan :: PR: #3675
- [build] chore: bump package version to 0.4.2 by @ko3n1g :: PR: #3721
- [model, ckpt, docs] fix: support HF→Megatron conversion under decentralized PGs (r0.4.0) by @cuichenx :: PR: #3674
- cp:
Fix Gemma3 example folder (3724)intor0.4.0by @svcnvidia-nemo-ci :: PR: #3728 - cp:
Reorganize ModelOpt docs (3715)intor0.4.0by @svcnvidia-nemo-ci :: PR: #3751 - [model, ckpt] fix: align GPT-OSS BF16 down_proj orientation on import (r0.4.0) by @cuichenx :: PR: #3753
- perf(qwen3-next): set expandable_segments on GB300 BF16/FP8_MX to fix OOM by @ko3n1g :: PR: #3767
- cp:
llama31 405b gb200 nvfp4 no pg overlap (3713)intor0.4.0by @svcnvidia-nemo-ci :: PR: #3773 - cp:
[perf] update: switch GPT-OSS B200 V2 dispatcher default to alltoall (3614)intor0.4.0by @svcnvidia-nemo-ci :: PR: #3682 - nt3 super nvfp4; lm3.1 405B nvfp4; lm3 70B mxfp8- expandable_segments by @malay-nagda :: PR: #3780
- cp:
[config] Update micro_batch_size to 2 for gemma3 recipe (3815)intor0.4.0by @svcnvidia-nemo-ci :: PR: #3828 - chore: Bump TE to latest 2.14 and MCore to latest 0.17.0 by @chtruong814 :: PR: #3806
- qwen3 next env var fix by @malay-nagda :: PR: #3845
- chore: Bump and remove packages to address CVEs (#3841) by @chtruong814 :: PR: #3855
- Bump MCore to 2edffa by @chtruong814 :: PR: #3857
- cp:
chore: Bump deps to address CVEs (3919)intor0.4.0by @svcnvidia-nemo-ci :: PR: #3925 - cp:
2604_patch_perf_summary (3818)intor0.4.0by @svcnvidia-nemo-ci :: PR: #3861 - cp:
26.04.01_perf_summary (3997)intor0.4.0by @svcnvidia-nemo-ci :: PR: #3998 - cp:
docs: note 26.04 drops PyAV by default and document runtime install (4020)intor0.4.0by @svcnvidia-nemo-ci :: PR: #4021 - cp:
[perf] fix: guard cuda_graph_scope validation against None (3249)intor0.4.0(#3262) by @svcnvidia-nemo-ci - cp:
fix(perf): set NCCL env vars when nccl_ub enabled via recipe config (3283)intor0.4.0(#3305) by @yaoyu-33 - cp:
Enable nemo-ci tests (short runs - perf and non-perf) for Wan + Updating recipes names (3179)intor0.4.0(#3324) by @svcnvidia-nemo-ci - cp:
Perf script utility to lock gpu frequency. (2977)intor0.4.0(#3326) by @svcnvidia-nemo-ci - cp: `fix(gemma3-vl): force right-padding in VLM collate t...
26.04-alpha.rc2
[MXFP8 param gather]Update param buffer before copy to model weights …
NVIDIA Megatron-Bridge 0.4.1
- This release addresses known security issues. For the latest NVIDIA Vulnerability Disclosure Information visit https://www.nvidia.com/en-us/security/, for acknowledgement please reach out to the NVIDIA PSIRT team at PSIRT@nvidia.com
26.04-alpha.rc1
Merge branch 'PR2411' into 26.04-alpha
NVIDIA Megatron-Bridge 0.4.0
Highlights
Model Collection Support
- MiniMax M2 / M2.5 support (PR#2602)
- Kimi 2.5 support, including GB300 MXFP8 recipe and HF config updates (PR#2743)
- Nemotron 3 Super model support (PR#2912)
- Sarvam support (PR#1814)
- Qwen 3.5 VL Bridge with recipes and LoRA bridge / merge support (PR#2530, PR#2654, PR#2736)
- Qwen 2.5 Omni support (PR#2634)
- Qwen2-Audio support (PR#2324)
- Xiaomi MiMo dense MTP model bridge support (PR#2387, by HollowMan6)
Diffusion Collection
- Diffusion model support for DFM-to-Bridge migration (PR#2534, PR#2645)
- FLUX and WAN diffusion submodule improvements (PR#2822, PR#2849)
Training & Functionality
- Parquet support for sequence-packing preprocessing, improving handling of larger datasets (PR#2395)
- Energon integration for sequence packing with WebDataset workflows (PR#2440)
- Default packed sequences across finetune recipes (PR#2284)
- More modern finetuning datasets, including OpenMathInstruct V2 and GSM8K (PR#2264)
- Unified dataset configuration in
run_recipe.py(PR#2826) - NCCL flight recorder configuration support (PR#2891)
- Comet ML experiment tracking integration (PR#2910)
- Refactored SFT and PEFT recipes for VLM workflows (PR#2614)
- Added the
on_checkpoint_savecallback event for training workflows (PR#2905) - Added MoE LoRA rank normalization for expert layers (PR#3006)
- Direct export of block-wise FP8 weights and scaling factors (PR#1994)
- Accelerated first-fit packing with a segment tree for much faster packing on large datasets (PR#2953)
Model Optimization
- Pruning support and documentation (PR#2244)
- Post-training quantization support for Nano, Super, and Ultra model families (PR#2303)
- Distillation quantization support in NeMo 2 (PR#2591)
Performance
- Nemotron 3 Super perf config, including GB200 improvements and BF16 / NVFP4 functional support via module recompute (PR#3208)
Developer Experience & Compatibility
- ModelConfig and ModelBuilder refactor integrated into the training loop (PR#2798, PR#2671)
- Dev branch support and documentation updates (PR#2497)
- Python 3.12 migration announcement (PR#2773)
- Transformers 5.0 through 5.3 compatibility (PR#2068, PR#2781)
- PEFT Bridge offline mode support (PR#2574)
- LoRA merge on CPU (PR#2194)
- Self-contained Megatron-to-HF export with auto-config synthesis (PR#2778)
- Scripts and documentation for Megatron-LM and Megatron Bridge correlation
Examples & Tutorials
- Resiliency examples (PR#2115)
- Qwen3 VL sequence packing examples (PR#2380)
- Distillation example cleanup (PR#2865, PR#2860)
Community Contributions
@HollowMan6(Aalto University): Xiaomi MiMo dense MTP bridge support, Qwen 3.5 VL LoRA bridge and merge, and additional export / PEFT fixes (PR#2387, PR#2736, PR#2384, PR#2799)@shaltielshmid: packed-sequence improvements for large datasets and safer model loading defaults (PR#2395, PR#2766)@jaeminh: accelerated first-fit packing with a segment tree (PR#2953)@pavelgein: added theon_checkpoint_savecallback event (PR#2905)@ShiftyBlock(UC Berkeley): added auto-config for self-contained Megatron-to-HF export (PR#2778)@erictang000(Anyscale): added LoRA rank normalization for MoE expert layers (PR#3006)@eternally-z: added direct export support for block-wise FP8 weights and scaling factors (PR#1994)@Hayak3: fixed the unsupported normalization argument for Qwen3-VL (PR#1970)@mohit-sarvam(Sarvam AI): added Sarvam MoE support (PR#1814)
A big thank you to our community contributors for their valuable support!
Changelog Details
- docs: Update callback code snippets to include all imports needed for example by @ananthsub :: PR: #2283
- M4 leftover for QWen3-VL with MCore vision encoder by @shifangx :: PR: #2370
- Update Qwen3 235B B300 Configs to match Qwen3 B200 Configs by @rhmukundan :: PR: #2669
- [bridge] Fix off-by-one in sliding window size for Gemma2, Gemma3, Mistral, and GPT-OSS by @cuichenx :: PR: #2656
- fix: Write intermediate results to tmp by @ko3n1g :: PR: #2726
- Perf recipe dataloader num_workers interface fix by @dingqingy-nv :: PR: #2710
- Suppress noisy _extra_state warnings during checkpoint loading by @cuichenx :: PR: #2689
- [model, recipe] Add Qwen 3.5 recipes by @cuichenx :: PR: #2654
- [ci] chore: add nightly dev commit bump workflow by @ko3n1g :: PR: #2729
- ci(fix): Unique naming for dev branch by @ko3n1g :: PR: #2747
- [ci] Refactor Gemma3-VL launch script to run finetune and packed tests separately by @cuichenx :: PR: #2730
- add qwen2_5_omni by @yuekaizhang :: PR: #2634
- build: Bump TE 2.13 by @ko3n1g :: PR: #2753
- [docs, ci] chore: add governance issue forms and triage guide by @yaoyu-33 :: PR: #2716
- [test] fix: temporarily disable qwen2.5 omni unit tests by @yaoyu-33 :: PR: #2759
- add nemotron3 super docs by @liding-nv :: PR: #2757
- ci: Fix stopiteration for Mbridge by @ko3n1g :: PR: #2760
- GPT-OSS Blackwell MXFP8 recipes by @weijiac0619 :: PR: #2633
- feat(mimo): phase 2 - model provider, DDP wrapping, process groups by @aroshanghias-nvd :: PR: #2004
- [build] feat: add OSS NeMo FW dockerfiles by @thomasdhc :: PR: #2722
- Lm3 70B GB200 FP8_CS SFT cfg update by @malay-nagda :: PR: #2748
- [docs] chore: use uv run in test file docstring run instructions by @cuichenx :: PR: #2728
- build: Bump NVRX by @ko3n1g :: PR: #2775
- NVFP4 memory spike fix compared to M-LM by @sanandaraj5597 :: PR: #2764
- [doc] feat: Document adapter merge verification in stream_adapter_weights example by @yaoyu-33 :: PR: #2042
- [doc] chore: Add needs-review to PR state labels guidance by @yaoyu-33 :: PR: #2758
- [ckpt] fix: broaden exception handling in save_artifacts dynamic module loading by @yaoyu-33 :: PR: #2765
- [test] fix: use toy configs in qwen2.5 omni unit tests by @yaoyu-33 :: PR: #2761
- [model] Refactor Qwen3-VL and Ministral3 fine-tuning scripts by @kamran-nvidia :: PR: #2735
- docs - Update user manual with new MoE features and Megatron FSDP by @onel :: PR: #2529
- remove encoder_and_decoder usage by @dimapihtar :: PR: #2512
- Fix attention_mask mismatch in compare.py by @mohsinm-dev :: PR: #2476
- [model, test] fix: guard hybrid layer count across MCore branches by @yaoyu-33 :: PR: #2776
- [data] fix: guard eval_interval division to prevent ZeroDivisionError by @yaoyu-33 :: PR: #2732
- [sync][training] fix: log loss values of exactly 0.0 in training_log() by @mehraakash :: PR: #2740
- [model] feat: support Qwen 3.5 MTP c...
NVIDIA Megatron-Bridge 0.3.1
Changelog Details
Performance & Model Configs
- CP SFT performance improvements (#2527)
- Nemotron 3 Nano perf config updates (#2560, #2681)
- Onboard LLaMA3 70B LoRA to B300 and B200 chips (#2588)
- Update Qwen3 235B B300 configs to match B200 configs (#2706, #2720)
- Update DeepSeek-V3 B300 config (#2723)
- DeepSeek-V3: set
no_non_det_algofor deterministic training (#2673) - Add MoE Sequential MLP mappings in HF Bridges (#2589)
Bug Fixes
- [training] Cap
lr_warmup_stepsto be strictly less thanlr_decay_steps(#2858) - [training] Fix
DistillationProvider.to_cfg_dictto save missing keys in run_config (#2594) - [training] Fix
StopIterationerror in MBridge (#2762) - [checkpoint] Fix local checkpoint integration (#2709)
- [checkpoint] Log warning when HuggingFace Hub download fails silently (#2493)
- [checkpoint] Low-memory save: use
AutoBridgedirectly indistill_llama32_3b-1bto load HF weights (#2860) - [inference] Use
config.hidden_sizedirectly for Qwen3VL inference wrapper (#2855) - [misc] Improve
compare.pyrobustness for multi-GPU and vocab-padded models (#2647) - [misc] Fix BOS token mismatch in
compare_text_generation(#2889) - [misc] Guard
eod_idaccess incompare_text_generationfor HF tokenizers (#2853) - [misc] Guard missing kubernetes deps (#2871)
- [example] Fix example scripts and recipe names in release branch (#2862, #2863)
Documentation
- Add ModelOpt pruning docs (#2629)
NVIDIA Megatron-Bridge 0.3.0
Highlights
- Model Collection Support
- Performance
- NVFP4 support for LLama3 models.
- HybridEP support for NVL8 systems (PR#494)
- MLA performance improvement with cudnn layernorm and cudnn 9.18
- LN+MXFP8 quantization fusion with TE.sequence and cudnn backend
- Supports FSDP for MoE models with MXFP8 (PR#2135, PR#2239)
- Support Muon Optimizer (PR#683)
- NVFP4 Llama Playbook (PR#1409)
- Training & Functionality
- LoRA Bridge (initial): RL LoRA support for VeRL / nemo-rl (PR#1766)
- Multi-token prediction (MTP): Qwen3 dense examples (PR#2138)
- Decentralized parallel group (M4) end to end support and examples (PR#2011, examples)
- Context Parallelism (CP) with sequence packing in LLMs (PR#1867)
- Context Parallelism (CP) with sequence packing in VLMs (PR#1997)
- Callbacks integration (PR#2063)
- Low memory save for model importing from HF (fix Deepseek V3 and Kimi-K2 import) (PR#1949)
- Community Contributions
- @HollowMan6: MoE router weight adapter wrapper (PR#1834), temporary disable adapter support (PR#1811), flexible LoRA target_modules (PR#1799), separate layernorm mappings (PR#1808), shared_experts MoE fix (PR#1800), LoRA split QKV with GQA fix (PR#1818), Moonlight/Kimi rotary_emb export fix (PR#1838), configurable use_arbitrary_attention_mask (PR#1807)
- @Hayak3: Fix Qwen3-VL unsupported normalization arg (PR#1970)
- @shaltielshmid: Disable FP8 during CPU initialization for export (PR#1815)
- @therealnaveenkamal: MLFlow integration (PR#2112)
- @kannankumar: Fill-in-the-Middle (FIM) dataset support (PR#2066)
- A big thank you to our community contributors for their valuable support!
Changelog Details
- concise naming | weak scaling | save cfg to file by @malay-nagda :: PR: #1246
- cg_scope valid list and default none by @malay-nagda :: PR: #1264
- chore: Merge fp8 args by @ko3n1g :: PR: #1279
- cg and nan grad norm fix by @malay-nagda :: PR: #1309
- feat: Support PEFT weight mapping and merge LoRA adapters when export to hf by @HollowMan6 :: PR: #1310
- Add Nemotron nano v2 vl by @cuichenx :: PR: #1136
- Replay "Ko3n1g/ci/cleanup recipe evaluator (#1349)" by @ko3n1g :: PR: #1377
- Gemma3 VL LoRA Recipe + Documentations by @suiyoubi :: PR: #1388
- Add GLM4.5 FT Recipe by @suiyoubi :: PR: #1382
- Adding FLA as dependency for Qwen3-Next by @adityavavreNVDA :: PR: #1359
- fix: default to
ncclcomm overlap bootstrap backend by @ananthsub :: PR: #1395 - Add Qwen2/2.5 FT recipes by @ananthsub :: PR: #1385
- [PEFT/LoRA] fix: using ETP instead of TP for expert layers by @HollowMan6 :: PR: #1380
- Llama3 PEFT- 8B, 70B by @malay-nagda :: PR: #1381
- Add option for LoRA with Transformer Engine op fuser by @michal2409 :: PR: #1324
- [OMNIML-2937] Support Megatron Bridge quantized checkpoint export to HF unified checkpoint by @yueshen2016 :: PR: #1302
- HybridEP support by @erhoo82 :: PR: #1367
- expose option to dump config to file during end to end tests by @ananthsub :: PR: #1400
- [OMNIML-2935] PTQ support of MOE model (Qwen-3) on Megatron-Bridge by @yueshen2016 :: PR: #1405
- Revert "feat: Dependabot automerge if successful (#1051)" by @pablo-garay :: PR: #1428
- Update perf docs by @gautham-kollu :: PR: #1426
- Add Qwen3VL support (dense and moe) by @yashaswikarnati :: PR: #1174
- Fix llama3-8b NVFP4 recipe by @adityavavreNVDA :: PR: #1347
- fix GPT-OSS perf scripts by @erhoo82 :: PR: #1438
- Add functional test for finetuning with sequence packing by @ananthsub :: PR: #861
- feat: Pass custom srun args into Run by @ko3n1g :: PR: #1440
- Fix typo in dataclass from
callable=>typing.Callableinnemotron_h_provider.pyby @shaltielshmid :: PR: #1442 - pass the support of deepep for B200 and B300 GPUs by @erhoo82 :: PR: #1436
- cuda graph fine grained scope | hybridEP | a2a overlap by @malay-nagda :: PR: #1348
- nvfp4 for dense models by @sanandaraj5597 :: PR: #1453
- Added Qwen 3 next perf scripts by @sanandaraj5597 :: PR: #1451
- reset gradient_accumulation_fusion with megatron fsdp by @ananthsub :: PR: #1386
- guard trust_remote_code by @dimapihtar :: PR: #1291
- fix lint checks on main by @ananthsub :: PR: #1463
- DSv3- gb200 base cfg fix | b200 no a2a overlap by @malay-nagda :: PR: #1476
- sequence_length -> seq_length by @dimapihtar :: PR: #1023
- feat: Add whitelist support for mismatched params in load_hf_weights by @yaoyu-33 :: PR: #1447
- [docs] Update readme with supported models/recipes by @ananthsub :: PR: #1455
- Add Gemma2 recipes by @ananthsub :: PR: #1383
- [docs] Add release section for changelog and software component versions by @ananthsub :: PR: #1490
- [docs] Add 0.2.0 version picker by @ananthsub :: PR: #1488
- Reduced precision (BF16, FP8, MXFP8, NVFP4) training tutorial using Megatron-Bridge by @sergiopperez :: PR: #1409
- Update conversion compare script and add accelerate dependency by @yaoyu-33 :: PR: #1344
- [main] Fix functional conftest to handle optional
nvdlfw-inspectdependency by @ananthsub :: PR: #1496 - [docs] Update supported model docs by @ananthsub :: PR: #1503
- fix: Escape user inputs in data tutorials by @ananthsub :: PR: #1465
- Bridge instantiate_utils: drop unexpected config keys with warning by @yaoyu-33 :: PR: #1203
- Make container image point to last known release container by @gautham-kollu :: PR: #1443
- Revamp recipe tutorials by @ananthsub :: PR: #1308
- [docs] 25.11 release notes by @ananthsub :: PR: #1504
- Add generic scripts for training by @ananthsub :: PR: #1390
- Nemotron nano v2 finetune by @cuichenx :: PR: #1391
- Replay: M4 Remove parallel state usage in train loops, train steps and utils #1175 + Bug fix by @yaoyu-33 :: PR: #1445
- track dtype in scatter to tp ranks by @ananthsub :: PR: #1509
- Update performance scripts to align with llmb requirements by @scsudhakaran :: PR: #1416
- fix qwen3_vl by changing sequence_length to seq_length by @shifangx :: PR: #1511
- Update GPT-OSS pretrain config parameters by @cuichenx :: PR: #1375
- feat: mcore trigger mbridge by @pablo-garay :: PR: #1441
- fix: cleanup by @pablo-garay :: PR: #1540
- Revert strong-scaling support for DeepSeek-V3 by @scsudhakaran :: PR: #1548
- Add fallback for shared embedding flag by @yaoyu-33 :: PR: #1521
- Wan Bridge (checkpoints conversion) by @huvunvidia :: PR: #1550
- feat: defer flop calculation to model_provider "get_num_floating_point_operations" if provided by @yaoyu-33 :: PR: #1446
- refactor: Unify launchers by @ko3n1g :: PR: #1519
- bug fixes- unify launchers by @malay-nagda :: PR: #1573
- ci: Bump MCore and ModelOpt by @chtruong814 :: PR: #1551
- docs: Update documentation.md to include install submodules command by @chenopis :: PR: #1576
- fix: Fix load failure when
load_megatron_modelfrom a model trained with uneven pp by @yaoyu-33 :: PR: #1579 - Added 25.11 starter pack by @sanandaraj5597 :: PR: #1596
- fix: Wandb mocking by @ko3n1g :: PR: #1587
- fix: Use model seq length as default if no CLI is provided by @ko3n1g :: PR: #1600
- scripts: Update help string of args.detach by @ko3n1g :: PR: #1589
- ci: Add DGXC executor by @ko3n1g :: PR: #1584
- fix: Fix model parallel initialization ordering by @yaoyu-33 :: PR: #1574
- fix: Missing return of parse_additional_slurm_params by @ko3n1g :: PR: #1619
- Add fix for users who want to provide a path on disk to a custom HF tokenizer by @jstjohn :: PR: #1594
- fix: wandb exp name in recipe path by @ko3n1g :: PR: #1623
- Rename TensorRT Model Optimizer to Model Optimizer by @AAnoosheh :: PR: #1484
- Cleanup partial CG objects by @gautham-kollu :: PR: #1615
- [Canonical LoRA] fix: use correct q_out_features for
linear_qby @HollowMan6 :: PR: #1627 - [Canonical LoRA] fix: forward under expert layers by @HollowMan6 :: PR: #1628
- qwen3 235b config update by @malay-nagda :: PR: #1613
- chore: Update codeowners of performance scripts by @ko3n1g :: PR: #1641
- Re-use higher-level config override util in tutorials by @ananthsub :: PR: #1524
- docs: add wayfinder readme.md files for each docs directory by @chenopis :: PR: #1617
- ci: Fix DGXC env vars by @ko3n1g :: PR: #1629
- Support strong scaling ...
NVIDIA Megatron-Bridge 0.2.2
- This release addresses known security issues. For the latest NVIDIA Vulnerability Disclosure Information visit https://www.nvidia.com/en-us/security/, for acknowledgement please reach out to the NVIDIA PSIRT team at PSIRT@nvidia.com
NVIDIA Megatron-Bridge 0.2.1
- Performance
- Activation offloading to host memory support with pipelining
- Supports the high activation memory needs of MoE models training with dynamic shapes
- Fixed Nemotron FLOPS calculation model
- Activation offloading to host memory support with pipelining
- Model Collection Support
- Ministral 3
- Enhanced LoRA support
- LoRA support for Mamba layers (for Nemotron Nano V2 and NemotronH finetuning)
NVIDIA Megatron-Bridge 0.2.0
-
- LLM
- HuggingFace Conversion + training recipes:
- GPT-OSS
- Qwen3 Next
- Nemotron-H
- Nemotron Nano v2
- Moonlight
- OlMoE
- GLM 4.5
- Gemma 3
- HuggingFace conversion support:
- Llama Nemotron
- Mistral
- Gemma
- Gemma 2
- HuggingFace Conversion + training recipes:
- VLM
- Nemotron Nano v2 VL
- Qwen 3 VL
- Qwen2.5 VL
- Gemma3 VL
- LLM
-
- Megatron-Bridge support for new benchmarks
- Benchmarks (same workloads as GB200 system) for GB300 system
- GPT-OSS 120B
- Qwen3-Next 80B_A3B
- Support for linear attention on Blackwell - Gated Delta Networks
- Pre-training with NVFP4 precision: Llama3 8B, Lama3 70B, Llama3.1 405B
- Megatron-Bridge support for benchmarks previously existing only for NeMo 2.0
- Nemotron-H 56B
- Fine-tuning (SFT and LoRA): Llama3 8B and Llama3 70B
- HybridEP: DeepSeek V3 benchmarks on GB200 and GB300 systems now use HybridEP
- CUDA Graphs
- Full-model iteration CUDA graph used for dense models- Llama3 8B, Llama3 70B, Llama3.1 405B
- Fine-grained Transformer component specific CUDA Graphs used for MoE models
- Megatron-Bridge support for new benchmarks
-
NVIDIA Model Optimization Integration
- Knowledge Distillation
- Post training quantization export
- Quantization aware training
-
- Support for expert layers
- Supported merging adapters for export to HuggingFace @HollowMan6
-
Finetuning dataset improvements: OpenAI messages format conversion, chat template support
-
Integration with Tensor NVIDIA-DLFW-Inspect for tensor statistic collection & monitoring
-
Broader Community Adoption: Integrate the Megatron-Bridge into the training pipelines of VeRL (PR), Slime (PR), and Sky-RL (PR).
-
Special thanks to the community contributors for this release: @HollowMan6, @fzyzcjy, @erictang000, @hawkoli1987.