Releases · NVIDIA-NeMo/Megatron-Bridge

28 May 21:18

nemo-automation-bot

v0.4.2

c810129

NVIDIA Megatron-Bridge 0.4.2 Latest

Latest

Highlights

Expanded performance configs for DeepSeek V3, Qwen, GPT-OSS, and WAN
Supported fp4_param_gather mixed precision config
Enhanced security in dataset checkpoint deserialization and url loading. Safer trust_remote_code handling.

Performance

NVFP4 with 4-bit parameter AllGather in DP communications (PR#3364, PR#4005)
DSV3 B300 recipe tuning (PR#3549)
DSV3 B200 recipe tuning (PR#3368)
Qwen3 235B A22B B300 recipe tuning (PR#3490)
NT3 super B300 recipe tuning (PR#3579)
GPT-OSS B200 regression fix (PR#3614)

Software Component

Upgraded NVIDIA Resiliency Extension (NVRX) to v0.6.0

Known issues

There is a known issue with Evaluator when installing nvidia-vlmeval inside /opt/NeMo-FW. Please use the /opt/Megatron-Bridge directory to install the package:

cd /opt/Megatron-Bridge
uv pip install nvidia-vlmeval

Changelog Details

beep boop 🤖: Bumping megatron.bridge to v0.4.1 by @nemo-automation-bot[bot] :: PR: #3363
cp: [perf] fix: guard cuda_graph_scope validation against None (3249) into r0.4.0 by @svcnvidia-nemo-ci :: PR: #3262
cp: fix(perf): set NCCL env vars when nccl_ub enabled via recipe config (3283) into r0.4.0 by @yaoyu-33 :: PR: #3305
cp: Enable nemo-ci tests (short runs - perf and non-perf) for Wan + Updating recipes names (3179) into r0.4.0 by @svcnvidia-nemo-ci :: PR: #3324
cp: Perf script utility to lock gpu frequency. (2977) into r0.4.0 by @svcnvidia-nemo-ci :: PR: #3326
cp: fix(gemma3-vl): force right-padding in VLM collate to prevent token loss (3331) into r0.4.0 by @svcnvidia-nemo-ci :: PR: #3332
cp: fix(perf): read baseline values from golden values when using new format (3334) into r0.4.0 by @svcnvidia-nemo-ci :: PR: #3338
[docs] chore: bump versions1.json to 0.4.0 (latest) by @ko3n1g :: PR: #3376
b200 DSv3 better cfg (#3368), mxfp8 to fp8_cs for h100 gpt-oss #3378 by @malay-nagda :: PR: #3420
2604 perf summary (#3377) by @malay-nagda :: PR: #3405
cp: docs(releases): add 26.04 software component versions (3421) into r0.4.0 by @svcnvidia-nemo-ci :: PR: #3430
cp: b200 DSv3 better cfg (3368) into r0.4.0 by @svcnvidia-nemo-ci :: PR: #3401
cp: [training] fix: report memory on 2nd iteration to better reflect actual peak (3169) into r0.4.0 by @dingqingy-nv :: PR: #3367
cp: Update Qwen3-VL pretrain perf configs for 30B and 235B (3327) into r0.4.0 by @svcnvidia-nemo-ci :: PR: #3342
cp: docs: Add container version to docs version picker (3434) into r0.4.0 by @svcnvidia-nemo-ci :: PR: #3435
cp: [docs] Add Megatron Bridge 0.4.0 release notes (#3419) by @chtruong814 :: PR: #3439
cp: fix(test): clone mmap-backed tensors before overwriting safetensors file (#3335) by @yaoyu-33 :: PR: #3441
cp: [test] refactor: move diffusion tests to test_groups directory (3275) into r0.4.0 by @svcnvidia-nemo-ci :: PR: #3442
remove archival data from main page by @malay-nagda :: PR: #3448
cp: fix: set 644 permissions on COPY'd files to match cloned repos (3431) into r0.4.0 by @svcnvidia-nemo-ci :: PR: #3450
cp: [perf] fix: use direct assignment for NCCL env vars when nccl_ub enabled (3350) into r0.4.0 by @svcnvidia-nemo-ci :: PR: #3453
cp: [training] feat: enable fp4_param_gather in MixedPrecisionConfig (3364) into r0.4.0 by @svcnvidia-nemo-ci :: PR: #3454
cp: fix(docker): replace rdma-core source build with system package install (3429) into r0.4.0 by @svcnvidia-nemo-ci :: PR: #3457
cp: [training] fix: record CUDA memory history before snapshot so dumps are non-empty (#3487) into r0.4.0 by @dingqingy-nv :: PR: #3508
cp: [vulnops][misc] fix: Add allowlist validation for _target_ instantiation (3142) into r0.4.0 by @svcnvidia-nemo-ci :: PR: #3540
cp: [vulnops][data] fix: Replace unsafe pickle.loads with restricted unpickler in Qwen VL pipeline (3139) into r0.4.0 by @svcnvidia-nemo-ci :: PR: #3541
cp: [vulnops][ckpt] fix: Use weights_only=True in ModelOpt checkpoint loading (3138) into r0.4.0 by @svcnvidia-nemo-ci :: PR: #3542
cp: [vulnops][ckpt] fix: Use weights_only=True in TrainState checkpoint loading (3506) into r0.4.0 by @svcnvidia-nemo-ci :: PR: #3557
cp: [vulnops][data] fix: Replace unsafe pickle.load with restricted unpickler for index metadata (3140) into r0.4.0 by @svcnvidia-nemo-ci :: PR: #3558
cp: [vulnops] fix: _contains_code_references allowlist bypass leads to RCE (3379) into r0.4.0 by @svcnvidia-nemo-ci :: PR: #3559
fix: Add security warning for trust_remote_code and remove hardcoded True by @chtruong814 :: PR: #3539
cp: Cleanup TE cuda graphs with the right api (3459) into r0.4.0 by @svcnvidia-nemo-ci :: PR: #3476
cp: Update DeepSeek-V3 configs for B300 (3549) into r0.4.0 by @svcnvidia-nemo-ci :: PR: #3565
cp: log repo status manual (3570) into r0.4.0 by @svcnvidia-nemo-ci :: PR: #3572
cp: ci: post merge comment with SHA after successful CI run (3567) into r0.4.0 by @svcnvidia-nemo-ci :: PR: #3573
cp: [perf] update: switch GPT-OSS GB200 V2 dispatcher default to alltoall (3561) into r0.4.0 by @svcnvidia-nemo-ci :: PR: #3577
cp: no fp4 param gather (3578) into r0.4.0 by @svcnvidia-nemo-ci :: PR: #3580
cp: fix(evaluate): skip non-dict golden value entries such as job_id (3581) into r0.4.0 by @svcnvidia-nemo-ci :: PR: #3582
cp: [vulnops][data] fix: Validate URLs in VLM video loader to prevent SSRF (3482) into r0.4.0 by @svcnvidia-nemo-ci :: PR: #3588
fix(docker): suppress lightning from uv resolution in fw_pyproject by @ko3n1g :: PR: #3602
cp: [vulnops][data] fix: Remove unnecessary allow_pickle=True and add security warnings (3141) into r0.4.0 by @svcnvidia-nemo-ci :: PR: #3615
cp: [vulnops][data] fix: Replace allow_pickle=True with restricted unpickler in packed dataset loading (3616) into r0.4.0 by @svcnvidia-nemo-ci :: PR: #3629
cp: add VP for LoRA Lm3 70B (3547) into r0.4.0 by @svcnvidia-nemo-ci :: PR: #3596
cp: num_layers_fix- qwen vl 235b_a22b on B200 (3589) into r0.4.0 by @svcnvidia-nemo-ci :: PR: #3603
cp: fix(docker): resolve lightning not found on PyPI by providing local stub (3604) into r0.4.0 by @svcnvidia-nemo-ci :: PR: #3606
cp: 70b_lora_gb200_bf16_fix (3623) into r0.4.0 by @svcnvidia-nemo-ci :: PR: #3627
cp: [vulnops] fix: Add SSRF protection to image-loading utilities (3630) into r0.4.0 by @svcnvidia-nemo-ci :: PR: #3632
chore(beep boop 🤖): Bump uv.lock (r0.4.0, mcore-core_r0.17.0) (2026-04-30) by @svcnvidia-nemo-ci :: PR: #3591
cp: [vulnops] fix: Add SSRF protection to audio URL loading (3633) into r0.4.0 by @svcnvidia-nemo-ci :: PR: #3636
cp: fix(perf): keep PCT binding for deepseek_v3 large_scale on b300 (3656) into r0.4.0 by @svcnvidia-nemo-ci :: PR: #3657
fix: apply vllm PR 36192 patch and bump pillow to 12.20 by @ko3n1g :: PR: #3671
cp: Add previously removed NemotronHBridge SequentialMLP mappings (3628) into r0.4.0 by @svcnvidia-nemo-ci :: PR: #3701
Use HybridEP flex dispatcher for Qwen3 235B B300 perf configs (#3490) by @rhmukundan :: PR: #3675
[build] chore: bump package version to 0.4.2 by @ko3n1g :: PR: #3721
[model, ckpt, docs] fix: support HF→Megatron conversion under decentralized PGs (r0.4.0) by @cuichenx :: PR: #3674
cp: Fix Gemma3 example folder (3724) into r0.4.0 by @svcnvidia-nemo-ci :: PR: #3728
cp: Reorganize ModelOpt docs (3715) into r0.4.0 by @svcnvidia-nemo-ci :: PR: #3751
[model, ckpt] fix: align GPT-OSS BF16 down_proj orientation on import (r0.4.0) by @cuichenx :: PR: #3753
perf(qwen3-next): set expandable_segments on GB300 BF16/FP8_MX to fix OOM by @ko3n1g :: PR: #3767
cp: llama31 405b gb200 nvfp4 no pg overlap (3713) into r0.4.0 by @svcnvidia-nemo-ci :: PR: #3773
cp: [perf] update: switch GPT-OSS B200 V2 dispatcher default to alltoall (3614) into r0.4.0 by @svcnvidia-nemo-ci :: PR: #3682
nt3 super nvfp4; lm3.1 405B nvfp4; lm3 70B mxfp8- expandable_segments by @malay-nagda :: PR: #3780
cp: [config] Update micro_batch_size to 2 for gemma3 recipe (3815) into r0.4.0 by @svcnvidia-nemo-ci :: PR: #3828
chore: Bump TE to latest 2.14 and MCore to latest 0.17.0 by @chtruong814 :: PR: #3806
qwen3 next env var fix by @malay-nagda :: PR: #3845
chore: Bump and remove packages to address CVEs (#3841) by @chtruong814 :: PR: #3855
Bump MCore to 2edffa by @chtruong814 :: PR: #3857
cp: chore: Bump deps to address CVEs (3919) into r0.4.0 by @svcnvidia-nemo-ci :: PR: #3925
cp: 2604_patch_perf_summary (3818) into r0.4.0 by @svcnvidia-nemo-ci :: PR: #3861
cp: 26.04.01_perf_summary (3997) into r0.4.0 by @svcnvidia-nemo-ci :: PR: #3998
cp: docs: note 26.04 drops PyAV by default and document runtime install (4020) into r0.4.0 by @svcnvidia-nemo-ci :: PR: #4021
cp: [perf] fix: guard cuda_graph_scope validation against None (3249) into r0.4.0 (#3262) by @svcnvidia-nemo-ci
cp: fix(perf): set NCCL env vars when nccl_ub enabled via recipe config (3283) into r0.4.0 (#3305) by @yaoyu-33
cp: Enable nemo-ci tests (short runs - perf and non-perf) for Wan + Updating recipes names (3179) into r0.4.0 (#3324) by @svcnvidia-nemo-ci
cp: Perf script utility to lock gpu frequency. (2977) into r0.4.0 (#3326) by @svcnvidia-nemo-ci
cp: `fix(gemma3-vl): force right-padding in VLM collate t...

Contributors

ko3n1g, cuichenx, and 6 other contributors

Assets 2

07 May 07:08

mmarcinkiewicz

26.04-alpha.rc2

fd5c473

26.04-alpha.rc2

[MXFP8 param gather]Update param buffer before copy to model weights …

Assets 2

06 May 21:49

nemo-automation-bot

v0.4.1

f9b6319

NVIDIA Megatron-Bridge 0.4.1

This release addresses known security issues. For the latest NVIDIA Vulnerability Disclosure Information visit https://www.nvidia.com/en-us/security/, for acknowledgement please reach out to the NVIDIA PSIRT team at PSIRT@nvidia.com

Assets 2

23 Apr 09:32

mmarcinkiewicz

26.04-alpha.rc1

68d8bcc

26.04-alpha.rc1

Merge branch 'PR2411' into 26.04-alpha

Assets 2

16 Apr 22:46

svcnvidia-nemo-ci

v0.4.0

0fbfe7d

NVIDIA Megatron-Bridge 0.4.0

Highlights

Model Collection Support

MiniMax M2 / M2.5 support (PR#2602)
Kimi 2.5 support, including GB300 MXFP8 recipe and HF config updates (PR#2743)
Nemotron 3 Super model support (PR#2912)
Sarvam support (PR#1814)
Qwen 3.5 VL Bridge with recipes and LoRA bridge / merge support (PR#2530, PR#2654, PR#2736)
Qwen 2.5 Omni support (PR#2634)
Qwen2-Audio support (PR#2324)
Xiaomi MiMo dense MTP model bridge support (PR#2387, by HollowMan6)

Diffusion Collection

Diffusion model support for DFM-to-Bridge migration (PR#2534, PR#2645)
FLUX and WAN diffusion submodule improvements (PR#2822, PR#2849)

Training & Functionality

Parquet support for sequence-packing preprocessing, improving handling of larger datasets (PR#2395)
Energon integration for sequence packing with WebDataset workflows (PR#2440)
Default packed sequences across finetune recipes (PR#2284)
More modern finetuning datasets, including OpenMathInstruct V2 and GSM8K (PR#2264)
Unified dataset configuration in run_recipe.py (PR#2826)
NCCL flight recorder configuration support (PR#2891)
Comet ML experiment tracking integration (PR#2910)
Refactored SFT and PEFT recipes for VLM workflows (PR#2614)
Added the on_checkpoint_save callback event for training workflows (PR#2905)
Added MoE LoRA rank normalization for expert layers (PR#3006)
Direct export of block-wise FP8 weights and scaling factors (PR#1994)
Accelerated first-fit packing with a segment tree for much faster packing on large datasets (PR#2953)

Model Optimization

Pruning support and documentation (PR#2244)
Post-training quantization support for Nano, Super, and Ultra model families (PR#2303)
Distillation quantization support in NeMo 2 (PR#2591)

Performance

Nemotron 3 Super perf config, including GB200 improvements and BF16 / NVFP4 functional support via module recompute (PR#3208)

Developer Experience & Compatibility

ModelConfig and ModelBuilder refactor integrated into the training loop (PR#2798, PR#2671)
Dev branch support and documentation updates (PR#2497)
Python 3.12 migration announcement (PR#2773)
Transformers 5.0 through 5.3 compatibility (PR#2068, PR#2781)
PEFT Bridge offline mode support (PR#2574)
LoRA merge on CPU (PR#2194)
Self-contained Megatron-to-HF export with auto-config synthesis (PR#2778)
Scripts and documentation for Megatron-LM and Megatron Bridge correlation

Examples & Tutorials

Resiliency examples (PR#2115)
Qwen3 VL sequence packing examples (PR#2380)
Distillation example cleanup (PR#2865, PR#2860)

Community Contributions

@HollowMan6 (Aalto University): Xiaomi MiMo dense MTP bridge support, Qwen 3.5 VL LoRA bridge and merge, and additional export / PEFT fixes (PR#2387, PR#2736, PR#2384, PR#2799)
@shaltielshmid: packed-sequence improvements for large datasets and safer model loading defaults (PR#2395, PR#2766)
@jaeminh: accelerated first-fit packing with a segment tree (PR#2953)
@pavelgein: added the on_checkpoint_save callback event (PR#2905)
@ShiftyBlock (UC Berkeley): added auto-config for self-contained Megatron-to-HF export (PR#2778)
@erictang000 (Anyscale): added LoRA rank normalization for MoE expert layers (PR#3006)
@eternally-z: added direct export support for block-wise FP8 weights and scaling factors (PR#1994)
@Hayak3: fixed the unsupported normalization argument for Qwen3-VL (PR#1970)
@mohit-sarvam (Sarvam AI): added Sarvam MoE support (PR#1814)

A big thank you to our community contributors for their valuable support!

Changelog Details

docs: Update callback code snippets to include all imports needed for example by @ananthsub :: PR: #2283
M4 leftover for QWen3-VL with MCore vision encoder by @shifangx :: PR: #2370
Update Qwen3 235B B300 Configs to match Qwen3 B200 Configs by @rhmukundan :: PR: #2669
[bridge] Fix off-by-one in sliding window size for Gemma2, Gemma3, Mistral, and GPT-OSS by @cuichenx :: PR: #2656
fix: Write intermediate results to tmp by @ko3n1g :: PR: #2726
Perf recipe dataloader num_workers interface fix by @dingqingy-nv :: PR: #2710
Suppress noisy _extra_state warnings during checkpoint loading by @cuichenx :: PR: #2689
[model, recipe] Add Qwen 3.5 recipes by @cuichenx :: PR: #2654
[ci] chore: add nightly dev commit bump workflow by @ko3n1g :: PR: #2729
ci(fix): Unique naming for dev branch by @ko3n1g :: PR: #2747
[ci] Refactor Gemma3-VL launch script to run finetune and packed tests separately by @cuichenx :: PR: #2730
add qwen2_5_omni by @yuekaizhang :: PR: #2634
build: Bump TE 2.13 by @ko3n1g :: PR: #2753
[docs, ci] chore: add governance issue forms and triage guide by @yaoyu-33 :: PR: #2716
[test] fix: temporarily disable qwen2.5 omni unit tests by @yaoyu-33 :: PR: #2759
add nemotron3 super docs by @liding-nv :: PR: #2757
ci: Fix stopiteration for Mbridge by @ko3n1g :: PR: #2760
GPT-OSS Blackwell MXFP8 recipes by @weijiac0619 :: PR: #2633
feat(mimo): phase 2 - model provider, DDP wrapping, process groups by @aroshanghias-nvd :: PR: #2004
[build] feat: add OSS NeMo FW dockerfiles by @thomasdhc :: PR: #2722
Lm3 70B GB200 FP8_CS SFT cfg update by @malay-nagda :: PR: #2748
[docs] chore: use uv run in test file docstring run instructions by @cuichenx :: PR: #2728
build: Bump NVRX by @ko3n1g :: PR: #2775
NVFP4 memory spike fix compared to M-LM by @sanandaraj5597 :: PR: #2764
[doc] feat: Document adapter merge verification in stream_adapter_weights example by @yaoyu-33 :: PR: #2042
[doc] chore: Add needs-review to PR state labels guidance by @yaoyu-33 :: PR: #2758
[ckpt] fix: broaden exception handling in save_artifacts dynamic module loading by @yaoyu-33 :: PR: #2765
[test] fix: use toy configs in qwen2.5 omni unit tests by @yaoyu-33 :: PR: #2761
[model] Refactor Qwen3-VL and Ministral3 fine-tuning scripts by @kamran-nvidia :: PR: #2735
docs - Update user manual with new MoE features and Megatron FSDP by @onel :: PR: #2529
remove encoder_and_decoder usage by @dimapihtar :: PR: #2512
Fix attention_mask mismatch in compare.py by @mohsinm-dev :: PR: #2476
[model, test] fix: guard hybrid layer count across MCore branches by @yaoyu-33 :: PR: #2776
[data] fix: guard eval_interval division to prevent ZeroDivisionError by @yaoyu-33 :: PR: #2732
[sync][training] fix: log loss values of exactly 0.0 in training_log() by @mehraakash :: PR: #2740
[model] feat: support Qwen 3.5 MTP c...

Contributors

onel, sudostock, and 42 other contributors

Assets 2

20 Mar 22:35

svcnvidia-nemo-ci

v0.3.1

9c9dd84

NVIDIA Megatron-Bridge 0.3.1

Changelog Details

Performance & Model Configs

CP SFT performance improvements (#2527)
Nemotron 3 Nano perf config updates (#2560, #2681)
Onboard LLaMA3 70B LoRA to B300 and B200 chips (#2588)
Update Qwen3 235B B300 configs to match B200 configs (#2706, #2720)
Update DeepSeek-V3 B300 config (#2723)
DeepSeek-V3: set no_non_det_algo for deterministic training (#2673)
Add MoE Sequential MLP mappings in HF Bridges (#2589)

Bug Fixes

[training] Cap lr_warmup_steps to be strictly less than lr_decay_steps (#2858)
[training] Fix DistillationProvider.to_cfg_dict to save missing keys in run_config (#2594)
[training] Fix StopIteration error in MBridge (#2762)
[checkpoint] Fix local checkpoint integration (#2709)
[checkpoint] Log warning when HuggingFace Hub download fails silently (#2493)
[checkpoint] Low-memory save: use AutoBridge directly in distill_llama32_3b-1b to load HF weights (#2860)
[inference] Use config.hidden_size directly for Qwen3VL inference wrapper (#2855)
[misc] Improve compare.py robustness for multi-GPU and vocab-padded models (#2647)
[misc] Fix BOS token mismatch in compare_text_generation (#2889)
[misc] Guard eod_id access in compare_text_generation for HF tokenizers (#2853)
[misc] Guard missing kubernetes deps (#2871)
[example] Fix example scripts and recipe names in release branch (#2862, #2863)

Documentation

Add ModelOpt pruning docs (#2629)

Assets 2

26 Feb 03:51

svcnvidia-nemo-ci

v0.3.0

21b02e0

NVIDIA Megatron-Bridge 0.3.0

Highlights

Model Collection Support
- Nano v3 (PR#1858)
- GLM 4.5v (PR#1798)
- Ministral 3 (PR#1580)
Performance
- NVFP4 support for LLama3 models.
- HybridEP support for NVL8 systems (PR#494)
- MLA performance improvement with cudnn layernorm and cudnn 9.18
- LN+MXFP8 quantization fusion with TE.sequence and cudnn backend
- Supports FSDP for MoE models with MXFP8 (PR#2135, PR#2239)
- Support Muon Optimizer (PR#683)
- NVFP4 Llama Playbook (PR#1409)
Training & Functionality
- LoRA Bridge (initial): RL LoRA support for VeRL / nemo-rl (PR#1766)
- Multi-token prediction (MTP): Qwen3 dense examples (PR#2138)
- Decentralized parallel group (M4) end to end support and examples (PR#2011, examples)
- Context Parallelism (CP) with sequence packing in LLMs (PR#1867)
- Context Parallelism (CP) with sequence packing in VLMs (PR#1997)
- Callbacks integration (PR#2063)
- Low memory save for model importing from HF (fix Deepseek V3 and Kimi-K2 import) (PR#1949)
Community Contributions
- @HollowMan6: MoE router weight adapter wrapper (PR#1834), temporary disable adapter support (PR#1811), flexible LoRA target_modules (PR#1799), separate layernorm mappings (PR#1808), shared_experts MoE fix (PR#1800), LoRA split QKV with GQA fix (PR#1818), Moonlight/Kimi rotary_emb export fix (PR#1838), configurable use_arbitrary_attention_mask (PR#1807)
- @Hayak3: Fix Qwen3-VL unsupported normalization arg (PR#1970)
- @shaltielshmid: Disable FP8 during CPU initialization for export (PR#1815)
- @therealnaveenkamal: MLFlow integration (PR#2112)
- @kannankumar: Fill-in-the-Middle (FIM) dataset support (PR#2066)
- A big thank you to our community contributors for their valuable support!

Changelog Details

concise naming | weak scaling | save cfg to file by @malay-nagda :: PR: #1246
cg_scope valid list and default none by @malay-nagda :: PR: #1264
chore: Merge fp8 args by @ko3n1g :: PR: #1279
cg and nan grad norm fix by @malay-nagda :: PR: #1309
feat: Support PEFT weight mapping and merge LoRA adapters when export to hf by @HollowMan6 :: PR: #1310
Add Nemotron nano v2 vl by @cuichenx :: PR: #1136
Replay "Ko3n1g/ci/cleanup recipe evaluator (#1349)" by @ko3n1g :: PR: #1377
Gemma3 VL LoRA Recipe + Documentations by @suiyoubi :: PR: #1388
Add GLM4.5 FT Recipe by @suiyoubi :: PR: #1382
Adding FLA as dependency for Qwen3-Next by @adityavavreNVDA :: PR: #1359
fix: default to nccl comm overlap bootstrap backend by @ananthsub :: PR: #1395
Add Qwen2/2.5 FT recipes by @ananthsub :: PR: #1385
[PEFT/LoRA] fix: using ETP instead of TP for expert layers by @HollowMan6 :: PR: #1380
Llama3 PEFT- 8B, 70B by @malay-nagda :: PR: #1381
Add option for LoRA with Transformer Engine op fuser by @michal2409 :: PR: #1324
[OMNIML-2937] Support Megatron Bridge quantized checkpoint export to HF unified checkpoint by @yueshen2016 :: PR: #1302
HybridEP support by @erhoo82 :: PR: #1367
expose option to dump config to file during end to end tests by @ananthsub :: PR: #1400
[OMNIML-2935] PTQ support of MOE model (Qwen-3) on Megatron-Bridge by @yueshen2016 :: PR: #1405
Revert "feat: Dependabot automerge if successful (#1051)" by @pablo-garay :: PR: #1428
Update perf docs by @gautham-kollu :: PR: #1426
Add Qwen3VL support (dense and moe) by @yashaswikarnati :: PR: #1174
Fix llama3-8b NVFP4 recipe by @adityavavreNVDA :: PR: #1347
fix GPT-OSS perf scripts by @erhoo82 :: PR: #1438
Add functional test for finetuning with sequence packing by @ananthsub :: PR: #861
feat: Pass custom srun args into Run by @ko3n1g :: PR: #1440
Fix typo in dataclass from callable => typing.Callable in nemotron_h_provider.py by @shaltielshmid :: PR: #1442
pass the support of deepep for B200 and B300 GPUs by @erhoo82 :: PR: #1436
cuda graph fine grained scope | hybridEP | a2a overlap by @malay-nagda :: PR: #1348
nvfp4 for dense models by @sanandaraj5597 :: PR: #1453
Added Qwen 3 next perf scripts by @sanandaraj5597 :: PR: #1451
reset gradient_accumulation_fusion with megatron fsdp by @ananthsub :: PR: #1386
guard trust_remote_code by @dimapihtar :: PR: #1291
fix lint checks on main by @ananthsub :: PR: #1463
DSv3- gb200 base cfg fix | b200 no a2a overlap by @malay-nagda :: PR: #1476
sequence_length -> seq_length by @dimapihtar :: PR: #1023
feat: Add whitelist support for mismatched params in load_hf_weights by @yaoyu-33 :: PR: #1447
[docs] Update readme with supported models/recipes by @ananthsub :: PR: #1455
Add Gemma2 recipes by @ananthsub :: PR: #1383
[docs] Add release section for changelog and software component versions by @ananthsub :: PR: #1490
[docs] Add 0.2.0 version picker by @ananthsub :: PR: #1488
Reduced precision (BF16, FP8, MXFP8, NVFP4) training tutorial using Megatron-Bridge by @sergiopperez :: PR: #1409
Update conversion compare script and add accelerate dependency by @yaoyu-33 :: PR: #1344
[main] Fix functional conftest to handle optional nvdlfw-inspect dependency by @ananthsub :: PR: #1496
[docs] Update supported model docs by @ananthsub :: PR: #1503
fix: Escape user inputs in data tutorials by @ananthsub :: PR: #1465
Bridge instantiate_utils: drop unexpected config keys with warning by @yaoyu-33 :: PR: #1203
Make container image point to last known release container by @gautham-kollu :: PR: #1443
Revamp recipe tutorials by @ananthsub :: PR: #1308
[docs] 25.11 release notes by @ananthsub :: PR: #1504
Add generic scripts for training by @ananthsub :: PR: #1390
Nemotron nano v2 finetune by @cuichenx :: PR: #1391
Replay: M4 Remove parallel state usage in train loops, train steps and utils #1175 + Bug fix by @yaoyu-33 :: PR: #1445
track dtype in scatter to tp ranks by @ananthsub :: PR: #1509
Update performance scripts to align with llmb requirements by @scsudhakaran :: PR: #1416
fix qwen3_vl by changing sequence_length to seq_length by @shifangx :: PR: #1511
Update GPT-OSS pretrain config parameters by @cuichenx :: PR: #1375
feat: mcore trigger mbridge by @pablo-garay :: PR: #1441
fix: cleanup by @pablo-garay :: PR: #1540
Revert strong-scaling support for DeepSeek-V3 by @scsudhakaran :: PR: #1548
Add fallback for shared embedding flag by @yaoyu-33 :: PR: #1521
Wan Bridge (checkpoints conversion) by @huvunvidia :: PR: #1550
feat: defer flop calculation to model_provider "get_num_floating_point_operations" if provided by @yaoyu-33 :: PR: #1446
refactor: Unify launchers by @ko3n1g :: PR: #1519
bug fixes- unify launchers by @malay-nagda :: PR: #1573
ci: Bump MCore and ModelOpt by @chtruong814 :: PR: #1551
docs: Update documentation.md to include install submodules command by @chenopis :: PR: #1576
fix: Fix load failure when load_megatron_model from a model trained with uneven pp by @yaoyu-33 :: PR: #1579
Added 25.11 starter pack by @sanandaraj5597 :: PR: #1596
fix: Wandb mocking by @ko3n1g :: PR: #1587
fix: Use model seq length as default if no CLI is provided by @ko3n1g :: PR: #1600
scripts: Update help string of args.detach by @ko3n1g :: PR: #1589
ci: Add DGXC executor by @ko3n1g :: PR: #1584
fix: Fix model parallel initialization ordering by @yaoyu-33 :: PR: #1574
fix: Missing return of parse_additional_slurm_params by @ko3n1g :: PR: #1619
Add fix for users who want to provide a path on disk to a custom HF tokenizer by @jstjohn :: PR: #1594
fix: wandb exp name in recipe path by @ko3n1g :: PR: #1623
Rename TensorRT Model Optimizer to Model Optimizer by @AAnoosheh :: PR: #1484
Cleanup partial CG objects by @gautham-kollu :: PR: #1615
[Canonical LoRA] fix: use correct q_out_features for linear_q by @HollowMan6 :: PR: #1627
[Canonical LoRA] fix: forward under expert layers by @HollowMan6 :: PR: #1628
qwen3 235b config update by @malay-nagda :: PR: #1613
chore: Update codeowners of performance scripts by @ko3n1g :: PR: #1641
Re-use higher-level config override util in tutorials by @ananthsub :: PR: #1524
docs: add wayfinder readme.md files for each docs directory by @chenopis :: PR: #1617
ci: Fix DGXC env vars by @ko3n1g :: PR: #1629
Support strong scaling ...

Contributors

jstjohn, yfw, and 46 other contributors

Assets 2

09 Jan 18:14

chtruong814

v0.2.2

0465189

NVIDIA Megatron-Bridge 0.2.2

This release addresses known security issues. For the latest NVIDIA Vulnerability Disclosure Information visit https://www.nvidia.com/en-us/security/, for acknowledgement please reach out to the NVIDIA PSIRT team at PSIRT@nvidia.com

Assets 2

18 Dec 00:04

ko3n1g

v0.2.1

1c43b39

NVIDIA Megatron-Bridge 0.2.1

Performance
- Activation offloading to host memory support with pipelining
  - Supports the high activation memory needs of MoE models training with dynamic shapes
  - Fixed Nemotron FLOPS calculation model
Model Collection Support
- Ministral 3
Enhanced LoRA support
- LoRA support for Mamba layers (for Nemotron Nano V2 and NemotronH finetuning)

Assets 2

04 Dec 23:56

ko3n1g

v0.2.0

7af9601

NVIDIA Megatron-Bridge 0.2.0

Model Collection Support
- LLM
  - HuggingFace Conversion + training recipes:
    - GPT-OSS
    - Qwen3 Next
    - Nemotron-H
    - Nemotron Nano v2
    - Moonlight
    - OlMoE
    - GLM 4.5
    - Gemma 3
  - HuggingFace conversion support:
    - Llama Nemotron
    - Mistral
    - Gemma
    - Gemma 2
- VLM
  - Nemotron Nano v2 VL
  - Qwen 3 VL
  - Qwen2.5 VL
  - Gemma3 VL
Performance
- Megatron-Bridge support for new benchmarks
  - Benchmarks (same workloads as GB200 system) for GB300 system
  - GPT-OSS 120B
  - Qwen3-Next 80B_A3B
  - Support for linear attention on Blackwell - Gated Delta Networks
  - Pre-training with NVFP4 precision: Llama3 8B, Lama3 70B, Llama3.1 405B
- Megatron-Bridge support for benchmarks previously existing only for NeMo 2.0
  - Nemotron-H 56B
  - Fine-tuning (SFT and LoRA): Llama3 8B and Llama3 70B
- HybridEP: DeepSeek V3 benchmarks on GB200 and GB300 systems now use HybridEP
- CUDA Graphs
  - Full-model iteration CUDA graph used for dense models- Llama3 8B, Llama3 70B, Llama3.1 405B
  - Fine-grained Transformer component specific CUDA Graphs used for MoE models
NVIDIA Model Optimization Integration
- Knowledge Distillation
- Post training quantization export
- Quantization aware training
Enhanced LoRA support
- Support for expert layers
- Supported merging adapters for export to HuggingFace @HollowMan6
Finetuning dataset improvements: OpenAI messages format conversion, chat template support
Integration with Tensor NVIDIA-DLFW-Inspect for tensor statistic collection & monitoring
Support for sample-based training
Broader Community Adoption: Integrate the Megatron-Bridge into the training pipelines of VeRL (PR), Slime (PR), and Sky-RL (PR).
Special thanks to the community contributors for this release: @HollowMan6, @fzyzcjy, @erictang000, @hawkoli1987.

Contributors

fzyzcjy, HollowMan6, and 2 other contributors

Assets 2

Releases: NVIDIA-NeMo/Megatron-Bridge

NVIDIA Megatron-Bridge 0.4.2

Highlights

Performance

Software Component

Known issues

Contributors

Uh oh!

26.04-alpha.rc2

Uh oh!

NVIDIA Megatron-Bridge 0.4.1

Uh oh!

26.04-alpha.rc1

Uh oh!

NVIDIA Megatron-Bridge 0.4.0

Highlights

Model Collection Support

Diffusion Collection

Training & Functionality

Model Optimization

Performance

Developer Experience & Compatibility

Examples & Tutorials

Community Contributions

Contributors

Uh oh!

NVIDIA Megatron-Bridge 0.3.1

Performance & Model Configs

Bug Fixes

Documentation

Uh oh!

NVIDIA Megatron-Bridge 0.3.0

Highlights

Contributors

Uh oh!

NVIDIA Megatron-Bridge 0.2.2

Uh oh!

NVIDIA Megatron-Bridge 0.2.1

Uh oh!

NVIDIA Megatron-Bridge 0.2.0

Contributors

Uh oh!