fix(deepseek-v4): restore batch axis for packed-sequence (THD) forward by akoumpa · Pull Request #2651 · NVIDIA-NeMo/Automodel

akoumpa · 2026-06-20T02:04:58Z

NVBug: 6329577

What

Add the missing leading batch dimension to inputs_embeds in DeepseekV4Model.forward
when the input arrives in packed-sequence (THD) layout, before the hc_mult expansion.

Why

Packed-sequence finetuning of DeepSeek-V4-Flash crashes on the first optim step in the model
forward (NVBugs 6329577):

RuntimeError: The expanded size of the tensor (4) must match the existing size (512)
              at non-singleton dimension 2.  Target sizes: [-1, -1, 4, -1].
              Tensor sizes: [512, 512, 1]

at nemo_automodel/components/models/deepseek_v4/model.py:450:

h = inputs_embeds.unsqueeze(2).expand(-1, -1, self.config.hc_mult, -1).contiguous()

Root cause: the THD packed path (make_cp_batch_and_ctx(use_te=True) ->
process_input_for_thd) collapses the batch dimension, handing the model a rank-1
input_ids of shape [T]. embed_tokens([T]) then yields a rank-2 [T, H] inputs_embeds,
so unsqueeze(2) -> [T, H, 1] and expand(-1,-1,hc_mult,-1) tries to resize the (non-singleton)
hidden dim to hc_mult and fails. The model already restores the batch dim on the OUTPUT side
(compute_lm_head_logits(is_thd=True) does unsqueeze(0) -> [1, T, V]); the input side just
lacked the symmetric up-rank. (The original NVBug "suggested fix" was a no-op — identical to the
existing code — and did not address the actual rank mismatch.)

How

In DeepseekV4Model.forward, after computing inputs_embeds and before the hc_mult expand:

if inputs_embeds.dim() == 2:
    inputs_embeds = inputs_embeds.unsqueeze(0)

This is a no-op for the normal BSHD [B, S, H] path and mirrors the existing output-side THD
restoration; downstream position_ids (1-D) and seq_lens (1-D) up-ranks were already present.
7 lines added, 0 removed, 1 file.

How tested

Weightless single-GPU repro (tiny random-init DSV4, hc_mult=4, hidden_size=512) that feeds
input_ids through the real process_input_for_thd to produce the exact rank-1 [T] layout:
- Before: reproduces the reported error at model.py:450 (Tensor sizes: [512, 512, 1]).
- After: forward completes, logits [1, 512, 256] = [1, T, vocab], no NaN.
pytest tests/unit_tests/models/deepseek_v4/test_dsv4_model_smoke.py -> 17 passed
(incl. THD / forward / backward smoke tests). No regressions.

NVBugs: 6329577 Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

copy-pr-bot · 2026-06-20T02:05:02Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

akoumpa · 2026-06-20T02:13:16Z

/ok to test 0862c83

fix(deepseek-v4): restore batch axis for packed-sequence (THD) forward

0862c83

NVBugs: 6329577 Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

akoumpa requested a review from a team as a code owner June 20, 2026 02:04

akoumpa added the r0.5.0 Auto-cherrypick to release branch. Apply before merge; cherrypick happens after merge. label Jun 20, 2026

copy-pr-bot Bot temporarily deployed to nemo-ci June 20, 2026 02:13 Inactive

copy-pr-bot Bot temporarily deployed to test June 20, 2026 02:13 Inactive

copy-pr-bot Bot temporarily deployed to public June 20, 2026 02:13 Inactive

copy-pr-bot Bot temporarily deployed to public June 20, 2026 02:15 Inactive

copy-pr-bot Bot temporarily deployed to public June 20, 2026 02:16 Inactive

copy-pr-bot Bot temporarily deployed to nemo-ci June 20, 2026 02:17 Inactive

HuiyingLi approved these changes Jun 20, 2026

View reviewed changes

akoumpa enabled auto-merge (squash) June 20, 2026 03:31

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(deepseek-v4): restore batch axis for packed-sequence (THD) forward#2651

fix(deepseek-v4): restore batch axis for packed-sequence (THD) forward#2651
akoumpa wants to merge 1 commit into
mainfrom
akoumparouli/nvbug6329577-deepseek-v4-thd-batch

akoumpa commented Jun 20, 2026

Uh oh!

copy-pr-bot Bot commented Jun 20, 2026

Uh oh!

akoumpa commented Jun 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

akoumpa commented Jun 20, 2026

What

Why

How

How tested

Uh oh!

copy-pr-bot Bot commented Jun 20, 2026

Uh oh!

akoumpa commented Jun 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants