Commit 9822d2e
test(moe): fix stale unit tests broken by lazy DeepEP buffer + packed-param requires_grad
Two L0 unit tests were stale relative to earlier branch code changes:
- test_grouped_experts_deepep_token_dispatcher_init asserted init_token_dispatcher
eagerly calls _init_deepep_buffer, but buffer allocation is now lazy (deferred to
FusedDispatch.forward) — the revert that fixed the single-node load-time OOM. Assert
it is NOT called.
- ExpertParallel._partition_fn now constructs nn.Parameter(..., requires_grad=...) so
non-floating packed mxfp4 params (int8 / e8m0) don't trip the default requires_grad=True.
The test's stub Parameter didn't accept/store requires_grad; add it (also unblocks the
requires_grad-preservation test).
Both fixes verified: tests/unit_tests/moe now 450 passed, 0 failed.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: Daniel Shen <dshen@crusoe.ai>1 parent 3676cd4 commit 9822d2e
2 files changed
Lines changed: 6 additions & 2 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
759 | 759 | | |
760 | 760 | | |
761 | 761 | | |
762 | | - | |
| 762 | + | |
| 763 | + | |
| 764 | + | |
| 765 | + | |
763 | 766 | | |
764 | 767 | | |
765 | 768 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
84 | 84 | | |
85 | 85 | | |
86 | 86 | | |
87 | | - | |
| 87 | + | |
88 | 88 | | |
| 89 | + | |
89 | 90 | | |
90 | 91 | | |
91 | 92 | | |
| |||
0 commit comments