Commit d036765
committed
Refactor pipeline parallel helpers for graph PP reuse
Extract pipeline metadata, module splitting, and PP rank-to-stage mapping from pipeline_llm so graph PP can reuse the underlying setup logic without duplicating it. Add backward_requires_autograd to the schedule builder for graph PP, which runs explicit backward graphs instead of autograd. Existing eager PP behavior is unchanged.
Keep pipeline_llm as the only public entrypoint exported by torchtitan.distributed.pipeline_parallel. Make build_pipeline_schedule, generate_llm_fqn_per_model_part, and pipeline_module_split private because they are implementation details with narrower contracts: schedule construction depends on the current PP config shape, LLM FQN generation encodes TorchTitan-specific module naming heuristics, and module splitting assumes models tolerate deleted or empty layer containers.
Update internal and experiment callsites to use the private helper names directly where reuse is still needed. This keeps the reusable code centralized while avoiding accidentally blessing those helpers as stable public API.
stack-info: PR: #2724, branch: sanketpurandare/stack/31 parent 627126f commit d036765
3 files changed
Lines changed: 229 additions & 166 deletions
File tree
- torchtitan
- distributed
- experiments
- ft/diloco
- transformers_modeling_backend
0 commit comments