Commit d036765

committed

Refactor pipeline parallel helpers for graph PP reuse

Extract pipeline metadata, module splitting, and PP rank-to-stage mapping from pipeline_llm so graph PP can reuse the underlying setup logic without duplicating it. Add backward_requires_autograd to the schedule builder for graph PP, which runs explicit backward graphs instead of autograd. Existing eager PP behavior is unchanged. Keep pipeline_llm as the only public entrypoint exported by torchtitan.distributed.pipeline_parallel. Make build_pipeline_schedule, generate_llm_fqn_per_model_part, and pipeline_module_split private because they are implementation details with narrower contracts: schedule construction depends on the current PP config shape, LLM FQN generation encodes TorchTitan-specific module naming heuristics, and module splitting assumes models tolerate deleted or empty layer containers. Update internal and experiment callsites to use the private helper names directly where reuse is still needed. This keeps the reusable code centralized while avoiding accidentally blessing those helpers as stable public API. stack-info: PR: #2724, branch: sanketpurandare/stack/3

1 parent 627126f commit d036765Copy full SHA for d036765

3 files changed

torchtitan
- distributed
  - pipeline_parallel.py
- experiments
  - ft/diloco
    - utils.py
  - transformers_modeling_backend
    - pipeline.py

Comments

(0)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Commit d036765

File tree

0 commit comments