Skip to content

[graph_trainer] AutoParallel AOT FX Trace Backend Integration#2725

Open
sanketpurandare wants to merge 1 commit intosanketpurandare/stack/3from
sanketpurandare/stack/4
Open

[graph_trainer] AutoParallel AOT FX Trace Backend Integration#2725
sanketpurandare wants to merge 1 commit intosanketpurandare/stack/3from
sanketpurandare/stack/4

Conversation

@sanketpurandare
Copy link
Copy Markdown
Contributor

@sanketpurandare sanketpurandare commented Mar 27, 2026

Stacked PRs:


[graph_trainer] AutoParallel AOT FX Trace Backend Integration

The goal is to make AutoParallel a first-class aot_fx_trace integration with
two backend modes:

  1. Native GraphTrainer backend mode

    • --compile.mode aot_fx_trace --compile.autoparallel
    • AutoParallel places the model.
    • GraphTrainer traces forward, loss, and backward with make_fx.
    • GraphTrainer uses its own aot-fx-trace graph passes and compile path.
  2. AutoParallel backend mode

    • --compile.mode aot_fx_trace --compile.autoparallel
    • --compile.inductor_compilation autoparallel_backend
    • AutoParallel places the model.
    • GraphTrainer still traces forward, loss, and backward with make_fx.
    • GraphTrainer switches from its native pass stack to AutoParallel's backend
      policy helpers and full-Inductor compilation path.

The key design point: both modes share the same AutoParallel model placement and
the same GraphTrainer training-step tracing. They differ only in what pass and
backend policy is applied after GraphTrainer has the traced train-step graph.

@sanketpurandare sanketpurandare force-pushed the sanketpurandare/stack/3 branch from 2d2fb54 to bd097f8 Compare March 27, 2026 00:49
@sanketpurandare sanketpurandare force-pushed the sanketpurandare/stack/4 branch from 9f9cc4c to b0f0cc3 Compare March 27, 2026 00:49
@meta-cla meta-cla Bot added the CLA Signed This label is managed by the Meta Open Source bot. label Mar 27, 2026
@sanketpurandare sanketpurandare changed the base branch from sanketpurandare/stack/3 to main March 27, 2026 01:13
@sanketpurandare sanketpurandare force-pushed the sanketpurandare/stack/4 branch from b0f0cc3 to cd1af1a Compare March 27, 2026 01:14
@sanketpurandare sanketpurandare changed the base branch from main to sanketpurandare/stack/3 March 27, 2026 01:14
Comment thread torchtitan/experiments/autoparallel/local_map_deepseek_v3/model.py Outdated
@sanketpurandare sanketpurandare marked this pull request as draft March 27, 2026 02:22
@sanketpurandare sanketpurandare changed the base branch from sanketpurandare/stack/3 to main March 27, 2026 18:57
@sanketpurandare sanketpurandare force-pushed the sanketpurandare/stack/4 branch from cd1af1a to 0d30cb5 Compare March 27, 2026 18:57
@sanketpurandare sanketpurandare changed the base branch from main to sanketpurandare/stack/3 March 27, 2026 18:57
@sanketpurandare sanketpurandare changed the base branch from sanketpurandare/stack/3 to main April 1, 2026 16:28
@sanketpurandare sanketpurandare changed the base branch from main to sanketpurandare/stack/3 April 1, 2026 16:28
@sanketpurandare sanketpurandare changed the base branch from sanketpurandare/stack/3 to main April 24, 2026 19:53
@sanketpurandare sanketpurandare force-pushed the sanketpurandare/stack/4 branch from 0d30cb5 to e00f2a7 Compare April 24, 2026 19:53
@sanketpurandare sanketpurandare changed the title Fix DeepSeekV3Model for Configurable build pattern Add DeepSeek V3 debugmodel_sdpa and 16B_sdpa config variants Apr 24, 2026
@sanketpurandare sanketpurandare changed the base branch from main to sanketpurandare/stack/3 April 24, 2026 19:53
@sanketpurandare sanketpurandare marked this pull request as ready for review April 24, 2026 19:55
Comment thread torchtitan/models/deepseek_v3/config_registry.py Outdated
@sanketpurandare sanketpurandare marked this pull request as draft April 30, 2026 18:15
@sanketpurandare sanketpurandare changed the base branch from sanketpurandare/stack/3 to main April 30, 2026 18:15
@sanketpurandare sanketpurandare force-pushed the sanketpurandare/stack/4 branch from e00f2a7 to 89d4fa1 Compare April 30, 2026 18:15
@sanketpurandare sanketpurandare changed the base branch from main to sanketpurandare/stack/3 April 30, 2026 18:16
@sanketpurandare sanketpurandare marked this pull request as ready for review April 30, 2026 18:16
@sanketpurandare sanketpurandare marked this pull request as draft April 30, 2026 18:41
@sanketpurandare sanketpurandare changed the base branch from sanketpurandare/stack/3 to main April 30, 2026 18:41
@sanketpurandare sanketpurandare force-pushed the sanketpurandare/stack/4 branch from 89d4fa1 to 2c037b6 Compare April 30, 2026 18:41
@sanketpurandare sanketpurandare changed the base branch from main to sanketpurandare/stack/3 April 30, 2026 18:41
@sanketpurandare sanketpurandare marked this pull request as ready for review April 30, 2026 18:41
Comment thread torchtitan/models/deepseek_v3/__init__.py Outdated
@sanketpurandare sanketpurandare marked this pull request as draft May 4, 2026 03:05
@sanketpurandare sanketpurandare changed the base branch from sanketpurandare/stack/3 to main May 4, 2026 03:05
@sanketpurandare sanketpurandare force-pushed the sanketpurandare/stack/4 branch from 2c037b6 to 5adc1ea Compare May 4, 2026 03:05
@sanketpurandare sanketpurandare changed the title Add DeepSeek V3 debugmodel_sdpa and 16B_sdpa config variants [graph_trainer] AutoParallel AOT FX Trace Backend Integration May 4, 2026
@sanketpurandare sanketpurandare changed the base branch from main to sanketpurandare/stack/3 May 4, 2026 03:05
@sanketpurandare sanketpurandare marked this pull request as ready for review May 4, 2026 03:06
The goal is to make AutoParallel a first-class `aot_fx_trace` integration with
two backend modes:

1. **Native GraphTrainer backend mode**
   - `--compile.mode aot_fx_trace --compile.autoparallel`
   - AutoParallel places the model.
   - GraphTrainer traces forward, loss, and backward with `make_fx`.
   - GraphTrainer uses its own aot-fx-trace graph passes and compile path.

2. **AutoParallel backend mode**
   - `--compile.mode aot_fx_trace --compile.autoparallel`
   - `--compile.inductor_compilation autoparallel_backend`
   - AutoParallel places the model.
   - GraphTrainer still traces forward, loss, and backward with `make_fx`.
   - GraphTrainer switches from its native pass stack to AutoParallel's backend
     policy helpers and full-Inductor compilation path.

The key design point: both modes share the same AutoParallel model placement and
the same GraphTrainer training-step tracing. They differ only in what pass and
backend policy is applied after GraphTrainer has the traced train-step graph.

stack-info: PR: #2725, branch: sanketpurandare/stack/4
@sanketpurandare sanketpurandare marked this pull request as draft May 4, 2026 04:38
@sanketpurandare sanketpurandare changed the base branch from sanketpurandare/stack/3 to main May 4, 2026 04:38
@sanketpurandare sanketpurandare force-pushed the sanketpurandare/stack/4 branch from 5adc1ea to 28d5181 Compare May 4, 2026 04:38
@sanketpurandare sanketpurandare changed the base branch from main to sanketpurandare/stack/3 May 4, 2026 04:38
@sanketpurandare sanketpurandare marked this pull request as ready for review May 4, 2026 04:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/8gpu CLA Signed This label is managed by the Meta Open Source bot.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants