[Refactor] Modular Integration Test Framework with DeepSeek-v3 Support by wwwjn · Pull Request #1431 · pytorch/torchtitan

wwwjn · 2025-07-21T20:12:53Z

Integration Tests Restructuring

Split current integration tests into two sets:
1. Depth Test - features.py: Use llama3 model, to test all the main components of torchtitan are functioning as expected
2. Breath Test - models.py : As we are supporting more models in torchtitan core, setup parallelsim related tests for each model, to test model architecture / args related changes. Make sure the Integration test implementation is easy to extend to new models.
Moved integration test files from the root directory to a dedicated tests/integration_tests/ directory
Added a base configuration file base_config.toml for integration tests, as most of the train_configs shared 90% same settings
Separate control logic and test case definition: run_tests.py for control logic, other files for test case definition.

tianyu-l

Great initiative! Left some initial comments, let's discuss.

tianyu-l

I suggest we make model tests flat, and reuse functions such as main, run_tests, etc. across all tests -- basically decouple control logic and data.

tianyu-l

Looks much cleaner. Had some more comments.

tianyu-l · 2025-08-17T08:56:17Z

    """
-    integration_tests_flavors = defaultdict(list)
-    integration_tests_flavors["debug_model.toml"] = [
+    integration_tests_flavors = []


this is not addressed

tianyu-l · 2025-08-17T09:02:35Z

+    parser.add_argument(
+        "--test_suite",
+        default="",
+        choices=["features", "models", "h100"],


Is it because ft.py is special and hard to reconcile? I think that's fine for now.

Yes ft.py use different command to run a single tests, so this is hard to reconcile.

tianyu-l · 2025-08-17T09:05:47Z



-def run_test(test_flavor: OverrideDefinitions, full_path: str, output_dir: str):
+def run_single_test(test_flavor: OverrideDefinitions, full_path: str, output_dir: str):


why do we need to define such functions again? we can just put things like clip_encoder_version_arg into OverrideDefinitions

If the concern is repetition, you can define the common part as a list l (with better naming) and put it in every OverrideDefinitions using *l.

For Flux model, the main concern is not repeatedly add configurations, but the command is different from the main run_single_tests() function. Flux is running using another ./run_train.py under flux folder, not the main ./run_train.py in torchtitan. The same thing happened for ft.py as well,ft.py also use a different command to run the tests.

Most of time the commands to run tests are the same, we always use main ./run_train.py with slightly modified configurations. So I think there's no need to over generalize the run_test() function. If the command is different, I re-defined the run_single_tests() function. And since the run_tests() functions calls run_single_tests() internally, so run_test() function is re-defined as well.

Would love to know your opinion on this design

tianyu-l · 2025-08-17T09:06:15Z

-                                )
-                            else:
-                                run_test(test_flavor, full_path, args.output_dir)
+def run_tests(args, test_list: List[OverrideDefinitions]):


can we reuse what's already in run_tests.py?

Same rationale as above

tianyu-l

Looks great, thank you for the refactor!

#1431) ### Integration Tests Restructuring * Split current integration tests into two sets: 1. Depth Test - `features.py`: Use llama3 model, to test all the *main components* of torchtitan are functioning as expected 2. Breath Test - `models.py` : As we are supporting more models in torchtitan core, setup parallelsim related tests for each model, to test model architecture / args related changes. Make sure the Integration test implementation is easy to extend to new models. * Moved integration test files from the root directory to a dedicated `tests/integration_tests/` directory * Added a base configuration file `base_config.toml` for integration tests, as most of the train_configs shared 90% same settings * Separate control logic and test case definition: `run_tests.py` for control logic, other files for test case definition.

pytorch#1431) ### Integration Tests Restructuring * Split current integration tests into two sets: 1. Depth Test - `features.py`: Use llama3 model, to test all the *main components* of torchtitan are functioning as expected 2. Breath Test - `models.py` : As we are supporting more models in torchtitan core, setup parallelsim related tests for each model, to test model architecture / args related changes. Make sure the Integration test implementation is easy to extend to new models. * Moved integration test files from the root directory to a dedicated `tests/integration_tests/` directory * Added a base configuration file `base_config.toml` for integration tests, as most of the train_configs shared 90% same settings * Separate control logic and test case definition: `run_tests.py` for control logic, other files for test case definition.

wwwjn requested review from fegin, tianyu-l and wconstab as code owners July 21, 2025 20:12

meta-cla Bot added the CLA Signed This label is managed by the Meta Open Source bot. label Jul 21, 2025

wwwjn changed the title ~~Integration Tests Restructuring for extensible test cases.~~ Modular Integration Test Framework with DeepSeek-v3 Support Jul 21, 2025

wwwjn marked this pull request as draft July 21, 2025 20:15

wwwjn changed the title ~~Modular Integration Test Framework with DeepSeek-v3 Support~~ [WIP] Modular Integration Test Framework with DeepSeek-v3 Support Jul 25, 2025

wwwjn force-pushed the ci-refactor branch from dd623c8 to 4d078b6 Compare July 25, 2025 18:39

wwwjn marked this pull request as ready for review July 25, 2025 22:22

wwwjn changed the title ~~[WIP] Modular Integration Test Framework with DeepSeek-v3 Support~~ Modular Integration Test Framework with DeepSeek-v3 Support Jul 25, 2025

tianyu-l reviewed Jul 28, 2025

View reviewed changes

wwwjn changed the title ~~Modular Integration Test Framework with DeepSeek-v3 Support~~ [Refactor] Modular Integration Test Framework with DeepSeek-v3 Support Jul 29, 2025

wwwjn force-pushed the ci-refactor branch 3 times, most recently from f94d708 to 5cf7850 Compare July 31, 2025 22:20

tianyu-l requested changes Aug 2, 2025

View reviewed changes

wwwjn force-pushed the ci-refactor branch from a8f9a81 to 2cca687 Compare August 11, 2025 20:28

wwwjn requested a review from tianyu-l August 14, 2025 01:15

tianyu-l reviewed Aug 17, 2025

View reviewed changes

wwwjn added 11 commits August 25, 2025 14:18

add CI for torchtitan

eec75c6

lint

12086e0

refactor CI

33dff7d

add integration test

6a2f5b2

refactor v1

70f3338

remove use_for_integration_test

0019a97

rename

8c5421d

rebase v2

5096c97

refactor

89d7dea

change commandline

fe56c32

change commandline

55ea2f2

wwwjn added 11 commits August 25, 2025 14:19

rebase

5065202

rebase

4d52b40

change badge

e0f6e3d

rebase to main

2a3d719

fix readme

d44310f

refactor wip

bde53eb

refactor logic

1441938

rebase

fb0d808

restore

20849be

fix flux tests

a5f4291

fix lint

f906330

wwwjn force-pushed the ci-refactor branch from acdab4c to 102c381 Compare August 25, 2025 22:14

rebase

21cf580

wwwjn force-pushed the ci-refactor branch from 102c381 to 21cf580 Compare August 27, 2025 06:21

wwwjn added 3 commits August 26, 2025 23:40

fix

b315f19

fix README

847e101

fix flux

b80a42a

wwwjn requested a review from tianyu-l August 27, 2025 18:52

fegin reviewed Aug 27, 2025

View reviewed changes

wwwjn added 3 commits August 27, 2025 12:52

fix List -> list

f4fe480

fix list

e46f7e8

fix lint

bb53357

tianyu-l mentioned this pull request Aug 28, 2025

Llama4 DP=4, EP=4 with torch.compile crashes with inductor codegen triton kernel error #1640

Closed

wwwjn requested a review from fegin August 28, 2025 17:43

tianyu-l approved these changes Aug 31, 2025

View reviewed changes

tianyu-l merged commit accc702 into main Aug 31, 2025
11 checks passed

tianyu-l deleted the ci-refactor branch August 31, 2025 01:55



		def run_test(test_flavor: OverrideDefinitions, full_path: str, output_dir: str):
		def run_single_test(test_flavor: OverrideDefinitions, full_path: str, output_dir: str):

Conversation

wwwjn commented Jul 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Integration Tests Restructuring

Uh oh!

tianyu-l left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

tianyu-l left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

tianyu-l left a comment

Choose a reason for hiding this comment

Uh oh!

tianyu-l Aug 17, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

tianyu-l Aug 17, 2025

Choose a reason for hiding this comment

Uh oh!

wwwjn Aug 25, 2025

Choose a reason for hiding this comment

Uh oh!

tianyu-l Aug 17, 2025

Choose a reason for hiding this comment

Uh oh!

wwwjn Aug 27, 2025

Choose a reason for hiding this comment

Uh oh!

tianyu-l Aug 17, 2025

Choose a reason for hiding this comment

Uh oh!

wwwjn Aug 27, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

tianyu-l left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

wwwjn commented Jul 21, 2025 •

edited

Loading