[Refactor] Modular Integration Test Framework with DeepSeek-v3 Support#1431
[Refactor] Modular Integration Test Framework with DeepSeek-v3 Support#1431
Conversation
tianyu-l
left a comment
There was a problem hiding this comment.
Great initiative! Left some initial comments, let's discuss.
f94d708 to
5cf7850
Compare
tianyu-l
left a comment
There was a problem hiding this comment.
I suggest we make model tests flat, and reuse functions such as main, run_tests, etc. across all tests -- basically decouple control logic and data.
tianyu-l
left a comment
There was a problem hiding this comment.
Looks much cleaner. Had some more comments.
| """ | ||
| integration_tests_flavors = defaultdict(list) | ||
| integration_tests_flavors["debug_model.toml"] = [ | ||
| integration_tests_flavors = [] |
| parser.add_argument( | ||
| "--test_suite", | ||
| default="", | ||
| choices=["features", "models", "h100"], |
There was a problem hiding this comment.
Is it because ft.py is special and hard to reconcile? I think that's fine for now.
There was a problem hiding this comment.
Yes ft.py use different command to run a single tests, so this is hard to reconcile.
|
|
||
|
|
||
| def run_test(test_flavor: OverrideDefinitions, full_path: str, output_dir: str): | ||
| def run_single_test(test_flavor: OverrideDefinitions, full_path: str, output_dir: str): |
There was a problem hiding this comment.
why do we need to define such functions again? we can just put things like clip_encoder_version_arg into OverrideDefinitions
If the concern is repetition, you can define the common part as a list l (with better naming) and put it in every OverrideDefinitions using *l.
There was a problem hiding this comment.
For Flux model, the main concern is not repeatedly add configurations, but the command is different from the main run_single_tests() function. Flux is running using another ./run_train.py under flux folder, not the main ./run_train.py in torchtitan. The same thing happened for ft.py as well,ft.py also use a different command to run the tests.
Most of time the commands to run tests are the same, we always use main ./run_train.py with slightly modified configurations. So I think there's no need to over generalize the run_test() function. If the command is different, I re-defined the run_single_tests() function. And since the run_tests() functions calls run_single_tests() internally, so run_test() function is re-defined as well.
Would love to know your opinion on this design
| ) | ||
| else: | ||
| run_test(test_flavor, full_path, args.output_dir) | ||
| def run_tests(args, test_list: List[OverrideDefinitions]): |
There was a problem hiding this comment.
can we reuse what's already in run_tests.py?
There was a problem hiding this comment.
Same rationale as above
tianyu-l
left a comment
There was a problem hiding this comment.
Looks great, thank you for the refactor!
#1431) ### Integration Tests Restructuring * Split current integration tests into two sets: 1. Depth Test - `features.py`: Use llama3 model, to test all the *main components* of torchtitan are functioning as expected 2. Breath Test - `models.py` : As we are supporting more models in torchtitan core, setup parallelsim related tests for each model, to test model architecture / args related changes. Make sure the Integration test implementation is easy to extend to new models. * Moved integration test files from the root directory to a dedicated `tests/integration_tests/` directory * Added a base configuration file `base_config.toml` for integration tests, as most of the train_configs shared 90% same settings * Separate control logic and test case definition: `run_tests.py` for control logic, other files for test case definition.
pytorch#1431) ### Integration Tests Restructuring * Split current integration tests into two sets: 1. Depth Test - `features.py`: Use llama3 model, to test all the *main components* of torchtitan are functioning as expected 2. Breath Test - `models.py` : As we are supporting more models in torchtitan core, setup parallelsim related tests for each model, to test model architecture / args related changes. Make sure the Integration test implementation is easy to extend to new models. * Moved integration test files from the root directory to a dedicated `tests/integration_tests/` directory * Added a base configuration file `base_config.toml` for integration tests, as most of the train_configs shared 90% same settings * Separate control logic and test case definition: `run_tests.py` for control logic, other files for test case definition.
pytorch#1431) ### Integration Tests Restructuring * Split current integration tests into two sets: 1. Depth Test - `features.py`: Use llama3 model, to test all the *main components* of torchtitan are functioning as expected 2. Breath Test - `models.py` : As we are supporting more models in torchtitan core, setup parallelsim related tests for each model, to test model architecture / args related changes. Make sure the Integration test implementation is easy to extend to new models. * Moved integration test files from the root directory to a dedicated `tests/integration_tests/` directory * Added a base configuration file `base_config.toml` for integration tests, as most of the train_configs shared 90% same settings * Separate control logic and test case definition: `run_tests.py` for control logic, other files for test case definition.
Integration Tests Restructuring
Split current integration tests into two sets:
features.py: Use llama3 model, to test all the main components of torchtitan are functioning as expectedmodels.py: As we are supporting more models in torchtitan core, setup parallelsim related tests for each model, to test model architecture / args related changes. Make sure the Integration test implementation is easy to extend to new models.Moved integration test files from the root directory to a dedicated
tests/integration_tests/directoryAdded a base configuration file
base_config.tomlfor integration tests, as most of the train_configs shared 90% same settingsSeparate control logic and test case definition:
run_tests.pyfor control logic, other files for test case definition.