Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BE][5/n] simplify pp vs. non-pp set up #510

Merged
merged 3 commits into from
Aug 8, 2024

Commits on Aug 7, 2024

  1. simply pp vs. non-pp set up

    [ghstack-poisoned]
    tianyu-l committed Aug 7, 2024
    Configuration menu
    Copy the full SHA
    338f183 View commit details
    Browse the repository at this point in the history
  2. Update on "[BE][5/n] simplify pp vs. non-pp set up"

    This PR restructures the PP vs. non-PP setup in `train.py`:
    - Now we only have two main if-else for PP vs. non-PP, one in setup phase, the other in training phase.
    - I think it's already clear to read or copy-paste, and it's not necessary to create separate sub-functions to hold the code.
    
    This PR also removes unnecessary module returns in `parallelize_llama`, as we are modifying module in-place. Note that torch.compile and AC require returning and reassigning the module. But since we are doing per-block compile and AC, we achieve that in-place for the whole model by
    ```
    transformer_block = compile/AC(transformer_block)
    model.layers.register_module(layer_id, transformer_block)
    ``` 
    
    [ghstack-poisoned]
    tianyu-l committed Aug 7, 2024
    Configuration menu
    Copy the full SHA
    f58ca70 View commit details
    Browse the repository at this point in the history

Commits on Aug 8, 2024

  1. Update on "[BE][5/n] simplify pp vs. non-pp set up"

    This PR refactors the PP vs. non-PP setup in `train.py`:
    - moves `build_pipeline_schedule ` into `pipeline_llama` which reduces the interface for PP in `train.py`
    - refactors the set up flow, so that we only have two main if-else for PP vs. non-PP, one in setup phase, the other in training phase.
    - I think it's already clear to read or copy-paste, and it's not necessary to create separate sub-functions to hold the code.
    
    This PR also removes unnecessary module returns in `parallelize_llama`, as we are modifying module in-place. Note that torch.compile and AC require returning and reassigning the module. But since we are doing per-block compile and AC, we achieve that in-place for the whole model by
    ```
    transformer_block = compile/AC(transformer_block)
    model.layers.register_module(layer_id, transformer_block)
    ``` 
    
    [ghstack-poisoned]
    tianyu-l committed Aug 8, 2024
    Configuration menu
    Copy the full SHA
    ff53569 View commit details
    Browse the repository at this point in the history