Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPMD] Enable GPU CI for Distributed Tensor #333

Merged
merged 72 commits into from
Aug 31, 2022
Merged

[SPMD] Enable GPU CI for Distributed Tensor #333

merged 72 commits into from
Aug 31, 2022

Conversation

fduwjj
Copy link
Contributor

@fduwjj fduwjj commented Aug 9, 2022

We have not enabled distributed tensor test for GPUs. Let's do that.

Also add an E2E test for a Megatron-LM style TP.

@fduwjj fduwjj changed the title Enable GPU test for Distributed Tensor [SPMD] Enable GPU test for Distributed Tensor Aug 9, 2022
@fduwjj fduwjj requested a review from wanchaol August 9, 2022 22:56
@fduwjj
Copy link
Contributor Author

fduwjj commented Aug 31, 2022

Turns out the root cause of CI failure is due to this one: NVIDIA/nccl#290

@fduwjj
Copy link
Contributor Author

fduwjj commented Aug 31, 2022

This is reference for installing Python from source: https://hackersandslackers.com/multiple-versions-python-ubuntu/

@fduwjj
Copy link
Contributor Author

fduwjj commented Aug 31, 2022

CI all passed and linter fixed. Let's merge it.

@fduwjj fduwjj merged commit c0d1d7e into main Aug 31, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants