Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Does TP/SP overlap support nccl backend? #1288

Closed
wplf opened this issue Oct 25, 2024 · 2 comments
Closed

Does TP/SP overlap support nccl backend? #1288

wplf opened this issue Oct 25, 2024 · 2 comments

Comments

@wplf
Copy link
Contributor

wplf commented Oct 25, 2024

Hi, thank you for great works.

I've seen the source code and found that TE linear TP/SP overlap supports reduce_scatter and all_gather in nccl backend.
In the megatron repo, TP overlap is initialized on mpi. But, torch don't support mpi backend by default.

So, does TP/SP overlap support nccl backend?

@yaox12
Copy link
Collaborator

yaox12 commented Oct 27, 2024

Yes, we now support bootstrap from NCCL.
And in latest Megatron, you can find the argument --tp-comm-bootstrap-backend in https://github.com/NVIDIA/Megatron-LM/blob/d357c188323b6928cbcbd6f7e06af04c1694382f/megatron/training/arguments.py#L1163-L1165

@wplf
Copy link
Contributor Author

wplf commented Oct 27, 2024

Thank you, very much.
I'll give it a try.

@wplf wplf closed this as completed Oct 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants