Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dgrad ReduceScatter overlap fix #1088

Merged
merged 5 commits into from
Aug 13, 2024
Merged

Conversation

vasunvidia
Copy link
Collaborator

@vasunvidia vasunvidia commented Aug 7, 2024

Description

Please include a brief summary of the changes, relevant motivation and context.

This PR fixes 2 bugs in enabling DGRAD-RS overlap.

Type of change

  • Documentation change (change only to the documentation, either a fix or a new content)
  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Infra/Build change
  • Code refractor

Changes

Please list the changes introduced in this PR:

  • Fix a bug which assumes DGRAD-RS overlap always uses pipeline method. The fix is to use TP config to add the layer to the correct method list.
  • ring_exchange RS uses main_stream for last GEMM chunk. But the send/recv streams wait for stream_compute during last chunk. The fix is to use stream_compute for last chunk and use main stream for reduction kernel.

Checklist:

  • I have read and followed the contributing guidelines
  • The functionality is complete
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes

This PR fixes a bug in enabling DGRAD-RS overlap by adding the
layer to the correct method list. Previously, the RS-DGRAD overlap
layer was incorrectly added to pipeline method list even if
ring_exchange method is specified in config.

Signed-off-by: Vasudevan Rengasamy <[email protected]>
ring_exchange RS uses main_stream for last GEMM chunk. But the
send/recv streams wait for stream_compute during last chunk.

Signed-off-by: Vasudevan Rengasamy <[email protected]>
@vasunvidia vasunvidia marked this pull request as ready for review August 7, 2024 22:21
Copy link
Collaborator

@timmoon10 timmoon10 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

UB is being refactored at #1067. Can you make comments in that PR so we don't revert this bug?

@timmoon10
Copy link
Collaborator

/te-ci pytorch

Copy link
Member

@ksivaman ksivaman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good

@ksivaman
Copy link
Member

Could you resolve the conflicts? @vasunvidia

@ksivaman
Copy link
Member

/te-ci pytorch

@ksivaman ksivaman merged commit ec49a52 into NVIDIA:main Aug 13, 2024
25 of 26 checks passed
mgoldfarb-nvidia pushed a commit to mgoldfarb-nvidia/TransformerEngine that referenced this pull request Aug 14, 2024
* DGRAD-RS overlap bug fix

This PR fixes a bug in enabling DGRAD-RS overlap by adding the
layer to the correct method list. Previously, the RS-DGRAD overlap
layer was incorrectly added to pipeline method list even if
ring_exchange method is specified in config.

Signed-off-by: Vasudevan Rengasamy <[email protected]>

* Bug fix for ring_exchange ReduceScatter

ring_exchange RS uses main_stream for last GEMM chunk. But the
send/recv streams wait for stream_compute during last chunk.

Signed-off-by: Vasudevan Rengasamy <[email protected]>

---------

Signed-off-by: Vasudevan Rengasamy <[email protected]>
Co-authored-by: Tim Moon <[email protected]>
Co-authored-by: Kirthi Shankar Sivamani <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants