Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Port "sub-group transpose reduction" to default path #2266

Open
victor-eds opened this issue Sep 17, 2024 · 2 comments · May be fixed by #2491
Open

Port "sub-group transpose reduction" to default path #2266

victor-eds opened this issue Sep 17, 2024 · 2 comments · May be fixed by #2491
Assignees

Comments

@victor-eds
Copy link
Contributor

#2109 explores layout conversion in the advanced path to improve reduction performance (see #1637 for investigation). Porting this to the default path would involve a transformation similar to (after heuristics to check profitability):

  1. Reshape input tensor so no data movement is needed and we can perform reduction of elements within the work-item tt.reshape
  2. Perform reduction within the work-item tt.reduce
  3. Convert layout so a transposition within the sub-group as explained in the investigation is performed triton_gpu.convert_layout
  4. Finalize reduction (within work-item and possibly within the work-group) tt.reduce
  5. Convert back to initial layout triton_gpu.convert_layout

Note 5 can be dropped in case the new layout is beneficial for performance.

@victor-eds victor-eds self-assigned this Sep 17, 2024
@vlad-penkin vlad-penkin added this to the 4.0 [Performance] Core milestone Sep 17, 2024
@victor-eds victor-eds changed the title Port #2109 to default path Port "sub-group transpose reduction" to default path Sep 18, 2024
@victor-eds victor-eds removed their assignment Sep 18, 2024
@victor-eds victor-eds self-assigned this Sep 30, 2024
@victor-eds
Copy link
Contributor Author

Working on generating always NOP reshape ops.

@victor-eds
Copy link
Contributor Author

Adding lit tests locally. Pass 1.0 in a good shape.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment