You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
#2109 explores layout conversion in the advanced path to improve reduction performance (see #1637 for investigation). Porting this to the default path would involve a transformation similar to (after heuristics to check profitability):
Reshape input tensor so no data movement is needed and we can perform reduction of elements within the work-item tt.reshape
Perform reduction within the work-item tt.reduce
Convert layout so a transposition within the sub-group as explained in the investigation is performed triton_gpu.convert_layout
Finalize reduction (within work-item and possibly within the work-group) tt.reduce
Convert back to initial layout triton_gpu.convert_layout
Note 5 can be dropped in case the new layout is beneficial for performance.
The text was updated successfully, but these errors were encountered:
#2109 explores layout conversion in the advanced path to improve reduction performance (see #1637 for investigation). Porting this to the default path would involve a transformation similar to (after heuristics to check profitability):
tt.reshape
tt.reduce
triton_gpu.convert_layout
tt.reduce
triton_gpu.convert_layout
Note 5 can be dropped in case the new layout is beneficial for performance.
The text was updated successfully, but these errors were encountered: