Port "sub-group transpose reduction" to default path #2266

victor-eds · 2024-09-17T12:28:12Z

#2109 explores layout conversion in the advanced path to improve reduction performance (see #1637 for investigation). Porting this to the default path would involve a transformation similar to (after heuristics to check profitability):

Reshape input tensor so no data movement is needed and we can perform reduction of elements within the work-item tt.reshape
Perform reduction within the work-item tt.reduce
Convert layout so a transposition within the sub-group as explained in the investigation is performed triton_gpu.convert_layout
Finalize reduction (within work-item and possibly within the work-group) tt.reduce
Convert back to initial layout triton_gpu.convert_layout

Note 5 can be dropped in case the new layout is beneficial for performance.

The text was updated successfully, but these errors were encountered:

victor-eds · 2024-10-07T11:47:59Z

Working on generating always NOP reshape ops.

victor-eds · 2024-10-15T11:53:16Z

Adding lit tests locally. Pass 1.0 in a good shape.

victor-eds added the performance label Sep 17, 2024

victor-eds self-assigned this Sep 17, 2024

vlad-penkin added this to the 4.0 [Performance] Core milestone Sep 17, 2024

vlad-penkin added codegen: attention enhancement New feature or request labels Sep 17, 2024

victor-eds changed the title ~~Port #2109 to default path~~ Port "sub-group transpose reduction" to default path Sep 18, 2024

victor-eds removed their assignment Sep 18, 2024

victor-eds self-assigned this Sep 30, 2024

This was referenced Oct 15, 2024

[OptRed] Define -tritonintelgpu-optimize-reduction-locality pass #2491

Open

[Triton] Use UnitAttr in tt.reshape definition #2497

Closed

This was linked to pull requests Oct 16, 2024

[Triton] Use UnitAttr in tt.reshape definition #2497

Closed

[OptRed] Define -tritonintelgpu-optimize-reduction-locality pass #2491

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Port "sub-group transpose reduction" to default path #2266

Port "sub-group transpose reduction" to default path #2266

victor-eds commented Sep 17, 2024

victor-eds commented Oct 7, 2024

victor-eds commented Oct 15, 2024

Port "sub-group transpose reduction" to default path #2266

Port "sub-group transpose reduction" to default path #2266

Comments

victor-eds commented Sep 17, 2024

victor-eds commented Oct 7, 2024

victor-eds commented Oct 15, 2024