Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Benchmark] Run xetla streamk gemm in benchmark #2438

Merged
merged 3 commits into from
Oct 15, 2024

Conversation

ESI-SYD
Copy link
Contributor

@ESI-SYD ESI-SYD commented Oct 8, 2024

No description provided.

@ESI-SYD ESI-SYD linked an issue Oct 8, 2024 that may be closed by this pull request
@whitneywhtsang
Copy link
Contributor

@ESI-SYD Performance from https://github.com/intel/intel-xpu-backend-for-triton/actions/runs/11230502839 is 240TFlops, the reported performance was 262TFlops, can you investigate why?
There are a lot of printouts like below, can they be cleaned up?

 problem size: (3072,3072), tiled_shape: (12,12), tiles: 144, dp_tiles: 96, sk_tiles: 48, iters_per_tile: 128, num_workgroups: 128, dp_workgroups: 96, dp_waves: 3, sk_groups_per_region: 32, sk_regions: 1, sk_waves: 1, sk_iters_per_normal_group: 192, sk_big_groups_per_region: 0, avail_xecores: 32

Local range: {1, 8, 4} 
SK Score: 51

@ESI-SYD
Copy link
Contributor Author

ESI-SYD commented Oct 9, 2024

@ESI-SYD Performance from https://github.com/intel/intel-xpu-backend-for-triton/actions/runs/11230502839 is 240TFlops, the reported performance was 262TFlops, can you investigate why? There are a lot of printouts like below, can they be cleaned up?

 problem size: (3072,3072), tiled_shape: (12,12), tiles: 144, dp_tiles: 96, sk_tiles: 48, iters_per_tile: 128, num_workgroups: 128, dp_workgroups: 96, dp_waves: 3, sk_groups_per_region: 32, sk_regions: 1, sk_waves: 1, sk_iters_per_normal_group: 192, sk_big_groups_per_region: 0, avail_xecores: 32

Local range: {1, 8, 4} 
SK Score: 51

Let me check, previously I can get ~252 locally

@ESI-SYD ESI-SYD marked this pull request as draft October 9, 2024 03:10
@ESI-SYD
Copy link
Contributor Author

ESI-SYD commented Oct 10, 2024

Try to disable prints intel/xetla#54

@ESI-SYD ESI-SYD marked this pull request as ready for review October 15, 2024 02:54
@whitneywhtsang
Copy link
Contributor

What has changed? Is performance good now? Print disabled?

@ESI-SYD
Copy link
Contributor Author

ESI-SYD commented Oct 15, 2024

What has changed? Is performance good now? Print disabled?

xecores increased, 251.6 now, 96% , print disable change not landed (looks like not activate in their public repo).

https://github.com/intel/intel-xpu-backend-for-triton/actions/runs/11338429336/job/31531618981

@ESI-SYD
Copy link
Contributor Author

ESI-SYD commented Oct 15, 2024

Note: Pre-commit checks failure not releated to this PR.

@whitneywhtsang
Copy link
Contributor

In https://github.com/intel/intel-xpu-backend-for-triton/actions/runs/11338429336, I see stream-k performance is only 103TFlops.

@ESI-SYD
Copy link
Contributor Author

ESI-SYD commented Oct 15, 2024

In https://github.com/intel/intel-xpu-backend-for-triton/actions/runs/11338429336, I see stream-k performance is only 103TFlops.

        M       K       N   Triton-GB/s    XeTLA-GB/s  Triton-GB/s-min  XeTLA-GB/s-min  Triton-GB/s-max  XeTLA-GB/s-max  Triton-TFlops  XeTLA-TFlops  Triton-TFlops-min  XeTLA-TFlops-min  Triton-TFlops-max  XeTLA-TFlops-max  Triton-CV  XeTLA-CV
0  3072.0  4096.0  3072.0  5.033170e+07  5.033177e+07     5.033169e+07    5.033173e+07     5.033170e+07    5.033179e+07     103.746892    251.641847          93.622131        168.239498         110.720402        280.920817   0.054009  0.174175

@whitneywhtsang
Copy link
Contributor

In https://github.com/intel/intel-xpu-backend-for-triton/actions/runs/11338429336, I see stream-k performance is only 103TFlops.

        M       K       N   Triton-GB/s    XeTLA-GB/s  Triton-GB/s-min  XeTLA-GB/s-min  Triton-GB/s-max  XeTLA-GB/s-max  Triton-TFlops  XeTLA-TFlops  Triton-TFlops-min  XeTLA-TFlops-min  Triton-TFlops-max  XeTLA-TFlops-max  Triton-CV  XeTLA-CV
0  3072.0  4096.0  3072.0  5.033170e+07  5.033177e+07     5.033169e+07    5.033173e+07     5.033170e+07    5.033179e+07     103.746892    251.641847          93.622131        168.239498         110.720402        280.920817   0.054009  0.174175

Opps, I was looking at the wrong column.

Copy link
Contributor

@whitneywhtsang whitneywhtsang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please create an issue to track the removal of the prints.

@ESI-SYD
Copy link
Contributor Author

ESI-SYD commented Oct 15, 2024

Please create an issue to track the removal of the prints.

#2489

@etiotto
Copy link
Contributor

etiotto commented Oct 15, 2024

@ESI-SYD reminder, the pre-commit is failing.

@whitneywhtsang whitneywhtsang merged commit 6018c7b into main Oct 15, 2024
5 checks passed
@whitneywhtsang whitneywhtsang deleted the yudong/xetla_streamk branch October 15, 2024 19:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[XeTLA] Add StreamK and SplitK implementation
4 participants