Add microbenchmark for A@B^t #2408

Egor-Krivov · 2024-10-02T16:09:49Z

PR adds microbenchmark for gemm with A@B^t, which closes #2414

Egor-Krivov · 2024-10-04T14:25:31Z

Test run https://github.com/intel/intel-xpu-backend-for-triton/actions/runs/11181640138/job/31086121649

alexbaden

The benchmark implementation looks fine. One note - this example is almost identical to our tutorial 10 experimental block pointer code, with some small changes to handle the transpose case and a user provided bias vector. We might be able to unify this microbenchmark with tutorial 10 in the future.

Based on this feedback #2408 (review) Changed GEMM benchmark to include transposed matrices case. Closes #2424 Relates to #1795 A@B^t case is important because weight matrix is often stored in [M, K] format. For example, in https://pytorch.org/docs/stable/generated/torch.nn.Linear.html Right now we are about 1.5 times slower on XPU against raw torch for that case. A^t@B case is important because it's part of matmul backprop. Right now we are about 4 times slower on XPU against raw torch for that case.

Egor-Krivov added 9 commits October 2, 2024 12:47

Added gemm with B^t matrix

367337f

update

296abb7

update

200ede8

update

af19514

update

adccfa5

update

4df6d9a

update

a2b8c5a

update

34a0f90

update

1786aa6

vlad-penkin linked an issue Oct 2, 2024 that may be closed by this pull request

[Benchmarks] Add microbenchmark with A@B^t #2414

Closed

Egor-Krivov added 6 commits October 4, 2024 13:45

update

868121b

merged

9016007

style fixes

65b31af

fixed

a86ef8d

updated path

9add27d

simpler path

dcbe830

Egor-Krivov marked this pull request as ready for review October 4, 2024 14:20

Egor-Krivov requested review from alexbaden and whitneywhtsang October 4, 2024 14:23

Egor-Krivov changed the title ~~Add microbenchmark with A@B^t~~ Add microbenchmark for A@B^t Oct 4, 2024

imporoved docstring

d900fc5

Egor-Krivov enabled auto-merge (squash) October 4, 2024 15:48

alexbaden approved these changes Oct 4, 2024

View reviewed changes

Egor-Krivov merged commit cac829d into main Oct 4, 2024
5 checks passed

Egor-Krivov deleted the egor/gemm_b branch October 4, 2024 15:48

Egor-Krivov mentioned this pull request Oct 7, 2024

Add A^t@B benchmark #2430

Merged

Egor-Krivov mentioned this pull request Oct 9, 2024

[GEMM-perf] matmul is slower when one input needs to be transposed #1795

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add microbenchmark for A@B^t #2408

Add microbenchmark for A@B^t #2408

Egor-Krivov commented Oct 2, 2024 •

edited

Loading

Egor-Krivov commented Oct 4, 2024

alexbaden left a comment

Add microbenchmark for A@B^t #2408

Add microbenchmark for A@B^t #2408

Conversation

Egor-Krivov commented Oct 2, 2024 • edited Loading

Egor-Krivov commented Oct 4, 2024

alexbaden left a comment

Choose a reason for hiding this comment

Egor-Krivov commented Oct 2, 2024 •

edited

Loading