Call to mkldnn_matmul from aten::addmm on AArch64 #91763

milpuz01 · 2023-01-05T13:54:20Z

We have noticed that on BERT_pytorch in torchbenchmark majority of time is spent in running GEMM in aten:addmm. At the moment this calls into BLAS routine, but on AArch64 it will be faster if it calls into mkldnn_matmul. Performance wise compared to build with OpenBLAS it runs faster 1.2x faster on 16 cores with batch size of 8 on Graviton3, while if fast math mode (mkldnn_matmul exposes through oneDNN and Arm Compute Library option to run GEMM with FP32 inputs using BBF16 operations) is enabled then it is 2.3x

Fixes #ISSUE_NUMBER

cc @gujinghui @PenghuiCheng @XiaobingSuper @jianyuh @jgong5 @mingfeima @sanchitintel @ashokei @jingxu10 @min-jean-cho @yanbing-j @Guobing-Chen @Xia-Weiwen @malfet

linux-foundation-easycla · 2023-01-05T13:54:23Z

The committers listed above are authorized under a signed CLA.

✅ login: milpuz01 / name: Milos Puzovic (f794036915dd8dce26bfd01a602d6c979fb25353, e26d718a6140adc960e5a4cd8d0abe245434dfb4, 9285eacf13686d0926e66a52b8a5f15e992fb6f6, 323af68e24fc61520d4441f4f29de42b1255d2ff)

pytorch-bot · 2023-01-05T13:54:23Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/91763

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit c75af8a:
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

ngimel

cc @malfet for changes to Dependencies.cmake

aten/src/ATen/native/LinearAlgebra.cpp

pytorchmergebot · 2023-03-26T03:21:43Z

Merge failed

Reason: 1 jobs have failed, first few of them are: Meta Internal-Only Changes Check

Details for Dev Infra team

Raised by workflow job

agunapal · 2023-03-31T16:43:10Z

@pytorchbot merge

pytorchmergebot · 2023-03-31T16:45:06Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorchmergebot · 2023-03-31T16:45:07Z

Merge failed

Reason: This PR is too stale; the last push date was more than 3 days ago. Please rebase and try again. You can rebase and merge by leaving the following comment on this PR:
@pytorchbot merge -r
Or just rebase by leaving @pytorchbot rebase comment

Details for Dev Infra team

Raised by workflow job

snadampal · 2023-03-31T16:53:54Z

@pytorchbot rebase

pytorchmergebot · 2023-03-31T16:56:09Z

@pytorchbot successfully started a rebase job. Check the current status here

We have noticed that on BERT_pytorch in torchbenchmark majority of time is spent in running GEMM in aten:addmm. At the moment this calls into BLAS routine, but on AArch64 it will be faster if it calls into mkldnn_matmul. Performance wise compared to build with OpenBLAS it runs faster 1.2x faster on 16 cores with batch size of 8 on Graviton3, while if fast math mode (mkldnn_matmul exposes through oneDNN and Arm Compute Library option to run GEMM with FP32 inputs using BBF16 operations) is enabled then it is 2.3x

pytorchmergebot · 2023-03-31T16:56:15Z

Successfully rebased addmm_call_to_mkldnn_matmul onto refs/remotes/origin/viable/strict, please pull locally before adding more changes (for example, via git checkout addmm_call_to_mkldnn_matmul && git pull --rebase)

agunapal · 2023-03-31T18:27:05Z

@pytorchbot merge

pytorchmergebot · 2023-03-31T18:28:57Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorchmergebot · 2023-03-31T18:29:01Z

Merge failed

Reason: 1 jobs have failed, first few of them are: Meta Internal-Only Changes Check

Details for Dev Infra team

Raised by workflow job

facebook-github-bot · 2023-03-31T18:45:22Z

@malfet has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

malfet · 2023-03-31T20:44:29Z

Lots and lots of jobs fail internally with:

aten/src/ATen/native/LinearAlgebra.cpp:1420:29: error: invalid token at start of a preprocessor expression
 #if defined(__aarch64__) && AT_MKLDNN_ACL_ENABLED()

malfet · 2023-04-01T03:04:54Z

For posterity: adding new flags to Config.h.in requires mentioning it in following buck build files: TARGETS and ovrsource_aten_gen_defs.bzl

malfet · 2023-04-01T04:23:59Z

@pytorchbot merge -f "landed internally"

pytorchmergebot · 2023-04-01T04:25:52Z

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

We have noticed that on BERT_pytorch in torchbenchmark majority of time is spent in running GEMM in aten:addmm. At the moment this calls into BLAS routine, but on AArch64 it will be faster if it calls into mkldnn_matmul. Performance wise compared to build with OpenBLAS it runs faster 1.2x faster on 16 cores with batch size of 8 on Graviton3, while if fast math mode (mkldnn_matmul exposes through oneDNN and Arm Compute Library option to run GEMM with FP32 inputs using BBF16 operations) is enabled then it is 2.3x Fixes #ISSUE_NUMBER Pull Request resolved: pytorch#91763 Approved by: https://github.com/jgong5, https://github.com/ngimel, https://github.com/malfet

milpuz01 requested review from lezcano, nikitaved and IvanYashchuk as code owners January 5, 2023 13:54

pytorch-bot bot added the release notes: linalg_frontend release notes category label Jan 5, 2023

pytorchbot added the open source label Jan 5, 2023

IvanYashchuk added module: mkldnn Related to Intel IDEEP or oneDNN (a.k.a. mkldnn) integration matrix multiplication labels Jan 5, 2023

IvanYashchuk requested review from ngimel and removed request for IvanYashchuk January 5, 2023 16:08

IvanYashchuk added module: arm Related to ARM architectures builds of PyTorch. Includes Apple M1 and removed release notes: linalg_frontend release notes category labels Jan 5, 2023

pytorch-bot bot added the release notes: linalg_frontend release notes category label Jan 5, 2023

ngimel reviewed Jan 5, 2023

View reviewed changes

aten/src/ATen/native/LinearAlgebra.cpp Outdated Show resolved Hide resolved

jgong5 requested a review from zhuhaozhe January 6, 2023 03:07

mikaylagawarecki added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Jan 6, 2023

milpuz01 requested review from jgong5 and ngimel and removed request for lezcano, nikitaved, zhuhaozhe, jgong5 and ngimel January 6, 2023 15:43

milpuz01 added 6 commits March 31, 2023 16:56

Address comments from review

9f66949

Fix indentation that should make it to build in Windows

6c1096e

Call to FP32 mkldnn_matmul on AArch64

2173071

Set default value of AT_MKLDNN_ACL_ENABLED at global level

2396462

Make it to build on Windows

c75af8a

pytorchmergebot force-pushed the addmm_call_to_mkldnn_matmul branch from f33249a to c75af8a Compare March 31, 2023 16:56

pytorchmergebot added the Merged label Apr 1, 2023

pytorchmergebot closed this in 2630144 Apr 1, 2023

snadampal mentioned this pull request Apr 4, 2023

[v2.0.1] Release Tracker #97272

Closed

atalman removed this from the 2.0.1 milestone May 3, 2023

nSircombe mentioned this pull request Aug 11, 2023

Fix the dispatch failure when output C for addmm is not transposed ARM-software/Tool-Solutions#189

Open

snadampal mentioned this pull request Nov 30, 2023

[aarch64] nn.Linear(20, 1) inference fails #114750

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Call to mkldnn_matmul from aten::addmm on AArch64 #91763

Call to mkldnn_matmul from aten::addmm on AArch64 #91763

milpuz01 commented Jan 5, 2023 •

edited by pytorch-bot bot

Loading

linux-foundation-easycla bot commented Jan 5, 2023 •

edited

Loading

pytorch-bot bot commented Jan 5, 2023 •

edited

Loading

ngimel left a comment

pytorchmergebot commented Mar 26, 2023

agunapal commented Mar 31, 2023

pytorchmergebot commented Mar 31, 2023

pytorchmergebot commented Mar 31, 2023

snadampal commented Mar 31, 2023

pytorchmergebot commented Mar 31, 2023

pytorchmergebot commented Mar 31, 2023

agunapal commented Mar 31, 2023

pytorchmergebot commented Mar 31, 2023

pytorchmergebot commented Mar 31, 2023

facebook-github-bot commented Mar 31, 2023

malfet commented Mar 31, 2023

malfet commented Apr 1, 2023

malfet commented Apr 1, 2023

pytorchmergebot commented Apr 1, 2023

Call to mkldnn_matmul from aten::addmm on AArch64 #91763

Call to mkldnn_matmul from aten::addmm on AArch64 #91763

Conversation

milpuz01 commented Jan 5, 2023 • edited by pytorch-bot bot Loading

linux-foundation-easycla bot commented Jan 5, 2023 • edited Loading

pytorch-bot bot commented Jan 5, 2023 • edited Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/91763

✅ No Failures

ngimel left a comment

Choose a reason for hiding this comment

pytorchmergebot commented Mar 26, 2023

Merge failed

agunapal commented Mar 31, 2023

pytorchmergebot commented Mar 31, 2023

Merge started

pytorchmergebot commented Mar 31, 2023

Merge failed

snadampal commented Mar 31, 2023

pytorchmergebot commented Mar 31, 2023

pytorchmergebot commented Mar 31, 2023

agunapal commented Mar 31, 2023

pytorchmergebot commented Mar 31, 2023

Merge started

pytorchmergebot commented Mar 31, 2023

Merge failed

facebook-github-bot commented Mar 31, 2023

malfet commented Mar 31, 2023

malfet commented Apr 1, 2023

malfet commented Apr 1, 2023

pytorchmergebot commented Apr 1, 2023

Merge started

milpuz01 commented Jan 5, 2023 •

edited by pytorch-bot bot

Loading

linux-foundation-easycla bot commented Jan 5, 2023 •

edited

Loading

pytorch-bot bot commented Jan 5, 2023 •

edited

Loading