Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Call to mkldnn_matmul from aten::addmm on AArch64 #91763

Closed
wants to merge 6 commits into from

Conversation

milpuz01
Copy link
Contributor

@milpuz01 milpuz01 commented Jan 5, 2023

We have noticed that on BERT_pytorch in torchbenchmark majority of time is spent in running GEMM in aten:addmm. At the moment this calls into BLAS routine, but on AArch64 it will be faster if it calls into mkldnn_matmul. Performance wise compared to build with OpenBLAS it runs faster 1.2x faster on 16 cores with batch size of 8 on Graviton3, while if fast math mode (mkldnn_matmul exposes through oneDNN and Arm Compute Library option to run GEMM with FP32 inputs using BBF16 operations) is enabled then it is 2.3x

Fixes #ISSUE_NUMBER

cc @gujinghui @PenghuiCheng @XiaobingSuper @jianyuh @jgong5 @mingfeima @sanchitintel @ashokei @jingxu10 @min-jean-cho @yanbing-j @Guobing-Chen @Xia-Weiwen @malfet

@linux-foundation-easycla
Copy link

linux-foundation-easycla bot commented Jan 5, 2023

CLA Signed

The committers listed above are authorized under a signed CLA.

  • ✅ login: milpuz01 / name: Milos Puzovic (f794036915dd8dce26bfd01a602d6c979fb25353, e26d718a6140adc960e5a4cd8d0abe245434dfb4, 9285eacf13686d0926e66a52b8a5f15e992fb6f6, 323af68e24fc61520d4441f4f29de42b1255d2ff)

@pytorch-bot
Copy link

pytorch-bot bot commented Jan 5, 2023

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/91763

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit c75af8a:
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@IvanYashchuk IvanYashchuk added module: mkldnn Related to Intel IDEEP or oneDNN (a.k.a. mkldnn) integration matrix multiplication labels Jan 5, 2023
@IvanYashchuk IvanYashchuk requested review from ngimel and removed request for IvanYashchuk January 5, 2023 16:08
@IvanYashchuk IvanYashchuk added module: arm Related to ARM architectures builds of PyTorch. Includes Apple M1 and removed release notes: linalg_frontend release notes category labels Jan 5, 2023
@pytorch-bot pytorch-bot bot added the release notes: linalg_frontend release notes category label Jan 5, 2023
Copy link
Collaborator

@ngimel ngimel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @malfet for changes to Dependencies.cmake

aten/src/ATen/native/LinearAlgebra.cpp Outdated Show resolved Hide resolved
@jgong5 jgong5 requested a review from zhuhaozhe January 6, 2023 03:07
@mikaylagawarecki mikaylagawarecki added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Jan 6, 2023
@milpuz01 milpuz01 requested review from jgong5 and ngimel and removed request for lezcano, nikitaved, zhuhaozhe, jgong5 and ngimel January 6, 2023 15:43
@pytorchmergebot
Copy link
Collaborator

Merge failed

Reason: 1 jobs have failed, first few of them are: Meta Internal-Only Changes Check

Details for Dev Infra team Raised by workflow job

@agunapal
Copy link
Contributor

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@pytorchmergebot
Copy link
Collaborator

Merge failed

Reason: This PR is too stale; the last push date was more than 3 days ago. Please rebase and try again. You can rebase and merge by leaving the following comment on this PR:
@pytorchbot merge -r
Or just rebase by leaving @pytorchbot rebase comment

Details for Dev Infra team Raised by workflow job

@snadampal
Copy link
Collaborator

@pytorchbot rebase

@pytorchmergebot
Copy link
Collaborator

@pytorchbot successfully started a rebase job. Check the current status here

We have noticed that on BERT_pytorch in torchbenchmark majority of time is spent in running GEMM in aten:addmm. At the moment this calls into BLAS routine, but on AArch64 it will be faster if it calls into mkldnn_matmul. Performance wise compared to build with OpenBLAS it runs faster 1.2x faster on 16 cores with batch size of 8 on Graviton3, while if fast math mode (mkldnn_matmul exposes through oneDNN and Arm Compute Library option to run GEMM with FP32 inputs using BBF16 operations) is enabled then it is 2.3x
@pytorchmergebot
Copy link
Collaborator

Successfully rebased addmm_call_to_mkldnn_matmul onto refs/remotes/origin/viable/strict, please pull locally before adding more changes (for example, via git checkout addmm_call_to_mkldnn_matmul && git pull --rebase)

@agunapal
Copy link
Contributor

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@pytorchmergebot
Copy link
Collaborator

Merge failed

Reason: 1 jobs have failed, first few of them are: Meta Internal-Only Changes Check

Details for Dev Infra team Raised by workflow job

@facebook-github-bot
Copy link
Contributor

@malfet has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@malfet
Copy link
Contributor

malfet commented Mar 31, 2023

Lots and lots of jobs fail internally with:

aten/src/ATen/native/LinearAlgebra.cpp:1420:29: error: invalid token at start of a preprocessor expression
 #if defined(__aarch64__) && AT_MKLDNN_ACL_ENABLED()

@malfet
Copy link
Contributor

malfet commented Apr 1, 2023

For posterity: adding new flags to Config.h.in requires mentioning it in following buck build files: TARGETS and ovrsource_aten_gen_defs.bzl

@malfet
Copy link
Contributor

malfet commented Apr 1, 2023

@pytorchbot merge -f "landed internally"

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

snadampal pushed a commit to snadampal/pytorch that referenced this pull request Apr 4, 2023
We have noticed that on BERT_pytorch in torchbenchmark majority of time is spent in running GEMM in aten:addmm. At the moment this calls into BLAS routine, but on AArch64 it will be faster if it calls into mkldnn_matmul. Performance wise compared to build with OpenBLAS it runs faster 1.2x faster on 16 cores with batch size of 8 on Graviton3, while if fast math mode (mkldnn_matmul exposes through oneDNN and Arm Compute Library option to run GEMM with FP32 inputs using BBF16 operations) is enabled then it is 2.3x

Fixes #ISSUE_NUMBER

Pull Request resolved: pytorch#91763
Approved by: https://github.com/jgong5, https://github.com/ngimel, https://github.com/malfet
@atalman atalman removed this from the 2.0.1 milestone May 3, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ciflow/trunk Trigger trunk jobs on your pull request matrix multiplication Merged module: arm Related to ARM architectures builds of PyTorch. Includes Apple M1 module: cpu CPU specific problem (e.g., perf, algorithm) module: mkldnn Related to Intel IDEEP or oneDNN (a.k.a. mkldnn) integration open source release notes: linalg_frontend release notes category triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Projects
None yet
Development

Successfully merging this pull request may close these issues.