squash xblock for persistent inner reduction #102444

ngimel · 2023-05-28T01:11:21Z

Currently layer norm kernel performance is pretty bad due to triton perf bug https://gist.github.com/ngimel/c1e7f70f8268f038e710e835b0065f63, but since XBLOCK for persistent reduction is 1 we can just drop this dimension and operate on 1d tensors (and then perf of ln kernels improves a lot)
Perf results http://hud.pytorch.org/benchmark/compilers?startTime=Mon%2C%2022%20May%202023%2001%3A27%3A25%20GMT&stopTime=Mon%2C%2029%20May%202023%2001%3A27%3A25%20GMT&suite=torchbench&mode=training&dtype=amp&lBranch=ngimel/persistent_1d&lCommit=1d5175f5e682f37aae15fd217bc3767e1788bacf&rBranch=main&rCommit=c9f4f01981fd73fcc7c27676cc50230cd1b5bc22, approx 4% on hf

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @Xia-Weiwen @wenzhe-nrv @jiayisunx @peterbell10

pytorch-bot · 2023-05-28T01:11:23Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/102444

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 235b88e:
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

ngimel · 2023-05-29T06:12:23Z

@pytorchbot merge

pytorchmergebot · 2023-05-29T06:14:33Z

Merge failed

Reason: This PR needs a release notes: label
If your changes are user facing and intended to be a part of release notes, please use a label starting with release notes:.

If not, please add the topic: not user facing label.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "topic: not user facing"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

Details for Dev Infra team

Raised by workflow job

ngimel · 2023-05-29T06:15:21Z

@pytorchbot merge

pytorchmergebot · 2023-05-29T06:17:22Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorchmergebot · 2023-05-29T07:28:27Z

Merge failed

Reason: 1 jobs have failed, first few of them are: trunk / linux-focal-rocm5.4.2-py3.8 / test (default, 1, 3, linux.rocm.gpu)

Details for Dev Infra team

Raised by workflow job

ngimel · 2023-05-30T02:48:26Z

cc @jataylo, I had to exclude rocm from this optimization, because rocm is on the old triton version that doesn't have tl.reduce and workaround doesn't work for 1d tensors (with some fixes it worked for me locally when I forced the workaround path in triton_helpers.py but it was still failing on rocm CI d97b63e). Can you guys update your triton to a later pin so that tl.reduce workaround is not longer needd?

ngimel · 2023-05-30T02:48:46Z

@pytorchbot merge

pytorchmergebot · 2023-05-30T02:51:04Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

jataylo · 2023-05-31T14:50:06Z

cc @jataylo, I had to exclude rocm from this optimization, because rocm is on the old triton version that doesn't have tl.reduce and workaround doesn't work for 1d tensors (with some fixes it worked for me locally when I forced the workaround path in triton_helpers.py but it was still failing on rocm CI d97b63e). Can you guys update your triton to a later pin so that tl.reduce workaround is not longer needd?

Thanks for the heads up @ngimel, we hit a blocker that stopped us updating triton for awhile to bring in the tl.reduce change (ROCm/triton#208), we've fixed that associated issue and we are hoping to be able to move our triton commit forward shortly after merging the changes into our pyt2.0 branch of triton.

I'll remove the conditionalisation of this commit in the PR that bumps our triton commit and add you as a reviewer.

cc: @dllehr-amd

Revert aten.prod explicit fallback on ROCm and enabling the use of tl.reduce in triton codegen. This PR also enables an optimisation that was previously conditionalised out for ROCm #102444 Pull Request resolved: #104099 Approved by: https://github.com/peterbell10, https://github.com/malfet

This reverts commit 2cc6ae1.

This basically reverts #102444

squash xblock for persistent inner reduction

1d5175f

github-actions bot added ciflow/inductor module: inductor labels May 28, 2023

ngimel requested a review from jansel May 29, 2023 01:28

jansel approved these changes May 29, 2023

View reviewed changes

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label May 29, 2023

pytorchmergebot added the merging label May 29, 2023

pytorchmergebot removed the merging label May 29, 2023

ngimel added the topic: not user facing topic category label May 29, 2023

pytorchmergebot added the merging label May 29, 2023

pytorchmergebot removed the merging label May 29, 2023

ngimel added 3 commits May 29, 2023 19:32

workaround for no tl.reduce + 0d tensors

d97b63e

don't do no_x_dim optimization on rocm

4309e7f

another try

235b88e

pytorchmergebot added the merging label May 30, 2023

pytorchmergebot added Merged and removed merging labels May 30, 2023

pytorchmergebot closed this in 2cc6ae1 May 30, 2023

ezyang mentioned this pull request Jun 3, 2023

[inductor] Inline ComputedBuffer computation when there are no reads #102000

Closed

anijain2305 mentioned this pull request Jun 4, 2023

[inductor] initial value for max_values is of type fp32, but the then block redefines it as <[1], fp32> #102925

Closed

williamwen42 mentioned this pull request Jun 13, 2023

[inductor] certain reductions cause triton compile error #103481

Closed

This was referenced Jun 23, 2023

Update triton commit pin for ROCm #104035

Closed

[ROCm] Enable tl.reduce usage on ROCm #104099

Closed

htyu added a commit that referenced this pull request Oct 28, 2023

Revert "squash xblock for persistent inner reduction (#102444)"

8c7b396

This reverts commit 2cc6ae1.

htyu added a commit that referenced this pull request Oct 30, 2023

[Triton] Always do dimensions.

34f7fee

This basically reverts #102444

htyu added a commit that referenced this pull request Nov 3, 2023

[Triton] Always do dimensions.

937f899

This basically reverts #102444

qxy11 mentioned this pull request Feb 1, 2024

Remove no_x_dim references #118822

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

squash xblock for persistent inner reduction #102444

squash xblock for persistent inner reduction #102444

ngimel commented May 28, 2023 •

edited

Loading

pytorch-bot bot commented May 28, 2023 •

edited

Loading

ngimel commented May 29, 2023

pytorchmergebot commented May 29, 2023

ngimel commented May 29, 2023

pytorchmergebot commented May 29, 2023

pytorchmergebot commented May 29, 2023

ngimel commented May 30, 2023

ngimel commented May 30, 2023

pytorchmergebot commented May 30, 2023

jataylo commented May 31, 2023 •

edited

Loading

squash xblock for persistent inner reduction #102444

squash xblock for persistent inner reduction #102444

Conversation

ngimel commented May 28, 2023 • edited Loading

pytorch-bot bot commented May 28, 2023 • edited Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/102444

✅ No Failures

ngimel commented May 29, 2023

pytorchmergebot commented May 29, 2023

Merge failed

ngimel commented May 29, 2023

pytorchmergebot commented May 29, 2023

Merge started

pytorchmergebot commented May 29, 2023

Merge failed

ngimel commented May 30, 2023

ngimel commented May 30, 2023

pytorchmergebot commented May 30, 2023

Merge started

jataylo commented May 31, 2023 • edited Loading

ngimel commented May 28, 2023 •

edited

Loading

pytorch-bot bot commented May 28, 2023 •

edited

Loading

jataylo commented May 31, 2023 •

edited

Loading