Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding the 2-parameter xplat shuffle helpers and accelerating them #68559

Merged
merged 19 commits into from
May 2, 2022

Conversation

tannergooding
Copy link
Member

@tannergooding tannergooding commented Apr 26, 2022

This adds the 2 parameter shuffle APIs of the form Vector128<T> Shuffle(Vector128<T> value, Vector128<TInteger> indices) and provides basic acceleration for these on x86, x64, and Arm64.

This doesn't provide acceleration for non-constants or for cross-lane operations on Vector256<T> x86/x64 since that requires additional complexity that would delay the baseline support.


Longer term, we'll want to introduce a GenTreeVecCon (to parity GenTreeDblCon and GenTreeIntCon) and accelerate these additional scenarios.

Additionally we'll want to expose the 3 parameter shuffle APIs of the form Vector128<T> Shuffle(Vector128<T> lower, Vector128<T> upper, Vector128<TInteger> indices), which will come in a follow up PR.

@dotnet-issue-labeler
Copy link

Note regarding the new-api-needs-documentation label:

This serves as a reminder for when your PR is modifying a ref *.cs file and adding/modifying public APIs, to please make sure the API implementation in the src *.cs file is documented with triple slash comments, so the PR reviewers can sign off that change.

@dotnet-issue-labeler dotnet-issue-labeler bot added area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI new-api-needs-documentation labels Apr 26, 2022
@ghost ghost assigned tannergooding Apr 26, 2022
@ghost
Copy link

ghost commented Apr 26, 2022

Tagging subscribers to this area: @JulieLeeMSFT
See info in area-owners.md if you want to be subscribed.

Issue Details

This adds the 2 parameter shuffle APIs of the form Vector128<T> Shuffle(Vector128<T> value, Vector128<TInteger> indices) and provides basic acceleration for these on x86, x64, and Arm64.

This doesn't provide acceleration for non-constants or for cross-lane operations on Vector256<T> x86/x64 since that requires additional complexity that would delay the baseline support.

Author: tannergooding
Assignees: -
Labels:

area-CodeGen-coreclr, new-api-needs-documentation

Milestone: -

@tannergooding
Copy link
Member Author

This is part, but not all, of the remaining work on #63331 which is important to get in for the .NET 7 Arm64 work.

@tannergooding tannergooding added this to the 7.0.0 milestone Apr 26, 2022
@tannergooding
Copy link
Member Author

/azp run runtime-coreclr jitstress-isas-x86, runtime-coreclr jitstress-isas-arm, runtime-coreclr outerloop

@azure-pipelines
Copy link

Azure Pipelines successfully started running 3 pipeline(s).

@tannergooding
Copy link
Member Author

/azp run runtime-coreclr jitstress-isas-x86, runtime-coreclr jitstress-isas-arm, runtime-coreclr outerloop

@azure-pipelines
Copy link

Azure Pipelines successfully started running 3 pipeline(s).

@tannergooding
Copy link
Member Author

/azp run runtime-coreclr jitstress-isas-x86, runtime-coreclr jitstress-isas-arm, runtime-coreclr outerloop

@azure-pipelines
Copy link

Azure Pipelines successfully started running 3 pipeline(s).

@tannergooding tannergooding marked this pull request as ready for review April 27, 2022 13:58
@tannergooding
Copy link
Member Author

This is ready for review. Failures are unrelated to this PR, but I'll log issues for the ones that are obvious actual problems.

@tannergooding
Copy link
Member Author

CC. @dotnet/jit-contrib


simdBaseJitType = varTypeIsUnsigned(simdBaseType) ? CORINFO_TYPE_UBYTE : CORINFO_TYPE_BYTE;

GenTree* op1Dup = fgMakeMultiUse(&op1);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you tested the case where op1 is something complex so that fgMakeMultiUse needs to create a temp?
I think you need to change fgMakeMultiUse to accept a CORINFO_CLASS_HANDLE that it can pass along to fgInsertCommaFormTemp.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. I'll update to allow passing that down and will add another test that explicitly covers this.

Copy link
Member

@jakobbotsch jakobbotsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not very familiar with these intrinsics, but given the extensive tests this LGTM

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI new-api-needs-documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants