-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding the 2-parameter xplat shuffle helpers and accelerating them #68559
Conversation
Note regarding the This serves as a reminder for when your PR is modifying a ref *.cs file and adding/modifying public APIs, to please make sure the API implementation in the src *.cs file is documented with triple slash comments, so the PR reviewers can sign off that change. |
Tagging subscribers to this area: @JulieLeeMSFT Issue DetailsThis adds the This doesn't provide acceleration for non-constants or for cross-lane operations on
|
This is part, but not all, of the remaining work on #63331 which is important to get in for the .NET 7 Arm64 work. |
/azp run runtime-coreclr jitstress-isas-x86, runtime-coreclr jitstress-isas-arm, runtime-coreclr outerloop |
Azure Pipelines successfully started running 3 pipeline(s). |
/azp run runtime-coreclr jitstress-isas-x86, runtime-coreclr jitstress-isas-arm, runtime-coreclr outerloop |
Azure Pipelines successfully started running 3 pipeline(s). |
… but not handle it as an intrinsic
60b40d2
to
3e635bd
Compare
/azp run runtime-coreclr jitstress-isas-x86, runtime-coreclr jitstress-isas-arm, runtime-coreclr outerloop |
Azure Pipelines successfully started running 3 pipeline(s). |
This is ready for review. Failures are unrelated to this PR, but I'll log issues for the ones that are obvious actual problems. |
CC. @dotnet/jit-contrib |
src/coreclr/jit/gentree.cpp
Outdated
|
||
simdBaseJitType = varTypeIsUnsigned(simdBaseType) ? CORINFO_TYPE_UBYTE : CORINFO_TYPE_BYTE; | ||
|
||
GenTree* op1Dup = fgMakeMultiUse(&op1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you tested the case where op1
is something complex so that fgMakeMultiUse
needs to create a temp?
I think you need to change fgMakeMultiUse
to accept a CORINFO_CLASS_HANDLE
that it can pass along to fgInsertCommaFormTemp
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. I'll update to allow passing that down and will add another test that explicitly covers this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not very familiar with these intrinsics, but given the extensive tests this LGTM
This adds the
2 parameter
shuffle APIs of the formVector128<T> Shuffle(Vector128<T> value, Vector128<TInteger> indices)
and provides basic acceleration for these on x86, x64, and Arm64.This doesn't provide acceleration for non-constants or for cross-lane operations on
Vector256<T>
x86/x64 since that requires additional complexity that would delay the baseline support.Longer term, we'll want to introduce a
GenTreeVecCon
(to parityGenTreeDblCon
andGenTreeIntCon
) and accelerate these additional scenarios.Additionally we'll want to expose the
3 parameter
shuffle APIs of the formVector128<T> Shuffle(Vector128<T> lower, Vector128<T> upper, Vector128<TInteger> indices)
, which will come in a follow up PR.