-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable EVEX feature: Embedded Rounding for Avx512F.Add() #94684
Conversation
Note regarding the This serves as a reminder for when your PR is modifying a ref *.cs file and adding/modifying public APIs, please make sure the API implementation in the src *.cs file is documented with triple slash comments, so the PR reviewers can sign off that change. |
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch Issue DetailsDescriptionEnabled EVEX feature: embedded rounding in JIT backend, and exposed only 1 related API:
to establish the complete compiling path starting from importation. DetailsTo tag the related AST node/instruction is embedded rounding enabled, we introduced some new flags:
Follow-up in this PRThis PR, at current stage, is intended to show and discuss the design with maintainers, we are open to adjust and improve the designs. And we will need to add some unit tests to better cover this feature and related APIs. Follow-up after this PRAs this PR only exposed 1 embedded rounding related API, we will make another follow-up PR to expose other related APIs altogether. Since embedded rounding is mostly compatible with arithmetic and casting intrinsics, some more emit path will be impacted, we expect to extend the same design to different emit path if needed.
|
Failures are known, will apply the format patch later. Hi @tannergooding, I think we can start to have some discussion on the design of embedded rounding. Would you please take a look at this draft? Thanks! The impact on the throughput is less than I expected considering the extra bits I introduced in the |
src/tests/JIT/HardwareIntrinsics/X86_Avx512/Avx512F/EmbeddedRounding.Double.cs
Outdated
Show resolved
Hide resolved
1. fix typo in commnets 2. Add SetEvexBroadcastIfNeeded 3. Added mask in insOpts
2. removed round-to-even, the default option from InsOpts as it will be handled on the default path.
732cf72
to
07b7d21
Compare
Rebased the branch, running the pipeline tests again. I'll push the generated format patch shortly. For the tests, we are working internally on templates to cover all the embedded rounding APIs, I can push a commit with the template and cover all the Will the plan sound good to you?
|
…ol byte is not constant. 2. Create a template to generate the unit tests for embedded rounding APIs. 3. nit: fix naming.
Updates:
The expected test results were calculated on i9-9980XE with a C++ program. Edit: Failures look irrelevant to the changes. Hi @tannergooding, format patch and unit tests have been applied. There should be no other blockers. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CC. @dotnet/jit-contrib for secondary review
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some small nits/questions/requests
src/coreclr/jit/emit.h
Outdated
@@ -774,13 +774,13 @@ class emitter | |||
unsigned _idCallAddr : 1; // IL indirect calls: can make a direct call to iiaAddr | |||
unsigned _idNoGC : 1; // Some helpers don't get recorded in GC tables | |||
#if defined(TARGET_XARCH) | |||
unsigned _idEvexbContext : 1; // does EVEX.b need to be set. | |||
#endif // TARGET_XARCH | |||
unsigned _idEvexbContext : 2 // Does Evex.b need to be set for embedded broadcast/embedded rounding. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing a semicolon (probably got lucky that there is a "stray" semicolon below on CLANG_FORMAT_COMMENT_ANCHOR
unsigned _idEvexbContext : 2 // Does Evex.b need to be set for embedded broadcast/embedded rounding. | |
unsigned _idEvexbContext : 2; // Does Evex.b need to be set for embedded broadcast/embedded rounding. |
Also, "Does Evex.b need to be set" makes it sound like this is a 0/1, false/true boolean. Then why does it need 2 bits? I think you should expand the comment to state what the two bits are used for. (You don't need to put the comment all on this one line; make it a multi-line comment above this line if necessary)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the comments!
_idEvexbContext
is used to differentiate several features: normal intrinsics, embedded broadcast, embedded rounding. In the normal or embedded broadcast cases, the semantic for Evex.L'L
remains the same, intrinsic vector length, while in embedded rounding case, the semantic changes to indicate the rounding mode. _idEvexbContext
will be used to inform emitter to specially handle the Evex.L'L
in the embedded rounding scenario. I will properly document this part in the comments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A comment like you just wrote would be great.
src/coreclr/jit/emit.h
Outdated
// x86: 53/49 bits | ||
// amd64: 54/49 bits | ||
// x86: 54/50 bits | ||
// amd64: 55/51 bits |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The "51" should be "50" AFAICT
// amd64: 55/51 bits | |
// amd64: 55/50 bits |
@@ -1542,11 +1542,31 @@ class emitter | |||
{ | |||
return _idEvexbContext != 0; | |||
} | |||
void idSetEvexbContext() | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The function above, idIsEvexbContext()
sounds odd. Should it be renamed idIsEvexbContextSet()
?
src/coreclr/jit/emitxarch.cpp
Outdated
// GetEmbRoudingMode: Get the rounding mode for embedded rounding | ||
// | ||
// Arguments: | ||
// mode -- the flag from the correspoding gentree node indicating the mode. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit (typo)
// mode -- the flag from the correspoding gentree node indicating the mode. | |
// mode -- the flag from the corresponding GenTree node indicating the mode. |
src/coreclr/jit/emitxarch.cpp
Outdated
@@ -1139,6 +1139,29 @@ static bool isLowSimdReg(regNumber reg) | |||
#endif | |||
} | |||
|
|||
//------------------------------------------------------------------------ | |||
// GetEmbRoudingMode: Get the rounding mode for embedded rounding |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// GetEmbRoudingMode: Get the rounding mode for embedded rounding | |
// GetEmbRoundingMode: Get the rounding mode for embedded rounding |
src/coreclr/jit/gentree.h
Outdated
GTF_HW_ER_MASK = 0x30000000, // Bits used by handle types below | ||
GTF_HW_ER_TOEVEN = 0x00000000, // GT_HWINTRINSIC -- embedded rounding mode: ToEven (Default). | ||
GTF_HW_ER_TONEGATIVEINFINITY = 0x10000000, // GT_HWINTRINSIC -- embedded rounding mode: ToNegativeInfinity. | ||
GTF_HW_ER_TOPOSITIVEINFINITY = 0x20000000, // GT_HWINTRINSIC -- embedded rounding mode: ToPositiveInfinity. | ||
GTF_HW_ER_TOZERO = 0x30000000, // GT_HWINTRINSIC -- embedded rounding mode: ToZero. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few notes:
- please align the
=
- Add underscores between words (otherwise it's to easy to read "TOE...", "TONE...", "TOP..."
GTF_HW_ER_MASK = 0x30000000, // Bits used by handle types below | |
GTF_HW_ER_TOEVEN = 0x00000000, // GT_HWINTRINSIC -- embedded rounding mode: ToEven (Default). | |
GTF_HW_ER_TONEGATIVEINFINITY = 0x10000000, // GT_HWINTRINSIC -- embedded rounding mode: ToNegativeInfinity. | |
GTF_HW_ER_TOPOSITIVEINFINITY = 0x20000000, // GT_HWINTRINSIC -- embedded rounding mode: ToPositiveInfinity. | |
GTF_HW_ER_TOZERO = 0x30000000, // GT_HWINTRINSIC -- embedded rounding mode: ToZero. | |
GTF_HW_ER_MASK = 0x30000000, // Bits used by handle types below | |
GTF_HW_ER_TO_EVEN = 0x00000000, // GT_HWINTRINSIC -- embedded rounding mode: FloatRoundingMode = ToEven (Default) "{rn-sae}" | |
GTF_HW_ER_TO_NEGATIVE_INFINITY = 0x10000000, // GT_HWINTRINSIC -- embedded rounding mode: FloatRoundingMode = ToNegativeInfinity "{rd-sae}" | |
GTF_HW_ER_TO_POSITIVE_INFINITY = 0x20000000, // GT_HWINTRINSIC -- embedded rounding mode: FloatRoundingMode = ToPositiveInfinity "{ru-sae}" | |
GTF_HW_ER_TO_ZERO = 0x30000000, // GT_HWINTRINSIC -- embedded rounding mode: FloatRoundingMode = ToZero "{rz-sae}" |
src/coreclr/jit/lowerxarch.cpp
Outdated
@@ -1055,6 +1055,70 @@ GenTree* Lowering::LowerHWIntrinsic(GenTreeHWIntrinsic* node) | |||
|
|||
NamedIntrinsic intrinsicId = node->GetHWIntrinsicId(); | |||
|
|||
if (HWIntrinsicInfo::IsEmbRoundingCompatible(intrinsicId)) | |||
{ | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: unnecessary extra line
src/coreclr/jit/lowerxarch.cpp
Outdated
// this | ||
// point. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// this | |
// point. | |
// this point. |
assert(gtOper == GT_HWINTRINSIC); | ||
gtFlags &= ~GTF_HW_ER_MASK; | ||
} | ||
void SetEmbRoundingMode(uint8_t mode) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please add function comments? In particular, it would be nice to know that mode
is one of the values from System.Runtime.Intrinsics.X86.FloatRoundingMode.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
default: | ||
break; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should this be:
default: | |
break; | |
case 0: | |
break; | |
default: | |
unreached(); |
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new test failures come from this change:
When we call the embedded rounding APIs in the reflection call, JIT will generate a jump table during the emit stage, this jump table is essentially a switch case containing all the possible results under different entry values, say in this case 0~11 (by the existing design, the jump table will iterate from 0 to the max value we set, say here 11), so even this function is supposed to accept 8~11 only, there are case when it might take 0~7.
Based on the given information, do we consider revert the changes, or do we want to modify the jump table generation logic a bit?
p.s. I made a mistake setting the upper bound to be 12, that will take a small fix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks okay to revert the changes, and let this function accept those unexpected values.
For reference, by the current design, Avx2.GatherMaskVector128
is also generating a jump table containing all the results under entry values from 0~8, while this intrinsic only expect 1,2,4,8. Seems, those unexpected value shall be blocked out on the language level, say we don't define values other than those in the System.Runtime.Intrinsics.X86.FloatRoundingMode Enum.
And for the generated results with the unexpected values would be not "harmful" as they will be the results under the default rounding mode.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, so the generic jump table code generates options for all values [0..maxValue]
for some API-specific maximum value, but some APIs (like this one) have a much more specific, smaller set of expected values. Seems ok to leave your code as it was -- maybe document (comment) why the switch allows "unexpected" values.
@Ruihan-Yin Looks like there are some test failures:
https://dev.azure.com/dnceng-public/public/_build/results?buildId=532443&view=ms.vss-test-web.build-test-results-tab |
Yes, please see my comment here. |
let SetEmbRoundingMode accept unexpected values to accomadate the jump table generatation logics.
Thanks for all the reviews and help! |
* some workaround with embedded rounding in compiler backend. * extend _idEvexbContext to 2bit to distinguish embedded broadcast and embedded rounding * Expose APIs with rounding mode. * Apply format patch * Do not include the third parameter in Avx512.Add(left, right) * split _idEvexbContext bits and made a explicit convert function from uint8_t to insOpts for embedded rounding mode. * Remove unexpected comment-out * Fix unexpected deletion * resolve comments: removed redundent bits in instDesc for EVEX.b context. Introduced `emitDispEmbRounding` to display the embedded rounding feature in the disassembly. * bug fix: fix un-needed assertion check. * Apply format patch. * Resolve comments: merge INS_OPTS_EVEX_b and INS_OPTS_EVEX_er_rd Do a pre-check for embedded rounding before lowering. * Add a helper function to generalize the logic when lowering the embedded rounding intrinsics. * Resolve comments: 1. fix typo in commnets 2. Add SetEvexBroadcastIfNeeded 3. Added mask in insOpts * 1. Add unit case for non-default rounding mode 2. removed round-to-even, the default option from InsOpts as it will be handled on the default path. * formatting * 1. Create a fallback jump table for embedded rounding APIs when control byte is not constant. 2. Create a template to generate the unit tests for embedded rounding APIs. 3. nit: fix naming. * remove hand-written unit tests for embedded rounding. * formatting * Resolve comments. * formatting * revert changes: let SetEmbRoundingMode accept unexpected values to accomadate the jump table generatation logics.
Description
Enabled EVEX feature: embedded rounding in JIT backend, and exposed only 1 related API:
public static Vector512<double> Add(Vector512<double> left, Vector512<double> right, FloatRoundingMode mode);
to establish the complete compiling path starting from importation.
Details
To tag the related AST node/instruction is embedded rounding enabled, we introduced some new flags:
GTF_HW_ER_*
: As the APIs are designed in a way thatFloatRoundingMode
information is passed as an extra operand, we choose to let the AST node carry this operand untillowering
, and during lowering, the node will be converted back to the normal version, and the extra operand will be converted to a flag on the node. In this way, we can simply reuse the existing emit paths in the embedded rounding enabled cases.insOpts
to carry this extra information to inform emitter embedded rounding is on, so in theinstrDesc
, we made the following changes:_idEvexbContext
and extend it to 2 bit as_idIsEmbBroadcast
and_idIsEmbRounding
_idEmbRoundingMode
Follow-up in this PR
This PR, at current stage, is intended to show and discuss the design with maintainers, we are open to adjust and improve the designs.
And we will need to add some unit tests to better cover this feature and related APIs.
Follow-up after this PR
As this PR only exposed 1 embedded rounding related API, we will make another follow-up PR to expose other related APIs altogether. Since embedded rounding is mostly compatible with arithmetic and casting intrinsics, some more emit path will be impacted, we expect to extend the same design to different emit path if needed.