-
Notifications
You must be signed in to change notification settings - Fork 2.7k
Conversation
83fa0db
to
71bad9a
Compare
Diff summary:
|
I added a new function -
The code in |
@dotnet/jit-contrib PTAL |
I am wondering if CreateTemporary is enough for #5867 or we need a more fundamental solution. |
// stored in RAX := Quotient, RDX := Remainder. | ||
// Move the result to the desired register, if necessary | ||
if (oper == GT_DIV || oper == GT_UDIV) | ||
// Signed divide RDX:RAX by r/m64, with result |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should update this comment, since this now applies to both signed and unsigned divide.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can update the comment but this isn't code that I have changed. The GitHub diff viewer shows changes here because it lacks a "ignore whitespace changes" option.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"ignore whitespace changes"
If we add w=1
to the URL query, then whitespace is ignored:
- diff view: https://github.com/dotnet/coreclr/pull/5871/files?w=1
- commit view: 651c419?w=1
-- see more neat/cheat tricks like ts
for tab space etc. here: http://git.io/sheet
if (isDiv) | ||
{ | ||
if ((type == TYP_INT && divisorValue == INT_MIN) || | ||
(type == TYP_LONG && divisorValue == INT64_MIN)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see from below that you have tests that exercise this case, but I wonder if this will require special support for 32-bit targets, since the MSIL relational operators always return an int, and I suspect the JIT assumes the same for the GT_EQ and similar operators.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this currently runs after decomposition, I don't think it will cause issues, as there should not be any long relationals at this point, for a 32-bit target. However, I guess that also means that these transformations will be less effective for longs, in general. (The woes of phase ordering.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and I suspect the JIT assumes the same for the GT_EQ and similar operators
codegenxarch considers both TYP_INT and TYP_LONG to be valid: https://github.com/dotnet/coreclr/blob/master/src/jit/codegenxarch.cpp#L7161. It generates the same code for both but that's normal given the way x64 zero fills 64 bit registers.
I'm not familiar with ARM but codegenarm64 seems to produce a 64 bit result anyway: https://github.com/dotnet/coreclr/blob/master/src/jit/instr.cpp#L307
for a 32-bit target. However, I guess that also means that these transformations will be less effective for longs, in general. (The woes of phase ordering.)
Yes, that's a bit of a problem. I think that the way things are set up now it doesn't actually matter as morph will replace all long div/mod operations with helper calls. JIT32 doesn't do any optimizations for long div by power of 2, it always uses the helper.
The changes look good to me. |
@mikedn - I am going to run these changes against the internal tests, but at the moment I am running into some technical difficulties. I hope to get them resolved soon. I'd really like to see this merged. |
Regarding I can't say I like the solution but I don't think there are many alternatives:
|
I completed the internal tests, and it looks good. I did a very limited analysis of the diffs, and this is clearly getting more cases - mostly results in larger code, as expected, and uses more temps (due to introducing new ones in rationalize), but is actually smaller in some cases due to not forcing the operands to fixed registers. |
Optimizing GT_DIV/GT_UDIV/GT_MOD/GT_UMOD by power of 2 in codegen is problematic because the xarch DIV instruction has special register requirements. By the time codegen decides to perform the optimization the rax and rdx registers have been already allocated by LSRA even though they're not always needed (as it happens in the case of unsigned division where CDQ isn't used). Since the JIT can't represent a CDQ instruction in its IR an arithmetic shift (GT_RSH) has been instead to extract the dividend sign. xarch's SAR is larger than CDQ but it has the advantage that it doesn't require specific registers. Also, arithmetic shifts are available on architectures other than xarch. Example: method "static int foo(int x) => x / 8;" is now compiled to mov eax, ecx mov edx, eax sar edx, 31 and edx, 7 add edx, eax mov eax, edx sar eax, 3 instead of mov eax, ecx cdq and edx, 7 add eax, edx sar eax, 3 As a side-effect of this change the optimization now also works when the divisor is too large to be contained. Previously this wasn't possible because the divisor constant needed to be modified during codegen but the constant was already loaded into a register. Example: method "static ulong foo(ulong x) => x / 4294967296;" is now compiled to mov rax, rcx shr rax, 32 whereas before a DIV instruction was used. This change also fixes an issue in fgShouldUseMagicNumberDivide. The optimization that is done in lower can handle negative power of 2 divisors but fgShouldUseMagicNumberDivide handled those cases because it didn't check the absolute value of the divisor. Example: method "static int foo(int x) => return x / -2;" is now compiled to mov eax, ecx mov edx, eax shr edx, 31 add edx, eax sar edx, 1 mov eax, edx neg eax instead of mov eax, 0x7FFFFFFF imul edx:eax, ecx mov eax, edx sub eax, ecx mov edx, eax shr edx, 31 add eax, edx
Sure, squashed down to 2 commits, one for tests and one for implementation.
BTW, is there a way to ask LSRA to allocate RAX/RDX but only if they happen to be available? If we could ask the source of the sign extending shift to be in RAX and the destination in RDX then codegen could change Thanks for review! |
Currently, register preferences (vs. requirements) are only set on lclVar intervals. The |
// Perform the 'targetType' (64-bit or 32-bit) divide instruction | ||
instruction ins; | ||
if (oper == GT_UMOD || oper == GT_UDIV) | ||
ins = INS_div; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Needs braces (i.e.) and I would also add a comment says that INS_div is an unsigned divide (as I always found this mnemonic to be confusing on x86)
{
ins = INS_div; // unsigned divide
}
else
{
ins = INS_idiv; // signed divide
}
Looks Good |
// const divisor into equivalent but faster sequences. | ||
// | ||
// Arguments: | ||
// pTree: pointer to the parent node's link to the node we care about |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function header needs to be updated as this method has another additional argument 'data'
Looks good. |
I ran asmdiffs for arm64, and took a brief look at them. They are similar to those for amd64. I'm going to merge this in the interest of forward progress. @mikedn - could you do a follow-up PR to address the feedback? @briansull - we may want to assess the perf implications for arm64; I don't know if the relative costs are similar. |
Sure, though I'm not sure what's the hurry with this one :) |
@mikedn - only that it's been so long, the outstanding feedback is relatively minor, and (IMO) it's easier to focus on just the new changes when it's a new PR. |
Improve div/mod by const power of 2 Commit migrated from dotnet/coreclr@3dfe85f
Same as #1241 with a bug fix
Fixes #1207