-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add more optimizations for (https://github.com/dotnet/runtime/issues/61412) #74806
Conversation
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch Issue Detailsnull
|
public class Issue61412
{
[MethodImpl(MethodImplOptions.NoInlining)]
public static bool Equal0(int x) => (x & 1) == 0;
[MethodImpl(MethodImplOptions.NoInlining)]
public static bool Equal1(int x) => (x & 1) == 1;
[MethodImpl(MethodImplOptions.NoInlining)]
public static bool NotEqual1(int x) => (x & 1) != 1;
[MethodImpl(MethodImplOptions.NoInlining)]
public static bool NotEqual0(int x) => (x & 1) != 0;
} ; Assembly listing for method Issue61412:Equal0(int):bool
; Emitting BLENDED_CODE for X64 CPU with AVX - Windows
; optimized code
; rsp based frame
; partially interruptible
; No PGO data
; Final local variable assignments
;
; V00 arg0 [V00,T00] ( 3, 3 ) int -> rcx single-def
;# V01 OutArgs [V01 ] ( 1, 1 ) lclBlk ( 0) [rsp+00H] "OutgoingArgSpace"
;
; Lcl frame size = 0
G_M14579_IG01: ;; offset=0000H
;; size=0 bbWeight=1 PerfScore 0.00
G_M14579_IG02: ;; offset=0000H
8BC1 mov eax, ecx
F7D0 not eax
83E001 and eax, 1
;; size=7 bbWeight=1 PerfScore 0.75
G_M14579_IG03: ;; offset=0007H
C3 ret
;; size=1 bbWeight=1 PerfScore 1.00
; Total bytes of code 8, prolog size 0, PerfScore 2.55, instruction count 4, allocated bytes for code 8 (MethodHash=70dec70c) for method Issue61412:Equal0(int):bool
; ============================================================
; Assembly listing for method Issue61412:Equal1(int):bool
; Emitting BLENDED_CODE for X64 CPU with AVX - Windows
; optimized code
; rsp based frame
; partially interruptible
; No PGO data
; Final local variable assignments
;
; V00 arg0 [V00,T00] ( 3, 3 ) int -> rcx single-def
;# V01 OutArgs [V01 ] ( 1, 1 ) lclBlk ( 0) [rsp+00H] "OutgoingArgSpace"
;
; Lcl frame size = 0
G_M54258_IG01: ;; offset=0000H
;; size=0 bbWeight=1 PerfScore 0.00
G_M54258_IG02: ;; offset=0000H
8BC1 mov eax, ecx
83E001 and eax, 1
;; size=5 bbWeight=1 PerfScore 0.50
G_M54258_IG03: ;; offset=0005H
C3 ret
;; size=1 bbWeight=1 PerfScore 1.00
; Total bytes of code 6, prolog size 0, PerfScore 2.10, instruction count 3, allocated bytes for code 6 (MethodHash=b52d2c0d) for method Issue61412:Equal1(int):bool
; ============================================================
; Assembly listing for method Issue61412:NotEqual1(int):bool
; Emitting BLENDED_CODE for X64 CPU with AVX - Windows
; optimized code
; rsp based frame
; partially interruptible
; No PGO data
; Final local variable assignments
;
; V00 arg0 [V00,T00] ( 3, 3 ) int -> rcx single-def
;# V01 OutArgs [V01 ] ( 1, 1 ) lclBlk ( 0) [rsp+00H] "OutgoingArgSpace"
;
; Lcl frame size = 0
G_M3143_IG01: ;; offset=0000H
;; size=0 bbWeight=1 PerfScore 0.00
G_M3143_IG02: ;; offset=0000H
8BC1 mov eax, ecx
F7D0 not eax
83E001 and eax, 1
;; size=7 bbWeight=1 PerfScore 0.75
G_M3143_IG03: ;; offset=0007H
C3 ret
;; size=1 bbWeight=1 PerfScore 1.00
; Total bytes of code 8, prolog size 0, PerfScore 2.55, instruction count 4, allocated bytes for code 8 (MethodHash=b798f3b8) for method Issue61412:NotEqual1(int):bool
; ============================================================
; Assembly listing for method Issue61412:NotEqual0(int):bool
; Emitting BLENDED_CODE for X64 CPU with AVX - Windows
; optimized code
; rsp based frame
; partially interruptible
; No PGO data
; Final local variable assignments
;
; V00 arg0 [V00,T00] ( 3, 3 ) int -> rcx single-def
;# V01 OutArgs [V01 ] ( 1, 1 ) lclBlk ( 0) [rsp+00H] "OutgoingArgSpace"
;
; Lcl frame size = 0
G_M35142_IG01: ;; offset=0000H
;; size=0 bbWeight=1 PerfScore 0.00
G_M35142_IG02: ;; offset=0000H
8BC1 mov eax, ecx
83E001 and eax, 1
;; size=5 bbWeight=1 PerfScore 0.50
G_M35142_IG03: ;; offset=0005H
C3 ret
;; size=1 bbWeight=1 PerfScore 1.00
; Total bytes of code 6, prolog size 0, PerfScore 2.10, instruction count 3, allocated bytes for code 6 (MethodHash=94c276b9) for method Issue61412:NotEqual0(int):bool
; ============================================================ |
Can someone review this please? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, the optimization seems to be quite conservatives around surroundings but it was like that before your changes.
Any idea why we don't see arm64 diffs? Is this handled already there? |
Thanks! I guess because of this check? Can't say for sure GenTree* Lowering::OptimizeConstCompare(GenTree* cmp)
{
assert(cmp->gtGetOp2()->IsIntegralConst());
#if defined(TARGET_XARCH) || defined(TARGET_ARM64)
GenTree* op1 = cmp->gtGetOp1();
GenTreeIntCon* op2 = cmp->gtGetOp2()->AsIntCon();
ssize_t op2Value = op2->IconValue();
#ifdef TARGET_ARM64 // <---
// Do not optimise further if op1 has a contained chain.
if (op1->OperIs(GT_AND) &&
(op1->gtGetOp1()->isContainedAndNotIntOrIImmed() || op1->gtGetOp2()->isContainedAndNotIntOrIImmed()))
{
return cmp;
}
#endif
///...
} |
@En3Tho oh, interesting, if you want you can remove that ifdef so we can see SPMI diffs on Ci as part of this PR |
@EgorBo Sure. Let's see what will break :D |
One of failures is #76041 . I'm not sure what those Push work item to Helix failures mean. Is that a pure ci problem? Also, should spmi for arm triggered manually? Am I just missing arm results or there are none? UPD: arm has regressed so reverting that check back |
@En3Tho thanks! |
Closes #61412
Enhances #73120 with (X & 1) == 0 to ((NOT X) & 1) in addition to (X & 1) != 0 to (X & 1)
Cases of == 1 and != 1 are supported too, #73120 transforms them to 0 comparisons
Please correct me as I'm a newbie.