Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JIT: Optimize movzx after setcc #66245

Merged
merged 6 commits into from
Mar 7, 2022
Merged

Conversation

EgorBo
Copy link
Member

@EgorBo EgorBo commented Mar 5, 2022

Clear target reg via xor reg, reg instead of relying on zero-extend after SETCC which is heavier and slower

bool Test(int x) => x == 42;
; Method Test(int):bool:this
G_M55728_IG01:              ;; offset=0000H
G_M55728_IG02:              ;; offset=0000H
+      33C0                 xor      eax, eax
       83FA2A               cmp      edx, 42
       0F94C0               sete     al
-      0FB6C0               movzx    rax, al
G_M55728_IG03:              ;; offset=0009H
       C3                   ret      
-; Total bytes of code: 10
+; Total bytes of code: 9

Nice diffs:

benchmarks.run.Linux.x64.checked.mch:
Total bytes of delta: -5230 (-0.03 % of base)

coreclr_tests.pmi.Linux.x64.checked.mch:
Total bytes of delta: -210860 (-0.16 % of base)

libraries.crossgen2.Linux.x64.checked.mch:
Total bytes of delta: -4420 (-0.06 % of base)

libraries.pmi.Linux.x64.checked.mch:
Total bytes of delta: -25312 (-0.05 % of base)

libraries_tests.pmi.Linux.x64.checked.mch:
Total bytes of delta: -49314 (-0.04 % of base)

@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Mar 5, 2022
@ghost ghost assigned EgorBo Mar 5, 2022
@ghost
Copy link

ghost commented Mar 5, 2022

Tagging subscribers to this area: @JulieLeeMSFT
See info in area-owners.md if you want to be subscribed.

Issue Details

Clear target reg via xor reg, reg instead of relying on zero-extend after SETCC which is heavier and slower

bool Test(int x) => x == 42;
; Method Test(int):bool:this
G_M55728_IG01:              ;; offset=0000H
G_M55728_IG02:              ;; offset=0000H
+      33C0                 xor      eax, eax
       83FA2A               cmp      edx, 42
       0F94C0               sete     al
-      0FB6C0               movzx    rax, al
G_M55728_IG03:              ;; offset=0009H
       C3                   ret      
; Total bytes of code: 10

Local diffs were promising...

Author: EgorBo
Assignees: EgorBo
Labels:

area-CodeGen-coreclr

Milestone: -

@EgorBo
Copy link
Member Author

EgorBo commented Mar 5, 2022

it's not safe apparently, xor clears flags

@EgorBo EgorBo closed this Mar 5, 2022
@jakobbotsch
Copy link
Member

it's not safe apparently, xor clears flags

In which cases did it matter? Doesn't all of these cases emit a cmp after?

@EgorBo
Copy link
Member Author

EgorBo commented Mar 5, 2022

it's not safe apparently, xor clears flags

In which cases did it matter? Doesn't all of these cases emit a cmp after?

I wasn't able to detect - tests were keeping failing, I suspect some complex logic in emitInsBinary?

@EgorBo
Copy link
Member Author

EgorBo commented Mar 5, 2022

Ah, I think I've found out why

@EgorBo EgorBo reopened this Mar 5, 2022
@EgorBo
Copy link
Member Author

EgorBo commented Mar 5, 2022

My understanding that it used to crash on things like

xor      eax, eax
cmp      dword ptr [rax+8], 0

the fix is quite conservative

@EgorBo
Copy link
Member Author

EgorBo commented Mar 6, 2022

cc @dotnet/jit-contrib @jakobbotsch

Copy link
Member

@jakobbotsch jakobbotsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. What do the diffs look like if you do something less conservative? e.g.

regMaskTP GenTree::gtGetContainedRegMask()
{
    regMaskTP mask = gtGetRegMask();
    for (GenTree* operand : Operands())
    {
        if (operand->isContained())
        {
            mask |= operand->gtGetContainedRegMask();
        }
    }

    return mask;
}

and do the optimization unless (op1->gtGetContainedRegMask() | op2->gtGetContainedRegMask()) & genRegMask(targetReg)? (not sure if that's completely right)

@kasperk81
Copy link
Contributor

Nice diffs:

nice. and regressions are just cse diffs or something more interesting?

@EgorBo
Copy link
Member Author

EgorBo commented Mar 7, 2022

Nice diffs:

nice. and regressions are just cse diffs or something more interesting?

No, it's too late for CSE, from my understanding it happens because we used to end up with:

cmp      edx, 42
sete     al
movzx    rax, al
movzx    rax, al

where the 2nd mov was optimized out via RedundantMov

 -- suppressing mov because last instruction already moved from src to dst register.

with my PR we end up with:

xor      eax, eax
cmp      edx, 42
sete     al
movzx    rax, al

and we don't benefit from RedundantMov here.

it can be fixed I guess, but not as part of this PR

Repro:

public enum ResultCode
{
    Success = 0,
    SaslBindInProgress = 14,
    NoSuchAttribute = 16,
    InvalidAttributeSyntax = 21,
    NoSuchObject = 32,
    InvalidDNSyntax = 34,
    AliasDereferencingProblem = 36,
    InappropriateAuthentication = 48,
}

internal static bool IsResultCode(ResultCode code)
{
    if (code >= ResultCode.Success && code <= ResultCode.SaslBindInProgress)
    {
        return true;
    }

    if (code >= ResultCode.NoSuchAttribute && code <= ResultCode.InvalidAttributeSyntax)
    {
        return true;
    }

    if (code >= ResultCode.NoSuchObject && code <= ResultCode.InvalidDNSyntax)
    {
        return true;
    }

    return (code == ResultCode.AliasDereferencingProblem || 
            code == ResultCode.InappropriateAuthentication);
}

Comment on lines 875 to 883
regMaskTP GenTree::gtGetContainedRegMask()
{
regMaskTP mask = GetRegNum() != REG_NA ? gtGetRegMask() : 0;
for (GenTree* operand : Operands())
{
mask |= operand->gtGetContainedRegMask();
}
return mask;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This still seems more conservative than necessary, doesn't it only need to or in the operand regs if it itself is contained?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I.e. I guess this is a more correct version than my original:

regMaskTP GenTree::gtGetContainedRegMask()
{
    if (!isContained())
        return gtGetRegMask();

    regMaskTP mask = 0;
    for (GenTree* operand : Operands())
    {
        mask |= operand->gtGetContainedRegMask();
    }
    return mask;
}

@kasperk81
Copy link
Contributor

thanks. overall it is nice win (regression in bignumber format method is a red-herring).
c++ compiler generated interesting asm with that repro (almost half the no. of lines)

IsResultCode(ResultCode):
        cmp     edi, 34
        ja      .L2
        movabs  rdx, -30068932608
        mov     eax, 1
        bt      rdx, rdi
        jc      .L2
        ret
.L2:
        cmp     edi, 36
        sete    al
        cmp     edi, 48
        sete    dl
        or      eax, edx
        ret

@EgorBo
Copy link
Member Author

EgorBo commented Mar 7, 2022

c++ compiler generated interesting asm with that repr

Yeah, the bitmask hack is a nice thing to have, e.g. here it's used in C# https://github.com/dotnet/runtime/blob/main/src%2Flibraries%2FSystem.Private.CoreLib%2Fsrc%2FSystem%2FRuntime%2FCompilerServices%2FRuntimeHelpers.cs#L87-L89 (because jit is not yet able to optimize it)

@EgorBo
Copy link
Member Author

EgorBo commented Mar 7, 2022

@jakobbotsch thank you, diffs are even bigger now

@ghost ghost locked as resolved and limited conversation to collaborators Apr 7, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants