Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

emit branchless form of (i >= 0 && j >= 0)/(i!=0&& j!= 0) for signed integers #62689

Merged
merged 6 commits into from
Oct 31, 2022

Conversation

pedrobsaila
Copy link
Contributor

Fixes #61940

@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Dec 12, 2021
@ghost ghost added the community-contribution Indicates that the PR has been added by a community member label Dec 12, 2021
@ghost
Copy link

ghost commented Dec 12, 2021

Tagging subscribers to this area: @JulieLeeMSFT
See info in area-owners.md if you want to be subscribed.

Issue Details

Fixes #61940

Author: pedrobsaila
Assignees: -
Labels:

area-CodeGen-coreclr

Milestone: -

Comment on lines 8162 to 8163
assert(m_testInfo1.compTree->gtOper == GT_EQ || m_testInfo1.compTree->gtOper == GT_NE ||
m_testInfo1.compTree->gtOper == GT_LT || m_testInfo1.compTree->gtOper == GT_GE);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
assert(m_testInfo1.compTree->gtOper == GT_EQ || m_testInfo1.compTree->gtOper == GT_NE ||
m_testInfo1.compTree->gtOper == GT_LT || m_testInfo1.compTree->gtOper == GT_GE);
assert(m_testInfo1.compTree->OperIs(GT_EQ, GT_NE, GT_LT, GT_GE));


if ((cond->gtOper != GT_EQ) && (cond->gtOper != GT_NE))
if ((cond->gtOper != GT_EQ) && (cond->gtOper != GT_NE) && (cond->gtOper != GT_LT) && (cond->gtOper != GT_GE))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if ((cond->gtOper != GT_EQ) && (cond->gtOper != GT_NE) && (cond->gtOper != GT_LT) && (cond->gtOper != GT_GE))
if (!cond->OperIs(GT_EQ, GT_NE, GT_LT, GT_GE))

Comment on lines 6 to 14
$(CLRTestBatchPreCommands)
set DOTNET_JitNoStructPromotion=1
set DOTNET_JitNoCSE=1
]]></CLRTestBatchPreCommands>
<BashCLRTestPreCommands><![CDATA[
$(BashCLRTestPreCommands)
export DOTNET_JitNoStructPromotion=1
export DOTNET_JitNoCSE=1
]]></BashCLRTestPreCommands>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this really need JitNoStructPromotion and JitNoCSE - seems like a copy & paste error?

Also, I don't think this should be Jit/Regression folder, that is meant for tests for bug fixes.

JIT\opt\OptimizeBools would be a better place.

@AndyAyersMS
Copy link
Member

Looking at https://dev.azure.com/dnceng/public/_build/results?buildId=1527034&view=ms.vss-build-web.run-extensions-tab there are more regressions than I'd expect to see from something like this.

Can you take a look at some of these?

@pedrobsaila
Copy link
Contributor Author

pedrobsaila commented Jan 11, 2022

Thanks @AndyAyersMS for pointing that, I will work on it. This is my first PR on coreclr so I didn't know about superpmi (which is kind of cool). Just to be sure : so my PR doesn't introduce regressions I need to ensure that 'Total bytes of delta' is always negative on the different OS/Arch ? since my code is supposed to produce less assembly code

@AndyAyersMS
Copy link
Member

I need to ensure that 'Total bytes of delta' is always negative on the different OS/Arch?

Ideally, yes, a change like this would only produce smaller code. But sometimes this isn't possible.

What we typically do is look at the worst cases and see why the code size has increased. Often this uncovers things that can be improved in the optimization or reveals weaknesses in a related optimization that we should try and address.

You should be able to run SPMI on just the specific method instances that have regressed and from there use jit dumps and to analyze what the jit is doing. Let me know if you need help getting this running.

Sometimes the root cause of the regressions is complicated, and the regressions are hard to address. We then need to weigh the pluses and minuses to decide if we want to move forward with a change.

@AndyAyersMS AndyAyersMS added the needs-author-action An issue or pull request that requires more info or actions from the author. label Jan 18, 2022
@ghost ghost removed the needs-author-action An issue or pull request that requires more info or actions from the author. label Jan 23, 2022
@JulieLeeMSFT JulieLeeMSFT added this to the 7.0.0 milestone Jan 25, 2022
@pedrobsaila
Copy link
Contributor Author

pedrobsaila commented Jan 29, 2022

@AndyAyersMS I found out my changes introduce calls to hackishModuleName:hackishMethodName(). They don't appear in jit dumps, but they are present in assembly code. Do you have any idea about their utility and what might cause them to appear?

@AndyAyersMS
Copy link
Member

In the SPMI data collection we often don't collect the actual names (to save some space). During SPMI replay if the jit then asks for a name, we'll return one of these hackishModuleName:hackishMethodName strings.

const char* MethodContext::repGetMethodName(CORINFO_METHOD_HANDLE ftn, const char** moduleName)
{
const char* result = "hackishMethodName";
DD value;
DLD key;
key.A = CastHandle(ftn);
key.B = (moduleName != nullptr);
int itemIndex = -1;
if (GetMethodName != nullptr)
itemIndex = GetMethodName->GetIndex(key);
if (itemIndex < 0)
{
if (moduleName != nullptr)
*moduleName = "hackishModuleName";
}
else
{
value = GetMethodName->Get(key);
DEBUG_REP(dmpGetMethodName(key, value));
if (moduleName != nullptr)
*moduleName = (const char*)GetMethodName->GetBuffer(value.B);
result = (const char*)GetMethodName->GetBuffer(value.A);
}
return result;
}

It would surprise me if you start seeing calls in the emitted assembly after your change that weren't there before.

Probably best to look at a specific example, can you share one?

@pedrobsaila
Copy link
Contributor Author

pedrobsaila commented Jan 29, 2022

Probably best to look at a specific example, can you share one?

Here are some examples :

  • Microsoft.AspNetCore.Http.HostString:GetParts(Microsoft.Extensions.Primitives.StringSegment,byref,byref) ( libraries_tests.pmi.windows.x64.checked.mch diffs, file=163821.dasm, line=85)
  • Microsoft.Extensions.Primitives.StringSegment:AsSpan(int,int) (libraries.pmi.windows.x64.checked.mch diffs, file=240711.dasm, line=513)

Copy link
Member

@AndyAyersMS AndyAyersMS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll take a closer look at your regressions and let you know what I see.


if ((cond->gtOper != GT_EQ) && (cond->gtOper != GT_NE))
// we don't optimize this statements because we might delete them (e.g. array range checks)
if (cond->OperIs(GT_LT, GT_GE) && (m_b1->bbFlags != BBF_IMPORTED || m_b2->bbFlags != BBF_IMPORTED))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This check looks a little odd to me. We usually don't care about BBF_IMPORTED later on in the jit. I wonder if we should be checking something else here instead to convey the meaning better.

Do you recall what lead you to add this?

Copy link
Contributor Author

@pedrobsaila pedrobsaila Feb 6, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

array range checks instructions (indexer >= 0, indexer < array.Length) are added to the user code, but later they get deleted during the Range Check Elimination step. When those instructions are passed through the boolean optimization step they get optimized with user code, so they're no longer deleted. That's why I added the BBF_IMPORTED filter so I optimize only user code, I know the filter is not constraining enough, but I didn't find some flag/property that could filter only array range check instructions.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is to fix a regression and not to fix some correctness issue? If so then it's likely a similar effect to what I mention below with jump threading. An upstream opt inhibits a downstream one.

If my experiment to move OptOptimizeBools to the end of the optimization pass pans out, then this won't be necessary.

Should have some data on it soon.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes it's to fix regression (not to fix correctness issue)

@AndyAyersMS
Copy link
Member

In System.Net.HttpWebRequest:AddRange(System.String,long,long):this the new transformation kicks in:

Folded boolean conditions of BB02 and BB03 to :
STMT00001 ( 0x00E[E-] ... 0x011 )
               [000008] -----+------              *  JTRUE     void  
               [000007] J----+-N----              \--*  LT        int   
               [000387] ------------                 +--*  OR        long  
               [000004] -----+------                 |  +--*  LCL_VAR   long   V02 arg2         
     (  1,  1) [000033] ------------                 |  \--*  LCL_VAR   long   V03 arg3         
               [000006] -----+------                 \--*  CNS_INT   long   0

Unfortunately, this blocks jump threading later on, as it can't reason about compound conditionals like OR yet.

In the baseline jit we see both the BB02 and BB03 conditions allowing us to fold other branches:

Dominator BB03 of BB08 has relop with same liberal VN
N003 (  3,  3) [000036] J------N----              *  LT        int    $142
N001 (  1,  1) [000033] ------------              +--*  LCL_VAR   long   V03 arg3         u:1 $c1
N002 (  1,  1) [000035] ------------              \--*  CNS_INT   long   0 $280
 Redundant compare; current relop:
N003 (  3,  3) [000224] J------N----              *  LT        int    $142
N001 (  1,  1) [000221] ------------              +--*  LCL_VAR   long   V19 tmp15        u:1 $c1
N002 (  1,  1) [000223] ------------              \--*  CNS_INT   long   0 $280
Fall through successor BB04 of BB03 reaches, relop must be false

Redundant branch opt in BB08...

Dominator BB02 of BB06 has relop with same liberal VN
N003 (  3,  3) [000007] J------N----              *  LT        int    $141
N001 (  1,  1) [000004] ------------              +--*  LCL_VAR   long   V02 arg2         u:1 $c0
N002 (  1,  1) [000006] ------------              \--*  CNS_INT   long   0 $280
 Redundant compare; current relop:
N003 (  3,  3) [000148] J------N----              *  LT        int    $141
N001 (  1,  1) [000145] ------------              +--*  LCL_VAR   long   V14 tmp10        u:1 $c0
N002 (  1,  1) [000147] ------------              \--*  CNS_INT   long   0 $280
Fall through successor BB03 of BB02 reaches, relop must be false

Redundant branch opt in BB06...

It's probably not possible to anticipate when this effect might happen in OptOptimizeBools. One idea is to perhaps defer this boolean optimization until later, at the end of the main optimized pass, instead of running it before. While I'm not 100% sure, I can't think of any other optimizations OptOptimizeBools might enable. I'll give this a try.

Another idea is to teach branch threading to look through boolean operators on compares. If we're on the true side of a branch and we see an AND we know both conditions must be true; if we're on the false side we know both must be false, etc. This might prove to be a bit challenging but should find more cases (even without this change). Will add notes to #48115.

@AndyAyersMS
Copy link
Member

It's probably not possible to anticipate when this effect might happen in OptOptimizeBools. One idea is to perhaps defer this boolean optimization until later, at the end of the main optimized pass, instead of running it before. While I'm not 100% sure, I can't think of any other optimizations OptOptimizeBools might enable. I'll give this a try.

This indeed fixes AddRange but causes regressions elsewhere, so we will have to look more broadly at what it does.

@AndyAyersMS
Copy link
Member

It's probably not possible to anticipate when this effect might happen in OptOptimizeBools. One idea is to perhaps defer this boolean optimization until later, at the end of the main optimized pass, instead of running it before. While I'm not 100% sure, I can't think of any other optimizations OptOptimizeBools might enable. I'll give this a try.

This indeed fixes AddRange but causes regressions elsewhere, so we will have to look more broadly at what it does.

Best guess is that by keeping more branches around during optimization we end up creating more assertions which causes us to lose other assertions and hence lose some optimizations. One of the annoyances of assertion prop is that at some point in large enough methods it just silently stops creating assertions and so spotting this is harder than it should be.... would be good if at the end of assertion gen we report how many assertions we wanted to create but couldn't, or something.

Overall moving the phase is a net improvement, there are just a few annoying bad cases, and I t hink (but haven't verified it fixes most of the regressions you were seeing).

If you want to play around with this yourself then cherry-pick the head commit from my fork: main...AndyAyersMS:MoveOptOptimizeBoolsLater

Let me think about this for a bit and get back to you with a plan...

@pedrobsaila
Copy link
Contributor Author

pedrobsaila commented Feb 10, 2022

In System.Net.HttpWebRequest:AddRange(System.String,long,long):this the new transformation kicks in:

@AndyAyersMS I tried reproducing this transformation locally by could not figure it out, can you share what commands do you execute to get it ?

Here is what I'm doing to reproduce regression :

  • Create a console app that contains the call System.Net.HttpWebRequest:AddRange(System.String,long,long):this
  • set this environment variable COMPlus_JitDump="AddRange"
  • launch the console app with corerun.exe
  • I get the result
*************** Starting PHASE Optimize bools
*************** In optOptimizeBools()
*************** In fgDebugCheckBBlist

*************** Finishing PHASE Optimize bools

@AndyAyersMS
Copy link
Member

I tried reproducing this transformation locally by could not figure it out, can you share what commands do you execute to get it ?

I usually extract the instance from SPMI, by using a script like the following. To make this work you need to fill in the actual paths to the SPMI collection and the baseline jit, as well as your locally built jit.

Then you just invoke with the method context number (reported in the SPMI diff report).

This leaves two jit dump files; nnn.d and nnn.d1 for the baseline and your locally built jit, respectively. I then usually diff those to see how/when the two jits behave differently.

If you uncomment the goto :DEBUG you can instead run with your locally built jit in the debugger.

@echo off 
setlocal

set DEVENV="C:\Program Files\Microsoft Visual Studio\2022\Preview\Common7\IDE\devenv.exe"
set CDIR=%CD%

set COLLECTION=C:\repos\runtime2\artifacts\spmi\mch\63009f0c-662a-485b-bac1-ff67be6c7f9d.windows.x64\aspnet.run.windows.x64.checked.mch

set JIT=C:\repos\runtime2\artifacts\tests\coreclr\windows.x64.Checked\Tests\Core_Root\clrjit.dll
set BASEJIT=C:\repos\runtime2\artifacts\spmi\basejit\2210b7490fbd28b7ce52211339d4ee8af82f1dac.windows.x64.Checked\clrjit.dll
set SPMI=c:\repos\runtime2\artifacts\tests\coreclr\windows.x64.Checked\Tests\Core_Root\superpmi.exe

set complus_jitdump=*
set complus_ngendump=*

@rem goto :DEBUG

del %CDIR%\%1.d*

set complus_jitstdoutfile=%CDIR%\%1.d

%SPMI%  -c %1 %BASEJIT% %COLLECTION%
set complus_jitstdoutfile=%CDIR%\%1.d1

%SPMI%  -c %1 %JIT% %COLLECTION%

goto :EOF

:DEBUG

%DEVENV% -debugexe %SPMI%  -c %1 %JIT% %COLLECTION%

But, what you are doing should work too. Just make sure you are running with optimization enabled, typically we do this by disabling tiered compilation by setting an environment var before launching corerun.exe

set COMPlus_TieredCompilation=0
corerun.exe ...

@AndyAyersMS
Copy link
Member

Let me think about this for a bit and get back to you with a plan...

Sorry for the delay on this, I have been preoccupied with other work and unfortunately that work is going to take another week or so. I still think the right idea is to move this optimization later but as noted above this causes some complications.

@AndyAyersMS
Copy link
Member

Let me think about this for a bit and get back to you with a plan...

Sorry for the delay on this, I have been preoccupied with other work and unfortunately that work is going to take another week or so. I still think the right idea is to move this optimization later but as noted above this causes some complication

Sorry to say I am still a bit tied up -- hopefully for just another week or two.

@@ -9176,6 +9226,30 @@ GenTree* OptBoolsDsc::optIsBoolComp(OptTestInfo* pOptTest)
// | \--* LCL_VAR int V03 arg3
// \--* CNS_INT int 0
//
// Case 5: (x != 0 && y != 0) => (x | y) != 0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be (x != 0 || y != 0) I'm assuming?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes it should be (x != 0 || y != 0)

@AndyAyersMS
Copy link
Member

sorry radical, lewing, pavelsavara, thaystg, lambdageek, BrzVlad, vargaz, SamMonoRT, marek-safar and MichalStrehovsky for bothering you I made a mistake in the merge process, that's why you've been added as reviewers

No worries. I've un-added everyone above.

Also I might have confused things by pushing a merge to your branch. I should have explained what I was doing in more detail.

@pedrobsaila
Copy link
Contributor Author

pedrobsaila commented Oct 22, 2022

No worries. I've un-added everyone above.

thanks @AndyAyersMS :)

I have two questions though :

  • how do you merge/rebase commits from main (without including people in the PR) ?
  • What version of VS are you using ? it seems I'm the the only one having 'Detected use of a corrupted OBJECTREF. Possible GC hole.' errors while debugging and evaluating variables, I tried three different versions of VS and I still get them.

@AndyAyersMS
Copy link
Member

Github has gotten pretty good about seeing through merges, so we generally don't need to rebase/force-push to update PRs these days, unless there are conflicts or something.

I'm not sure how much to explain as it depends on how well you know git. I will keep it high-level for now, let me know if you need more details.

You should have a copy of the runtime repo on your local machine somewhere (say c:\repos\runtime) and that repo should have at least two remotes with urls and simple names, one for your fork https://github.com/pedrobsaila/runtime (origin) and one for the main repo https://github.com/dotnet/runtime (upstream).

The changes in your local repo are on branch 61940 which is tracking a branch with the same name on your fork.

;; (0) get to current version of code in the local repo

cd c:\repos\runtime
git checkout 61940

;; (1) bring local version up to date if someone else pushed to the fork

git fetch origin
git merge origin\61940 --ff-only

;; (2) merge in latest bits from main repo's main

git fetch upstream main:main
git merge

;; (3) update the PR

git push origin

@AndyAyersMS
Copy link
Member

  • What version of VS are you using ? it seems I'm the the only one having 'Detected use of a corrupted OBJECTREF. Possible GC hole.' errors while debugging and evaluating variables, I tried three different versions of VS and I still get them.

I am using VS 2022 (64-bit) Version 17.4.0 Preview 4.0 and have seen similar problems with debugger properly evaluation. Not sure if something has changed in our main branch that makes debugging new builds difficult.

One solution is to switch over to using windbg which doesn't try to be quite so helpful. But if you've never used windbg it can be a bit of a steep learning curve.

@pedrobsaila
Copy link
Contributor Author

pedrobsaila commented Oct 22, 2022

;; (3) update the PR

git push origin
```

Does not this add merged commits at the head of my branch ? the PR history will be crowded with others commit. I did remark that your merge was quit clean, did you squash merged commits before pushing them to origin ?

One solution is to switch over to using windbg which doesn't try to be quite so helpful. But if you've never used windbg it can be a bit of a steep learning curve.

Thanks for the hint I'll try it

@AndyAyersMS
Copy link
Member

Does not this add merged commits at the head of my branch? the PR history will be crowded with others commit. I did remark that your merge was quit clean, did you squash merged commits before pushing them to origin

Yes. But git hub uses the "three dot" git diff so if you merge main into your PR branch, it updates the merge base, so those other commits don't appear as diffs in your PR. See eg https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/proposing-changes-to-your-work-with-pull-requests/about-comparing-branches-in-pull-requests

Likewise, if you run git difftool -d main in your local repo, since your local main is updated to that same commit in main, you will only see your diffs.

@AndyAyersMS
Copy link
Member

AndyAyersMS commented Oct 23, 2022

Based on CI testing, there is a problem introduced into System.Numerics.BigNumber:HexNumberToBigInteger(byref,byref):int which you can see with this repro case (run with COMPlus_TieredCompilation=0):

using System;
using System.Globalization;
using System.Numerics;
using System.Runtime.CompilerServices;

class Assert
{
    [MethodImpl(MethodImplOptions.NoInlining)]
    public static void Equal(uint x, BigInteger y) { if (x != y) { Console.WriteLine($"{x} != {y}"); } }
    [MethodImpl(MethodImplOptions.NoInlining)]
    public static void True(bool b) { if (!b) { Console.WriteLine($"fail"); } }
}

class X
{
    public static void Main()
    {
            BigInteger result;
            Assert.True(BigInteger.TryParse("080000001", NumberStyles.HexNumber, null, out result));
            Assert.Equal(0x80000001u, result);
     }
}

The disasm for the method (vs unmodified jit) shows this new transformation kicking in:

;; base

G_M53841_IG51:              ;; offset=0468H
       83FB01               cmp      ebx, 1
       7550                 jne      SHORT G_M53841_IG56
       458B6500             mov      r12d, dword ptr [r13]
       33F6                 xor      rsi, rsi
       4585FF               test     r15d, r15d
       7505                 jne      SHORT G_M53841_IG52
       4585E4               test     r12d, r12d
       7C0D                 jl       SHORT G_M53841_IG53
						;; size=21 bbWeight=0.50 PerfScore 3.00
G_M53841_IG52:              ;; offset=047DH
       4181FC00000080       cmp      r12d, 0xFFFFFFFF80000000
       0F85F7FEFFFF         jne      G_M53841_IG39

;; diff

G_M53841_IG51:              ;; offset=0468H
       83FB01               cmp      ebx, 1
       754E                 jne      SHORT G_M53841_IG55
       458B6500             mov      r12d, dword ptr [r13]
       33F6                 xor      rsi, rsi
       418BCF               mov      ecx, r15d
       410BCC               or       ecx, r12d
       740D                 je       SHORT G_M53841_IG52         ;; <<<-- I think this should be jne
       4181FC00000080       cmp      r12d, 0xFFFFFFFF80000000
       0F85F9FEFFFF         jne      G_M53841_IG39

@AndyAyersMS
Copy link
Member

SPMI failure was timeout on x86.

I think this is finally ready to merge.

@AndyAyersMS
Copy link
Member

/azp run fuzzlyn

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@AndyAyersMS AndyAyersMS merged commit a7b5c22 into dotnet:main Oct 31, 2022
@AndyAyersMS
Copy link
Member

@pedrobsaila finally merged.

Thank you for all your hard work here, and for hanging in there over what turned out to be an unexpectedly long process.

@pedrobsaila pedrobsaila deleted the 61940 branch October 31, 2022 20:39
@pedrobsaila
Copy link
Contributor Author

Thank you @AndyAyersMS too for your guidance, kindness and patience with me through my first PR on JIT.

@ghost ghost locked as resolved and limited conversation to collaborators Dec 1, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI community-contribution Indicates that the PR has been added by a community member
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Could emit branchless form of i >= 0 && j >= 0 for signed integers
5 participants