Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cache allregs to avoid checking the type repeatedly #76850

Merged
merged 8 commits into from
Jan 10, 2023

Conversation

kunalspathak
Copy link
Member

@kunalspathak kunalspathak commented Oct 11, 2022

Cache the availableRegs in a table that we can look-up and return instead of comparing the type every single time in allRegs() method. Gives a nice TP:

image

@ghost ghost assigned kunalspathak Oct 11, 2022
@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Oct 11, 2022
@ghost
Copy link

ghost commented Oct 11, 2022

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

Issue Details

null

Author: kunalspathak
Assignees: kunalspathak
Labels:

area-CodeGen-coreclr

Milestone: -

@kunalspathak kunalspathak marked this pull request as ready for review October 11, 2022 23:29
@kunalspathak
Copy link
Member Author

@dotnet/jit-contrib

@SingleAccretion
Copy link
Contributor

What's up with the regressions on x86?

@kunalspathak
Copy link
Member Author

What's up with the regressions on x86?

I will have to double check, but the regression is on Linux/arm MinOpts only.

image

@jakobbotsch
Copy link
Member

I guess the interesting ones are benchmarks.run.windows.x86.checked.mch and benchmarks.run.windows.x64.checked.mch:

image

image

(The libraries.crossgen2 collections have respectively 16 and 14 MinOpts contexts, so I don't think you can really read much from those)

@SingleAccretion
Copy link
Contributor

Linux/arm MinOpts only.

Note it is present in Windows x86/MinOpts benchmarks / CG collections as well.

@jakobbotsch
Copy link
Member

I wonder if it's because it makes allRegs(<constant type>) more expensive. I see a bunch of those throughout LSRA that could be replaced with the right availableXRegs field.

@kunalspathak
Copy link
Member Author

I guess the interesting ones are benchmarks.run.windows.x86.checked.mch and benchmarks.run.windows.x64.checked.mch:

Ok, yeah I am still learning that I have to expand those details :)

Tried inlining the availableIntRegs directly at places where allRegs(TYP_INT) were present but it doesn't help much. Will disassemble and see what is different.

@kunalspathak
Copy link
Member Author

Still need to do the TP regression analysis for MinOpts.

@kunalspathak
Copy link
Member Author

Ping to myself.

@kunalspathak
Copy link
Member Author

Hoping to review the TP analysis soon.

@kunalspathak
Copy link
Member Author

kunalspathak commented Jan 9, 2023

The biggest diffs in MinOpts for benchmarks on windows/x64 are these. I will investigate more.

-22555               : ??$varTypeIsSIMD@W4var_types@@@@YA_NW4var_types@@@Z
+193049              : ??$varTypeIsSIMD@W4var_types@@@@YA_NW4var_types@@@Z
-460532              : ??0LinearScan@@QEAA@PEAVCompiler@@@Z
+942354              : ??0LinearScan@@QEAA@PEAVCompiler@@@Z
Base: 438977289, Diff: 437964751, -0.2307%

??0LinearScan@@QEAA@PEAVCompiler@@@Z                                                  : 481822  : +104.62% : 20.73% : +0.1098%
??$varTypeIsSIMD@W4var_types@@@@YA_NW4var_types@@@Z                                   : 170494  : +755.90% : 7.34%  : +0.0388%
?defineNewInternalTemp@LinearScan@@AEAAPEAVRefPosition@@PEAUGenTree@@W4var_types@@I@Z : 2874    : +5.56%   : 0.12%  : +0.0007%
?LowerStoreLocCommon@Lowering@@AEAAXPEAUGenTreeLclVarCommon@@@Z                       : -2499   : -0.26%   : 0.11%  : -0.0006%
?identifyCandidates@LinearScan@@AEAAXXZ                                               : -6888   : -3.02%   : 0.30%  : -0.0016%
?BuildCall@LinearScan@@AEAAHPEAUGenTreeCall@@@Z                                       : -36677  : -0.23%   : 1.58%  : -0.0084%
?BuildUse@LinearScan@@AEAAPEAVRefPosition@@PEAUGenTree@@IH@Z                          : -85337  : -0.94%   : 3.67%  : -0.0194%
?BuildDef@LinearScan@@AEAAPEAVRefPosition@@PEAUGenTree@@IH@Z                          : -204706 : -2.84%   : 8.81%  : -0.0466%
?allRegs@LinearScan@@AEAAIW4var_types@@@Z                                             : -403823 : -100.00% : 17.38% : -0.0920%
?BuildNode@LinearScan@@AEAAHPEAUGenTree@@@Z                                           : -924883 : -6.09%   : 39.80% : -0.2107%

@kunalspathak
Copy link
Member Author

I tried running locally and don't see any regression (at least on windows/x64):

image

@kunalspathak
Copy link
Member Author

/azp run runtime-coreclr superpmi-diffs

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@kunalspathak
Copy link
Member Author

The latest run doesn't have any benchmarks regression. It just has TP regression on crossgen2 and I tried to eliminate some of that by eliminating the call to varTypeIsSIMD().

Here is the latest and I don't think there is anything much that can be done:

Base: 1801740, Diff: 1802793, +0.0584%

??0LinearScan@@QEAA@PEAVCompiler@@@Z                                             : 5430  : +104.62% : 54.91% : +0.3014%
??$varTypeIsSIMD@W4var_types@@@@YA_NW4var_types@@@Z                              : 21    : +7.95%   : 0.21%  : +0.0012%
memset                                                                           : 16    : +0.17%   : 0.16%  : +0.0009%
?LowerStoreLocCommon@Lowering@@AEAAXPEAUGenTreeLclVarCommon@@@Z                  : -30   : -0.90%   : 0.30%  : -0.0017%
?identifyCandidates@LinearScan@@AEAAXXZ                                          : -33   : -1.66%   : 0.33%  : -0.0018%
?BuildCall@LinearScan@@AEAAHPEAUGenTreeCall@@@Z                                  : -103  : -0.34%   : 1.04%  : -0.0057%
?genFnPrologCalleeRegArgs@CodeGen@@IEAAXW4_regNumber_enum@@PEA_NPEAURegState@@@Z : -189  : -2.42%   : 1.91%  : -0.0105%
?BuildUse@LinearScan@@AEAAPEAVRefPosition@@PEAUGenTree@@IH@Z                     : -526  : -2.45%   : 5.32%  : -0.0292%
?BuildDef@LinearScan@@AEAAPEAVRefPosition@@PEAUGenTree@@IH@Z                     : -634  : -3.62%   : 6.41%  : -0.0352%
?allRegs@LinearScan@@AEAAIW4var_types@@@Z                                        : -1246 : -100.00% : 12.60% : -0.0692%
?BuildNode@LinearScan@@AEAAHPEAUGenTree@@@Z                                      : -1657 : -4.58%   : 16.76% : -0.0920%

@jakobbotsch
Copy link
Member

jakobbotsch commented Jan 10, 2023

@kunalspathak What if you change the initialization of the table to:

for (unsigned int i = 0; i < TYP_COUNT; i++)
{
  availableRegs[i] = &availableIntRegs;
}

availableRegs[TYP_FLOAT] = &availableFloatRegs;
availableRegs[TYP_DOUBLE] = &availableDoubleRegs;
#ifdef FEATURE_SIMD
availableRegs[TYP_SIMD8] = &availableDoubleRegs;
availableRegs[TYP_SIMD12] = &availableDoubleRegs;
availableRegs[TYP_SIMD16] = &availableDoubleRegs;
availableRegs[TYP_SIMD32] = &availableDoubleRegs;
#endif

In any case it still looks minor to me.

@kunalspathak
Copy link
Member Author

@kunalspathak What if you change the initialization of the table to:

That's a reasonable approach, but as I mentioned earlier, this is LinearScan ctor which just gets invoked once per compilation. With the latest changes, the TP regression dropped, so I will go ahead and merge it.

image

This was a good exercise for me to learn how to analyze TP regression locally.

@kunalspathak kunalspathak merged commit 7a6c31c into dotnet:main Jan 10, 2023
@ghost ghost locked as resolved and limited conversation to collaborators Feb 9, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants