Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor LSRA's heuristics selection #52832

Merged
merged 7 commits into from
May 19, 2021

Conversation

kunalspathak
Copy link
Member

@kunalspathak kunalspathak commented May 17, 2021

Refactor register selection allocateReg() method so that the selection heuristics can be applied in different and dynamic order instead of hardcoded and sequential. Since allocateReg() operates on local method variables, I had to write RegisterSelection class that contains all the register selection related variables. select() method would do the register selection. Most of the changes are moving around the existing code in allocateReg() to its own separate methods of RegisterSelection class.

For now, the ordering is not changed, but in DEBUG, I have included a COMPlu_ variable that LsraOrdering that will let the user set the ordering. It takes characters A thru Q and each character corresponds to a heuristics. In DEBUG the heuristics will be applied based on the provided sequence to LsraOrdering.

There are no asmdiffs with this change.

I did throughput analysis of this change using pin and do not see any impact. All the diffs are within the the error range.

Pin numbers

Windows x64

benchmarks win-x64 base test % diff
AVG 469981382.6 470077304 0.0204%
MEDIAN 469664270 470334938 0.1428%
Min 467782713
Max 473685235
Error 1.26%
More data
benchmarks win-x64 base test % diff
1 469724283 470354416 0.1341%
2 469457310 469208025 -0.0531%
3 467782713 469527435 0.3730%
4 469511878 469823674 0.0664%
5 473685235 470315460 -0.7114%
6 469953271 470457822 0.1074%
7 470750807 470365850 -0.0818%
8 469692331 469480893 -0.0450%
9 469636209 470716253 0.2300%
10 469619789 470523212 0.1924%
libraries win-x64 base test %diff
AVG 471576554.6 471270446.1 -0.0649%
MEDIAN 471653252.5 471219824.5 -0.0919%
Min 470510112
Max 472843503
Error 0.496%
More data
libraries win-x64 base test %diff
1 471272742 472026101 0.1599%
2 471228504 470381286 -0.1798%
3 471857149 470790783 -0.2260%
4 470510112 471619617 0.2358%
5 471028303 470915818 -0.0239%
6 472843503 472112342 -0.1546%
7 472020017 471867586 -0.0323%
8 471650706 470694466 -0.2027%
9 471698711 471523831 -0.0371%
10 471655799 470772631 -0.1872%

Windows arm64

benchmarks win-arm64 base test % diff
AVG 470289317 470456210 0.04%
Median 470091627 470472792 0.08%
Max 471226168
Min 469446085
Error 0.3792%
More data
benchmarks win-arm64 base test % diff
1 471226168 470085832 -0.24%
2 470704395 470830745 0.03%
3 469978311 470319473 0.07%
4 469446085 470555625 0.24%
5 470091627 470489374 0.08%
libraries win-arm64 base test % diff
Avg 471407802 472086055 0.144%
Median 471215772 472285001 0.227%
Min 470397436
Max 472300281
Error 0.405%
More data
libraries win-arm64 base test % diff
1 472300281 472658696 0.076%
2 471215772 472285001 0.227%
3 470397436 471574292 0.250%
4 472089200 472385618 0.063%
5 471036323 471526668 0.104%

Windows x86

benchmarks win-x86 base test % diff
Avg 469405091 470278693 0.186%
Median 469284472 470327175 0.222%
Min 467917214
Max 470888941
Error % 0.635%
More data
benchmarks win-x86 base test % diff
1 469284472 471413937 0.454%
2 470143349 470020586 -0.026%
3 468791479 470496670 0.364%
4 470888941 469135096 -0.372%
5 467917214 470327175 0.515%
Avg 469405091 470278693 0.186%
Median 469284472 470327175 0.222%
libraries win-x86 base test % diff
Avg 471484490 471600583 0.025%
Median 471240526 471534393 0.062%
Min 470891466
Max 472454555
Error % 0.332%
More data
libraries win-x86 base test % diff
1 470891466 471521340 0.134%
2 471044935 472017169 0.206%
3 471240526 472383079 0.242%
4 472454555 470546933 -0.404%
5 471790966 471534393 -0.054%

Linux arm

libraries unix-arm base test % diff
Avg 469671369 469949932 0.059%
Median 469815618 469932018 0.025%
Min 468845304
Max 470561987
Error % 0.366%
More data
libraries unix-arm base test % diff
1 470561987 469932018 -0.134%
2 468992444 469692563 0.149%
3 469815618 469016187 -0.170%
4 468845304 470677323 0.391%
5 470141490 470431570 0.062%
Avg 469671369 469949932 0.059%
Median 469815618 469932018 0.025%

Contributes to #43318

@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label May 17, 2021
@kunalspathak
Copy link
Member Author

runtime failures related to #52710

@kunalspathak kunalspathak changed the title Refactor LSRA's selection Refactor LSRA's heuristics selection May 18, 2021
- summary docs
- Removed DEFAULT_ORDER
- Added order seq ID
@kunalspathak kunalspathak marked this pull request as ready for review May 18, 2021 18:57
@kunalspathak
Copy link
Member Author

@dotnet/jit-contrib

@BruceForstall BruceForstall self-requested a review May 18, 2021 20:42
@kunalspathak
Copy link
Member Author

jitstressregs failure is related to #52945

Copy link
Member

@BruceForstall BruceForstall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There seems to be some confusion between regMaskTP/RBM_NONE and regNumber/REG_NA. You don't want to return REG_NA from a function defined to return regMaskTP.

I presume most of the code is just refactored/moved.

src/coreclr/jit/lsra.h Outdated Show resolved Hide resolved
// Apply a simple mask-based selection heuristic, and return 'true' if we now have a single candidate.
bool applySelection(int selectionScore, regMaskTP selectionCandidates DEBUG_ARG(RegisterScore* registerScore))
// Perform register selection and update currentInterval or refPosition TODO:
FORCEINLINE regMaskTP select(Interval* currentInterval,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are all the FORCEINLINE really necessary?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

allocateReg is a hot method and I didn't want to add function call cost to it so marked everything FORCEINLINE. Is there a downside that you see with it?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose it's ok if it's really necessary because it's known to be hot.

src/coreclr/jit/lsra.cpp Outdated Show resolved Hide resolved
}
}
}
else if (r
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fix/remove TODO

src/coreclr/jit/lsra.cpp Show resolved Hide resolved
}
}
}
else if (r
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO: fix or remove?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see the reference line of your comment. Can you copy/paste the text where you see the leftover TODO?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think all the ones I flagged had your alias, so just search for that.

@kunalspathak kunalspathak merged commit 8868e97 into dotnet:main May 19, 2021
@karelz karelz added this to the 6.0.0 milestone May 20, 2021
@ghost ghost locked as resolved and limited conversation to collaborators Jun 19, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants