Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce OptLevel() in jit #77465

Closed
wants to merge 17 commits into from
Closed

Introduce OptLevel() in jit #77465

wants to merge 17 commits into from

Conversation

EgorBo
Copy link
Member

@EgorBo EgorBo commented Oct 26, 2022

This PR introduces opts.OptLevel() property to take various related jit flags into account such as "prefer size", "prefer speed", "minopts" etc. As you can see from this piece, jit used to ignore these flags. Now it doesn't. The main idea is to use OPT_SizeAndThrougput for Tier0 (except explicit minopts and debug-friendly codegen). Other flags can be used via corresponding arguments in ILC/Crossgen.
Most of the existing OPT_SizeAndThrougput in this PR aren't expected to affect Tier0 since most of those optimizations are disabled for Tier0 anyway.

Since all these paths with if SMALL_CODE weren't tested previously I decided to remove some, e.g. everything related to alignment in the data section because e.g. floating point constants are expected to be 16b aligned, etc. Feel free to restore some pieces if that will show nice size savings (I bet it won't).

Existing opts.OptimizationsEnabled() is left to be "either Blended or Speed level".

Unblocks #77357

@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Oct 26, 2022
@ghost ghost assigned EgorBo Oct 26, 2022
@ghost
Copy link

ghost commented Oct 26, 2022

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

Issue Details

This PR introduces opts.OptLevel() property to take various related jit flags into account such as "prefer size", "prefer speed", "minopts" etc. As you can see from this piece, jit used to ignore these flags. Now it doesn't. The main idea is to use OPT_SizeAndThrougput for Tier0 (except explicit minopts and debug-friendly codegen). Other flags can be used via corresponding arguments in ILC/Crossgen.

Since all these paths with if SMALL_CODE weren't tested previously I decided to remove some, e.g. everything related to alignment in the data section because e.g. floating point constants are expected to be 16b aligned, etc. Feel free to restore some pieces if that will show nice size savings (I bet it won't).

Existing opts.OptimizationsEnabled() is left to be "either Blended or Speed level".

Author: EgorBo
Assignees: EgorBo
Labels:

area-CodeGen-coreclr

Milestone: -

@jakobbotsch
Copy link
Member

You will need to update the summary.md printed by superpmi-diffs for asmdiffs/tpdiff, I guess instead of MinOpts we should use some terminology like "FastOpts" or "FastCode". You potentially need to fix superpmi as well since its detection for MinOpts might not make much sense w.r.t. these changes.

@EgorBo
Copy link
Member Author

EgorBo commented Nov 11, 2022

@AndyAyersMS (cc @dotnet/jit-contrib) could you please review this if you have time?

It's almost ready, I just want to fix SuperPMI to print optimization levels.
The PR is zero-diffs but with nice throughput improvements, up to -1.51% for tier0.

For now, it doesn't change anything for MinOpts vs Tier0 (SizeOrThroughput). better names for opt levels are welcomed.
We talked about removing CLFLG_* optimization flags but I'd prefer to do it separately if you don't mind.

@EgorBo EgorBo marked this pull request as ready for review November 11, 2022 14:59
@@ -3271,7 +3271,7 @@ void CodeGen::genCall(GenTreeCall* call)

// If there is nothing next, that means the result is thrown away, so this value is not live.
// However, for minopts or debuggable code, we keep it live to support managed return value debugging.
if ((call->gtNext == nullptr) && !compiler->opts.MinOpts() && !compiler->opts.compDbgCode)
if ((call->gtNext == nullptr) && !compiler->opts.OptimizationDisabled() && !compiler->opts.compDbgCode)
Copy link
Contributor

@SingleAccretion SingleAccretion Nov 11, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

!compiler->opts.OptimizationDisabled()

Nit: the double negation is hard to read. Can these (there are a number of instances) be switched to OptimizationEnabled?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, but I didn't want to distract code reviewers with that. I think we'd better do it separately, for now it's easier to review because I simply replaced MinOpts with the same OptimizationDisabled so no need to validate it 🙂

@jakobbotsch
Copy link
Member

jakobbotsch commented Nov 11, 2022

The PR is zero-diffs but with nice throughput improvements, up to -1.51% for tier0.

MinOpts (-1.07%)
Collection PDIFF
benchmarks.run.Linux.arm.checked.mch -0.05%
coreclr_tests.run.Linux.arm.checked.mch -1.11%
libraries.crossgen2.Linux.arm.checked.mch -0.05%
libraries.pmi.Linux.arm.checked.mch -0.04%
libraries_tests.pmi.Linux.arm.checked.mch -0.03%

This looks really odd, I'm curious how this is showing up so differently in coreclr_tests (unfortunately no easy way to check without hacking superpmi and/or using a more detailed pin tool)

@EgorBo
Copy link
Member Author

EgorBo commented Nov 11, 2022

The PR is zero-diffs but with nice throughput improvements, up to -1.51% for tier0.

MinOpts (-1.07%)
Collection PDIFF
benchmarks.run.Linux.arm.checked.mch -0.05%
coreclr_tests.run.Linux.arm.checked.mch -1.11%
libraries.crossgen2.Linux.arm.checked.mch -0.05%
libraries.pmi.Linux.arm.checked.mch -0.04%
libraries_tests.pmi.Linux.arm.checked.mch -0.03%
This looks really odd, I'm curious how this is showing up so differently in coreclr_tests (unfortunately no easy way to check without hacking superpmi and/or using a more detailed pin tool)

The reason why throughput is slightly (I see that for the most important collections it's smaller) improved is because OptimizationsEnabled()/Disabled did not just return cached value previously.

@jakobbotsch
Copy link
Member

The reason why throughput is slightly (I see that for the most important collections it's smaller) improved is because OptimizationsEnabled()/Disabled did not just return cached value previously.

Yeah, I'm just surprised that the differences are so large. I will try to look into what exactly causes this to impact coreclr_tests so much more than the other collections, this is quite unexpected to me.

@jakobbotsch
Copy link
Member

@EgorBo It looks like this PR has diffs again in the latest superpmi-diffs run

@jakobbotsch
Copy link
Member

jakobbotsch commented Nov 12, 2022

The majority of the TP improvement is coming from two particular contexts, 426399 and 426400 (on current latest collection on the head of this PR). These method are both named test:Main() so they are not very easy to locate (I couldn't find any really large Main methods in classes named test in the repo, maybe it's some dynamically created thing?).
426400 is by far the largest, it has 261 KB of IL and 16731 EH clauses and takes us 14 seconds to JIT in tier 0 on my 5950X. This PR reduces the number of instructions executed while jitting that context by around 14.5%. It seems likely we have some quadratic behavior somewhere related to EH clauses.

@EgorBo
Copy link
Member Author

EgorBo commented Nov 12, 2022

The majority of the TP improvement is coming from two particular contexts, 426399 and 426400 (on current latest collection on the head of this PR). These method are both named test:Main() so they are not very easy to locate (I couldn't find any really large Main methods in classes named test in the repo, maybe it's some dynamically created thing?). 426400 is by far the largest, it has 261 KB of IL and 16731 EH clauses and takes us 14 seconds to JIT in tier 0 on my 5950X. This PR reduces the number of instructions executed while jitting that context by around 14.5%. It seems likely we have some quadratic behavior somewhere related to EH clauses.

Interesting, thanks for looking into that!

@jakobbotsch
Copy link
Member

@EgorBo Another thing I noticed is that the disassembly often has "; unknown optimization flags" at the top, e.g.

; Assembly listing for method JitTest.HFA.TestCase:Main():int
; Emitting quick and small code for X64 CPU with AVX - Windows
; Tier-0 compilation
; unknown optimization flags
; rbp based frame
; partially interruptible
; Final local variable assignments

(maybe it's expected due to SPMI?)

@jakobbotsch
Copy link
Member

jakobbotsch commented Nov 12, 2022

These method are both named test:Main() so they are not very easy to locate (I couldn't find any really large Main methods in classes named test in the repo, maybe it's some dynamically created thing?).

It's probably this one:
https://github.com/dotnet/runtime/blob/5565135d21024ac2820e08d0a7aca3f2cb9d2b55/src/tests/JIT/Regression/VS-ia64-JIT/V1.2-M02/b28158/test.il

@BruceForstall
Copy link
Member

@EgorBo I presume this is still in progress. Maybe move to "Draft" status temporarily?

@BruceForstall
Copy link
Member

@EgorBo ping

@JulieLeeMSFT JulieLeeMSFT marked this pull request as draft December 27, 2022 17:48
@ghost
Copy link

ghost commented Feb 5, 2023

Draft Pull Request was automatically closed for 30 days of inactivity. Please let us know if you'd like to reopen it.

@EgorBo
Copy link
Member Author

EgorBo commented Mar 6, 2023

Resolving conflicts now..

@ghost
Copy link

ghost commented Apr 5, 2023

Draft Pull Request was automatically closed for 30 days of inactivity. Please let us know if you'd like to reopen it.

@ghost ghost closed this Apr 5, 2023
@ghost ghost locked as resolved and limited conversation to collaborators May 5, 2023
This pull request was closed.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants