Introduce OptLevel() in jit #77465

EgorBo · 2022-10-26T01:11:42Z

This PR introduces opts.OptLevel() property to take various related jit flags into account such as "prefer size", "prefer speed", "minopts" etc. As you can see from this piece, jit used to ignore these flags. Now it doesn't. The main idea is to use OPT_SizeAndThrougput for Tier0 (except explicit minopts and debug-friendly codegen). Other flags can be used via corresponding arguments in ILC/Crossgen.
Most of the existing OPT_SizeAndThrougput in this PR aren't expected to affect Tier0 since most of those optimizations are disabled for Tier0 anyway.

Since all these paths with if SMALL_CODE weren't tested previously I decided to remove some, e.g. everything related to alignment in the data section because e.g. floating point constants are expected to be 16b aligned, etc. Feel free to restore some pieces if that will show nice size savings (I bet it won't).

Existing opts.OptimizationsEnabled() is left to be "either Blended or Speed level".

Unblocks #77357

ghost · 2022-10-26T01:11:52Z

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

Issue Details

This PR introduces opts.OptLevel() property to take various related jit flags into account such as "prefer size", "prefer speed", "minopts" etc. As you can see from this piece, jit used to ignore these flags. Now it doesn't. The main idea is to use OPT_SizeAndThrougput for Tier0 (except explicit minopts and debug-friendly codegen). Other flags can be used via corresponding arguments in ILC/Crossgen.

Since all these paths with if SMALL_CODE weren't tested previously I decided to remove some, e.g. everything related to alignment in the data section because e.g. floating point constants are expected to be 16b aligned, etc. Feel free to restore some pieces if that will show nice size savings (I bet it won't).

Existing opts.OptimizationsEnabled() is left to be "either Blended or Speed level".

Author:	EgorBo
Assignees:	EgorBo
Labels:	`area-CodeGen-coreclr`
Milestone:	-

jakobbotsch · 2022-10-26T13:33:07Z

You will need to update the summary.md printed by superpmi-diffs for asmdiffs/tpdiff, I guess instead of MinOpts we should use some terminology like "FastOpts" or "FastCode". You potentially need to fix superpmi as well since its detection for MinOpts might not make much sense w.r.t. these changes.

src/coreclr/jit/emitxarch.cpp

src/coreclr/jit/compiler.h

EgorBo · 2022-11-11T14:58:58Z

@AndyAyersMS (cc @dotnet/jit-contrib) could you please review this if you have time?

It's almost ready, I just want to fix SuperPMI to print optimization levels.
The PR is zero-diffs but with nice throughput improvements, up to -1.51% for tier0.

For now, it doesn't change anything for MinOpts vs Tier0 (SizeOrThroughput). better names for opt levels are welcomed.
We talked about removing CLFLG_* optimization flags but I'd prefer to do it separately if you don't mind.

SingleAccretion · 2022-11-11T15:06:58Z

src/coreclr/jit/codegenarmarch.cpp

@@ -3271,7 +3271,7 @@ void CodeGen::genCall(GenTreeCall* call)

    // If there is nothing next, that means the result is thrown away, so this value is not live.
    // However, for minopts or debuggable code, we keep it live to support managed return value debugging.
-    if ((call->gtNext == nullptr) && !compiler->opts.MinOpts() && !compiler->opts.compDbgCode)
+    if ((call->gtNext == nullptr) && !compiler->opts.OptimizationDisabled() && !compiler->opts.compDbgCode)


!compiler->opts.OptimizationDisabled()

Nit: the double negation is hard to read. Can these (there are a number of instances) be switched to OptimizationEnabled?

Right, but I didn't want to distract code reviewers with that. I think we'd better do it separately, for now it's easier to review because I simply replaced MinOpts with the same OptimizationDisabled so no need to validate it 🙂

jakobbotsch · 2022-11-11T15:23:19Z

The PR is zero-diffs but with nice throughput improvements, up to -1.51% for tier0.

MinOpts (-1.07%)

Collection	PDIFF
benchmarks.run.Linux.arm.checked.mch	-0.05%
coreclr_tests.run.Linux.arm.checked.mch	-1.11%
libraries.crossgen2.Linux.arm.checked.mch	-0.05%
libraries.pmi.Linux.arm.checked.mch	-0.04%
libraries_tests.pmi.Linux.arm.checked.mch	-0.03%

This looks really odd, I'm curious how this is showing up so differently in coreclr_tests (unfortunately no easy way to check without hacking superpmi and/or using a more detailed pin tool)

EgorBo · 2022-11-11T15:27:12Z

The PR is zero-diffs but with nice throughput improvements, up to -1.51% for tier0.

MinOpts (-1.07%)
Collection PDIFF
benchmarks.run.Linux.arm.checked.mch -0.05%
coreclr_tests.run.Linux.arm.checked.mch -1.11%
libraries.crossgen2.Linux.arm.checked.mch -0.05%
libraries.pmi.Linux.arm.checked.mch -0.04%
libraries_tests.pmi.Linux.arm.checked.mch -0.03%
This looks really odd, I'm curious how this is showing up so differently in coreclr_tests (unfortunately no easy way to check without hacking superpmi and/or using a more detailed pin tool)

The reason why throughput is slightly (I see that for the most important collections it's smaller) improved is because OptimizationsEnabled()/Disabled did not just return cached value previously.

jakobbotsch · 2022-11-11T15:31:23Z

The reason why throughput is slightly (I see that for the most important collections it's smaller) improved is because OptimizationsEnabled()/Disabled did not just return cached value previously.

Yeah, I'm just surprised that the differences are so large. I will try to look into what exactly causes this to impact coreclr_tests so much more than the other collections, this is quite unexpected to me.

jakobbotsch · 2022-11-12T13:59:16Z

@EgorBo It looks like this PR has diffs again in the latest superpmi-diffs run

jakobbotsch · 2022-11-12T15:23:32Z

The majority of the TP improvement is coming from two particular contexts, 426399 and 426400 (on current latest collection on the head of this PR). These method are both named test:Main() so they are not very easy to locate (I couldn't find any really large Main methods in classes named test in the repo, maybe it's some dynamically created thing?).
426400 is by far the largest, it has 261 KB of IL and 16731 EH clauses and takes us 14 seconds to JIT in tier 0 on my 5950X. This PR reduces the number of instructions executed while jitting that context by around 14.5%. It seems likely we have some quadratic behavior somewhere related to EH clauses.

EgorBo · 2022-11-12T15:28:19Z

The majority of the TP improvement is coming from two particular contexts, 426399 and 426400 (on current latest collection on the head of this PR). These method are both named test:Main() so they are not very easy to locate (I couldn't find any really large Main methods in classes named test in the repo, maybe it's some dynamically created thing?). 426400 is by far the largest, it has 261 KB of IL and 16731 EH clauses and takes us 14 seconds to JIT in tier 0 on my 5950X. This PR reduces the number of instructions executed while jitting that context by around 14.5%. It seems likely we have some quadratic behavior somewhere related to EH clauses.

Interesting, thanks for looking into that!

jakobbotsch · 2022-11-12T15:32:28Z

@EgorBo Another thing I noticed is that the disassembly often has "; unknown optimization flags" at the top, e.g.

; Assembly listing for method JitTest.HFA.TestCase:Main():int
; Emitting quick and small code for X64 CPU with AVX - Windows
; Tier-0 compilation
; unknown optimization flags
; rbp based frame
; partially interruptible
; Final local variable assignments

(maybe it's expected due to SPMI?)

jakobbotsch · 2022-11-12T15:37:28Z

These method are both named test:Main() so they are not very easy to locate (I couldn't find any really large Main methods in classes named test in the repo, maybe it's some dynamically created thing?).

It's probably this one:
https://github.com/dotnet/runtime/blob/5565135d21024ac2820e08d0a7aca3f2cb9d2b55/src/tests/JIT/Regression/VS-ia64-JIT/V1.2-M02/b28158/test.il

# Conflicts: # src/coreclr/jit/codegencommon.cpp

BruceForstall · 2022-12-12T16:51:53Z

@EgorBo I presume this is still in progress. Maybe move to "Draft" status temporarily?

BruceForstall · 2022-12-19T18:38:52Z

@EgorBo ping

# Conflicts: # src/coreclr/jit/lower.cpp

ghost · 2023-02-05T17:01:48Z

Draft Pull Request was automatically closed for 30 days of inactivity. Please let us know if you'd like to reopen it.

EgorBo · 2023-03-06T00:52:30Z

Resolving conflicts now..

ghost · 2023-04-05T05:01:52Z

Draft Pull Request was automatically closed for 30 days of inactivity. Please let us know if you'd like to reopen it.

dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Oct 26, 2022

ghost assigned EgorBo Oct 26, 2022

jakobbotsch reviewed Oct 26, 2022

View reviewed changes

src/coreclr/jit/emitxarch.cpp Outdated Show resolved Hide resolved

jakobbotsch reviewed Oct 26, 2022

View reviewed changes

src/coreclr/jit/compiler.h Outdated Show resolved Hide resolved

Test

c045cf9

EgorBo force-pushed the jit-optlevel branch from 6e352b8 to c045cf9 Compare October 27, 2022 14:56

Fix source of regressions

d23dfab

EgorBo mentioned this pull request Oct 27, 2022

Don't expand pinvokes in Tier0 #77545

Closed

EgorBo added 2 commits October 27, 2022 23:13

Make it zero-diff

648220e

Clean up

8f9e54d

build-analysis bot mentioned this pull request Oct 28, 2022

Tracking Nuget 429s dotnet/arcade#10885

Closed

2 tasks

EgorBo mentioned this pull request Oct 31, 2022

Dynamic PGO startup improvements in NET 8 #76969

Closed

23 tasks

EgorBo added 4 commits November 3, 2022 00:00

Merge branch 'main' of github.com:dotnet/runtime into jit-optlevel

46f1b20

fix arm64 regression

17cdad1

zero diff

7105010

Replace MinOpts with OptimizationDisabled

2a61ce7

build-analysis bot mentioned this pull request Nov 4, 2022

Tracking issue for CI build timeouts #76454

Closed

EgorBo added 2 commits November 11, 2022 14:14

Merge branch 'main' of github.com:dotnet/runtime into jit-optlevel

f379257

jit-format

23cc22d

EgorBo marked this pull request as ready for review November 11, 2022 14:59

SingleAccretion reviewed Nov 11, 2022

View reviewed changes

jakobbotsch mentioned this pull request Nov 12, 2022

JIT: Potentially quadratic behavior in EH clause handling #78267

Open

EgorBo added 6 commits November 13, 2022 20:30

Merge branch 'main' of github.com:dotnet/runtime into jit-optlevel

c473e12

Fix diffs

e13dd69

Merge branch 'main' of github.com:dotnet/runtime into jit-optlevel

932bf13

# Conflicts: # src/coreclr/jit/codegencommon.cpp

fix build

0a40fba

Merge branch 'main' of github.com:dotnet/runtime into jit-optlevel

9cb3adf

OPT_SizeAndThroughput -> OPT_Quick

8c09fd4

build-analysis bot mentioned this pull request Nov 17, 2022

Failing to build FreeBSD - src/native/libs/System.Native/pal_networkstatistics.c #78522

Closed

JulieLeeMSFT marked this pull request as draft December 27, 2022 17:48

Merge branch 'main' of github.com:dotnet/runtime into jit-optlevel

54bc2dc

# Conflicts: # src/coreclr/jit/lower.cpp

EgorBo mentioned this pull request Jan 20, 2023

Reduce cost of checking OptimizationEnabled #80948

Closed

ghost closed this Feb 5, 2023

EgorBo reopened this Feb 5, 2023

EgorBo mentioned this pull request Feb 24, 2023

Enable constant folding in Tier0 #82412

Merged

EgorBo mentioned this pull request Mar 8, 2023

[JIT] Add support to inline the field access of primitive types marked with TLS #82973

Merged

7 tasks

ghost closed this Apr 5, 2023

ghost locked as resolved and limited conversation to collaborators May 5, 2023

This pull request was closed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Introduce OptLevel() in jit #77465

Introduce OptLevel() in jit #77465

EgorBo commented Oct 26, 2022 •

edited

Loading

ghost commented Oct 26, 2022

jakobbotsch commented Oct 26, 2022

EgorBo commented Nov 11, 2022 •

edited

Loading

SingleAccretion Nov 11, 2022 •

edited

Loading

EgorBo Nov 11, 2022

jakobbotsch commented Nov 11, 2022 •

edited

Loading

EgorBo commented Nov 11, 2022 •

edited

Loading

jakobbotsch commented Nov 11, 2022

jakobbotsch commented Nov 12, 2022

jakobbotsch commented Nov 12, 2022 •

edited

Loading

EgorBo commented Nov 12, 2022

jakobbotsch commented Nov 12, 2022

jakobbotsch commented Nov 12, 2022 •

edited

Loading

BruceForstall commented Dec 12, 2022

BruceForstall commented Dec 19, 2022

ghost commented Feb 5, 2023

EgorBo commented Mar 6, 2023

ghost commented Apr 5, 2023

Introduce OptLevel() in jit #77465

Introduce OptLevel() in jit #77465

Conversation

EgorBo commented Oct 26, 2022 • edited Loading

ghost commented Oct 26, 2022

jakobbotsch commented Oct 26, 2022

EgorBo commented Nov 11, 2022 • edited Loading

SingleAccretion Nov 11, 2022 • edited Loading

Choose a reason for hiding this comment

EgorBo Nov 11, 2022

Choose a reason for hiding this comment

jakobbotsch commented Nov 11, 2022 • edited Loading

EgorBo commented Nov 11, 2022 • edited Loading

jakobbotsch commented Nov 11, 2022

jakobbotsch commented Nov 12, 2022

jakobbotsch commented Nov 12, 2022 • edited Loading

EgorBo commented Nov 12, 2022

jakobbotsch commented Nov 12, 2022

jakobbotsch commented Nov 12, 2022 • edited Loading

BruceForstall commented Dec 12, 2022

BruceForstall commented Dec 19, 2022

ghost commented Feb 5, 2023

EgorBo commented Mar 6, 2023

ghost commented Apr 5, 2023

EgorBo commented Oct 26, 2022 •

edited

Loading

EgorBo commented Nov 11, 2022 •

edited

Loading

SingleAccretion Nov 11, 2022 •

edited

Loading

jakobbotsch commented Nov 11, 2022 •

edited

Loading

EgorBo commented Nov 11, 2022 •

edited

Loading

jakobbotsch commented Nov 12, 2022 •

edited

Loading

jakobbotsch commented Nov 12, 2022 •

edited

Loading