JIT: initial version of a profile checker #42481

AndyAyersMS · 2020-09-19T00:55:27Z

Adds a new flowgraph checker that tries to verify that the profile data is
self-consistent.

Use COMPlus_JitProfileChecks=1 to enable in dumps, =2 to raise asserts
if issues are found.

I plan on using this to start fixing the issues it finds and then perhaps
gradually enable more stringent checking.

Adds a new flowgraph checker that tries to verify that the profile data is self-consistent. Use `COMPlus_JitProfileChecks=1` to enable in dumps, `=2` to raise asserts if issues are found. I plan on using this to start fixing the issues it finds and then perhaps gradually enable more stringent checking.

AndyAyersMS · 2020-09-19T00:55:38Z

cc @dotnet/jit-contrib

AndyAyersMS · 2020-09-22T15:33:49Z

@dotnet/jit-contrib anyone able to take a look?

CarolEidt

This seems like a reasonable start; I'm guessing that we'll want to make changes over time.
I think I'm the only one that uses the flowgraph dumping facility, but it would be very nice to annotate that graph with the profile data.

CarolEidt · 2020-09-22T23:22:10Z

src/coreclr/src/jit/flowgraph.cpp

+
+            if (!foundPreds)
+            {
+                // Might need to tone this down as we could see unreachable blocks?


It would seem that the seriousness of this as a "problem" would depend on whether this is estimated or measured profile data - if the latter, it would be interesting to report if there was non-zero weight (though it could happen if the data were stale).

We may see nonzero weight blocks become unreachable/dead even with accurate profile data, because at times we have to make guesses as to how profile data ends up distributed when we duplicate code. You can often see this post-inlining when there is strongly caller-correlated behavior.

I have a bunch of notes written up about "profile maintenance" that I haven't shared widely yet, but the upshot is that at times the profile will become inconsistent and fixing it up is potentially a global (intraprocedural) problem. So we may let things get out of balance for stretches and then reconstitute before key dependent phases. More on this soon.

AndyAyersMS · 2020-09-22T23:58:45Z

I'm the only one that uses the flowgraph dumping

Are you asking about the visualizable flow graph dumps? I have some changes for dot output lying around that I can polish up and PR. I have not touched the xml format dumps. I think somebody also suggested perhaps using dgml.

CarolEidt · 2020-09-23T00:13:00Z

Are you asking about the visualizable flow graph dumps? I have some changes for dot output lying around that I can polish up and PR. I have not touched the xml format dumps. I think somebody also suggested perhaps using dgml.

I use the dot format (COMPLUS_JitDumpFgDot). At the time I added that, there was pushback about eliminating the xml format, so I implemented the dot graphs as an option.

BruceForstall · 2020-09-24T00:26:30Z

This looks good. I'm curious about cases where you can't find a successor edge (namely, predecessor edge of successor block).

I plan on using this to start fixing the issues it finds and then perhaps gradually enable more stringent checking.

What did you find so far?

AndyAyersMS · 2020-09-24T00:38:23Z

Finding successor edges is probably fairly robust, but if/when there are multiples we don't know for sure which one to match with; we'll always take the first one we run across. Multiples may be fairly rare (I suppose switches are the most likely case. On a somewhat related note I've noticed recently that Roslyn seems quite reluctant to actually emit MSIL switches, preferring to build fairly large if-then-else trees).

I haven't looked extensively at the checker failures yet, but a few things show up already.

As you can see from the graph in #42657 we don't get the right counts propagated on some internal blocks. We may not even know what are reasonable likelhoods for some of the flow expansions. And there can be some slight count tearing from tiered PGO as the Tier0 method is often still running when we're jitting the Tier1 version (so the counts are in flux).

Morph doesn't fix up the counts properly (at times) when optimizing a conditional jump to an unconditional one. That is not too surprising as the fixes needed can be quite involved.

BruceForstall · 2020-09-24T00:57:55Z

I thought we couldn't get multiples because our preds (but not simple preds) have exactly one edge for any block, plus a count (flDupCount).

Dotnet-GitSync-Bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Sep 19, 2020

CarolEidt approved these changes Sep 22, 2020

View reviewed changes

AndyAyersMS merged commit 3492460 into dotnet:master Sep 23, 2020

AndyAyersMS deleted the ProfileDataChecker branch September 23, 2020 16:06

AndyAyersMS mentioned this pull request Oct 24, 2020

Dynamic PGO #43618

Closed

54 tasks

ghost locked as resolved and limited conversation to collaborators Dec 7, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

JIT: initial version of a profile checker #42481

JIT: initial version of a profile checker #42481

AndyAyersMS commented Sep 19, 2020

AndyAyersMS commented Sep 19, 2020

AndyAyersMS commented Sep 22, 2020

CarolEidt left a comment

CarolEidt Sep 22, 2020

AndyAyersMS Sep 22, 2020

AndyAyersMS commented Sep 22, 2020

CarolEidt commented Sep 23, 2020

BruceForstall commented Sep 24, 2020

AndyAyersMS commented Sep 24, 2020

BruceForstall commented Sep 24, 2020

JIT: initial version of a profile checker #42481

JIT: initial version of a profile checker #42481

Conversation

AndyAyersMS commented Sep 19, 2020

AndyAyersMS commented Sep 19, 2020

AndyAyersMS commented Sep 22, 2020

CarolEidt left a comment

Choose a reason for hiding this comment

CarolEidt Sep 22, 2020

Choose a reason for hiding this comment

AndyAyersMS Sep 22, 2020

Choose a reason for hiding this comment

AndyAyersMS commented Sep 22, 2020

CarolEidt commented Sep 23, 2020

BruceForstall commented Sep 24, 2020

AndyAyersMS commented Sep 24, 2020

BruceForstall commented Sep 24, 2020