-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
JIT: make global morph its own phase, morph blocks in RPO #86822
Conversation
Some refactoring in antcipation of enabling cross-block local assertion prop during global morph. Process blocks in RPO. Try and verify that no newly added blocks can alter the set of assertions that flow into the pre-existing ones.
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch Issue DetailsSome refactoring in antcipation of enabling cross-block local assertion prop during global morph. Process blocks in RPO. Try and verify that no newly added blocks can alter the set of assertions that flow into the pre-existing ones.
|
A bunch of mostly neutral diffs, looks mostly like frame offset churn because temps are created in different orders (which reminds me, we should try and sort locals based on weight or something). No new functionality actually enabled yet. |
@jakobbotsch PTAL |
Not surprisingly x86 has some special behavior here I'll have to chase down. The assert I added to try and verify that morph only makes "harmless" edits to flow is firing. I suspect it suffers from both false positives and false negatives. The main purpose behind the assert is to try and validate that any cross-block AP we might do on the blocks that exist at the start of morph (not yet happening here, but in some future PR) remain valid even if new blocks get introduced and flow is altered during morph. So far the edits I have seen are the following:
Another wrinkle we might consider here is to aggressively prune (or at least mark) unreachable blocks rather than morph them, doing so will also start to trim down the pred sets for blocks and have similar impact as the above. In fact we could do a dynamic evolution of the RPO by turning this into a worklist driven traversal (similar to how the importer runs). |
TP Impact is currently a bit high, but the hope is we can recoup that by reducing the volume of IR leaving morph., at least when optimizations are enabled, or else justify tje TP hit with better CQ. But likely we need to iterate in normal block order if we're not optimizing since RPO will offer no benefit. |
For the time being we can relax or remove the assert, avoid using RPO unless optimized, and look for better ways to track the set of introduced blocks (so we don't have to search the bblist for them later). |
The diffs are likely just not representative due to too many of missing contexts ( |
Locally I see fewer missed contexts (or perhaps just am not paying attention). At any rate not doing RPO unless we're optimizing knocked down the diffs considerably and should remove the minopts TP impact. I also optimized the new block visits for the optimized case and gave up (for now) trying to detect if the flow alterations and possible code motion could pose problems later on. I am now seeing diffs where there is an order dependence for block morphing. Block morph is sensitive to a local var's DNER state, and this gets set during global morph as needed, so depending on the relative ordering of the block morph and DNER setting we can get different expansions. I don't think it is a correctness issue since we already have this order dependence now. |
for (BasicBlock* const block : *fgNewBBs) | ||
{ | ||
fgMergeBlockReturn(block); | ||
fgMorphBlock(block); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this a significant chunk of throughput cost? What is the improvement of this tracking mechanism alone?
fgRenumberBlocks(); | ||
EnsureBasicBlockEpoch(); | ||
fgComputeEnterBlocksSet(); | ||
fgDfsReversePostorder(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm curious why we need to do so much work to set up for the RPO traversal. SSA's TopologicalSort
does not do any of this work; it simply starts at fgFirstBB
(well, the BB root, but then the original fgFirstBB
very soon after that). Would it be incorrect to do the same here?
Another idea: can we avoid the two step iteration process, and instead have a version of fgDfsReversePostorder
that just invokes a callback instead of recording the order into ambient state? Would it make any difference? I'm not totally sure where the throughput cost comes from in this change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm curious why we need to do so much work to set up for the RPO traversa
We can try and do all this on the fly, that's more or less what I was suggesting above:
we could do a dynamic evolution of the RPO by turning this into a worklist driven traversal (similar to how the importer runs).
I'm not totally sure where the throughput cost comes from in this change.
I don't understand it either, seems like this latest version should (aside from building the RPO) be fairly efficient. Will have to take a look.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another idea: can we avoid the two step iteration process, and instead have a version of
fgDfsReversePostorder
that just invokes a callback instead of recording the order into ambient state? Would it make any difference? I'm not totally sure where the throughput cost comes from in this change.
I suspect this is tricky for phases like morph that also can alter the flow graph.
Note value numbering (via |
Not going to happen in .NET 8, so will close for now. |
Some refactoring in antcipation of enabling cross-block local assertion prop during global morph.
Process blocks in RPO. Try and verify that no newly added blocks can alter the set of assertions that flow into the pre-existing ones.