Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JIT: Try to maintain fallthrough in Compiler::fgSplitEdge #107419

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

amanasifkhalid
Copy link
Member

Given some branch curr -> succ, fgSplitEdge introduces an intermediary block to create the form curr -> newBlock -> succ. The lexical placement of newBlock wouldn't matter if it weren't for the fact that we call fgSplitEdge during LSRA, after we've reordered blocks. Ideally, we wouldn't introduce any new blocks after establishing layout, but for the time being, always placing newBlock after curr to create fallthrough -- effectively moving the curr -> succ branch up a block, and not introducing new branches -- seems to be a decent compromise.

@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Sep 5, 2024
Copy link
Contributor

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

@amanasifkhalid
Copy link
Member Author

cc @dotnet/jit-contrib, @AndyAyersMS PTAL. Diffs are large, though they're inflated by collections with tiering (libraries_tests in particular). This change in placement is particularly impactful when we aren't optimizing, so we can't rely on block layout to fix things later -- though it looks like there are plenty of instances where LSRA introduces a new block after layout when we are optimizing, and this new placement is better at maintaining fallthrough.

@AndyAyersMS
Copy link
Member

Seems like perhaps fgSplitEdge should behave differently before/after layout? Before the placement is not important, after, it is...?

That being said, I have seen those LSRA blocks end up very poorly placed, and we might question whether after source or before target is the right location for the new block, when source and target blocks are not adjacent, or whether we just (as we've discussed) redo layout if LSRA makes changes.

@amanasifkhalid
Copy link
Member Author

I suppose we could differentiate between before/after layout, though since we're seeing this block placement affect FullOpts with some frequency, it might be more worthwhile to get layout working after LSRA.

That being said, I have seen those LSRA blocks end up very poorly placed, and we might question whether after source or before target is the right location for the new block

Between the two, I think placing after source is better most of the time? If the target has multiple preds, placing before target could potentially break up hotter fallthrough. If the source has only one successor, then placing after the source doesn't change anything, but if the source is a BBJ_COND, then the worst-case scenario is we had some layout like source, falseTarget, target, and now we have source, newBlock, falseTarget, target; we no longer have fallthrough into either successor of source.

If you'd like, I can put this PR on-hold and work on getting layout working after LSRA. For that, I'm thinking we can continue to run layout before LSRA, and detect if we need to run it again after, just to minimize regressions. Once we've refactored switch recognition to not depend on lexical layout (#107076), we should be able to only run layout once, after LSRA.

@AndyAyersMS
Copy link
Member

I think it makes sense to look into fixing layout post LSRA first. If that turns out to be complicated, then perhaps we can reconsider and do this first.

I believe LSRA will only split "critical edges"— edges where the source has multiple flow successors and the target has multiple flow predecessors. If we trust the initial layout then if source and target are adjacent then putting the block in between is the right thing; if not, presumably both source and target have other preferred partners and if those partners are adjacent then the new block should be placed out of the way somewhere where it won't break up any other important flow (though not necessarily at the end of the method / region, which is what I commonly see). If the preferred source or target partner is not adjacent (or maybe if no successor/predecessor is adjacent), then we should be placing after source or before target depending.

This calculus would be easier if we hand the flow scoring that we are envisioning for k-opt, as we could evaluate the different options and pick the one that has the best score.

@amanasifkhalid
Copy link
Member Author

With #107483 merged, the diffs are still big, though FullOpts diffs are concentrated in coreclr_tests and libraries_tests. After looking at the JIT dumps for a few examples, I see that fgSplitEdge's block placement during LSRA, as expected, is no longer meaningful in FullOpts, as block layout handles it. But the call site during profile incorporation is now the source of diffs, thanks to various early phases having dependencies on lexical block ordering. fgUpdateFlowGraph is one obvious culprit -- I can look into refactoring that next.

@AndyAyersMS are you ok taking this change as-is, or would you prefer we whittle down the churn as much as possible?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants