Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JIT: morph blocks in RPO #94247

Merged
merged 1 commit into from
Nov 2, 2023
Merged

JIT: morph blocks in RPO #94247

merged 1 commit into from
Nov 2, 2023

Conversation

AndyAyersMS
Copy link
Member

When optimizing, process blocks in RPO. Disallow creation of new blocks and new flow edges (the latter with certain preapproved exceptions).

Morph does not yet take advantage of the RPO to enable more optimization.

Contributes to #93246.

When optimizing, process blocks in RPO. Disallow creation of new blocks
and new flow edges (the latter with certain preapproved exceptions).

Morph does not yet take advantage of the RPO to enable more optimization.

Contributes to dotnet#93246.
@ghost ghost assigned AndyAyersMS Oct 31, 2023
@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Oct 31, 2023
@ghost
Copy link

ghost commented Oct 31, 2023

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

Issue Details

When optimizing, process blocks in RPO. Disallow creation of new blocks and new flow edges (the latter with certain preapproved exceptions).

Morph does not yet take advantage of the RPO to enable more optimization.

Contributes to #93246.

Author: AndyAyersMS
Assignees: AndyAyersMS
Labels:

area-CodeGen-coreclr

Milestone: -

@AndyAyersMS
Copy link
Member Author

AndyAyersMS commented Oct 31, 2023

@jakobbotsch PTAL
cc @dotnet/jit-contrib

Not yet a general purpose utility -- will have to wait for a second "customer" to figure out how to best refactor the RPO traversal.

A bit higher TP impact than I'd like, around 0.25%. Some of this was paid for in advance by #93704, which saved about 0.15%. I will keep looking for ways to pare this down.

Some diffs from places where morph is order sensitive. Mainly seems to be bbNum changes since we now do an up-front renumber and/or changes in when a local is marked as address exposed. For example fgMorphPotentialTailCall checks all lcl vars looking for exposed locals, but the set of exposed locals can change as we morph, so depending on ordering we may do more or fewer tail calls.

@AndyAyersMS
Copy link
Member Author

Re the order sensitivity (at least for address exposure): one could perhaps argue that any locals introduced during morph shouldn't count... that might reduce or remove this source of diffs all together.

//
fgRenumberBlocks();
EnsureBasicBlockEpoch();
fgComputeEnterBlocksSet();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How much of the cost comes from having to do the above setup calls before even computing the order?
When I've tried to compute the order before it wasn't clear to me why I would need to renumber blocks to do that.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can grab a new profile, but the old one is probably pretty accurate: see #86822 (comment)

Renumbering likely isn't needed and not renumbering should save a bit of TP. So let me try that.

Also looking back at that early PR reminded me that I should take a look at if SSA's topological sort, it might be more efficient.

Copy link
Member

@jakobbotsch jakobbotsch Nov 1, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's also possible we can make DfsBlockEntry cheaper in the same way as #86839 (even though GetSucc looks cheap, it would probably still help a bit).

It also confuses me a bit that we have multiple notions of start nodes with fgEnterBlks and fgDomFindStartNodes. I assume the nodes found exclusively by fgDomFindStartNodes are unreachable (?) so ideally we would get rid of them earlier, but even if not, isn't it optimizable by using bbRefs? If not, then I think using VisitRegularSuccs would be more efficient.

Anyway, don't feel the need to address it in this PR -- but might be a couple of avenues of future cleanup/improvements.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like for now at least renumbering is needed. Let me add some notes to #93246 about making all this more efficient. I'll come back to that after I get the assertion prop part working.

@jakobbotsch
Copy link
Member

Re the order sensitivity (at least for address exposure): one could perhaps argue that any locals introduced during morph shouldn't count... that might reduce or remove this source of diffs all together.

Another source of these is that block morphing depends on whether structs are marked as DNER, and morph can mark structs as DNER in some cases.

I think it's fine (and preferable) to take this churn – it at least makes it more well defined and depending on a logical order when this may happen, so probably reduces spurious diffs in the long run.

@AndyAyersMS
Copy link
Member Author

/azp run runtime-coreclr jitstress

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

Comment on lines +13924 to +13936
// Allow edge creation to genReturnBB (target of return merging)
// and the scratch block successor (target for tail call to loop).
// This will also disallow dataflow into these blocks.
//
if (genReturnBB != nullptr)
{
genReturnBB->bbFlags |= BBF_CAN_ADD_PRED;
}
if (fgFirstBBisScratch())
{
fgFirstBB->Next()->bbFlags |= BBF_CAN_ADD_PRED;
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's unfortunate we need these special cases (and the new block flag), but I understand you tried to avoid it and this is a compromise.

fgFirstBB->Next()->bbFlags |= BBF_CAN_ADD_PRED;
}

unsigned const bbNumMax = fgBBNumMax;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you capture fgBBNumMax because the number of blocks can change during morph? But you've disabled block creation. Do you want an assert after the for loop to assert no new blocks were created?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah this is a vestige of my earlier versions where the number of blocks could change, before I removed all the bits that create new blocks.

You think this should have assert(fgBBNumMax == bbNumMax)?

If so, sure. I can add that in my next PR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You think this should have assert(fgBBNumMax == bbNumMax)?

Right. That would clarify the assumption that the set of blocks doesn't change across this loop.

@AndyAyersMS
Copy link
Member Author

Jit stress failure looks like #93321, probably unrelated.

@AndyAyersMS AndyAyersMS merged commit 655b177 into dotnet:main Nov 2, 2023
150 of 155 checks passed
@ghost ghost locked as resolved and limited conversation to collaborators Dec 2, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants