Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expand Virtual Call targets earlier in Morph and allow CSE of the indirections #47808

Merged
merged 9 commits into from
Feb 16, 2021

Conversation

briansull
Copy link
Contributor

Use COMPlus_JitExpandCallsEarly=0 to disable and use old behavior

@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Feb 3, 2021
@briansull
Copy link
Contributor Author

briansull commented Feb 3, 2021

SuperPMI AsmDiffs indicate thatwe generally see improves in code size:

337 total files with Code Size differences (211 improved, 126 regressed), 84 unchanged.
337 total methods with Code Size differences (211 improved, 126 regressed), 84 unchanged.
7712 total files with Code Size differences (5710 improved, 2002 regressed), 896 unchanged.
7712 total methods with Code Size differences (5710 improved, 2002 regressed), 896 unchanged.
369 total files with Code Size differences (260 improved, 109 regressed), 49 unchanged.
369 total methods with Code Size differences (260 improved, 109 regressed), 49 unchanged.

Updated SuperPmi numbers:

Summary of Code Size diffs:
310 total files with Code Size differences (212 improved, 98 regressed), 85 unchanged.
310 total methods with Code Size differences (212 improved, 98 regressed), 85 unchanged.
Summary of Code Size diffs:
7237 total files with Code Size differences (5510 improved, 1727 regressed), 726 unchanged.
7237 total methods with Code Size differences (5510 improved, 1727 regressed), 726 unchanged.
Summary of Code Size diffs:
364 total files with Code Size differences (251 improved, 113 regressed), 49 unchanged.
364 total methods with Code Size differences (251 improved, 113 regressed), 49 unchanged.

Copy link
Member

@AndyAyersMS AndyAyersMS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any way you can think of to share the expansion logic with lower?

Note in a pure AOT compile (not sure how we know this) all the method pointer fetches can be marked invariant as there's no runtime backpatching.

src/coreclr/jit/morph.cpp Show resolved Hide resolved
src/coreclr/jit/morph.cpp Outdated Show resolved Hide resolved
src/coreclr/jit/morph.cpp Outdated Show resolved Hide resolved
src/coreclr/jit/morph.cpp Outdated Show resolved Hide resolved
src/coreclr/jit/morph.cpp Outdated Show resolved Hide resolved
src/coreclr/jit/morph.cpp Show resolved Hide resolved
// result = [tmp + vtabOffsOfIndirection + vtabOffsAfterIndirection + [tmp + vtabOffsOfIndirection]]
//
//
// If relative pointers are also in second level indirection, additional temporary is used:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realize this just mirrors what is over in LowerVirtualVtableCall, but I don't undertstand the If here -- there is no conditional code that follows.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed the comment to

// When isRelative is true we need to setup two temporary variables

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The are two cases being covered here the 'isRealtive' case and the normal case.
Currently I have never seen 'isRelative' being true. I will add an assert and look into when it is used by the VM.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if we ever test these paths.

cc @jkotas

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These paths were added by Samsung. There were only ever used by Tizen armel target.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is set here:

*isRelative = MethodTable::VTableIndir_t::isRelative ? 1 : 0;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alpencolt Are you still using the relative pointers?

We use it in FNV (CoreCLR 3.1), but this mode doesn't work in .NET 6 now.
If we'll improve R2R performance relative pointers will not be used. Otherwise we'll have to fix FNV.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jkotas Yes, But I wasn't able to see how MethodTable::VTableIndir_t::isRelative was ever set to non-zero

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is set to non-zero when FEATURE_NGEN_RELOCS_OPTIMIZATIONS is defined.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jkotas Maybe it gets set in some tricky way that I couldn't figure out.
Anyway I will leave support in for this until we get a better understanding of what Samsung's plans are for this.

// var2 = var1 + vtabOffsOfIndirection + vtabOffsAfterIndirection + [var1 + vtabOffsOfIndirection]
// result = [var2] + var2
//
unsigned varNum1 = lvaGrabTemp(true DEBUGARG("var1 - vtab"));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be nice to have a bit more descriptive name for these temps, eg "first-level vtable", "second-level vtable"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is that what they mean? it wasn't clear to me that was their meanings.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe the first one could be called the "indirection slot address", and the second one the "slot address"?

See eg:

//-------------------------------------------------------------------
// The VTABLE
//
// Rather than the traditional array of code pointers (or "slots") we use a two-level vtable in
// which slots for virtual methods live in chunks. Doing so allows the chunks to be shared among
// method tables (the most common example being between parent and child classes where the child
// does not override any method in the chunk). This yields substantial space savings at the fixed
// cost of one additional indirection for a virtual call.
//
// Note that none of this should be visible outside the implementation of MethodTable; all other
// code continues to refer to a virtual method via the traditional slot number. This is similar to
// how we refer to non-virtual methods as having a slot number despite having long ago moved their
// code pointers out of the vtable.
//
// Consider a class where GetNumVirtuals is 5 and (for the sake of the example) assume we break
// the vtable into chunks of size 3. The layout would be as follows:
//
// pMT chunk 1 chunk 2
// ------------------ ------------------ ------------------
// | | | M1() | | M4() |
// | fixed-size | ------------------ ------------------
// | portion of | | M2() | | M5() |
// | MethodTable | ------------------ ------------------
// | | | M3() |
// ------------------ ------------------
// | ptr to chunk 1 |
// ------------------
// | ptr to chunk 2 |
// ------------------
//
// We refer to "ptr to chunk 1" and "ptr to chunk 2" as "indirection slots."

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to get rid of the unused isRelative code path,
I'm really not OK with changing these names here when I don't really understand the implementation
and I didn't see the implementation (or documentation of it) in the VM directory.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@AndyAyersMS
The comment in methodtable.h that you linked to is what we use when isRelative is false.

It results in three indirections

@briansull briansull linked an issue Feb 8, 2021 that may be closed by this pull request
@JulieLeeMSFT JulieLeeMSFT added this to the 6.0.0 milestone Feb 8, 2021
@AndyAyersMS AndyAyersMS mentioned this pull request Feb 9, 2021
54 tasks
…expanded early during fgMorph

Added COMPlus_JitExpandCallsEarly variable to enable virtual calls to be expanded early on a per method basis
Set opts.compExpandCallsEarly to true when we are optimizing and have COMPlus_JitExpandCallsEarly enabled
Update gtSetEvalOrder to also include the call->gtControlExpr
Update morph to call fgExpandVirtualVtableCallTarget when we are expanding early
Update lower to not call LowerVirtualVtableCall when we have already expanded it early
Modify CheckTreeId to print the duplicated gtTreeID before it asserts.

All tests are passing when using COMPLUS_JitExpandCallsEarly=*

Expand the Virtual Call target after we morph the args
Fix an inadvertent change in the GT_CALL weights
Use COMPlus_JitExpandCallsEarly=0 to disable and use old behavior
Added comment stating the the isRelative code path is never executed
@briansull
Copy link
Contributor Author

I am investigating a few issues with 'Extra flags on tree' that I uncovered using superpmi.

@briansull
Copy link
Contributor Author

briansull commented Feb 12, 2021

I have addressed all of the issues that I found while running Super PMI with this change.
There were a few issues with keeping the gtFlags up to date when we have a gtControlExpr
And for tailcalls when we have a virtual call with an instance pointer that had a side-effect, the code gen was sub-optimal,
to what we would generate for a late expansion. This was probably more of a tailcall codegen issue, but I now check for the (rare) case and do a late expansion instead.

@AndyAyersMS PTAL

@briansull
Copy link
Contributor Author

/azp run runtime-coreclr jitstress

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@briansull
Copy link
Contributor Author

/azp run runtime-coreclr jitstress

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

Copy link
Member

@AndyAyersMS AndyAyersMS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pointed out one small style issue...


// Should we expand virtual call targets early for this method?
//
if (opts.compExpandCallsEarly == true)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (opts.compExpandCallsEarly == true)
if (opts.compExpandCallsEarly)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure

@briansull
Copy link
Contributor Author

/azp run runtime-coreclr jitstress

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@briansull
Copy link
Contributor Author

/azp run runtime-coreclr jitstress

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@briansull briansull merged commit 02b7358 into dotnet:master Feb 16, 2021
@ghost ghost locked as resolved and limited conversation to collaborators Mar 18, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

JIT: allow CSE a hoisting of vtable lookups
5 participants