Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove helper method frame from Profiler hooks #107152

Merged

Conversation

davidwrighton
Copy link
Member

@davidwrighton davidwrighton commented Aug 29, 2024

Remove helper method frame and update the CONTRACTs to suit the new behavior. This WILL change the profiler behavior in that a set of apis in the Profiler api surface will not fail immediately on any attempt to use them. However, the set of apis which change are I believe not safe to call within a ProfilerEnter/Leave hook as an actual triggered GC will cause potential memory corruption if the arguments/return value of the function have any GC pointers.

This be done without the tweaks to remove the CodeVersionManager lock, but in general, I'd like to get rid of that lock from most of the entire runtime. I welcome opinions as to whether I should try to hold the CodeVersionManager lock into another change focussed on that.

@dotnet-issue-labeler dotnet-issue-labeler bot added the needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners label Aug 29, 2024
@teo-tsirpanis teo-tsirpanis added area-VM-coreclr and removed needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners labels Sep 1, 2024
- Now, you can iterate the lists, and read the various rejit id's safely

This enables a few of the apis on the profiler to be safe to call while in a funciton enter/leave hook. From my analysis these functions were previously safe to call and did not assert, but only because of the presence of a HELPER_METHOD_FRAME. Now, they should simply be safe to call and the erroneous failure to assert has been addressed
…elper method frame

- Rework AppDomain::GetFriendlyName logic so that it is always safe for use here.
@davidwrighton davidwrighton marked this pull request as ready for review September 9, 2024 17:19
@davidwrighton
Copy link
Member Author

This be done without the tweaks to remove the CodeVersionManager lock, but in general, I'd like to get rid of that lock from most of the entire runtime. I welcome opinions as to whether I should try to hold the CodeVersionManager lock into another change focussed on that.

Copy link
Contributor

Tagging subscribers to this area: @tommcdon
See info in area-owners.md if you want to be subscribed.

Copy link
Contributor

Tagging subscribers to this area: @mangod9
See info in area-owners.md if you want to be subscribed.

{
CONTRACT (LPCWSTR)
{
THROWS;
if (GetThreadNULLOk()) {GC_TRIGGERS;} else {DISABLED(GC_NOTRIGGER);}
NOTHROW;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't correct. If m_friendlyName isn't in UTF16 format, then GetUnicode() will throw.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SString shouldn't be used as a field type. I would change that to avoid this issue.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

- Get rid of GetFriendlyNameForLogging, as its difference from GetFriendlyName is immaterial at the only time its ever called
@davmason
Copy link
Member

davmason commented Sep 9, 2024

This WILL change the profiler behavior in that a set of apis in the Profiler api surface will not fail immediately on any attempt to use them. However, the set of apis which change are I believe not safe to call within a ProfilerEnter/Leave hook as an actual triggered GC will cause potential memory corruption if the arguments/return value of the function have any GC pointers.

Is "not fail immediately" a typo for "now fail immediately"? As written this worries me, the whole profiler ELT hook concept is not well understood by the community at large, and very poorly documented by us. We shouldn't be introducing new ways for profilers to silenty fail in weird ways.

Copy link
Member

@noahfalk noahfalk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few small suggestions inline, looks like a nice improvement overall. I'm happy to see that lock diminished 👍

}
else
{
_ASSERTE(!"Unexpected rejit state, should be active as there exists a native code version for this IL code version");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect there are corner cases where this isn't true. For example a profiler could request to rejit a method, the JIT starts compiling it, and then before the JIT finishes the profiler requests to rejit it again. By the time the JIT invokes this method the IL version is no longer active.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ILCodeVersion goes from state kStateRequested to kStateGettingReJITParameters to kStateActive. We cannot actually run the JIT without the function being active, as we don't actually have any IL. There is no pathway for an ILCodeVersion to become inactive.

It however can be not THE active IL Code version, which is a different concept (which confusingly doesn't require the ILCodeVersion to have reached kStateActive. Honestly, there are 4 concepts of "active" in the code versioning codebase, all of which mean fairly different things, which makes everything here really confusing to read.

The DebuggerJitInfo can only be created after jitting completes, as the constructor for a DebuggerJitInfo requires a pointer to the start of the jitted code.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes, I forgot that RejitState() == kStateActive was distinct from CodeVersionManager::GetActiveILCodeVersion(). And agreed, the naming collision doesn't make that convenient to discuss. Probably not in this PR, but what you do think about renaming kStateActive to something like kILInitialized or kReJITParametersInitialized? I think that would better capture its meaning and reduce overload on the 'active' term.

CAN_TAKE_LOCK;
GC_TRIGGERS;
CANNOT_TAKE_LOCK;
GC_NOTRIGGER;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code comment below at 1133-1135 is out of sync with the code now that the lock is removed.


// Caller must ensure this lock is taken!
_ASSERTE(CodeVersionManager::IsLockOwnedByCurrentThread());

NativeCodeVersion nativeCodeVersion = pMD->GetCodeVersionManager()->GetNativeCodeVersion(pMD, pCodeStart);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
NativeCodeVersion nativeCodeVersion = pMD->GetCodeVersionManager()->GetNativeCodeVersion(pMD, pCodeStart);
NativeCodeVersion nativeCodeVersion = pCodeVersionManager->GetNativeCodeVersion(pMD, pCodeStart);

Now that the methods are merged we've already got the manager available.

//};

ReJITID m_parentId;
DAC_IGNORE(const) ReJITID m_parentId;
PTR_NativeCodeVersionNode m_pNextMethodDescSibling;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a comment below saying m_pNextILVersionNode is never modified after being added to the list. I'm guessing the same thing holds here and we could add a similar comment?

@noahfalk
Copy link
Member

I welcome opinions as to whether I should try to hold the CodeVersionManager lock into another change focussed on that.

This change seemed like a good scope as-is. I wouldn't add more lock refactoring to this PR if that is what you were asking about.

@davidwrighton
Copy link
Member Author

@davmason, we have logic which attempts to replicate the behavior of our contracts system, but with runtime failures. This is the logic which calls PROFILER_TO_CLR_ENTRYPOINT_SYNC_EX with kP2EETriggers as one of the flags. The behavior will be to return CORPROF_E_UNSUPPORTED_CALL_SEQUENCE. This change makes calls to the following functions throw if used from an profile enter/leave event.

GetAppDomainsContainingModule,
GetILFunctionBodyAllocator,
SetILFunctionBody,
SetILInstrumentedCodeMap,
ForceGC
GetClassFromToken,
GetClassFromTokenAndTypeArgs,
GetFunctionFromTokenAndTypeArgs,
GetAppDomainInfo
RequestReJITWithInliners,
GetReJITIDs,
RequestReJIT,
RequestRevert,
EnumJITedFunctions2,
EnumModules,
EnumNgenModuleMethodsInliningThisMethod,
ApplyMetaData,

This is unfortunate, so I went through the work to make some of the more likely used apis reliably succeed.

However, I can't fix all of them (although I could fix some of them). We also have the problem that we already had an unpredictable failure case, where if we actually hit a GC suspend point during an Enter/Leave hook, we will corrupt memory. This change makes more apis fail, but the ones that work should reliably work unlike the previous model where some stuff worked but only if the customer didn't get happen to get unlucky.

Copy link
Member

@tommcdon tommcdon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

Copy link
Member

@davmason davmason left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, I am a little worried about the implications for the profiler but doing this at the beginning of 10 should help with that

@davidwrighton davidwrighton merged commit 5e560d9 into dotnet:main Sep 11, 2024
90 checks passed
jtschuster pushed a commit to jtschuster/runtime that referenced this pull request Sep 17, 2024
Remove helper method frame and update the CONTRACTs to suit the new behavior. This WILL change the profiler behavior in that a set of apis in the Profiler api surface will not fail immediately on any attempt to use them. However, the set of apis which change are I believe not safe to call within a ProfilerEnter/Leave hook as an actual triggered GC will cause potential memory corruption if the arguments/return value of the function have any GC pointers.
sirntar pushed a commit to sirntar/runtime that referenced this pull request Sep 30, 2024
Remove helper method frame and update the CONTRACTs to suit the new behavior. This WILL change the profiler behavior in that a set of apis in the Profiler api surface will not fail immediately on any attempt to use them. However, the set of apis which change are I believe not safe to call within a ProfilerEnter/Leave hook as an actual triggered GC will cause potential memory corruption if the arguments/return value of the function have any GC pointers.
@github-actions github-actions bot locked and limited conversation to collaborators Oct 12, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants