[6.0] Port "Change profilers to use thread local evacuation counters (#59741)" to 6.0 #60116

davmason · 2021-10-07T08:22:37Z

Description

When running on 6.0 RC1 a customer detected a regression in P95 latency. After investigation it was tracked down to being caused by the notification profiler feature for three reasons

Using one global "callback in progress" counter profiler led to lots of contention across threads
C++ lambdas on hot paths are not optimized as well as traditional functions
One of the changes in the notification profiler feature caused a latent bug that could cause profilers to unintentionally disable inlining to start happening an order of magnitude more often
This fix addresses all three of these issues

Testing

CI tests, diagnostics tests, local profiler stress tests

Confirmed by the customer that it solves the P95 regression

Risk

The risk of this change is that it could affect performance or introduce a subtle profiler bug.

For performance we validated that customer app performance is as we expect

For profiler tests we ran the suite of profiler tests, as well as one off stress testing of attaching and detaching multiple profilers in a loop for hours

* Change profilers to use thread local evacuation counters Change to prefix increment * get rid of lambdas * Fix jit inlining, fix R2R too * Remove VolatilePtr<> from helpers * Get rid of additionalData argument

ghost · 2021-10-07T08:22:44Z

Tagging subscribers to this area: @tommcdon
See info in area-owners.md if you want to be subscribed.

Issue Details

Description

When running on 6.0 RC1 the Bing team detected a regression in P95 latency. After investigation it was tracked down to being caused by the notification profiler feature for three reasons

Using one global "callback in progress" counter profiler led to lots of contention across threads
C++ lambdas on hot paths are not optimized as well as traditional functions
One of the changes in the notification profiler feature caused a latent bug that could cause profilers to unintentionally disable inlining to start happening an order of magnitude more often
This fix addresses all three of these issues

Testing

CI tests, diagnostics tests, local profiler stress tests

Confirmed by Bing that it solves the P95 regression

Risk

The risk of this change is that it could affect performance or introduce a subtle profiler bug.

For performance we validated that Bing performance is as we expect

For profiler tests we ran the suite of profiler tests, as well as one off stress testing of attaching and detaching multiple profilers in a loop for hours

Author:	davmason
Assignees:	davmason
Labels:	`area-Diagnostics-coreclr`
Milestone:	6.0.0

jeffschwMSFT

Approved. Please make sure to get several code reviews, we should take for consideration in .NET 6.

noahfalk

Hey David this looks good! I spent a while scrutinizing and didn't find anything that is likely to be significant, but a couple things inline to check out.

noahfalk · 2021-10-08T23:01:58Z

src/coreclr/inc/profilepriv.inl


+FORCEINLINE BOOL ProfControlBlock::RequiresGenericsContextForEnterLeave()
+{
+    return AnyProfilerPassesCondition(&RequiresGenericsContextForEnterLeaveHelper); 


Not that you should change it, but do notification profilers support ELT? I didn't recall that they did so checking all of them seemed unnecessary.

noahfalk · 2021-10-08T23:26:36Z

src/coreclr/inc/profilepriv.inl


-    return AnyProfilerPassesCondition([](ProfilerInfo *pProfilerInfo) { return pProfilerInfo->curProfStatus.Get() >= kProfStatusActive; });
+    return (&g_profControlBlock)->mainProfilerInfo.pProfInterface.Load() != NULL 


I didn't spot a bug here, but the change in semantics makes the CHECK_PROFILER_STATUS() macro very misleading. That macro relies on this call to avoid delivering notifications to profilers that are not in the active state. Presumably that filtering no longer matters because the filtering occured earlier in DoProfilerCallback, but it is confusing to see the macro not doing correctly what it claims to do.

I reviewed the other sites that called CORProfilerPresent() to see if any of them would be negatively affected and didn't spot anything else that was clearly suspicious, but it would be good for you to scan for it too if you hadn't already. I'm fine not changing the CHECK_PROFILER_STATUS() macro for .NET 6, but it would be good to improve the situation in main.

noahfalk · 2021-10-08T23:33:47Z

src/coreclr/inc/profilepriv.inl

 }

-inline BOOL CORProfilerTrackConditionalWeakTableElements()
+FORCEINLINE BOOL CORProfilerTrackConditionalWeakTableElements()


The test below for IsCallback5Supported iterates all profilers so this test is probably much more expensive than it used to be. If it is used in a very hot path there may be a perf issue lurking.

noahfalk · 2021-10-10T00:38:33Z

src/coreclr/vm/threads.h

+    //---------------------------------------------------------------
+    // Why volatile?
+    // See code:ProfilingAPIUtility::InitializeProfiling#LoadUnloadCallbackSynchronization.
+    Volatile<DWORD> m_dwProfilerEvacuationCounters[MAX_NOTIFICATION_PROFILERS + 1];


I think you made the right choice to translate the scheme directly from global to per-thread for now. If future performance analysis shows that we need improvements in notification callback perf or per-thread memory usage there are some techniques we could apply to optimize this.

Change profilers to use thread local evacuation counters (dotnet#59741)

123da60

* Change profilers to use thread local evacuation counters Change to prefix increment * get rid of lambdas * Fix jit inlining, fix R2R too * Remove VolatilePtr<> from helpers * Get rid of additionalData argument

davmason added the area-Diagnostics-coreclr label Oct 7, 2021

davmason added this to the 6.0.0 milestone Oct 7, 2021

davmason requested review from noahfalk, jkotas, jeffschwMSFT, mangod9 and a team October 7, 2021 08:22

davmason self-assigned this Oct 7, 2021

jeffschwMSFT added the Servicing-consider Issue for next servicing release review label Oct 7, 2021

jeffschwMSFT approved these changes Oct 7, 2021

View reviewed changes

jkotas approved these changes Oct 7, 2021

View reviewed changes

mangod9 approved these changes Oct 7, 2021

View reviewed changes

tommcdon approved these changes Oct 7, 2021

View reviewed changes

leecow added Servicing-approved Approved for servicing release and removed Servicing-consider Issue for next servicing release review labels Oct 7, 2021

Anipik merged commit 386e5c1 into dotnet:release/6.0 Oct 8, 2021

noahfalk reviewed Oct 10, 2021

View reviewed changes

ghost locked as resolved and limited conversation to collaborators Nov 9, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[6.0] Port "Change profilers to use thread local evacuation counters (#59741)" to 6.0 #60116

[6.0] Port "Change profilers to use thread local evacuation counters (#59741)" to 6.0 #60116

davmason commented Oct 7, 2021 •

edited

Loading

ghost commented Oct 7, 2021

Description

Testing

Risk

jeffschwMSFT left a comment

noahfalk left a comment

noahfalk Oct 8, 2021

noahfalk Oct 8, 2021

noahfalk Oct 8, 2021

noahfalk Oct 10, 2021


		return AnyProfilerPassesCondition([](ProfilerInfo *pProfilerInfo) { return pProfilerInfo->curProfStatus.Get() >= kProfStatusActive; });
		return (&g_profControlBlock)->mainProfilerInfo.pProfInterface.Load() != NULL

[6.0] Port "Change profilers to use thread local evacuation counters (#59741)" to 6.0 #60116

[6.0] Port "Change profilers to use thread local evacuation counters (#59741)" to 6.0 #60116

Conversation

davmason commented Oct 7, 2021 • edited Loading

Description

Testing

Risk

ghost commented Oct 7, 2021

Description

Testing

Risk

jeffschwMSFT left a comment

Choose a reason for hiding this comment

noahfalk left a comment

Choose a reason for hiding this comment

noahfalk Oct 8, 2021

Choose a reason for hiding this comment

noahfalk Oct 8, 2021

Choose a reason for hiding this comment

noahfalk Oct 8, 2021

Choose a reason for hiding this comment

noahfalk Oct 10, 2021

Choose a reason for hiding this comment

davmason commented Oct 7, 2021 •

edited

Loading