Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[6.0] Port "Change profilers to use thread local evacuation counters (#59741)" to 6.0 #60116

Merged
merged 1 commit into from
Oct 8, 2021

Conversation

davmason
Copy link
Member

@davmason davmason commented Oct 7, 2021

Description

When running on 6.0 RC1 a customer detected a regression in P95 latency. After investigation it was tracked down to being caused by the notification profiler feature for three reasons

  • Using one global "callback in progress" counter profiler led to lots of contention across threads
  • C++ lambdas on hot paths are not optimized as well as traditional functions
  • One of the changes in the notification profiler feature caused a latent bug that could cause profilers to unintentionally disable inlining to start happening an order of magnitude more often
    This fix addresses all three of these issues

Testing

CI tests, diagnostics tests, local profiler stress tests

Confirmed by the customer that it solves the P95 regression

Risk

The risk of this change is that it could affect performance or introduce a subtle profiler bug.

For performance we validated that customer app performance is as we expect

For profiler tests we ran the suite of profiler tests, as well as one off stress testing of attaching and detaching multiple profilers in a loop for hours

* Change profilers to use thread local evacuation counters

Change to prefix increment

* get rid of lambdas

* Fix jit inlining, fix R2R too

* Remove VolatilePtr<> from helpers

* Get rid of additionalData argument
@davmason davmason added this to the 6.0.0 milestone Oct 7, 2021
@davmason davmason self-assigned this Oct 7, 2021
@ghost
Copy link

ghost commented Oct 7, 2021

Tagging subscribers to this area: @tommcdon
See info in area-owners.md if you want to be subscribed.

Issue Details

Description

When running on 6.0 RC1 the Bing team detected a regression in P95 latency. After investigation it was tracked down to being caused by the notification profiler feature for three reasons

  • Using one global "callback in progress" counter profiler led to lots of contention across threads
  • C++ lambdas on hot paths are not optimized as well as traditional functions
  • One of the changes in the notification profiler feature caused a latent bug that could cause profilers to unintentionally disable inlining to start happening an order of magnitude more often
    This fix addresses all three of these issues

Testing

CI tests, diagnostics tests, local profiler stress tests

Confirmed by Bing that it solves the P95 regression

Risk

The risk of this change is that it could affect performance or introduce a subtle profiler bug.

For performance we validated that Bing performance is as we expect

For profiler tests we ran the suite of profiler tests, as well as one off stress testing of attaching and detaching multiple profilers in a loop for hours

Author: davmason
Assignees: davmason
Labels:

area-Diagnostics-coreclr

Milestone: 6.0.0

@jeffschwMSFT jeffschwMSFT added the Servicing-consider Issue for next servicing release review label Oct 7, 2021
Copy link
Member

@jeffschwMSFT jeffschwMSFT left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved. Please make sure to get several code reviews, we should take for consideration in .NET 6.

@leecow leecow added Servicing-approved Approved for servicing release and removed Servicing-consider Issue for next servicing release review labels Oct 7, 2021
@Anipik Anipik merged commit 386e5c1 into dotnet:release/6.0 Oct 8, 2021
Copy link
Member

@noahfalk noahfalk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey David this looks good! I spent a while scrutinizing and didn't find anything that is likely to be significant, but a couple things inline to check out.


FORCEINLINE BOOL ProfControlBlock::RequiresGenericsContextForEnterLeave()
{
return AnyProfilerPassesCondition(&RequiresGenericsContextForEnterLeaveHelper);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not that you should change it, but do notification profilers support ELT? I didn't recall that they did so checking all of them seemed unnecessary.


return AnyProfilerPassesCondition([](ProfilerInfo *pProfilerInfo) { return pProfilerInfo->curProfStatus.Get() >= kProfStatusActive; });
return (&g_profControlBlock)->mainProfilerInfo.pProfInterface.Load() != NULL
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't spot a bug here, but the change in semantics makes the CHECK_PROFILER_STATUS() macro very misleading. That macro relies on this call to avoid delivering notifications to profilers that are not in the active state. Presumably that filtering no longer matters because the filtering occured earlier in DoProfilerCallback, but it is confusing to see the macro not doing correctly what it claims to do.

I reviewed the other sites that called CORProfilerPresent() to see if any of them would be negatively affected and didn't spot anything else that was clearly suspicious, but it would be good for you to scan for it too if you hadn't already. I'm fine not changing the CHECK_PROFILER_STATUS() macro for .NET 6, but it would be good to improve the situation in main.

}

inline BOOL CORProfilerTrackConditionalWeakTableElements()
FORCEINLINE BOOL CORProfilerTrackConditionalWeakTableElements()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test below for IsCallback5Supported iterates all profilers so this test is probably much more expensive than it used to be. If it is used in a very hot path there may be a perf issue lurking.

//---------------------------------------------------------------
// Why volatile?
// See code:ProfilingAPIUtility::InitializeProfiling#LoadUnloadCallbackSynchronization.
Volatile<DWORD> m_dwProfilerEvacuationCounters[MAX_NOTIFICATION_PROFILERS + 1];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you made the right choice to translate the scheme directly from global to per-thread for now. If future performance analysis shows that we need improvements in notification callback perf or per-thread memory usage there are some techniques we could apply to optimize this.

@ghost ghost locked as resolved and limited conversation to collaborators Nov 9, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants