Skip to content
This repository has been archived by the owner on Jun 10, 2020. It is now read-only.

CPU usage slowly increases over time #690

Closed
SteveDesmond-ca opened this issue May 19, 2018 · 82 comments
Closed

CPU usage slowly increases over time #690

SteveDesmond-ca opened this issue May 19, 2018 · 82 comments
Assignees
Labels
Milestone

Comments

@SteveDesmond-ca
Copy link

I had previously used Application Insights, but no longer do, however the App Insights code remained in my app. I noticed when I recently migrated hosts and upgraded to .NET Core 2.1.0-rc1 that the CPU usage gradually increased over the course of a few days.
image
The drops back to 0 are Docker container restarts.

Removing Application Insights from the site results in a steady almost-zero CPU usage.

See also this Twitter thread, though there's not much more there.

Repro Steps

  1. Add Application Insights (2.3.0-beta2) to an ASP.NET Core MVC app via NuGet and Startup.cs
    app.AddApplicationInsightsTelemetry("{no-longer-valid instrumentation key}")
  2. Setup some sort of synthetic monitoring to hit the site every minute
  3. Wait a couple days ¯_(ツ)_/¯

Actual Behavior

  • CPU slowly creeps up over the course of a few days

Expected Behavior

  • CPU remains stable

Version Info

SDK Version : 2.1.0-rc1
.NET Version : 2.1.0-rc1
How Application was onboarded with SDK(VisualStudio/StatusMonitor/Azure Extension) :
OS : Ubuntu 16.04
Hosting Info (IIS/Azure WebApps/ etc) : Azure VM

@cijothomas
Copy link
Contributor

Thanks for reporting. We'll try to setup a repro and investigate more.
What was the version of application insights SDK which you used?

@SteveDesmond-ca
Copy link
Author

SteveDesmond-ca commented May 23, 2018 via email

@cijothomas
Copy link
Contributor

image
Left an app in .net 2.0 running for last 3 days. Dont see any increase in mem/cpu

@cijothomas
Copy link
Contributor

image
Similar for app in .net 2.1 for last 3 days.

@cijothomas
Copy link
Contributor

I'll setup something for linux now.

@thohng
Copy link

thohng commented May 30, 2018

image

We got same issue on Ubuntu 16.04

  • After upgrade AspNetCore 2.1.0-rc1
  • Never got CPU issue with AspNetCore 2.0.x before
  • Not sure this caused by new AspNetCore 2.1.0-rc1 or new ApplicationInsights 2.3.0-beta2
  • Internally, we got few HttpClient failed a day, might related to https://github.com/dotnet/corefx/issues/29285

Before Upgrade:

<PackageReference Include="Microsoft.ApplicationInsights.AspNetCore" Version="2.2.1" />
<PackageReference Include="Microsoft.AspNetCore.ApplicationInsights.HostingStartup" Version="2.0.2" />
<PackageReference Include="Microsoft.AspNetCore.Mvc" Version="2.0.3" />

After:

<PackageReference Include="Microsoft.ApplicationInsights.AspNetCore" Version="2.3.0-beta2" />
<PackageReference Include="Microsoft.AspNetCore.ApplicationInsights.HostingStartup" Version="2.1.0-rc1-final" />
<PackageReference Include="Microsoft.AspNetCore.Mvc" Version="2.1.0-rc1-final" />

@cijothomas
Copy link
Contributor

Thanks for reporting this. We will setup a linux test to do deeper investigation soon.

@cijothomas
Copy link
Contributor

Have an asp.net core 2.1 app with AI SDK 2.3, running in ubuntu 16_04. A script is pinging the app once every 10 sec.
WIll let this run for few hours/day and check again.

@cijothomas
Copy link
Contributor

6hours, no visible change in CPU load. CPU is constant around .39%
Load is 6Request/Sec
Will leave this overnight.

@thohng
Copy link

thohng commented Jun 1, 2018

When we stop pinging, the CPU usage is no more keep constantly increase.

image

@mehmetilker
Copy link

Same thing happening on Windows as well (Azure App service)
Not only CPU but also Garbage collector affected.
I upgrade my .Net core 2.0 site to .Net core 2.1 on Jun 3 afternoon. And yesterday I have removed AI Insight extension from my app service (Nuget package reference still in my project.)
You can see changes in attached images.

cpu
gen

@cijothomas
Copy link
Contributor

repro-ed on linux vm. Need to investigate more.

@cijothomas cijothomas added this to the 2.4.0 milestone Jun 8, 2018
@flagbug
Copy link

flagbug commented Jun 12, 2018

@SteveDesmond-ca @mehmetilker @thohng I've seen a similar thing since upgrading to .NET Core 2.1 and not only does our CPU usage increase over time, but our open handles increase dramatically since the upgrade:

capture

Could you take a look at your handle count and confirm if you have the same issue or not?

@cwe1ss
Copy link

cwe1ss commented Jun 12, 2018

I'm recently having issues with increased memory usage. I checked it with dotMemory and I've seen many System.Net.Http related objects - e.g. open handlers that haven't been closed etc.

I'm not yet sure if it's Application Insights though (I will do some more tests next week) - but if it is, maybe this clue helps finding the issue.

@ByronAP
Copy link

ByronAP commented Jun 18, 2018

I also have increasing CPU which seems to be from HTTP. It was not noticeable until I went to production which looked great after deployment, but a few hours later everything was redlined due to ever-increasing CPU demand as shown below. Unfortunately, this is a show stopper and is forcing me to rollback.

Runtime: core 2.1 Linux x64
Containers: YES Docker
OS: UBUNTU & CENTOS

cpu

@flagbug
Copy link

flagbug commented Jun 21, 2018

We've downgraded Application Insights back to v2.2.1 and don't see the increasing CPU and memory usage anymore, so I can confirm that this is an issue with Application Insights.

@cijothomas Any chance this can be looked into? This basically makes version 2.3.0 not usable for production scenarios

@cijothomas
Copy link
Contributor

@flagbug Thanks for helping narrow down to 2.3.0. We will prioritize this investigation.

@cijothomas
Copy link
Contributor

Related, maybe the cause as well: microsoft/ApplicationInsights-dotnet#594

@nycdotnet
Copy link

Might want to see if HttpClientFactory solves the issue. Static HttpClient is not without its issues. https://blogs.msdn.microsoft.com/webdev/2018/02/28/asp-net-core-2-1-preview1-introducing-httpclient-factory/

@cwe1ss
Copy link

cwe1ss commented Jun 29, 2018

I now also ran our application without App Insights for a few days and I don't see any stale Http-related objects in memory so I'm also "blaming" Application Insights for now.

Could you please prioritize this? This has been open for more than a month and a profiler that leaks memory is a "really bad thing" ™️...

/cc @SergeyKanzhelev @lmolkova (fyi, in case you didn't know about this yet)

@cijothomas
Copy link
Contributor

@cwe1ss Were you facing issues in linux or windows or both?

@cwe1ss
Copy link

cwe1ss commented Jul 2, 2018

My setup: Windows Server 2016 on Azure (Standard_D11_v2, 2 cores 16GB RAM); Single-node Service Fabric cluster (development system); I'm running ~ 60 .NET Core apps (~ 30 ASP.NET Core apps, 30 console apps); services are communicating via the Service Fabric HTTP reverse proxy. As it's a dev system, there's very little load on the system.

Here's some more details:

I see a similar pattern in overall CPU usage like the other reporters:
cpu-increase

I took a process dump of one ASP.NET core Web API project. The process has been running for "2 days 20:49:36". dotMemory gives me the following overview. As you can see, the string value "dc.services.visualstudio.com" exists 23218 times.
dotmemory-overview

Looking at the type list shows that "System.Net.Http"-related objects make up the top of the list.
types-list

I do not see a similar pattern in an app where I've disabled Application Insights.

@benaadams
Copy link

@cwe1ss what are the back traces/call tree like on some of these objects?

11,613 object count ones are interesting; especially since it looks like there are 11,613 sets of HttpConnectionPoolManagers and 11,613 x List<HttpConnectionPool+CachedConnection>. That's not the number of cached connections; but the number of caches for connections.

HttpClientHandler probably would have the most informative backtraces? (as I assume all the others are allocated in it)

@cwe1ss
Copy link

cwe1ss commented Jul 2, 2018

@benaadams can I get this from the process dump via dotMemory / WinDbg or do I need to trace this with PerfView / dotTrace ?

@cwe1ss
Copy link

cwe1ss commented Jul 2, 2018

oh... I've seen the tabs in dotMemory... but they are empty and just say "No allocation info was collected during profiling". So I guess I need PerfView / dotTrace, right?!

@cwe1ss
Copy link

cwe1ss commented Jul 2, 2018

I clicked around a bit more in dotMemory (I don't know much about it, sorry) and I've seen that you can get more details about the object instances. Here's some screenshots for a random instance of System.Collections.Concurrent.ConcurrentDictionary+Tables<System.Net.Http.HttpConnectionPoolManager+HttpConnectionKey, System.Net.Http.HttpConnectionPool>.

dotmemory-instance

dotmemory-instance-shortestpath

dotmemory-instance-path

Hope this helps?!

@benaadams
Copy link

benaadams commented Jul 2, 2018

I'm guessing the CPU increase is from the constantly increasing number of Timers; from scanning the list of active timers to work out which ones to trigger; which your memory dump seems to indicate.

Timers also act as GC roots, which is why everything is probably hanging around, if their callbacks are capturing state.

Not sure why the timers are constantly increasing though...

@cwe1ss
Copy link

cwe1ss commented Jul 2, 2018

Could this be an issue with how HeartbeatProvider uses Timer? See e.g. https://www.codeproject.com/Answers/185740/Threading-Timer-prevents-GC-collection#answer1 where the guy from "Solution 1" says:

The method must be static...
Conversely, if you use a non static method and you reference it using this.NameMethod, the System.Threading.TimerCallback will keep a reference to each MyTimerData object. So, despite your mtd=null instruction, it won't be free until the closing of the applicaton.

Your example is a circular reference:
mtd => m_timer => new System.Threading.TimerCallback(...) => mtd.TimerCall(o){}

@benaadams thank you very much for looking into this!!!

@cijothomas
Copy link
Contributor

@Svisstack
Copy link

I upgraded to the 2.4.1 and it still has the leak which increases over time.

First screenshot showing 2 threads working only and constantly on the CPU leak, I found that the amount of threads depend on uptime of the process, so leak increasing in time and was not fixed in 2.4.1

Second screenshot showing the increase ratio where 600% is a hardware limit, after some time AspNetCore app working only on the leak.

@cijothomas Please advise here.

1

2

@benaadams
Copy link

@Svisstack what version of .NET are you running on? High CPU usage in the HttpHeaders Dictionary'2.TryGetValue would suggest concurrent reads+writes to the header dictionary.

.NET Core 2.1 will throw an exception when an infinite loop would previously occur due to this https://github.com/dotnet/corefx/issues/28123

It could happen if FrameworkSqlEventListner (or its chain in your callstack) if fired after/concurrently with the request end (when the Dictionary is cleared)

@Svisstack
Copy link

Svisstack commented Sep 9, 2018

@benaadams I'm running this on the .NET Framework 4.7.2 and compiling with the 4.7.1

If this is true, it looks like a major bug in app insights; why someone would put a SpinLock-like reader around the dictionary item which could not exist.

I had an idea to create a workaround and still keep the app insights, with disabling dependency tracking to avoid this issue to happen as it's related to the database dependency, however it looks like there is no option to disable dependency tracking.

@benaadams
Copy link

benaadams commented Sep 9, 2018

why someone would put a SpinLock-like reader around the dictionary item which could not exist.

The dictionary for the header collection is reused for each request on the same connection; so at the end of the request it is cleared.

If that clearing happens while another thread is reading the header collection (e.g. it reads it outside of the normal request flow); then the read can go into an infinite loop as the arrays inside the dictionary are changed (Dictionary's are not threadsafe for a concurrent write with any other operation on it).

In .NET Core 2.1 it will detect the infinite loop being formed and throw an exception; though it won't fix the concurrent use.

@Svisstack
Copy link

Svisstack commented Sep 9, 2018

@benaadams Thanks for detailed explanation. I understand now what's happening, but didn't see a clear solution/fix for this as it looks like the bug is actually in the AspNetCore which allows concurrent access to the dictionary instance which is not thread-safe ofc.

@benaadams
Copy link

looks like the bug is actually in the AspNetCore which allows concurrent access to the dictionary instance

Well... even if concurrent access was prevented (at a performance cost); it still wouldn't help as the data it read would still be wrong.

If its conflicting with .Clear() this means there is a race condition and 3 outcome could happen (if concurrent access was prevented)

  1. It could read the right data (pre clear)
  2. It could read no data (post clear) wrong
  3. It could read headers from the next request very wrong

At a guess... Looking at the stack from dotTrace, it could be a Sql query that's started as part of the Request; but not awaited as part of the request, so the response is sent before the Sql is completed. It could also be a Timer that's started to periodically poll and lazily started as part of a Request.

When it completes it would then be trying to log the shallow copied (i.e. reference to) the captured Request data as part of the Application Insight's Telemetry event; using the mutable Request information that has now changed and is no longer valid for the event.

I'm not entirely sure how FrameworkSqlProcessing.OnEndExecuteCallback accesses the Headers collection, though it looks to be part of the DependencyTelemetry object.

public void OnEndExecuteCallback(long id, bool success, bool synchronous, int sqlExceptionNumber)
{
    DependencyCollectorEventSource.Log.EndCallbackCalled(id.ToString(CultureInfo.InvariantCulture));

    var telemetryTuple = this.TelemetryTable.Get(id);

    if (telemetryTuple == null)
    {
        DependencyCollectorEventSource.Log.EndCallbackWithNoBegin(id.ToString(CultureInfo.InvariantCulture));
        return;
    }

    if (!telemetryTuple.Item2)
    {
        this.TelemetryTable.Remove(id);
        var telemetry = telemetryTuple.Item1 as DependencyTelemetry;
        telemetry.Success = success;
        telemetry.ResultCode = sqlExceptionNumber != 0 ? sqlExceptionNumber.ToString(CultureInfo.InvariantCulture) : string.Empty;
        DependencyCollectorEventSource.Log.AutoTrackingDependencyItem(telemetry.Name);
        ClientServerDependencyTracker.EndTracking(this.telemetryClient, telemetry);
    }
}

@benaadams
Copy link

benaadams commented Sep 9, 2018

If its a Timer being kicked off that then runs Sql queries you will likely want to suspend ExecutionContext flow when creating that Timer so that the HttpContext doesn't get captured and become forever accessible via IHttpContextAccessor when it shouldn't be (e.g. its only valid during that request)

Using a pattern similar to dotnet/corefx#26075

@benaadams
Copy link

Or aspnet's NonCapturingTimer helper dotnet/extensions#395

@Svisstack
Copy link

Svisstack commented Sep 10, 2018

@benaadams Thanks, I actually found the Tasks which can and probably are ending after the request is finished and they contain the SQL code, I will wait on them and report about the state of the issue after deploying the new version today.

Based on your very useful input, I think it's still issue in the AspNetCore as it should not provide functions which delivering possible random data, if Headers dictionary is shared between requests then function for accessing that headers should have required some kind of request ID (static long incrementing every new request) based on which internal function could decide that these headers no longer exists and throw an exception instead of providing random data.

@benaadams
Copy link

benaadams commented Sep 10, 2018

I think it's still issue in the AspNetCore as it should not provide functions which delivering possible random data, if Headers dictionary is shared between requests

Its more the Headers are for the connection as there can only be one active request on a connection at any time. If they are accessed via IHttpContextAccessor there is a change in ASP.NET Core 2.2 to invalidate the HttpContext and return null in the manner you described aspnet/HttpAbstractions#1021

However, while that may help with the CPU issue; it would also mean the telemetry from OnStart would be disconnected from OnEnd; if its using anything in the headers or HttpContext to connect them (as OnEndXxx wouldn't have this information).

What should probably happen is Application Insights during OnStartXxx; which happens during the request, should deep copy the information it needs from the context/headers (e.g. key, values) as the strings will remain valid being immutable. Then OnEndXxx should use that same copied object to get the headers which were captured at the start of the event.

I'm not entirely sure how Application Insights is capturing/reading the headers so I can't provide code suggestions for the changes, alas...

@Svisstack
Copy link

@benaadams Removing the Tasks with SQL code that could end after the request solved the issue. Many thanks, I really appreciate your input here.

So it's no action needed here as there is a new interface which didn't have this issue? I want to make sure that we should fix that for the community, or at least in known Insights bugs should be information that no dependency should end after the request is already finished.

@benaadams
Copy link

Well I do think AI should collect the information it needs on event start (i.e. key,value for headers); then use that collected information for event end; rather than rereading it from the HttpHeader collection; which should solve this issue more holistically.

@Svisstack
Copy link

@benaadams Can you then reopen this issue?

@benaadams
Copy link

benaadams commented Sep 11, 2018

Can you then reopen this issue?

Alas, no, I don't work for MS or have rights to this repo. Sorry

@Svisstack
Copy link

Svisstack commented Sep 11, 2018

@cijothomas Can you reopen this? We still have bug in insights related to this problem.

@cijothomas
Copy link
Contributor

@Svisstack Can you open this issue in https://github.com/Microsoft/ApplicationInsights-dotnet-server/issues?
From an initial look, the issue you are facing is from DependencyCollection module, which is in not in this repo. (This repo only has asp.net core extension methods, and request collection)
https://github.com/Microsoft/ApplicationInsights-dotnet-server/issues

@Svisstack
Copy link

@cijothomas It's done.

@cijothomas
Copy link
Contributor

@Svisstack thanks. Will discuss on othed thread.

@turgayozgur
Copy link

@benaadams Thanks, I actually found the Tasks which can and probably are ending after the request is finished and they contain the SQL code, I will wait on them and report about the state of the issue after deploying the new version today.

Based on your very useful input, I think it's still issue in the AspNetCore as it should not provide functions which delivering possible random data, if Headers dictionary is shared between requests then function for accessing that headers should have required some kind of request ID (static long incrementing every new request) based on which internal function could decide that these headers no longer exists and throw an exception instead of providing random data.

Hi @Svisstack,

How did you solve that with this way? Could you share any code detail?

@Svisstack
Copy link

Svisstack commented Dec 14, 2018 via email

@turgay
Copy link

turgay commented Dec 14, 2018

@Svisstack I guess you meant @turgayozgur but not me :)

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests