Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Plugin: "A task was cancelled" - problem with ADO authentication due to this. #8528

Closed
dtivel opened this issue Aug 29, 2019 · 12 comments · Fixed by NuGet/NuGet.Client#3141
Closed
Assignees
Labels
Area:Plugin V2 plugin w/ cross platform support Tenet:Reliability Crashes, hangs etc. Type:Bug
Milestone

Comments

@dtivel
Copy link
Contributor

dtivel commented Aug 29, 2019

There's a report of "A task was canceled." error on NuGet 5.2.x. The root cause is unknown.

(We are investigating potential fixes in NuGet client code and the interaction with https://github.com/Microsoft/artifacts-credprovider. This is impacting several customers badly.)

@dtivel dtivel added Type:Bug Area:Plugin V2 plugin w/ cross platform support Sprint 158 labels Aug 29, 2019
@dtivel dtivel added this to the 5.3 milestone Aug 29, 2019
@dtivel dtivel self-assigned this Aug 29, 2019
@zarenner
Copy link

Thanks for your work here - @jmyersmsft and I are happy to test any possible fixes on @AArnott's consistent repro of this issue whenever you think you have something.

@jmyersmsft
Copy link

I've got CancellationToken tracking code in a branch in my fork to help further investigation if necessary too

@dtivel dtivel assigned nkolev92 and unassigned dtivel Sep 4, 2019
@nkolev92 nkolev92 modified the milestones: 5.3, 5.4 Sep 18, 2019
@nkolev92
Copy link
Member

Moving to 5.4, once we have a fix, we can consider servicing if necessary.

@AArnott
Copy link
Contributor

AArnott commented Sep 18, 2019

Since raising the timeouts seems to have solved the problem for me, can we not simply deduce that the problem is that there are timeouts that cancel tokens and cause non-deterministic failures? Perhaps the default timeouts are far too aggressive.

@nkolev92
Copy link
Member

We believe there are multiple cancellation issues at hand.

  • Some are related to unwanted cancellations that are happening due to the way NuGet manages the lifetime of the plugins.

  • Thread starvation due to NuGet spamming the threadpool.

In your case logs and repros suggest the 2nd scenario.
Ideally we'd like to get to the bottom of why the threadpool becomes exhausted in some situations.
Increasing the timeout feels like a band aid that delays rather than fixes the root cause.
There's also been some reports that simply increasing the timeouts without clearing the http cache does not get rid of the problem for some customers.

@jmyersmsft
Copy link

Whether or not we fix the root causes, I think the default timeouts should be increased to at least 30s. There's almost no downside (they only matter if the plugin is badly misbehaving), and 5s isn't really a very long time to wait for a .net process to start up when the system is under heavy load (e.g. when restoring a bunch of packages from a bunch of feeds)

@nkolev92
Copy link
Member

Whether or not we fix the root causes, I think the default timeouts should be increased to at least 30s.

I agree with this. We have a crazy high timeout for http request, the plugins should have a similar timeout.

I just didn't want us to do a reactionary fix in 3.0.

@nkolev92
Copy link
Member

nkolev92 commented Nov 20, 2019

Updating the status here:

We have 5 fixes on the NuGet side so far, 5.3.x – 3.0.100, 5.4.x – 3.1.100, 5.5.x -3.1.200

We have 2 fixes on the plugin side in 0.19 and 0.20 versions

NuGet internally cancelling prematurely. (5.3.1) #8648
Logging failures triggering plugin shut down. (5.3.1) #8688
Propagated exception at dispose time fails the whole auth with a task cancellation error. (5.4) #8732
Plugin logging accuracy improvement (5.4) #8747
Threadpool starvation issues (5.5) #8528
Plugin side:

Unhandled exception crashes the restore operation (0.19.0) - microsoft/artifacts-credprovider#108
Change exit timeout to start ticking on a shutdown request, not at startup(0.20.0) microsoft/artifacts-credprovider#143

@nkolev92
Copy link
Member

nkolev92 commented Nov 28, 2019

Final fix in:
NuGet/NuGet.Client#3141

@AArnott
Copy link
Contributor

AArnott commented Nov 28, 2019

What version of .NET Core SDK or nuget.exe do we need to get this?

@rrelyea
Copy link
Contributor

rrelyea commented Nov 28, 2019

Checked into dev in plenty of time for 5.5. Which will ship with 16.5.
3.1.200 will be the matching sdk version, I think.

NuGet.exe might get patched in a 5.4.1 version if there is big need.
Dotnet.exe may have a servicing possibility too...but we don’t control timing.

Would love to understand where you (and others) most need fixes and how urgent a patch is.

5.4.0 already has 4 fixes helping this and related issues. Plus the cred provide added two fixes.
This fifth fix should polish off the problem.

Thanks @nkolev92 for continuing to attack until we killed this problem!

@AArnott
Copy link
Contributor

AArnott commented Nov 28, 2019

Thanks for the version numbers, @rrelyea.

For my own purposes there isn't an urgent need because I've already applied the workaround pretty much everywhere. But I'll be happy to remove it when I can.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area:Plugin V2 plugin w/ cross platform support Tenet:Reliability Crashes, hangs etc. Type:Bug
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants