Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DCR]: NuGet should handle restore with many feeds differently #12173

Open
mmitche opened this issue Oct 19, 2022 · 8 comments
Open

[DCR]: NuGet should handle restore with many feeds differently #12173

mmitche opened this issue Oct 19, 2022 · 8 comments
Assignees
Labels

Comments

@mmitche
Copy link

mmitche commented Oct 19, 2022

NuGet Product(s) Affected

NuGet.exe, MSBuild.exe, dotnet.exe

Current Behavior

.NET uses NuGet heavily to construct the product. We're probably the biggest stress test of the NuGet client that there is. One way we stress the client is that our Nuget.config files tend to have quite a few feeds in them. This is because .NET cannot use public upstreams and unify the set of input feeds into one, and because generally there are quite a few cross-product references, especially in testing. So we can easily have nuget.config files, say for the SDK, that have more than a dozen feeds: https://github.com/dotnet/sdk/blob/2ab4f6bb8cc01610a97357e1ee36dc08f182f28c/NuGet.config#L19-L33

Whether or not this is "good behavior" on .NET's part, NuGet appears to handle such situations poorly. It generally appears to start spamming requests all the feed endpoints for every package required.

Desired Behavior

I think NuGet could handle these situations much better. It should be throttling and preferencing the feeds that are requested based on results of previous requests or other metadata, like package source mapping. Essentially "ranking" the input feeds based on the likelihood that the desired data will be present on that feed.. The idea is to reduce the total request count, which should reduce load on the server, and potentially decrease restore time in the event that 429's are returned.

Let's say that initially there is a pool of C simultaneous allowed connections, N feeds to check, and M packages that it wants information on from those N feeds. Initially, up to C connections are used to check the N feeds in parallel. Let's say a desired package is found on feed P. Instead of mass-requesting its dependencies (or essentially any related packages), the feed P that contains the parent package is checked first, before the pool C of connections is used on the full set of feeds. The theory here is that a package's dependencies are more likely to be on the same feed. Not guaranteed, of course but more likely.

What this generally is talking about is ranking feeds based on an algorithm. You could have NuGet essentially determine the parallelism of a request for a given package by ranking the input sources, which would also work very generally with package source mapping as well as way of ranking likelihood based on results returned from previous requests.

When the input sources all have the same rank, you choose to use max parallelism. Then, for dependencies, you'd rank the parent package's feed higher, and request it first using a smaller sized pool of parallel connections. The size of the pool is dependent on the various ranks in the list. For example, if you 100% knew that a package was on feed Foo (e.g. because of package source mapping), then you'd have a pool of one. If you said that it was 50% likely to be on a feed, and the other 5 feeds were 10%, then the 50% likely feed gets 50% of the pool, and the rest get 10% each. If your pool was, say...5 max connections, You'd use 2.5 of those on the first feed and 2.5 for the rest. Now, you can't use half a connection of course, and a request can't use more than 1 connection, so the first feed gets 1 connection, and the other feeds get 2 (2.5 truncated), for a total parallelism of 3 vs. 5.

My guess is that this is also the case for other types of relationships. Major version numbers, package prefixes, etc.

Additional Context

The current NuGet behavior tends to cause a ton of traffic on AzDO (since the feeds we use are mostly hosted on a single org) and has overloaded AzDO numerous times. It also causes a lot of trouble when testing the SDK (since 'real' user scenarios are run in parallel).

Note that .NET doesn't get much value out of package source mapping to reduce these issues. Most of our packages appear on most of the input feeds, so you always seem to end up with "Microsoft., System." on every input feeds. There might be a few exceptions, but generally the set of input feeds that are going to contain the specific "runtime.*" we want for dotnet/sdk is going to be about the same with package source mapping as without.

@mmitche mmitche added Type:DCR Design Change Request Triage:Untriaged labels Oct 19, 2022
@marcpopMSFT
Copy link

Without this, we end up getting throttled and hitting 429 which then ends up with test retries which just makes the problem worse. #10558

@nkolev92
Copy link
Member

nkolev92 commented Oct 20, 2022

I think NuGet could handle these situations much better. It should be throttling and preferencing the feeds that are requested based on results of previous requests or other metadata, like package source mapping. Essentially "ranking" the input feeds based on the likelihood that the desired data will be present on that feed.. The idea is to reduce the total request count, which should reduce load on the server, and potentially decrease restore time in the event that 429's are returned.

If you have source mapping configuration, NuGet will not make remote calls to sources that do not have the mapping configured. Something to consider :)

As far as the rest goes, I think the algorithm could help reducing the 429. Hard to tell if overall perf changes one way or the other assuming in the scenario where sources weren't responding with 429.

@mmitche
Copy link
Author

mmitche commented Oct 21, 2022

I think NuGet could handle these situations much better. It should be throttling and preferencing the feeds that are requested based on results of previous requests or other metadata, like package source mapping. Essentially "ranking" the input feeds based on the likelihood that the desired data will be present on that feed.. The idea is to reduce the total request count, which should reduce load on the server, and potentially decrease restore time in the event that 429's are returned.

If you have source mapping configuration, NuGet will not make remote calls to sources that do not have the mapping configured. Something to consider :)

As far as the rest goes, I think the algorithm could help reducing the 429. Hard to tell if overall perf changes one way or the other assuming in the scenario where sources weren't responding with 429.

@nkolev92 I noted above that we don't gain a lot of benefit from package source mapping with reducing the number of feeds. Aside from a couple feeds, basically all our packages appear on all the feeds, because they span all our releases. This includes dontet-public You just end up with Microsoft.* for every feed except dotnet-eng.

Note that .NET doesn't get much value out of package source mapping to reduce these issues. Most of our packages appear on most of the input feeds, so you always seem to end up with "Microsoft., System." on every input feeds. There might be a few exceptions, but generally the set of input feeds that are going to contain the specific "runtime.*" we want for dotnet/sdk is going to be about the same with package source mapping as without.

@mmitche
Copy link
Author

mmitche commented Oct 21, 2022

I think the algorithm would actually integrate well with the existing package source mapping algorithm. feeds with a likelihood score of 0 (because the package source mapping says they shouldn't be restored from there) would simply get "0" connections to work with. Essentially, you'd never try them.

@nkolev92 nkolev92 changed the title [DCR]: NuGet should handle config files with many feeds differently [DCR]: NuGet should handle restore with many feeds differently Oct 24, 2022
@ilyas1974
Copy link

@nkolev92 is there any update on this? This has been hitting us pretty hard the last few days (see (https://github.com/dotnet/arcade/issues/10885) for details on impact to our team in the 24 hours.

@jeffkl jeffkl added the Priority:2 Issues for the current backlog. label Oct 27, 2022
@nkolev92
Copy link
Member

nkolev92 commented Oct 27, 2022

@ilyas1974

This isn't a straightforward change at all. We haven't investigated a change at this point.

@aortiz-msft aortiz-msft assigned aortiz-msft and zivkan and unassigned aortiz-msft Oct 28, 2022
@aortiz-msft
Copy link
Contributor

@zivkan - Would you please help understand the customer scenario and explore options to mitigate the throttling issues?

@marcpopMSFT
Copy link

@zivkan can we sync up next week to discuss ideas on how to handle this better in the dotnet repos?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

7 participants