Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement MsQuicConfiguration cache #99371

Merged
merged 12 commits into from
Mar 21, 2024

Conversation

rzikm
Copy link
Member

@rzikm rzikm commented Mar 6, 2024

Closes #98354.

This PR introduces a cache of MsQuicConfiguration objects which configure few aspect of MsQuic connections:

  • ALPN protocols
  • Certificate to be used
  • Client vs Server connection
  • Requiring client auth, configuring cipher suite, ...

The most critical is the certificate. On Linux, MsQuic uses statically linked OpenSSL fork, so the certificate is specified by serializing it to a buffer in PKCS12 format and deserializing it on the MsQuic side. Caching the configuration objects allows us to do this serialization once per app lifetime instead of doing it again for each connection and it will gives us a basis for features like TLS Resumption.

@ghost
Copy link

ghost commented Mar 6, 2024

Tagging subscribers to this area: @dotnet/ncl
See info in area-owners.md if you want to be subscribed.

Issue Details

Closes #98354.

Author: rzikm
Assignees: -
Labels:

area-System.Net.Quic

Milestone: -

@rzikm rzikm added the NO-REVIEW Experimental/testing PR, do NOT review it label Mar 6, 2024
@rzikm rzikm marked this pull request as ready for review March 6, 2024 18:11
@rzikm rzikm changed the title Implement MsQuicConfiguration cache [WIP] Implement MsQuicConfiguration cache Mar 6, 2024
@rzikm rzikm removed the NO-REVIEW Experimental/testing PR, do NOT review it label Mar 7, 2024
@rzikm rzikm changed the title [WIP] Implement MsQuicConfiguration cache Implement MsQuicConfiguration cache Mar 7, 2024
@rzikm rzikm requested a review from a team March 7, 2024 14:51
Copy link
Member

@ManickaP ManickaP left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for going into such details. I got too deep when trying to make this nicer, started coding the stuff and had to stop myself 😄

Copy link
Member

@ManickaP ManickaP left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM modulo comments (which are mostly non-functional).

One thing I'll point out for the last here as it's super nitpicky, but still bothers me a little. The comments are always inconsistent mess between full sentences without capitalization, non full sentences etc. Also this format is unique to your PRs:
c#

//
// some paragraph
//

Do whatever you want with this information. I can live with it.

}

// we added a new handle, check if we need to cleanup
if (s_configurationCache.Count % CheckExpiredModulo == 0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may lead to frequently throwing out and recreating entries once you have > 32 certs in use.
E.g. you're talking to 40 different hosts, and every time you reach 32 most of these objects will be idle so they're all thrown out, and you start from the beginning.

It may be worth adding slightly more logic here (e.g. skip throwing away entries that were used in the last second / use a timer instead of checking Count / allow only one CleanupCache call per X amount of time ...)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The strategy has been copied from the one we do in SslStream for Windows, I am not aware of any reported issues of too frequent cleanup (but maybe it just was not used for this sort of thing, cc @wfurt) and the code over there is ancient. How common is such a scenario?

My thoughts on this is to keep it simple, we are not trying to prevent all unnecessary allocations, only the frequent ones, if, say, app rotates 40 different configurations but does outbound connection once every second (i.e. 40s to rotate through all of them), then I don't think creating a fresh configuration makes a measurable dent in CPU usage.

If the app makes very frequent connections to multiple hosts and we should care about not doing extra work, then most of the configurations would still be in use and once there is a burst of connections and we stay above 32 connections, then there is no attempt for cleanup until the cache goes to 64 items, and so on.

I am not opposed to adding additional conditions for the cleanup, but adding additional conditions like "only 1 cleanup per X seconds" feels arbitrary without a more concrete evidence that it will help.

Another option would be making the cache cleanup size configurable via envvar so that we have a way out if some customer hits the issue.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMHO, this seems sufficient atm and if this proves problematic, we can always revisit the logic here.

Copy link
Member

@ManickaP ManickaP left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thank you!

private const string DisableCacheEnvironmentVariable = "DOTNET_SYSTEM_NET_QUIC_DISABLE_CONFIGURATION_CACHE";
private const string DisableCacheCtxSwitch = "System.Net.Quic.DisableConfigurationCache";

internal static bool ConfigurationCacheEnabled { get; } = GetConfigurationCacheEnabled();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious, why do we need an escape hatch for this one?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although I have tried to stress the cache to hammer out the possible weird concurrency bugs (and did not find any in the last version), I prefer to have an option to disable it in case a customer hits an issue.

@rzikm
Copy link
Member Author

rzikm commented Mar 21, 2024

CI failure is unrelated, all related pipelines have finished green.

@rzikm rzikm merged commit e12e2fa into dotnet:main Mar 21, 2024
85 of 87 checks passed
@github-actions github-actions bot locked and limited conversation to collaborators Apr 21, 2024
@karelz karelz added this to the 9.0.0 milestone May 14, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[QUIC] Implement MsQuicConfiguration cache
4 participants