Locking behavior too strict in OpenIdConnectCachingSecurityTokenProvider
, struggles with enough concurrent requests
#3078
Labels
OpenIdConnectCachingSecurityTokenProvider
, struggles with enough concurrent requests
#3078
Microsoft.Identity.Web Library
Microsoft.Identity.Web.OWIN
Microsoft.Identity.Web version
3.2.2
Web app
Not Applicable
Web API
Not Applicable
Token cache serialization
Not Applicable
Description
TL;DR: under enough concurrency, the locking behaviour of
OpenIdConnectCachingSecurityTokenProvider
creates a deadlock-like situation.Note
I've seen this class in enough Microsoft repositories to suspect that you don't really own it per-se, but this repo seems to be the go-to when it comes to OIDC auth across all stacks, and also the most recent commit for this file, so you're the "lucky" ones to get this report.
We've experienced cases where our app grinds to a halt when a certain concurrency threshold is reached.
Snapshot analysis pointed to
OpenIdConnectCachingSecurityTokenProvider
, more specifically theIssuer
andSecurityKeys
properties.Looking at the implementation, and reading the
ReaderWriterLockSlim
documentation, I think I can see how this happens.The a-ha moment for me was when reading the following section from the documentation linked above (I left the upgradeable mode away as it's not using it)
Looking at the implementation, we can see that accessing
Issuer
orSecurityKeys
will:RetrieveMetadata
which enters "write" mode to get the configuration from the underlyingConfigurationManager
instances, thenHere's a mental model of what happens for a single request:
RetrieveMetadata
, which enters "write" mode and fetches the configuration.SecurityKeys
properties is accessed:RetrieveMetadata
, which enters "write" mode and fetches the configuration.Extrapolating this to many concurrent requests, here's what I think happens:
This means that concurrent requests are dependent on all other concurrent requests to have entered then exited write mode to be able to enter read mode and continue the processing of the request.
The bigger the "write" mode queue gets, the higher the chance of new requests arriving and making the problem worse.
The only way out of this situation is for no new requests to arrive, so the existing concurrent requests can enter and exit "write" mode one after the other, after which they will all be able to enter "read" mode.
However, the fact that we call
RetrieveMetadata
twice in a single request (.Issuer
and.SecurityKeys
, both accessed byJwtFormat
) doesn't make it easy to get out of this situation.Even more so when
ConfigurationManager
caches the OIDC configuration for 12 hours by default, so the vast majority of these calls will be no-ops when it comes to the actual config used to validate tokens.Another aggravating factor is that we use
EnterReadMode
andEnterWriteMode
, which will wait for as long as needed until they can enter the relevant mode. Note thatTryEnterReadMode
andTryEnterWriteMode
exist, with timeout parameters, and their return value indicate whether entering the mode was successful.Here are assumptions about Entra ID / B2C:
Using these assumptions, I think we can loosen the locking behaviour:
Any thoughts?
Reproduction steps
Error message
No response
Id Web logs
No response
Relevant code snippets
N/A
Regression
No response
Expected behavior
No locking that brings the app to a halt.
The text was updated successfully, but these errors were encountered: