-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HttpHeaders thread safety and behavior of invalid values #61798
Comments
This comment has been minimized.
This comment has been minimized.
I have an issue when I called DynamoDB as below. They looks like the same issue about http headers. I'm using AWSSDK.DynamoDBv2 |
@amin-lu-xero the root cause is definitely the same, but looking at the AWS SDK code, it doesn't look like the problem is on their side. |
Yes, we are using NewRelic for APM |
They have fixed the issue in their instrumentation newrelic/newrelic-dotnet-agent#803 - please try updating their agent to the 9.2.0 version that was released a few hours ago. |
Thanks a lot ! |
I'm seeing this exact issue in a dotnet 6 console app running from I'm using Source code (slightly modified): (Inspired by this official example: https://docs.microsoft.com/en-us/azure/storage/blobs/storage-blob-scalable-app-upload-files?tabs=dotnet ) var blobContainerClient = new BlobContainerClient(new Uri(assetUploadToken.UploadDestination));
BlobUploadOptions uploadOptions = new BlobUploadOptions()
{
TransferOptions = new StorageTransferOptions()
{
MaximumConcurrency = 8
}
};
var tasks = new Queue<Task<Response<BlobContentInfo>>>();
foreach (string sourceFilePath in Directory.GetFiles(fileDirectoryPath, "*", searchOption)
.Where(x => regexFileExtensions.Match(x).Success))
{
var fileToUpload = new FileInfo(sourceFilePath);
var blobTargetPath = $"my-files";
var blobClient = blobContainerClient.GetBlobClient(blobTargetPath);
var uploadTask = blobClient.UploadAsync(fileToUpload.FullName, uploadOptions);
tasks.Enqueue(uploadTask);
}
// Run all the tasks asynchronously. The `uploadOptions` will handle concurrency.
await Task.WhenAll(tasks);
Console.WriteLine($"Uploaded {tasks.Count} files."); Stack Trace:
Full stack trace
|
@Strepto are you by chance using a NewRelic agent on the machine running this code? |
No, we do not use NewRelic on the machine(s). We do have TeamCitys Performance Monitor enabled though, if performance analysis might have an impact. https://www.jetbrains.com/help/teamcity/performance-monitor.html Edit: I have now disabled TeamCitys Performance Monitor, and will monitor the jobs. |
I cannot disable the proxy on the machines the error occurs. But I can see if I can try running the scenario on my dev-machine, which has no proxy. |
Triage: We will keep it open for few more months to catch other use cases like the one in #65379. Then close it. |
I believe I am hitting this issue using RestSharp 107.3.0 #restsharp/RestSharp#1772 |
I'm hitting this issues with concurrent requests on AWS SDK for SQS. Is there any workaround? We are using a proxy where this error is seen and not getting on dev machine where there is no proxy. |
@a-jackson are you saying you are hitting #65379 specifically? |
@MihaZupan I would guess that's the problem I'm having. I can't say say specifically that's it exactly. I only have error logs to go on. This is only reproducible on our servers where I'm unable to debug. I've seen all the exceptions listed in the top post here as well as one from AWS "The request signature we calculated does not match the signature you provided." Where it appears the headers from one request are with the body of another or something to that effect although I can't confirm that. |
Are you able to share stack traces when this happens? |
This is the most recent, it's not exactly one of the ones above but I think it's the same thing. I have seen the other errors as well but we don't keep logs that long on our test servers. Stack trace
|
There are two main known issues users are hitting here:
In your case, it looks like you are hitting the latter - #65379. There are possible mitigations here if you have access to how the |
I agree this looks like the latter case. |
Hey @a-jackson, sorry I didn't see your reply sooner. var socketsHttpHandler = new SocketsHttpHandler();
var customHandler = new CustomHandler(socketsHttpHandler);
var httpClient = new HttpClient(customHandler);
public sealed class CustomHandler : DelegatingHandler
{
public CustomHandler(HttpMessageHandler innerHandler) : base(innerHandler) { }
protected override HttpResponseMessage Send(HttpRequestMessage request, CancellationToken cancellationToken)
{
_ = request.Headers.TryGetValues("User-Agent", out _); // Force parsing
return base.Send(request, cancellationToken);
}
protected override Task<HttpResponseMessage> SendAsync(HttpRequestMessage request, CancellationToken cancellationToken)
{
_ = request.Headers.TryGetValues("User-Agent", out _); // Force parsing
return base.SendAsync(request, cancellationToken);
}
} |
@MihaZupan Thanks. I've just done some testing and that seems to work. I'm going to start rolling it out to our projects where needed. Is there any idea when this will be fixed? I see the milestone of this issue is set for 7.0.0 but we were hoping we could stick to LTS releases. Is there any chance this will be fixed in 6.x? We have several projects still working on upgrades to 6 which we've put on hold due to this. |
@a-jackson Is the above fix sufficient to unblock your upgrades to 7.0? The issue will have to be addressed in Given we've only heard of two users hitting this, and that a workaround exists, I think it's unlikely we would service an eventual fix for this at this time. cc: @karelz regarding any potential servicing changes here (fix for #65379). |
I can understand that if it's only two users, although we've had issues on every system running .Net 6. I don't know yet if we'll continue with upgrades. We are in the process of phasing out using a proxy at all so we'll probably hold further upgrades until then. I have another exception though that may or may not be related, just to give you some more info. We were getting TimeoutException on requests to Dynamo DB around 5% of requests when using the proxy. The exception was 100s after the request. We tested without the proxy and didn't see a single error. These requests were not logged at the proxy at all and there's nothing else we could explain it with. Other requests, even from the same process, were being logged through the proxy at the same time which is why I thought it is probably a different manifestation of the same issue. Stack trace
|
This stack indicates that the request was indeed never sent -- it timed out before it could obtain a working connection. The presence of this stack trace indicates that either:
|
a) Dynamo DB's client code appears to set a Connect Timeout of 3100ms. If it were this the exception would have been thrown after that long wouldn't it? Not 100s as we saw. Also the proxy logs indicated continuous activity throughout this period and we upgraded the instances it runs on to test that - it made no difference to the frequency of errors. We have just moved to using VPC Endpoints in AWS so Dynamo DB traffic is no longer using the proxy and we are not seeing this error now so it's not a problem for us any more. |
If It's hard to say what exactly is happening without detailed info about what the connection pool was doing. If you were still interested in investigating this further, would you be able to provide traces from an app hitting such failures (would potentially include PII)? |
The
HttpHeaders
collections were never thread-safe. Accessing a header may force lazy parsing of its value, resulting in modifications to the underlying data structures.Before .NET 6.0, reading from the collection concurrently happened to be thread-safe in most cases.
If the collection contained an invalid header value, that value would be removed during enumeration. In rare cases, this could cause problems from modifying the underlying Dictionary concurrently.
Starting with 6.0, less locking is performed around header parsing as it was no longer needed internally (#54130).
Due to this change, multiple examples of users accessing the headers concurrently surfaced. Currently known ones to the team are gRPC's .NET client (#55898, #55896) and NewRelic's HttpClient instrumentation (newrelic/newrelic-dotnet-agent#803).
Violating thread safety in 6.0 may result in the header values being duplicated/malformed or various exceptions being thrown during enumeration/header accesses.
I am posting a few known call stacks below to help other users hitting this issue discover the root cause:
IndexOutOfRangeException
NullReferenceException
InvalidCastException
There is precedent in concurrent reads being thread-safe on some collections (namely
Dictionary
) and users may assumeHttpHeaders
behaves the same.A related issue here is that forcing the validation of an invalid value will result in its removal from the collection. These Schrödinger's headers are difficult to reason about as their existence depends on whether/how/when they are observed.
We can look to make the behavior more intuitive. We should investigate whether we can (efficiently) implement the collection in a way that satisfies the following:
NonValidated
view of the collectionShould enumerating the collection (with or without validation) while a concurrent write operation is in progress have defined behavior?
Dictionary
's thread-safety modelThe text was updated successfully, but these errors were encountered: