-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[linux-arm64] System.ArgumentOutOfRangeException in AwaitableSocketAsyncEventArgs.InvokeContinuation #84407
Comments
This is a wild guess as I'm unfamiliar with this code, but I feel like there is a race condition in AwaitableSocketAsyncEventArgs. In
The comment implies that we can get there while a continuation is already enqueued. If we fail to swap the previous continuation and find the sentinel instead, we assign null to However, in Action<object?>? c = _continuation;
if (c != null || (c = Interlocked.CompareExchange(ref _continuation, s_completedSentinel, null)) != null)
{
Debug.Assert(c != s_completedSentinel, "The delegate should not have been the completed sentinel.");
object? continuationState = UserToken;
UserToken = null;
_continuation = s_completedSentinel; // in case someone's polling IsCompleted
ExecutionContext? ec = _executionContext;
if (ec == null)
{
InvokeContinuation(c, continuationState, forceAsync: false, requiresExecutionContextFlow: false);
} First In the memory dump, |
Thanks! How frequently do you see this in CI? This whole implementation was replaced in .NET 8 to instead use ManualResetValueTaskSourceCore. I'm wondering if you might be able to try with a recent preview or daily build and see if it still happens. As for the theory, I don't think that's what's happening. Those two blocks are guarded on different conditions. The block in OnCompleted(SocketAsyncEventArgs) only runs if a continuation was already hooked up and the other one runs only if the operation completed before hooking up the continuation; they should never both execute for the same operation. As this is only happening on arm, it's more likely we're missing a necessary volatile somewhere... I exipect the issue is _continuation should be volatile or that fast path read on it should be a Volatile.Read, as otherwise the read of UserToken could actually be reordered to before we read the continuation. |
I checked for the past month and found only two other occurrences of the problem (out of ~34400 runs in total). Unfortunately I don't think we can deploy .NET 8 just yet (we first need to update our instrumentation code to support it). |
Tagging subscribers to this area: @dotnet/area-system-threading-tasks Issue DetailsDescriptionWe've seen a crash in our CI caused by an ArgumentOutOfRangeException thrown by The full callstack:
I believe this is the same issue as #70486 and #72365. It's entirely possible that the crash is caused by our instrumentation code (we use the profiler API to rewrite IL at runtime, though we don't rewrite any of the methods in this callstack). However, since the exact same issue has been reported by somebody else, and because in both cases it happened only on ARM64, it makes me think there is actually an issue in the runtime or the BCL. We managed to capture a memory dump. I can share it if you have a place to upload it (47MB zipped). Reproduction StepsThe issue occurs very rarely in our CI so I don't have a repro. This is a test application that sends HTTP requests to itself (HttpClient + HttpListener). Expected behaviorNo crashes. Actual behaviorAn ArgumentOutOfRangeException is thrown and crashes the process. Regression?We observed the issue on .NET 6. I don't know if previous versions are impacted. Known WorkaroundsNo response Configuration.NET 6.0.15 Other informationNo response
|
I have seen something very similar in a raspberry pi. It is a hard to reproduce failure, in 10 pis running for 6 months we have seen it happen 4 times in 2 of the systems (2 each). All run the same code and are sending/receiving data on the network many times x second. That said, our stack no longer matches the one in this issue after the 5th level. In all 4 cases it looks like this for us:
However, since 4 levels are the same and the first is a method without arguments (OnCompletedInternal), it is reasonable to think these are related. |
Hello. I've also got this exception but in the System.Threading.Channels AsyncOperation class when trying to write a item to a channel. It's rare, but, when it occurs, it's stalls the consumer side of the channel since the Reader.WaitToReadAsync() method never returns. Could the issue found on AwaitableSocketAsyncEventArgs also be present here? Unfortunately, I can only send the part of the stack trace that originated the exception.
My app also runs on linux-arm64 |
On what .NET version? This was recently fixed (we think).
On what .NET version? |
@stephentoub 6. Do you mean the fix is in a patch for 6 or are you referring to fixed in 7? |
6.0.which? |
It should be fixed in a patch for 6 (and 7). |
7.0.5 |
And you're getting an ArgumentOutOfRangeException from the call to SignalCompletion in TryWrite, with no further frames on the stack in SignalCompletion? |
The app was built with the 6.0.406 SDK via this runner of github actions https://github.com/actions/runner-images/releases/tag/ubuntu22%2F20230219.1. It is a single file - self contained app, so I understand the runtime should be then 6.0.14 https://dotnet.microsoft.com/en-us/download/dotnet/6.0 |
Hi @stephentoub which version of 6 has this patch? thanks |
hey @stephentoub, I was reading PR #84432, and the fix was to use the volatile keyword. I'm a dotnet/C# newbie person, and I was curious to understand the keyword impact and I found some a lot people discussing about it. Is this fix a solution for multi-processor use cases or just when both threads run on the same processor? |
Compilers and hardware are free to reorder reads/writes in ways that are unobservable for the current thread. But such reorderings can be visible to other threads running concurrently. Marking a variable as volatile tells the system to use read-acquire / store-release semantics, preventing those reorderings. The .NET memory model is described at
The volatile keyword is described at https://learn.microsoft.com/en-us/dotnet/csharp/language-reference/keywords/volatile. |
Description
We've seen a crash in our CI caused by an ArgumentOutOfRangeException thrown by
System.Threading.ThreadPool+<>c.<.cctor>b__86_0(System.Object)
, itself invoked bySystem.Net.Sockets.Socket+AwaitableSocketAsyncEventArgs.InvokeContinuation
.The full callstack:
I believe this is the same issue as #70486 and #72365.
It's entirely possible that the crash is caused by our instrumentation code (we use the profiler API to rewrite IL at runtime, though we don't rewrite any of the methods in this callstack). However, since the exact same issue has been reported by somebody else, and because in both cases it happened only on ARM64, it makes me think there is actually an issue in the runtime or the BCL.
We managed to capture a memory dump. I can share it if you have a place to upload it (47MB zipped).
Reproduction Steps
The issue occurs very rarely in our CI so I don't have a repro. This is a test application that sends HTTP requests to itself (HttpClient + HttpListener).
Expected behavior
No crashes.
Actual behavior
An ArgumentOutOfRangeException is thrown and crashes the process.
Regression?
We observed the issue on .NET 6. I don't know if previous versions are impacted.
Known Workarounds
No response
Configuration
.NET 6.0.15
Linux Debian/ARM64. The crash was never observed on x86/x64.
Other information
No response
The text was updated successfully, but these errors were encountered: