Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix cancelability of DNS queries #104420

Merged
merged 5 commits into from
Jul 12, 2024

Conversation

rokonec
Copy link
Contributor

@rokonec rokonec commented Jul 4, 2024

Fixes: #92054

Context

ConnectAsync(EndPoint, CancellationToken) had not configured downstream code to propagate cancelation to DNS queries.

Changes made

Instruct related AwaitableSocketAsyncEventArgs to be cancelable - authored by @antonfirsov in #92862

  • ConnectAsync(this, userSocket: true, saeaCancelable: false) was called with saeaCancelable: false which makes it uncancelable. Configuring it by abilities of used CT maked it work as intended.
  • These changes are related to only ConnectAsync, and based on manual code analysis shall not have faulty side effects. However, it could break someone, if they relay of faulty behavior of ignoring passed CT. For example if they set previously timeout to unreasonablyt short time, f.e 100ms, if will now break them as DNS query will now timeout for them.

Propagate cancelation during connect by name (5e25367)

  • this is relatively safe change, this could, currently, only happen when connect by name (DNS) is canceled, and since it was not working previously, this code can only be hit with changes from this PR.

@antonfirsov
Copy link
Member

antonfirsov commented Jul 4, 2024

@rokonec just to make 100% sure this actually works, I wonder what is the output of this program (or equivalent test case) before/after the change on Windows?

using Socket socket = new(SocketType.Stream, ProtocolType.Tcp);
Stopwatch sw = Stopwatch.StartNew();
try
{
    await socket.ConnectAsync(new DnsEndPoint("does-not-exist", 80), new CancellationTokenSource(500).Token);
}
catch (OperationCanceledException)
{
    Console.WriteLine(sw.ElapsedMilliseconds);
}

I wish we could turn this into a test-case, but unfortunately it would flake in CI.

@MihaZupan MihaZupan added this to the 9.0.0 milestone Jul 4, 2024
@rokonec
Copy link
Contributor Author

rokonec commented Jul 9, 2024

just to make 100% sure this actually works

@antonfirsov Good hunch .. it was indeed NOT working properly. It was throwing wrong exception. After the fix in 5e25367 it all seems to be working fine. Please check those changes.

Before

Exception after 2816 ms ex: System.Net.Sockets.SocketException (11001): No such host is known.

After

Exception after 577 ms ex: System.OperationCanceledException: The operation was canceled.

@rokonec
Copy link
Contributor Author

rokonec commented Jul 12, 2024

/azp run runtime

@rokonec
Copy link
Contributor Author

rokonec commented Jul 12, 2024

/azp run runtime-libraries-coreclr outerloop

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

1 similar comment
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

Copy link
Member

@antonfirsov antonfirsov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM for 5e25367. Would be still good to have a third pair of eyes taking a look on the whole thing.

@antonfirsov
Copy link
Member

@rokonec since the change is not immediately trivial from the first look, can you update the PR description with an explanation (1) why do these two changes fix the thing (2) why don't they impact other code paths negatively?

Copy link
Member

@liveans liveans left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes LGTM, is it possible to add a test case for this?
Do we have any way to slow down the DNS resolving?

/cc @antonfirsov

@antonfirsov
Copy link
Member

Do we have any way to slow down the DNS resolving?

This but I doubt this approach is reliable on CI. To make things robust, we would need to dockerize a custom environment config.

@rokonec
Copy link
Contributor Author

rokonec commented Jul 12, 2024

/ba-g Known issues or unrelated tests

@rokonec rokonec merged commit 45f3250 into dotnet:main Jul 12, 2024
73 of 91 checks passed
@rokonec rokonec deleted the dev/rokonec/socket-ct-propagation branch July 12, 2024 18:53
@github-actions github-actions bot locked and limited conversation to collaborators Aug 12, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.