Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Triggering a cancellation token may not cancel a running query based on timing #1065

Closed
roji opened this issue May 10, 2021 · 19 comments · Fixed by #1781
Closed

Triggering a cancellation token may not cancel a running query based on timing #1065

roji opened this issue May 10, 2021 · 19 comments · Fixed by #1781
Assignees

Comments

@roji
Copy link
Member

roji commented May 10, 2021

When an async cancellation token is triggered, a running query is expected to be cancelled at the server; this apparently works fine when the token is triggered during ExecuteReaderAsync, or during ReadAsync.

However, if the token happens to be triggered while no SqlClient code is in progress (e.g. ReadAsync), the next time ReadAsync is called, it would simply return a cancelled Task immediately, without cancelling the query at the server. This means that there's a race condition - depending on when the token triggering happens, the query may be cancelled (if ReadAsync happens to be in progress) or it may not.

In Npgsql, we handled this race conditions by cancelling the (running) server query from ReadAsync, even when it's called with an already-cancelled token. This is being discussed in DapperLib/Dapper#1590 (comment), where Dapper may be working around this issue by registering a callback to call SqlCommand.Cancel when the token is triggered - this should IMHO not be necessary.

An alternative strategy would be to continue monitoring the cancellation token originally passed to ExecuteReaderAsync - even after ExecuteReaderAsync itself has already completed. This would have the advantage of allowing immediate cancellation at the server at any point, and not only when ReadAsync is called.

For a bit more context, I wrote a small blog post on ADO.NET and cancellation.

@cheenamalhotra
Copy link
Member

cheenamalhotra commented May 10, 2021

One note I'd like to make is that in SqlClient, ReadAsync() can be extremely slow, not due to server side processing delay, but due to client side delay. And it's not just a few seconds but it can go to as much as feeling like a hung-up application (#1058), we've reproduced up to 90 seconds (case taken internally from #44 (comment)).

i.e. when a normal Read() takes 2-3 seconds, ReadAsync() takes 90+ seconds.

Now at that point, a CancellationToken is "expected" to cancel the processing, because we accept it. But we do not cancel reading the packets already received from the server, since TDS is a Stream based protocol, and not reading stream packets is just not an option without breaking the reader/results. Only when DataReader is closed using CloseAsync(), Cancellation Token is accepted to cancel reading further, but it's too late if someone already triggered a ReadAsync() call and is awaiting result.

The only time cancellation occurs in ReadAsync() is in the beginning when the token is already expired. After driver processing begins, token is not registered for any further use.

@roji
Copy link
Member Author

roji commented May 10, 2021

@cheenamalhotra thanks for your comments...

But we do not cancel reading the packets already received from the server, since TDS is a Stream based protocol, and not reading stream packets is just not an option without breaking the reader/results.

Agreed; neither cancellation nor closing of the reader should ever cause the connection to break, lose protocol sync or similar. However, the user may still wish to cancel a query once it has starting running, and it's not currently clear to me how that can be done reliably.

Only when DataReader is closed using CloseAsync(), Cancellation Token is accepted to cancel reading further, but it's too late if someone already triggered a ReadAsync() call and is awaiting result.

Can you please provide more details on this? DbDataReader.CloseAsync does not accept a cancellation token (Close/Dispose generally do not in .NET).

The only time cancellation occurs in ReadAsync() is in the beginning when the token is already expired. After driver processing begins, token is not registered for any further use.

If I'm reading the code right, if a cancelled token is passed to ReadAsync, it just returns a cancelled Task but does not do any actual cancellation of the query at the server. Further down in ReadAsync, a cancellation callback does seem to be registered.

@cheenamalhotra
Copy link
Member

cheenamalhotra commented May 10, 2021

If I'm reading the code right, if a cancelled token is passed to ReadAsync, it just returns a cancelled Task but does not do any actual cancellation of the query at the server. Further down in ReadAsync, a cancellation callback does seem to be registered.

Yes that's right. Query processing completes when "ExecuteReaderAsync" completes, when we "Read" or "ReadAsync" there's no server-side processing of query. The results are read 1 row at a time, and if we run out of packets more packets continue to arrive till socket is open. We can only Cancel query that causes Attention signal to be passed to the server which discards the entire query results. So unless and until query needs to be discarded/canceled, Cancellation Token on "ReadAsync" will not perform server-side cancellation.

Can you please provide more details on this? DbDataReader.CloseAsync does not accept a cancellation token (Close/Dispose generally do not in .NET).

Sorry, my bad. It's the "Close()" method, not "CloseAsync()" that cancels everything. But you can find _cancelAsyncOnCloseToken in the code, which is the only "cancellationToken" that can cancel async executions: ExecuteAsyncCall. The async design is very heavy on OS and leads to numerous back-to-back replays. The only way to cancel is to "Close" the reader.

@Wraith2
Copy link
Contributor

Wraith2 commented May 10, 2021

Is this affected by #956 ?

@cheenamalhotra
Copy link
Member

No they're different issues.
I've uploaded Repro here: ReadAsyncTest.cs

@roji
Copy link
Member Author

roji commented May 10, 2021

So unless and until query needs to be discarded/canceled, Cancellation Token on "ReadAsync" will not perform server-side cancellation.

So to be sure I understand :) It does seem to be possible to cancel a running query by triggering the cancellation token passed to ReadAsync, right? At least, this is what I think I'm seeing in this code in ReadAsync, which shows a registration to call `SqlCommand.s_cancelIgnoreFailure?

If so, then again, the reason I opened this issue is the inherent race condition that the user faces: the code triggering the cancellation token has no way of knowing whether the code calling ReadAsync is right before the IsCancellationRequested check - in which case no server cancellation occurs - or after - in which case it does seem to occur. I hope that's clear.

Stepping back... Let's say I execute a heavy query returning lots of data. ExecuteReader (or ExecuteReaderAsync) has already completed, and I'm now at the point where I'm consuming rows by calling Read (or ReadAsync). What are my options at this point for cancelling the query - stopping its processing at the server and stopping more rows from being generated? I'm assuming SqlCommand.Cancel should work reliably, but should the async cancellation token passed to SqlDataReader.ReadAsync also work?

@roji
Copy link
Member Author

roji commented May 10, 2021

And one additional question. You wrote above:

It's the "Close()" method, not "CloseAsync()" that cancels everything.

Does this mean that calling Close on SqlDataReader performs server cancellation of the running query (once again, cutting it short and stopping more rows from being sent)? If that's so, then there would be no reason to manually call SqlCommand.Cancel (or trigger a cancellation token) before calling SqlDataReader.Close, since the latter already performs cancellation, right? If so, does CloseAsync not perform cancellation in a similar way as Close?

@Wraith2
Copy link
Contributor

Wraith2 commented May 10, 2021

As I understand it, you can request cancellation but the protocol requires that results be processed and discarded until a header is encountered with a done bit set, as @cheenamalhotra said the only other way to approach it would be to instantly kill the connection.

Since when you're using a SqlDataReader it assumes control of the parsing responsibility this means that some thread somewhere needs to be responsible for keeping parsing and you're going to need to wait for at least a single new packet. You can't have the connection that the parse loop corresponds to back until that's done. So if you want to cancel things go ahead but if you want to quickly do something else then you'll need to discard the connection and get another one from the pool to avoid a wait.

@roji
Copy link
Member Author

roji commented May 10, 2021

As I understand it, you can request cancellation but the protocol requires that results be processed and discarded until a header is encountered with a done bit set

Sure, but it cancellation is still a way of stopping a long query earlier, and making sure it generates less results, right? In other words, I'm not talking about any sort of instantaneous cancellation, and definitely not about anything that breaks the connection; just a means of cutting a query short, so that it doesn't get processed until the very end (and generates the full resultset, which would have to be sent over the network).

The problem I'm trying to outline in this issue, is that the async mechanism for doing the above (triggering the async cancellation token) seems to be unreliable because of a race condition.

@cheenamalhotra
Copy link
Member

cheenamalhotra commented May 11, 2021

Does this mean that calling Close on SqlDataReader performs server cancellation of the running query (once again, cutting it short and stopping more rows from being sent)?

No, it's only SqlCommand execute reader APIs that when completed mark completion of query execution on server and we have a reader instantiated that parses network packets to rows of data. *No public API in reader communicates to server, not even "Close()" as closing a reader is not considered as a hard stop. The driver safely closes all ongoing operations and prepares command for next execution.

"SqlCommand.Cancel()" is the only API that will send attention signal for ongoing reader query.

*Edit: Token passed to ReadAsync() does trigger call to SqlCommand.Cancel() internally if it's found expired before row processing begins.

just a means of cutting a query short, so that it doesn't get processed until the very end (and generates the full resultset, which would have to be sent over the network).

Driver does not read to the end when a SqlDataReader instance is created from ExecuteReader APIs. It only requests packets and prepares buffer for parsing. Row pointer is at -1 till here but packets could have been all received by now. SqlDataReader starts generating rows of data by parsing packets one by one when Read()/ReadAsync() or NextResult() are called as that moves index pointer to next row/resultset. You can still close reader if you did not move pointer to the row with large data, but if you did move your pointer (by calling Read/ReadAsync) driver will need to parse entire row then, and that cannot be cancelled.

@roji
Copy link
Member Author

roji commented May 11, 2021

No public API in reader communicates to server, not even "Close()" as closing a reader is not considered as a hard stop. [...]
"SqlCommand.Cancel()" is the only API that will send attention signal for ongoing reader query.

@cheenamalhotra I'm sorry if I'm being dense - help me understand... SqlDataReader.ReadAsync registers SqlCommand.s_cancelIgnoreFailure on the user-provided cancellation token (see code). SqlCommand.s_cancelIgnoreFailure points to SqlCommand.CancelIgnoreFailureCallback, which calls SqlCommand.CancelIgnoreFailure, which itself calls SqlCommand.Cancel.

So unless I'm misreading the code, triggering the cancellation token passed to SqlDataReader.ReadAsync internally calls SqlCommand.Cancel, no?

You can still close reader if you did not move pointer to the row with large data, but if you did move your pointer (by calling Read/ReadAsync) driver will need to parse entire row then, and that cannot be cancelled.

The question is what happens if the reader is closed when the server is still processing the query and generating rows. If I'm reading you correctly, SqlClient will wait for the query to complete at the server (without attempting to cancel it), and silently consume any additional rows that get returned from the server. For example, if I execute a very heavy query that generates a million rows, and then immediately dispose the reader, I still have to wait for the entire query to complete and for all million rows to be sent back the the client over the network, right?

Unless I'm mistaken, an alternative would be to internally call SqlCommand.Cancel in this case, making the query complete much earlier and preventing so much data from uselessly being sent over the network.

@cheenamalhotra
Copy link
Member

cheenamalhotra commented May 11, 2021

@cheenamalhotra I'm sorry if I'm being dense - help me understand... SqlDataReader.ReadAsync registers SqlCommand.s_cancelIgnoreFailure on the user-provided cancellation token (see code). SqlCommand.s_cancelIgnoreFailure points to SqlCommand.CancelIgnoreFailureCallback, which calls SqlCommand.CancelIgnoreFailure, which itself calls SqlCommand.Cancel.

Yes it does (sorry I missed that - corrected my last statement). I guess I've been emphasizing mostly on client side delay, as there's no promise that when cancellation occurs on this token, further processing will stop. You could try to run my above repro, where data is already retrieved so sending attention to server doesn't help.

The question is what happens if the reader is closed when the server is still processing the query and generating rows.

If server is still processing query, there's no reader. Until processing completes, ExecuteReader and ExecuteReaderAsync APIs will continue to wait for response. They will be cancelled (with server side termination) on CommandTimeout or CancellationToken or when SqlCommand.Cancel() is called.

@roji
Copy link
Member Author

roji commented May 11, 2021

Yes it does (sorry I missed that - corrected my last statement). I guess I've been emphasizing mostly on client side delay, as there's no promise that when cancellation occurs on this token, further processing will stop.

I agree that there shouldn't be any promise or expectation of immediate cancellation or similar; since cancellation communicates with the server, there will of course be a delay before it takes effect.

However, to go back to the original reason for this issue; depending on the precise timing when ReadAsync's cancellation token is triggered, cancellation may not happen at all, since ReadAsync returns immediately if passed an already-cancelled cancellation token. In other words, I'm saying it may make sense to modify that check, and to immediately call SqlCommand.s_cancelIgnoreFailure if ReadAsync is invoked with a cancelled token.

@cheenamalhotra
Copy link
Member

cheenamalhotra commented May 12, 2021

In other words, I'm saying it may make sense to modify that check, and to immediately call SqlCommand.s_cancelIgnoreFailure if ReadAsync is invoked with a cancelled token.

I will look into that.

I agree that there shouldn't be any promise or expectation of immediate cancellation or similar; since cancellation communicates with the server, there will of course be a delay before it takes effect.

Regarding this, what happens in driver is that even though cancellation occurs > attention is sent > response is received > but driver continues to process row with large data till the end of row for 90+ seconds. There's definitely no server side delay, our test proves that. That's why there's no guarantee in the driver, that cancellation token will terminate "ReadAsync" from processing and return back control.

It may make sense to generalize statement for all ADO.NET drivers, but for SqlClient this is an issue. The only solution we know of is to fix this huge performance lag.

@roji
Copy link
Member Author

roji commented May 12, 2021

That's why there's no guarantee in the driver, that cancellation token will terminate "ReadAsync" from processing and return back control.

@cheenamalhotra sure. Once again, it makes perfect sense for cancellation not to be immediate. But what should be guaranteed, is that the attention (cancellation?) is actually sent by SqlClient to SQL Server - and that's not currently the case (because of the race condition described above). What happens after the attention is sent - i.e. how long before the query actually is cancelled at the server, and how many additional rows are delivered to SqlClient before the query terminates - those are all unrelated questions where there are no expected guarantees.

To summarize, this issue is about making sure that SqlClient always sends a cancellation to SQL Server when a cancellation token is triggered.

@binki
Copy link

binki commented Sep 1, 2022

I was hoping I could rely on SqlDataReader.ReadAsync() sending the attention to the server, but it looks like I have to manually keep track of SqlCommand and call SqlCommand.Cancel() prior to disposing any reader until this bug is fixed. Thanks for the guidance!

@jnm2
Copy link

jnm2 commented Sep 29, 2022

I'm seeing this cause a wait for the entire result set inside SqlDataReader.Dispose. Instead of cancelling after the first row, this code waits for the full query (a billion rows in this case):

using Microsoft.Data.SqlClient; // 5.0.0. Same with System.Data.SqlClient on .NET Framework 4.8 too.
using System.Threading;
using System.Threading.Tasks;

public static class Program
{
    public static async Task Main()
    {
        const string longRunningQuery = @"
with TenRows as (select Value from (values (1), (2), (3), (4), (5), (6), (7), (8), (9), (10)) as TenRows (Value)),
    ThousandRows as (select A.Value as A, B.Value as B, C.Value as C from TenRows as A, TenRows as B, TenRows as C)
select *
from ThousandRows as A, ThousandRows as B, ThousandRows as C;";

        var connectionString = new SqlConnectionStringBuilder
        {
            DataSource = "(local)",
            IntegratedSecurity = true,
            TrustServerCertificate = true,
        }.ToString();

        using (var source = new CancellationTokenSource())
        using (var connection = new SqlConnection(connectionString))
        {
            await connection.OpenAsync(source.Token);

            using (var command = new SqlCommand(longRunningQuery, connection))
            using (var reader = await command.ExecuteReaderAsync(source.Token))
            {
                while (await reader.ReadAsync(source.Token))
                {
                    source.Cancel();
                }
            } // If you pause the debugger, it highlights this line
        }
    }
}

image

In practice, I find real apps blocking on SNIReadSyncOverAsync:

image

@jnm2
Copy link

jnm2 commented Sep 30, 2022

Can the fix be backported to .NET Framework's System.Data.SqlClient?

@JRahnama
Copy link
Contributor

@jnm2 S.D.SqlClient is in servicing mode and is not accepting any fixes but security ones.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
6 participants