-
Notifications
You must be signed in to change notification settings - Fork 286
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Canceling SQL Server query with while loop hangs forever #44
Comments
Looks like a dead lock - EndExecuteNonQuery acquired a lock for EndExecuteNonQueryInternal thus https://github.com/dotnet/corefx/blob/2b7016cf61f752200b68e1d0c9c6a8b0a93a7a19/src/System.Data.SqlClient/src/System/Data/SqlClient/TdsParserStateObject.cs#L590-L591 can't acquire it to cancel the query |
@EgorBo interesting. Any thoughts as to why this particular query triggers the condition every time whereas most queries work fine? |
In this case the query is executed on the server, but it will never complete as it is an infinite loop. This is essentially a race condition between Reader.Close and Reader.Cancel Looking at the code, this seems like a by design bug because of how the TDS protocol is behaving. @madelson Is this infinitely running query intentional? What is the real use case for this ? |
@saurabh500 thanks for responding and adding more details. A few thoughts:
I don't think this is correct because the same query can be canceled using the synchronous cancel pattern and forgoing async IO. I don't see why it would be "designed" for cancellation to work using one cancellation approach but not the other. Here's a modified version of the above code which cancels exactly as expected:
This is a toy example. The real use-case where I discovered this involved a query that does work in a loop. The loop will eventually terminate but it can take a long time to do under edge-case conditions. Therefore, the ability to cancel when the work isn't needed is important. Another very plausible use-case for this could be building an SSMS-like application or even a service like StackExchange's Data Explorer where part of the functionality is issuing user-authored queries against a database. For such applications reliable abort behavior is very important.
This doesn't feel intuitive to me. Queries that are taking forever (and might never complete) are exactly the kind of queries people want to cancel. |
@EgorBo @saurabh500 can we label this as a bug? |
@madelson Thanks for the explanation. We will be looking into this issue. The code samples and comparisons are appreciated. |
@saurabh500 thanks. I thought I'd seen some issues with the label "bug" so I wanted to make sure that this wasn't being considered "by design" per some of the earlier discussion. |
Any update on this? It looks like many issues are indeed labeled as bugs (https://github.com/dotnet/corefx/labels/bug). Many in that list feel similar to this in nature. |
As recently announced in the .NET Blog, focus on new SqlClient features an improvements is moving to the new Microsoft.Data.SqlClient package. For this reason, we are moving this issue to the new repo at https://github.com/dotnet/SqlClient. We will still use https://github.com/dotnet/corefx to track issues on other providers like System.Data.Odbc and System.Data.OleDB, and general ADO.NET and .NET data access issues. |
Thanks for the update @divega . Now that there is renewed focus on SqlClient, can this be labeled as a bug? It's clearly buggy behavior and as more and more people shift to using async queries passing through the RequestAborted cancellation token from ASP.NET Core I anticipate that this will come up more frequently. |
The explanation at #44 (comment) makes me wonder if async (i.e. CancellationToken-based) cancellation should have a similar implementation to SqlCommand.Cancel(), e.g. using Attention messages. However I think it in #109 we were seeing async cancellation work correctly, so I am not sure I understand this issue. cc @Wraith2, @roji, @saurabh500 |
This isn't the same issue as #109 and @EgorBo is spot on at pinpointing the problem. At the moment a lock is taken in EndExecuteNonQuery and then in TdsParserStateObject.Cancel the lock is attempted and cancellation isn't allowed if it can't be acquired. If I take out that acquired check meaning that it'll take the lock if it can but will still cancel even if it can't get it then the test works. The rest of the manual test suite also passes. |
It's not clear to me what the upshot of this is. It sounds like both CancelAsync and CancellationToken based cancellation are deadlocks waiting to happen? What's the recommended way to cancel an ongoing SQL query, preferably asynchronously? |
The fix was blocked by #248 which is still waiting. |
@Wraith2 the PR you linked is now merged. Does that fix this issue or just open the door for fixing it? |
|
With waits yes it's allowed because they're serialized. Without waits no in both synchronous and asynchronous modes because you're only allowed to have one owner per parser and the reader takes ownership as it executes. Moreover there's plenty of checking at call time to make sure there's only one asynchronous operation of any type executing at any time. Multi threading in both scenarios is supported through cheap creation of multiple connections and linked objects. The locking is to ensure that only one thread is mutating the parser and state object at any given time because the state transitions cause multiple variables to be changes you can't have partial interleaving. You can do this in a course way with locking which just prevents entry to the critical section but incurs waits or you can do it with interlock gates which prevents the wait but means you have to be really careful about ordering of gate checks. The locking is far easier to understand and work with. |
It's not allowed with waits too. You have to read and close first DataReader and then execute again. And let's say you locked the command, and allow 1 execute at a time. Without reading results whenever lock is released and query is re-executed, it's not going to work anyways. |
It depends on exactly what RunSqlAsync does but I agree with your description. From my reading of the code in the past executing two sync commands from two threads concurrently will throw an exception when the state object is found to be owned by another reader but as you say it's a Bad Idea so lets not worry about that. |
I'm lacking most of the context here, but just wanted to mention that as long as no contention is anticipated, locks aren't expensive - taking a free lock is generally a very cheap operation. I wouldn't go into lock-free CompareExchange or similar for perf reasons unless contention is anticipated (which doesn't seem to be the case - this seems to be just a guard against invalid user behavior) and the perf benefits are demonstrated. |
It isn't really a problem of how expensive the locks are because as you say they're pretty cheap most of the time. The problem is that the EndExecute process inside the library is taking a lock very early and holding it for a long time and because that same lock is required for cancellation it prevents cancel, it effectively serializes it to occur after the wait so you'll get a cancellation but it won't happen until the query is finished which isn't the intention. the compareexchange isn't for perf it's just for making the state transition atomic and preventing both EndExecuteInternal (which is Close from the data reader confusingly) and Cancel can't both be executing in a way that they think they've succeeded. Instead of locking to serialize we use the successful change of state to identify the "winner" and let them execute. |
We are having a similar issue but with Reader.ExecuteReadAsync which hangs without respecting cancellationToken we pass to it when reading large rows. Any suggestions on the workaround before fixing is in place? |
There isn't a workaround because the cancellation isn't respected. The PR's that might fix the problem are in review. |
@katwan the only workaround I know of is to use the synchronous ExecuteReader API with |
@cheenamalhotra @Wraith2 @smartguest what version of Microsoft.Data.SqlClient is this fixed in? I'm a bit confused because the related PRs are in a different repo. |
Reopening issue as the change was reverted in 4.0.0-preview3 (#1352) |
But the change log for the 4.0 release says the issue was fixed. How come? |
@abatishchev this seems like an oversight. Thanks for finding it. We will fix the documentation. |
Any chance of having it re-triaged (or maybe even fixed ;) ), please? |
Current alternative: // Register cancellation token to cancel command after command timeout, e.g. 5 seconds like under:
CancellationTokenSource source = new();
source.CancelAfter(5000);
source.Token.Register(() =>
{
Console.WriteLine("Cancelling operation.");
sqlCommand.Cancel();
});
// Instead of calling:
await sqlcommand.ExecuteReaderAsync(source.Token); // which gets stuck due to this issue
// Call sync API wrapped in a task that will cancel with token as well:
await Task.Run(() => sqlCommand.ExecuteReader(), source.Token))
// or simply call:
sqlCommand.ExecuteReader(); // as command will be canceled with token canceling this execution. |
Thank you for your response @cheenamalhotra! But, with what you're proposing, doesn't it mean that we'd have a thread blocked on each sync |
I just stumbled upon this issue as well. I have a command with an infinite timeout doing Thread 35048:
Thread 34880:
|
See StackOverflow post (https://stackoverflow.com/questions/48461567/canceling-query-with-while-loop-hangs-forever?noredirect=1#comment83955305_48461567) here for a full description of the issue.
Essentially, the issue is that for a certain query I am finding that calling
CancellationTokenSource.Cancel()
hangs indefinitely instead of canceling the query. The same query cancels instantly in SQL Server Management Studio. Here is code the reproduces the issue:The text was updated successfully, but these errors were encountered: