Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cancelling an async SqlClient operation throws SqlException, not TaskCanceledException #26

Open
roji opened this issue Jun 8, 2016 · 31 comments
Labels
🔨 Breaking Change Issues/PRs that are related with breaking API changes in the driver.

Comments

@roji
Copy link
Member

roji commented Jun 8, 2016

When using a cancellation token to cancel an SqlClient async operation, an SqlException is thrown rather than TaskCanceledException. This seems like problematic behavior, here's some analysis.

On the problematic side, standard async APIs generally throw TaskCanceledException, and SqlClient is doing something different and therefore unexpected. Also, as a result, canceled SqlClient tasks don't have status Canceled, they're Faulted instead. This makes differentiating cancellations from real exceptions more difficult, and the problem is compounded if the SqlClient call is just part of a larger async operation (which again, will be faulted instead of canceled unless some manual specific handling is involved).

The argument for the current behavior seems to be that all of SqlClient's server and network errors are raised as SqlException, making it easy to catch all database errors under a single exception type. On the other hand, it seems that cancellation isn't really an error condition as it's been requested by the user and is expected to occur. The TaskCanceledException is simply a mechanism for conveying the cancellation up the stack.

For the record, this was originally reported by @mikkeljohnsen for Npgsql in npgsql/npgsql#1146. Whatever is decided here for SqlClient will also be adopted for Npgsql (which currently behaves like SqlClient in this respect).

@GSPP
Copy link

GSPP commented Jun 9, 2016

If this is changed the exception should allow to tell the difference between:

  1. Timeout
  2. SqlCommand.Cancel
  3. Some other external cancellation arrived through a CancellationToken

My fear is that it becomes hard to analyze error logs. Also, sometimes programmatic action is required. Maybe you don't want to send an error mail to the ops team for explicit cancellations but you do want to send one for timeouts.

TaskCanceledException is not exactly the nicest exception possible and it's quite nasty already with HttpClient.

@roji
Copy link
Member Author

roji commented Jun 9, 2016

@GSPP, I think distinguishing between timeouts and cancellations would be easy in any case - TaskCanceledException would only be thrown for cancellations, not timeouts. SqlException already provides the means to programmatically understand exactly what happened. The question here is only whether cancellations should be be raised as SqlException or as TaskCanceledException.

@GSPP
Copy link

GSPP commented Jun 9, 2016

Good point.

@bgrainger
Copy link

When developing MySqlConnector, I didn't check the SqlClient behaviour first, but just assumed that throwing OperationCanceledException (and transitioning the Task returned from ExecuteXAsync to the Canceled state) was the right thing to do. (At the moment, I plan to keep this behaviour instead of changing it to match SqlClient and Npgsql.)

Throwing:
https://github.com/mysql-net/MySqlConnector/blob/a06bf08763fa97f3e4c7c0157a0e54d9d5b8bccf/src/MySqlConnector/MySqlClient/CommandExecutors/TextCommandExecutor.cs#L64-L67

Example test:
https://github.com/mysql-net/MySqlConnector/blob/a06bf08763fa97f3e4c7c0157a0e54d9d5b8bccf/tests/SideBySide/CancelTests.cs#L186

@mikkeljohnsen
Copy link

@bgrainger Yes it is the logical thing to do. The SqlClient is doing it wrong and should change it ASAP.

I think Npgsql should follow your lead and do the right thing.

@roji
Copy link
Member Author

roji commented Nov 6, 2017

As this hasn't even received a comment in such a long time, I'll go ahead and change Npgsql's behavior for 3.3.

I really do hope Microsoft improves their attitude towards database access and ADO.NET, it's quite frustrating for things to receive zero attention at all - Entity Framework is important but so is raw SQL access (@divega maybe you can help push again)

@saurabh500 saurabh500 removed their assignment Mar 30, 2018
@JackLiang5
Copy link

JackLiang5 commented Mar 20, 2019

Ref to Cancellation in Managed Threads ( https://docs.microsoft.com/en-us/dotnet/standard/threading/cancellation-in-managed-threads?view=netframework-4.7.2 )

Cancellations should be raised as OperationCanceledException or its derived class (e.g. TaskCanceledException). Usually, exception handler (including try/catch) will process OperationCanceledException specially. There're some examples in the referred page. Raising other exception like SqlException will break these examples.

@divega
Copy link

divega commented May 15, 2019

As recently announced in the .NET Blog, focus on new SqlClient features an improvements is moving to the new Microsoft.Data.SqlClient package. For this reason, we are moving this issue to the new repo at https://github.com/dotnet/SqlClient. We will still use https://github.com/dotnet/corefx to track issues on other providers like System.Data.Odbc and System.Data.OleDB, and general ADO.NET and .NET data access issues.

@divega divega transferred this issue from dotnet/corefx May 15, 2019
@David-Engel David-Engel added this to the Future milestone May 21, 2019
@cheenamalhotra cheenamalhotra added the 🔨 Breaking Change Issues/PRs that are related with breaking API changes in the driver. label Oct 7, 2019
@cheenamalhotra
Copy link
Member

Acknowledged.
As it seems to be a breaking API change, we'll consider fixing it with other API changes, when that happens.

@virzak
Copy link

virzak commented Dec 7, 2019

I ran into this as well.

Having to use a hack to suppress. Is there a better way to handle this in the meantime?

catch (OperationCanceledException)
{
}
// Fixme:
// Remove this catch when issue is resolved: https://github.com/dotnet/SqlClient/issues/26
catch (Microsoft.Data.SqlClient.SqlException sqlex)
{
    var errors = sqlex.Errors;
    var isCancelled = false;
    foreach (var error in errors)
    {
        if (!(error is Microsoft.Data.SqlClient.SqlError sqlError))
            continue;
        if (sqlError.Message == "Operation cancelled by user.")
        {
            isCancelled = true;
        }
    }

    if (!isCancelled)
        throw;
}

@cheenamalhotra cheenamalhotra modified the milestones: Future, 2.0.0 Dec 11, 2019
@cheenamalhotra cheenamalhotra removed this from the 2.0.0 milestone Jun 16, 2020
@proff
Copy link

proff commented Nov 6, 2020

@virzak, you used cancellationToken for cancel operation, so yo can use it for check exception:

catch (SqlException ex) when (cancellationToken.IsCancellationRequested && ex.Number == 0 && ex.State == 0 && ex.Class == 11)
{
    throw new OperationCanceledException(cancellationToken);
}

The error message can be localized.

@proff
Copy link

proff commented Nov 7, 2020

Or for entity framework core

services.AddDbContext<DbContext>(
                options => options
                    .UseSqlServer(...)
                    .AddInterceptors(new WrongExceptionOnCancelFixInterceptor())
    public class WrongExceptionOnCancelFixInterceptor : DbCommandInterceptor
    {
        public override Task CommandFailedAsync(
            DbCommand command,
            CommandErrorEventData eventData,
            CancellationToken cancellationToken = new CancellationToken())
        {
            if (eventData.IsAsync && cancellationToken.IsCancellationRequested
                                  && eventData.Exception is SqlException { Number: 0, State: 0, Class: 11 })
            {
                throw new OperationCanceledException(cancellationToken);
            }

            return base.CommandFailedAsync(command, eventData, cancellationToken);
        }
    }

@Wraith2
Copy link
Contributor

Wraith2 commented Mar 3, 2021

I've just been looking at cancellation for a bug and this seems like a reasonable time to possibly address this issue. It looks like there's only one place that we create cancel exceptions. Why not introduce an app switch that can optionally wrap the sql exception in a TaskCanceledException?

@Wraith2
Copy link
Contributor

Wraith2 commented Mar 4, 2021

If @johnnypham is already working on it I'll leave it alone again. I just spotted the code that i thought would need changing on the way past fixing something else and it came to mind.

In theory we can change the code to wrap the sql exception in the type we want and put it on a compatibility switch so it's an opt-in behavioural change. If that is done how do consumers like EF detect that the capability is present in the version of the library that they're using and set the flag?

@roji
Copy link
Member Author

roji commented Mar 4, 2021

In theory we can change the code to wrap the sql exception in the type we want and put it on a compatibility switch so it's an opt-in behavioural change.

It seems that the consensus is to make SqlClient throw OperationCanceledException by default, rather than when a special opt-in is used (hence the proposal to do it for 3.0 as a breaking change) - though if I'm understanding it wrong we could discuss this further. Though it would certainly be reasonable to have an opt-out that restores the current behavior of throwing an unwrapped SqlException).

If that is done how do consumers like EF detect that the capability is present in the version of the library that they're using and set the flag?

Anyone who's interested can check the same compatibility switch that SqlClient would check. But I can't really imagine EF doing this and modifying its behavior (EF also doesn't care that much about the cancellation/failure distinction)... Until SqlClient is changed to throw OperationCanceledException, all cancellations will be treated as failures by consuming code.

@danpaul88
Copy link

danpaul88 commented Jun 10, 2021

Given that the 3.0 release was just tagged and this issue is still open, I assume this didn't make the cut in the end? Does this mean a fix for this will have to wait for a 4.0 release now, since it was originally waiting to be included in 3.0 due to being a breaking change to the type of exception thrown?

@johnnypham
Copy link
Contributor

I made progress on this but was blocked by an existing issue where commands were not canceled reliably in some scenarios. We have two PRs to address it (#956 and #973) which we're looking to review soon.

@roji
Copy link
Member Author

roji commented Dec 12, 2021

@JRahnama @David-Engel as part of dotnet/efcore#19526 I'm trying to identify SqlExceptions which in fact represent cancellations. However, there doesn't seem to be any error code identifying cancellation... The SqlException contains two SqlErrors, the first message of which says "A severe error occurred on the current command. The results, if any, should be discarded." and the second says "Operation cancelled by user.". The class of both errors is 11, but the Number, State, and LineNumber are all 0. I could look at the message, but as that's localized that wouldn't always work and be brittle.

Am I missing something? What would be the proper way to identify an exception representing cancellation?

@Wraith2
Copy link
Contributor

Wraith2 commented Dec 12, 2021

Cancelling is supported in TDS by sending an out of band cancel request to the server, an attention message. The client is then required to continue processing incoming data from the server until it sees a packet attention acknowledge. The spec says that:

When the ignore and EOM bits are set, the server does not send an attention acknowledgment, but instead returns a table response with a single DONE token that has a status of DONE_ERROR to indicate that the incoming request was ignored.

I'm not clear about why we might be getting that behaviour in the case of cancellation when a request has been completely sent to the server and we're waiting for a response. The way I read the code I can't see us setting the ignore flag in out of band attention requests. I suspect that SQL server might not entirely be adhering to the spec or the behaviour is more subtle than I'm understanding. This bears further investigation imo.

Anyway. Ultimately we end up in this bit of code in TdsParser.TryProcessDone

            // Surface exception for DONE_ERROR in the case we did not receive an error token
            // in the stream, but an error occurred.  In these cases, we throw a general server error.  The
            // situations where this can occur are: an invalid buffer received from client, login error
            // and the server refused our connection, and the case where we are trying to log in but
            // the server has reached its max connection limit.  Bottom line, we need to throw general
            // error in the cases where we did not receive an error token along with the DONE_ERROR.
            if ((TdsEnums.DONE_ERROR == (TdsEnums.DONE_ERROR & status)) && stateObj.ErrorCount == 0 &&
                  stateObj.HasReceivedError == false && (RunBehavior.Clean != (RunBehavior.Clean & run)))
            {
                stateObj.AddError(new SqlError(0, 0, TdsEnums.MIN_ERROR_CLASS, _server, SQLMessage.SevereError(), "", 0));

                if (null != reader)
                {
                    if (!reader.IsInitialized)
                    {
                        run = RunBehavior.UntilDone;
                    }
                }
            }

which is synthesizing a generic error because it sees a DONE_ERROR status but has no error tokens processed. This explains the lack of state and line fields, we don't have them because we're making up the error based on a heuristic. That also means that we wouldn't be overwriting any information if we chose to use a state value which was outside the range that SQL server was able to send, a negative number for example. However, i would want to be convinced that cancellation was the only way for this code to be reached which as usual will require some research from the sql team side.

Tracing through the code I find that we're generating SqlError instances for every info message that is sent. Changing database, changing language, etc. That feels wasteful and I wonder if their materialization could be deferred until they are needed so we could avoid materializing them and the strings that the contain.

@MisinformedDNA
Copy link

Any updates?

@roji
Copy link
Member Author

roji commented Dec 15, 2022

Note that in some scenarios, ExecuteDbDataReaderAsync apparently throws an InvalidOperationException with the message "Operation cancelled by user" (see dotnet/efcore#29861). This makes it even more difficult for calling code to identify cancellation.

@MxNbrt
Copy link

MxNbrt commented Apr 3, 2023

@JRahnama and @Wraith2 what is the current status of this issue. Right now, i have to use a workaround like the one proposed here dotnet/efcore#19526 (comment) but it feels so wrong.

Supporting the correct exception type would be a breaking change but it would conform to the .net library coding guidelines and working with the SqlClient API (through ef core) would be much easier. Thx

@janpalas
Copy link

Or for entity framework core

services.AddDbContext<DbContext>(
                options => options
                    .UseSqlServer(...)
                    .AddInterceptors(new WrongExceptionOnCancelFixInterceptor())
    public class WrongExceptionOnCancelFixInterceptor : DbCommandInterceptor
    {
        public override Task CommandFailedAsync(
            DbCommand command,
            CommandErrorEventData eventData,
            CancellationToken cancellationToken = new CancellationToken())
        {
            if (eventData.IsAsync && cancellationToken.IsCancellationRequested
                                  && eventData.Exception is SqlException { Number: 0, State: 0, Class: 11 })
            {
                throw new OperationCanceledException(cancellationToken);
            }

            return base.CommandFailedAsync(command, eventData, cancellationToken);
        }
    }

This solution was working perfectly until EF Core 7. After upgrading to EF Core 7 the method CommandFailedAsync in an interceptor gets never called. Instead interceptor's method CommandCanceledAsync gets called and afterwards SqlException is thrown sometimes. However, within CommandCanceledAsync method you have no option to investigate the exception to apply the fix and throw OperationCancelledException instead of SqlException...

@roji
Copy link
Member Author

roji commented Jun 20, 2023

@janpalas when EF calls CommandCanceleAsync instead of CommandFailedAsync, that means that it really figured out that it's a cancellation (by checking whether the cancellation token as triggered); so I'm not sure there's a reason for you to "investigate" anything in the interceptor?

@janpalas
Copy link

janpalas commented Jun 20, 2023

it really figured out that it's a cancellation (by checking whether the cancellation token as triggered)

@roji sadly just partly. Now CommandCanceledAsync is correctly called, however sometimes cancellation gets "unwrapped" into OperationCancelledException (which is correct), but sometimes into SqlException. I adjusted interceptor's code to prevent SqlExceptions emerging from cancellation:

public override Task CommandCanceledAsync(DbCommand command, CommandEndEventData eventData, CancellationToken cancellationToken = default)
{
    // 20105 = Microsoft.EntityFrameworkCore.Database.Command.CommandCancelled
    if (cancellationToken.IsCancellationRequested && eventData.EventId == 20105)
        throw new OperationCanceledException(cancellationToken);

    return Task.CompletedTask;
}

(please note that CommandEndEventData has no information about the exception - that is why I use EventId)

@roji
Copy link
Member Author

roji commented Jun 20, 2023

@janpalas yep, EF makes no attempt to correct SqlClient's behavior here by throwing a different type of exception; it just distinguishes between failure and cancellation. In that sense, as far as I can tell, the situation isn't any worse in EF Core 7.0 compared to what it was in 6.0 - it's just that there's a different event for cancellation.

@janpalas
Copy link

@roji after thorough investigation it gets more interesting: In case of command cancellation, CommandCanceledAsync is called only before SqlException gets thrown. CommandCanceledAsync gets never called when OperationCancelledException is thrown, ie. CommandCanceledAsync gets hit only in minory cases.

And yet, it is not 100 % reliable: Sometimes SqlException is thrown in case of command cancellation, but CommandCanceledAsync is not hit...

@roji
Copy link
Member Author

roji commented Jun 22, 2023

@janpalas as these are EF questions, can you please open an issue on the EF repo with a code sample that shows the problematic behavior?

@BieleckiLtd
Copy link

@virzak, you used cancellationToken for cancel operation, so yo can use it for check exception:

catch (SqlException ex) when (cancellationToken.IsCancellationRequested && ex.Number == 0 && ex.State == 0 && ex.Class == 11)
{
    throw new OperationCanceledException(cancellationToken);
}

The error message can be localized.

In this specific scenario - is it necessary to check the properties of SqlException when the operation is cancelled by a CancellationToken, or is the following simplified approach sufficient?

catch (SqlException) when (cancellationToken.IsCancellationRequested)
{
    throw new OperationCanceledException(cancellationToken);
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🔨 Breaking Change Issues/PRs that are related with breaking API changes in the driver.
Projects
None yet
Development

No branches or pull requests