Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SqlDataReader returns stale data #980

Closed
dawust opened this issue Mar 10, 2021 · 11 comments · Fixed by #1023
Closed

SqlDataReader returns stale data #980

dawust opened this issue Mar 10, 2021 · 11 comments · Fixed by #1023
Labels
🐛 Bug! Issues that are bugs in the drivers we maintain.

Comments

@dawust
Copy link

dawust commented Mar 10, 2021

Describe the bug

On rare occasions SqlDataReader returns stale data from a previous query.
We read the columns of a SELECT statement sequentially using ExecuteReader and the corresponding Get operations, eg.

using (var sqlConnection = new SqlConnection(connectionString))
{
    using (var sqlCommand = new SqlCommand(sqlText, sqlConnection))
    {
        sqlConnection.Open();

        using (var reader = sqlCommand.ExecuteReader(CommandBehavior.SequentialAccess))
        {
            while (reader.Read())
            {
                var column1 = reader.GetInt32(0);
                var column2 = reader.GetDateTime(1);
                var column3 = reader.GetString(2);
                ...
            }
        }
    }
}

Eventhough the Get operations match the data types of the given SELECT statement in the SqlCommand, sometimes System.InvalidCastException is thrown.

System.InvalidCastException:
--
at Microsoft.Data.SqlClient.SqlBuffer.get_DateTime (Microsoft.Data.SqlClient, Version=2.0.20168.4, Culture=neutral, PublicKeyToken=23ec7fc2d6eaa4a5)
at Microsoft.Data.SqlClient.SqlDataReader.GetDateTime (Microsoft.Data.SqlClient, Version=2.0.20168.4, Culture=neutral, PublicKeyToken=23ec7fc2d6eaa4a5)

When investating the DataTable of reader.GetSchemaTable() it was obvious that the result belonged to a preceding unrelated SELECT statement. We experience this behaviour a few dozen times a day over millions of queries. It seems that the SqlConnection is corrupted, then is put back in the ConnectionPool and subsequent queries using the same SqlConnection are affected as well until the connection is doomed eventually.

This behaviour is often accompanied by a subsequent request failing with the following InvalidOperationException exception "Internal .NET Framework Data Provider error 60."

System.InvalidOperationException:
   at Microsoft.Data.SqlClient.SqlDelegatedTransaction.Initialize (Microsoft.Data.SqlClient, Version=2.0.20168.4, Culture=neutral, PublicKeyToken=23ec7fc2d6eaa4a5)
   at System.Transactions.TransactionStatePSPEOperation.PSPEInitialize (System.Transactions.Local, Version=5.0.0.0, Culture=neutral, PublicKeyToken=cc7b13ffcd2ddd51)
   at System.Transactions.TransactionStateActive.EnlistPromotableSinglePhase (System.Transactions.Local, Version=5.0.0.0, Culture=neutral, PublicKeyToken=cc7b13ffcd2ddd51)
   at System.Transactions.Transaction.EnlistPromotableSinglePhase (System.Transactions.Local, Version=5.0.0.0, Culture=neutral, PublicKeyToken=cc7b13ffcd2ddd51)
   at System.Transactions.Transaction.EnlistPromotableSinglePhase (System.Transactions.Local, Version=5.0.0.0, Culture=neutral, PublicKeyToken=cc7b13ffcd2ddd51)
   at Microsoft.Data.SqlClient.SqlInternalConnection.EnlistNonNull (Microsoft.Data.SqlClient, Version=2.0.20168.4, Culture=neutral, PublicKeyToken=23ec7fc2d6eaa4a5)
   at Microsoft.Data.ProviderBase.DbConnectionPool.PrepareConnection (Microsoft.Data.SqlClient, Version=2.0.20168.4, Culture=neutral, PublicKeyToken=23ec7fc2d6eaa4a5)
   at Microsoft.Data.ProviderBase.DbConnectionPool.TryGetConnection (Microsoft.Data.SqlClient, Version=2.0.20168.4, Culture=neutral, PublicKeyToken=23ec7fc2d6eaa4a5)
   at Microsoft.Data.ProviderBase.DbConnectionPool.TryGetConnection (Microsoft.Data.SqlClient, Version=2.0.20168.4, Culture=neutral, PublicKeyToken=23ec7fc2d6eaa4a5)
   at Microsoft.Data.ProviderBase.DbConnectionFactory.TryGetConnection (Microsoft.Data.SqlClient, Version=2.0.20168.4, Culture=neutral, PublicKeyToken=23ec7fc2d6eaa4a5)
   at Microsoft.Data.ProviderBase.DbConnectionInternal.TryOpenConnectionInternal (Microsoft.Data.SqlClient, Version=2.0.20168.4, Culture=neutral, PublicKeyToken=23ec7fc2d6eaa4a5)
   at Microsoft.Data.SqlClient.SqlConnection.TryOpen (Microsoft.Data.SqlClient, Version=2.0.20168.4, Culture=neutral, PublicKeyToken=23ec7fc2d6eaa4a5)
   at Microsoft.Data.SqlClient.SqlConnection.Open (Microsoft.Data.SqlClient, Version=2.0.20168.4, Culture=neutral, PublicKeyToken=23ec7fc2d6eaa4a5)

To reproduce

Unfortunately I don't have a repro. It seems to happen sporadically and is not related to heavy load.

Expected behavior

SqlDataReader only returns data corresponding to the given SqlCommand.

Further technical details

Microsoft.Data.SqlClient version: 2.1.2
.NET target: .NET 5.0
SQL Server version: SQL Server 2019
Operating system: Both Windows and Linux (Docker container mcr.microsoft.com/dotnet/aspnet:5.0)

Additional context
Both async and sync operations are affected. Some of our queries are wrapped in a TransactionScope.
To mitigate the issue, I exposed the Abort method of the SqlConnection using reflection, wrapped everything in a try/catch block and doom the connection manually after InvalidOperation or InvalidCast exceptions. By doing so subsequent queries are less prone to be affected.

Thank you in advance for any insights!

@cheenamalhotra
Copy link
Member

Hi @dawust

Could you share more details with code snippets about how do you open and end transactions?
Please try to include all possible cases that your application implements.

The error "Internal .NET Framework Data Provider error 60." refers to "UnknownTransactionFailure" which is triggered when driver cannot initialize delegated transaction here:

// Handle case where ExecuteTran didn't produce a new transaction, but also didn't throw.
if (null == connection.CurrentTransaction)
{
connection.DoomThisConnection();
throw ADP.InternalError(ADP.InternalErrorCode.UnknownTransactionFailure);
}

But as you said cast issues happen before that error, so something has gone bad already on your connection/transaction before this is raised.

To figure that out, is it possible to collect below information for same failure?

  1. SQL Server Profiler traces with custom information enabled, e.g. Transaction events and Transaction IDs (also on Statement events):

image

  1. Wireshark traces from any Windows client for "TDS" protocol only.

@dawust
Copy link
Author

dawust commented Mar 10, 2021

Thanks for your reply.

We've got our own in-house developed ORM, thus luckily all of our transactions follow the same pattern.
We wrapped the TransactionScope in a static utils class and we properly dispose of the TransactionScope utilizing the using statement.

    public static class TransactionScopeUtils
    {
        public static TransactionScope BeginScope(bool requireSeparateTransaction = false)
        {
            return new TransactionScope(
                requireSeparateTransaction ? TransactionScopeOption.RequiresNew : TransactionScopeOption.Required,
                new TransactionOptions { IsolationLevel = IsolationLevel.ReadCommitted },
                TransactionScopeAsyncFlowOption.Enabled);
        }
    }

/// Sample read usage
  using (TransactionScopeUtils.BeginScope(requireSeparateTransaction: true))
  {
	// Read data following this pattern
	using (var sqlConnection = new SqlConnection(connectionString))
	{
		using (var sqlCommand = new SqlCommand(sqlText, sqlConnection))
		{
			sqlConnection.Open();

			using (var reader = sqlCommand.ExecuteReader(CommandBehavior.SequentialAccess))
			{
				while (reader.Read())
				{
					var column1 = reader.GetInt32(0);
					var column2 = reader.GetDateTime(1);
					var column3 = reader.GetString(2);
					...
				}
			}
		}
	}	
  }

All of our write operations make use of a TransactionScope according to the following pattern:

using (var transactionScope = new TransactionScope(
	scopeOption: TransactionScopeOption.Required,
	transactionOptions: new TransactionOptions { Timeout = transactionTimeout, IsolationLevel = System.Transactions.IsolationLevel.ReadCommitted, }, 
	asyncFlowOption: TransactionScopeAsyncFlowOption.Enabled))
{
        var retryCount = 3;
	while (retryCount > 0)
	{
		try
		{
			using (var sqlConnection = new SqlConnection(connectionString))
			{
				// Do all our writing which involves reading multiple tables, writing multiple tables (sometimes using SqlBulkCopy)
			}
		}
		catch (SqlException exception) when (sqlException.Number == 1205 && retryCount > 1)
		{
			// a sql exception that is a deadlock
			Thread.Sleep(new Random().Next(100));
			retryCount--;
		}
	}
	
	transactionScope.Complete();
}

I hope the code fragments aren't simplified too much.
I'll get in touch with one of our DB admins to get you same traces as soon as possible.

I'm really grateful for your support.

@dawust
Copy link
Author

dawust commented Mar 12, 2021

Hi @cheenamalhotra

I managed to create a repro for both Exceptions:
https://gist.github.com/dawust/5cc15be6534675f6f4f0f8ab821a237f

Hope this helps!

@cheenamalhotra
Copy link
Member

Thanks for the repro @dawust !
I'll investigate into it and get back to you.

@cheenamalhotra cheenamalhotra added the 🐛 Bug! Issues that are bugs in the drivers we maintain. label Mar 12, 2021
@cheenamalhotra
Copy link
Member

Adding some notes for readers:

  • This issue does not affect .NET Framework drivers, both System.Data.SqlClient and Microsoft.Data.SqlClient are unaffected.
  • Issue exists and is reproducible for all .NET Core/Standard drivers starting from System.Data.SqlClient v4.5.0-preview1.
  • Managed SNI or MARS are not related here. Issue is reproducible with Native SNI as well.

More investigations to find root cause are in progress.

@gazdzinskak
Copy link

Hi @cheenamalhotra,
unfortunately we'd run into the issue described in #659 in production while using Microsoft.Data.SqlClient v2.0.1. We've upgraded to v.2.1.2, but as I understand from this issue, this may not eliminate the problem? We're working on being able to reproduce the issue but have not yet succeeded.
We also have quite a few applications in .NET Core with System.Data.SqlClient v.4.5+. Do you have any recommendation for such cases?
I'd be much obliged for your response.

@cheenamalhotra
Copy link
Member

@gazdzinskak

Yes this issue is on a different path and not related to #659.
We're investigating root cause and possible mitigations, will let you know soon.

cc @DavoudEshtehari

@DavoudEshtehari DavoudEshtehari linked a pull request Apr 6, 2021 that will close this issue
@DavoudEshtehari
Copy link
Contributor

@dawust Could you verify if the prepared fix #1023 works for you, please?

@dawust
Copy link
Author

dawust commented Apr 7, 2021

@DavoudEshtehari Thanks for the fix! I have rolled out the fix in one of our secondary systems. On this particular system we observe the issue once every two or three days. I'll give you an update on the situation by next week.

@dawust
Copy link
Author

dawust commented Apr 13, 2021

It seems that the fix did the trick, no more exceptions :)
@cheenamalhotra @DavoudEshtehari Thank you for your support!
Unfortunately I had to override some of our policies to get a non-stable nuget in our production environment, so I wonder if there is any chance to get a stable 2.x nuget with this fix?

@DavoudEshtehari
Copy link
Contributor

DavoudEshtehari commented Apr 13, 2021

Hi @dawust, thank you for the quick check. This change will apply as a hot fix to the stable versions as well as 2.x.
cc @cheenamalhotra, @David-Engel

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🐛 Bug! Issues that are bugs in the drivers we maintain.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants