Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Errors after upgrading to 5.2.0 from 5.1.5 on Linux #2378

Closed
alex-jitbit opened this issue Mar 1, 2024 · 79 comments · Fixed by #2777
Closed

Errors after upgrading to 5.2.0 from 5.1.5 on Linux #2378

alex-jitbit opened this issue Mar 1, 2024 · 79 comments · Fixed by #2777
Assignees

Comments

@alex-jitbit
Copy link

alex-jitbit commented Mar 1, 2024

After upgrading 5.1.5 to 5.2.0 on Linux (Ubuntu) .NET 8 I'm getting thousands of errors:

Exception message:

Microsoft.Data.SqlClient.SqlException (0x80131904): A network-related or instance-specific error occurred while establishing a connection to SQL Server. The server was not found or was not accessible. Verify that the instance name is correct and that SQL Server is configured to allow remote connections. (provider: TCP Provider, error: 35 - An internal exception was caught)
 ---> System.TimeoutException: The socket couldn't connect during the expected 14965 remaining time.

Stack trace:

Microsoft.Data.SqlClient.SqlException (0x80131904): A network-related or instance-specific error occurred while establishing a connection to SQL Server. The server was not found or was not accessible. Verify that the instance name is correct and that SQL Server is configured to allow remote connections. (provider: TCP Provider, error: 35 - An internal exception was caught)
 ---> System.TimeoutException: The socket couldn't connect during the expected 14965 remaining time.
   at Microsoft.Data.SqlClient.SNI.SNITCPHandle.Connect(String serverName, Int32 port, TimeoutTimer timeout, SqlConnectionIPAddressPreference ipPreference, String cachedFQDN, SQLDNSInfo& pendingDNSInfo)
   at Microsoft.Data.SqlClient.SNI.SNITCPHandle..ctor(String serverName, Int32 port, TimeoutTimer timeout, Boolean parallel, SqlConnectionIPAddressPreference ipPreference, String cachedFQDN, SQLDNSInfo& pendingDNSInfo, Boolean tlsFirst, String hostNameInCertificate, String serverCertificateFilename)
   at Microsoft.Data.ProviderBase.DbConnectionPool.TryGetConnection(DbConnection owningObject, UInt32 waitForMultipleObjectsTimeout, Boolean allowCreate, Boolean onlyOneCheckConnection, DbConnectionOptions userOptions, DbConnectionInternal& connection)
   at Microsoft.Data.ProviderBase.DbConnectionPool.TryGetConnection(DbConnection owningObject, TaskCompletionSource`1 retry, DbConnectionOptions userOptions, DbConnectionInternal& connection)
   at Microsoft.Data.ProviderBase.DbConnectionFactory.TryGetConnection(DbConnection owningConnection, TaskCompletionSource`1 retry, DbConnectionOptions userOptions, DbConnectionInternal oldConnection, DbConnectionInternal& connection)
   at Microsoft.Data.ProviderBase.DbConnectionInternal.TryOpenConnectionInternal(DbConnection outerConnection, DbConnectionFactory connectionFactory, TaskCompletionSource`1 retry, DbConnectionOptions userOptions)
   at Microsoft.Data.ProviderBase.DbConnectionClosed.TryOpenConnection(DbConnection outerConnection, DbConnectionFactory connectionFactory, TaskCompletionSource`1 retry, DbConnectionOptions userOptions)
   at Microsoft.Data.SqlClient.SqlConnection.TryOpen(TaskCompletionSource`1 retry, SqlConnectionOverrides overrides)
   at Microsoft.Data.SqlClient.SqlConnection.Open(SqlConnectionOverrides overrides)
   at Microsoft.Data.SqlClient.SqlConnection.Open()

Some requests work fine, but about 50% throw this error. Reverting back to 5.1.5 solves the problem.

Further technical details

Microsoft.Data.SqlClient version: 5.2
.NET target: NET 8
SQL Server version: SQL 2017 on Linux
Operating system: Ubuntu 22

@JRahnama
Copy link
Member

JRahnama commented Mar 1, 2024

@alex-jitbit is this happening on a regular connection? I mean there is no AAD included? can you provide a sample repro please?

@JRahnama
Copy link
Member

JRahnama commented Mar 1, 2024

Linux uses managed SNI and I think the improvements were done mostly on the native side, which is windows only. Which change did you mean?

@alex-jitbit
Copy link
Author

alex-jitbit commented Mar 1, 2024

No, no AAD, my connection string uses explicit username/password combo

Data Source=172.0.0.123,1433;Initial Catalog=database;user id=user;pwd=PaSsWoRd;Max Pool Size=250;Encrypt=false

A simple repro would be:

var cn = new SqlConnection(connectionString);
cn.Open();

Compile on .NET 8, run on Ubuntu 22.04 (AWS) connecting to external SQL Server (also Ubuntu 22 on AWS).

Reverting to 5.1.5 fixed the problem immediately.

P.S. Can't repro on WSL Ubuntu connecting to Windows-hosted MS SQL Server, I assume the issue is with connecting to a linux-hosted SQL Server OR it happens under heavy load only.

@JRahnama
Copy link
Member

JRahnama commented Mar 1, 2024

@alex-jitbit I will test it today and will update you after.

@guiestimoneon
Copy link

I had the same problem, I downgrade to 5.1.5 and it worked again

SQL 2019 - Windows Server - .NET 8

@JRahnama
Copy link
Member

JRahnama commented Mar 1, 2024

I was not able to repro the issue on Ubuntu 22.04 as a local server, but I will test it with a remote server. If there is any issue it should be related to #1029

Update: I tested with an azure SQL server at East US (adding more latency), but was not able to repro the issue.

@JRahnama
Copy link
Member

JRahnama commented Mar 1, 2024

here is my test setup:

csproj:

<Project Sdk="Microsoft.NET.Sdk">

  <PropertyGroup>
    <OutputType>Exe</OutputType>
    <TargetFramework>net8.0</TargetFramework>
    <ImplicitUsings>enable</ImplicitUsings>
    <Nullable>enable</Nullable>
  </PropertyGroup>
  <ItemGroup>
    <PackageReference Include="Microsoft.Data.SqlClient" Version="5.2.0" />
  </ItemGroup>
</Project>

Program.cs

using Microsoft.Data.SqlClient;

SqlConnectionStringBuilder builder = new(){
    DataSource ="*******.database.windows.net",
    UserID = "******",
    Password = "*****",
    InitialCatalog = "Northwind",
    MaxPoolSize = 250
};
using SqlConnection conn = new(builder.ConnectionString);
conn.Open();
Console.WriteLine(conn.State);

I will test with a remote on premises server later today.

@ctrlaltdan
Copy link

Also seeing the same. Downgrading to 5.2.0-preview5.24024.3 resolved the issue for me.

Additional debug info if it's helpful.

Framework: .NET 8.0.2 
Runtime: linux-musl-x64
Image: Alpine Linux v3.19
Using: Microsoft.EntityFrameworkCore.SqlServer:8.0.2
Connected using an Azure SQL Failover group, via Entity Framework. 

@JRahnama JRahnama removed the untriaged label Mar 5, 2024
@David-Engel
Copy link
Contributor

If you are on Linux/macOS and specify both port and instance name in the connection string (like server,12345\instance), that might be the source issue. There appears to have been a regression in 5.2.0 on non-Windows where it isn't ignoring the instance name when both it and the port are specified.

@sturledahl
Copy link

sturledahl commented Mar 8, 2024

Some obervation from me hoping it helps investigation:

We started getting this problem in alpine for a test that starts multiple threads connecting some same database in parallel. Other tests work fine. Our connection strings do not specify instance or port. Tried adding some delays in the code inside the different tasks to affect timing and then we got a different error instead of the one mentioned in this ticket:

System.InvalidOperationException: Timeout expired. The timeout period elapsed prior to obtaining a connection from the pool. This may have occurred because all pooled connections were in use and max pool size was reached.
12:25:02  at Microsoft.Data.ProviderBase.DbConnectionFactory.TryGetConnection(DbConnection owningConnection, TaskCompletionSource`1 retry, DbConnectionOptions userOptions, DbConnectionInternal oldConnection, DbConnectionInternal& connection)

Here's how threads are created in the test:
var tasks = schedulers.Select(s => new TaskFactory().StartNew(s.Start)).ToList();
foreach (var t in tasks)
t.Wait();

Reverting back to v5.1.5 fixed this, so we are not updating to v5.2.0 until we know more.

@alex-jitbit
Copy link
Author

I can confirm that we too experience this error under a heavy load with multiple threads (not sure if this is the culprit)

@JRahnama
Copy link
Member

JRahnama commented Mar 8, 2024

can you guys test with this package and see if the issue is resolved? just change the extension to nupkg and should be good for testing.
Microsoft.Data.SqlClient.6.0.0-pull.106802.zip

@sturledahl
Copy link

@JRahnama Any chance you could add it to nuget.org so our build/test system can find it?

@JRahnama
Copy link
Member

JRahnama commented Mar 8, 2024

@sturledahl this package is not officially signed and is not suitable for production use. I just wanted to confirm that the fix has resolved the issue for users before proceeding with a hotfix release.

@PaulVrugt
Copy link

any update on this?

@JRahnama
Copy link
Member

any update on this?

Were you able to test with the sample package?

can you guys test with this package and see if the issue is resolved? just change the extension to nupkg and should be good for testing. Microsoft.Data.SqlClient.6.0.0-pull.106802.zip

@PaulVrugt
Copy link

@JRahnama well no. We haven't updated to 5.2 yet because of this issue. We are currently using version 5.1.5 and running into #449, but it is only happening in our production environment (with a lot of traffic) and even there only once every few weeks. We have no controlled environment to test this. Maybe @alex-jitbit has a way to reproduce it and see if the 6.0.0 version resolves it

@mosesnnewman
Copy link

Same issue on Windows!

@alex-jitbit
Copy link
Author

Maybe @alex-jitbit has a way to reproduce it and see if the 6.0.0 version resolves it

Unfortunately this bug reproducible in production only (under high load) and frankly I'm too afraid to try beta fixes on my prod.

@JRahnama
Copy link
Member

JRahnama commented Apr 9, 2024

@alex-jitbit is it possible to test with 5.2.0-preview2 and 5.2.0-preview5 versions to identify what changed caused the issue?

@ABAG603
Copy link

ABAG603 commented Apr 23, 2024

We have same issue as people above after upgrading to the 5.2.0. Issue is reproducible from either Alpine containers or VM with Amazon Linux 2023. SQL Server is running on Windows Server and connection string contains named instance and port.

@JRahnama I've tested with different versions and here's the outcome:

  • 5.2.0-preview2.23159.1 - connection can be successfully opened
  • 5.2.0-preview5.24024.3 - connection can be successfully opened
  • 5.2.0 - opening connection fails with error below
Microsoft.Data.SqlClient.SqlException: A network-related or instance-specific error occurred while establishing a connection to SQL Server. The server was not found or was not accessible. Verify that the instance name is correct
and that SQL Server is configured to allow remote connections. (provider: TCP Provider, error: 26 - Error Locating Server/Instance Specified)
     System.Net.Sockets.SocketException: Success
       at int Microsoft.Data.SqlClient.SNI.SSRP.GetPortByInstanceName(string browserHostName, string instanceName, TimeoutTimer timeout, bool allIPsInParallel, SqlConnectionIPAddressPreference ipPreference)
       at SNITCPHandle Microsoft.Data.SqlClient.SNI.SNIProxy.CreateTcpHandle(DataSource details, TimeoutTimer timeout, bool parallel, SqlConnectionIPAddressPreference ipPreference, string cachedFQDN, ref SQLDNSInfo pendingDNSInfo,
          bool tlsFirst, string hostNameInCertificate, string serverCertificateFilename)
  at void Microsoft.Data.SqlClient.SqlInternalConnection.OnError(SqlException exception, bool breakConnection, Action<Action> wrapCloseInAction)
  at void Microsoft.Data.SqlClient.TdsParser.ThrowExceptionAndWarning(TdsParserStateObject stateObj, SqlCommand command, bool callerHasConnectionLock, bool asyncClose)
  at void Microsoft.Data.SqlClient.TdsParser.Connect(ServerInfo serverInfo, SqlInternalConnectionTds connHandler, TimeoutTimer timeout, SqlConnectionString connectionOptions, bool withFailover)
  at void Microsoft.Data.SqlClient.SqlInternalConnectionTds.AttemptOneLogin(ServerInfo serverInfo, string newPassword, SecureString newSecurePassword, TimeoutTimer timeout, bool withFailover)
  at void Microsoft.Data.SqlClient.SqlInternalConnectionTds.LoginNoFailover(ServerInfo serverInfo, string newPassword, SecureString newSecurePassword, bool redirectedUserInstance, SqlConnectionString connectionOptions,
     SqlCredential credential, TimeoutTimer timeout)
  at void Microsoft.Data.SqlClient.SqlInternalConnectionTds.OpenLoginEnlist(TimeoutTimer timeout, SqlConnectionString connectionOptions, SqlCredential credential, string newPassword, SecureString newSecurePassword, bool
     redirectedUserInstance)
  at Microsoft.Data.SqlClient.SqlInternalConnectionTds..ctor(DbConnectionPoolIdentity identity, SqlConnectionString connectionOptions, SqlCredential credential, object providerInfo, string newPassword, SecureString
     newSecurePassword, bool redirectedUserInstance, SqlConnectionString userConnectionOptions, SessionData reconnectSessionData, bool applyTransientFaultHandling, string accessToken, DbConnectionPool pool,
     Func<SqlAuthenticationParameters, CancellationToken, Task<SqlAuthenticationToken>> accessTokenCallback)
  at DbConnectionInternal Microsoft.Data.SqlClient.SqlConnectionFactory.CreateConnection(DbConnectionOptions options, DbConnectionPoolKey poolKey, object poolGroupProviderInfo, DbConnectionPool pool, DbConnection owningConnection,
     DbConnectionOptions userOptions)
  at DbConnectionInternal Microsoft.Data.ProviderBase.DbConnectionFactory.CreatePooledConnection(DbConnectionPool pool, DbConnection owningObject, DbConnectionOptions options, DbConnectionPoolKey poolKey, DbConnectionOptions
     userOptions)
  at DbConnectionInternal Microsoft.Data.ProviderBase.DbConnectionPool.CreateObject(DbConnection owningObject, DbConnectionOptions userOptions, DbConnectionInternal oldConnection)
  at DbConnectionInternal Microsoft.Data.ProviderBase.DbConnectionPool.UserCreateRequest(DbConnection owningObject, DbConnectionOptions userOptions, DbConnectionInternal oldConnection)
  at bool Microsoft.Data.ProviderBase.DbConnectionPool.TryGetConnection(DbConnection owningObject, uint waitForMultipleObjectsTimeout, bool allowCreate, bool onlyOneCheckConnection, DbConnectionOptions userOptions, out
     DbConnectionInternal connection)
  at bool Microsoft.Data.ProviderBase.DbConnectionPool.TryGetConnection(DbConnection owningObject, TaskCompletionSource<DbConnectionInternal> retry, DbConnectionOptions userOptions, out DbConnectionInternal connection)
  at bool Microsoft.Data.ProviderBase.DbConnectionFactory.TryGetConnection(DbConnection owningConnection, TaskCompletionSource<DbConnectionInternal> retry, DbConnectionOptions userOptions, DbConnectionInternal oldConnection, out
     DbConnectionInternal connection)
  at bool Microsoft.Data.ProviderBase.DbConnectionInternal.TryOpenConnectionInternal(DbConnection outerConnection, DbConnectionFactory connectionFactory, TaskCompletionSource<DbConnectionInternal> retry, DbConnectionOptions
     userOptions)
  at bool Microsoft.Data.ProviderBase.DbConnectionClosed.TryOpenConnection(DbConnection outerConnection, DbConnectionFactory connectionFactory, TaskCompletionSource<DbConnectionInternal> retry, DbConnectionOptions userOptions)
  at bool Microsoft.Data.SqlClient.SqlConnection.TryOpen(TaskCompletionSource<DbConnectionInternal> retry, SqlConnectionOverrides overrides)
  at void Microsoft.Data.SqlClient.SqlConnection.Open(SqlConnectionOverrides overrides)
  at void Microsoft.Data.SqlClient.SqlConnection.Open()
  at void Microsoft.EntityFrameworkCore.SqlServer.Storage.Internal.SqlServerConnection.OpenDbConnection(bool errorsExpected)
  at void Microsoft.EntityFrameworkCore.Storage.RelationalConnection.OpenInternal(bool errorsExpected)
  at bool Microsoft.EntityFrameworkCore.Storage.RelationalConnection.Open(bool errorsExpected)
  at bool Microsoft.EntityFrameworkCore.RelationalDatabaseFacadeExtensions.<>c.<OpenConnection>b__22_0(DatabaseFacade database)
  at TResult Microsoft.EntityFrameworkCore.ExecutionStrategyExtensions.<>c__DisplayClass12_0`2.<Execute>b__0(DbContext _, TState s)
  at TResult Microsoft.EntityFrameworkCore.SqlServer.Storage.Internal.SqlServerExecutionStrategy.Execute<TState,TResult>(TState state, Func<DbContext, TState, TResult> operation, Func<DbContext, TState, ExecutionResult<TResult>>
     verifySucceeded)
  at TResult Microsoft.EntityFrameworkCore.ExecutionStrategyExtensions.Execute<TState,TResult>(IExecutionStrategy strategy, TState state, Func<TState, TResult> operation, Func<TState, ExecutionResult<TResult>> verifySucceeded)
  at void Microsoft.EntityFrameworkCore.RelationalDatabaseFacadeExtensions.OpenConnection(DatabaseFacade databaseFacade)
  at int TestConnectionCommand.Execute(CommandContext context)

@JRahnama
Copy link
Member

@ABAG603 Fix is merged in the main branch by PR #2395. Hotfix release v5.2.1 is planned, but date yet TBD.

Closing the issue as fix is merged and will be available by next hotfix release.

@JRahnama
Copy link
Member

@PaulVrugt Unfortunately, my testing didn't yield positive results. I've explored various solutions within the current design, but none were effective and each had its own drawback.

In the meantime, here are some workarounds I've found:

  1. Increase ConnectTimeout: This can help by providing more time for socket connection.
  2. Set Minimum Available Threads: Adjusting the minimum available threads in your application might help, although it could impact performance.
  3. Disable Pooling: Turning off pooling creates a new connection each time, which is not ideal as it’s quite resource-intensive.

I understand that the issue is frustrating, but I want to assure you that we're doing our best to resolve it as quickly as possible. Our team is actively working on redesigning the pool and concurrent connections, and it's progressing well.

@BlythMeister
Copy link

BlythMeister commented Jul 25, 2024

@PaulVrugt Unfortunately, my testing didn't yield positive results. I've explored various solutions within the current design, but none were effective and each had its own drawback.

In the meantime, here are some workarounds I've found:

  1. Increase ConnectTimeout: This can help by providing more time for socket connection.
  2. Set Minimum Available Threads: Adjusting the minimum available threads in your application might help, although it could impact performance.
  3. Disable Pooling: Turning off pooling creates a new connection each time, which is not ideal as it’s quite resource-intensive.

I understand that the issue is frustrating, but I want to assure you that we're doing our best to resolve it as quickly as possible. Our team is actively working on redesigning the pool and concurrent connections, and it's progressing well.

@JRahnama Whilst i fully appreciate your working on a proper fix for this issue.
Is there really no scope to release a hotfix to 5.1.5 with the dependencies updated as per my comment - #2378 (comment) ??

@ErikEJ
Copy link
Contributor

ErikEJ commented Jul 25, 2024

@BlythMeister Yes, 5.1.6 is being worked on

@cbi-at-varian
Copy link

The provided workarounds aren't mitigating the issue completely in our case. The only "solution" is actually to downgrade to 5.1.5

We're load testing our service with k6 using the ramping-vus executor. We're ramping it up to 1000 vus and let the test run for 3min. For one of our scenario we tested using different connection timeouts with the 5.2.1 version as suggested workaround 1)

setup details average request time minimum request time median request time max request time failed calls out of max
SqlClient.5.2.1; default Connection Timeout (15) 356.52ms 1.18ms 346.64ms 15.51s failed 1.41% (1605/112097)
SqlClient.5.2.1; Connection Timeout=30 361.04ms 1.21ms 344.56ms 30.66s failed 1.17% (1418/119465)
SqlClient.5.2.1; Connection Timeout=60 358.13ms 1.19ms 347.49ms 1m0s failed 0.27% (316/116617)
SqlClient.5.1.5; default Connection Timeout (15) 384.03m 1.08ms 367.17ms 1.45s failed 0% (0/119816 )

The higher we put the timeout the better it went. Going to 180s we got zero errors, but we started to get client timeouts. Which is not acceptable as well.

The suggested workaround 2) to increase the Threadpool minimum size did not provide any better results. Honestly, we didn't try out the suggested workaround 3) as it sounds just too scary..

So please a proper fix would be really appreciated. Is there in the meantime an ETA for the fix?

@ahouben
Copy link

ahouben commented Aug 14, 2024

@BlythMeister Yes, 5.1.6 is being worked on

@ErikEJ @JRahnama would you know more about an ETA for when the connection pooling/handling is fixed and released in a 5.1.6 (hotfix) or 5.2.2/3 (on top of HEAD) version ?

From https://www.nuget.org/packages/Microsoft.Data.SqlClient...
image

@robs
Copy link

robs commented Aug 14, 2024

So, have I read this wrong or does the latest stable release (5.2.1) in nuget have this issue?

@ahouben
Copy link

ahouben commented Aug 14, 2024

@robs The latest version 5.2.1 on nuget still has this issue, it has not been fixed yet.

@robs
Copy link

robs commented Aug 14, 2024

Oh wow, surely the "stable" flag should be removed then?

We've not updated to 5.2.x yet but we're running Linux clients to Windows MS SQL.

Is everyone just waiting in the hope there's a 5.1.x release that has the updated dependencies?

@JonasJes
Copy link

JonasJes commented Aug 14, 2024

@cbi-at-varian Great work showing what we are experiencing

@JRahnama
Copy link
Member

Is everyone just waiting in the hope there's a 5.1.x release that has the updated dependencies?

5.1.x will be released soon ( in a week or so) with updated dependencies.

@Markeli
Copy link

Markeli commented Aug 14, 2024

And what about 5.2.x? Will it also be released soon?

JRahnama added a commit to JRahnama/SqlClient that referenced this issue Aug 14, 2024
@JRahnama
Copy link
Member

JRahnama commented Aug 15, 2024

Could someone please test the package below to see if the issue is resolved? Simply change the extension to nupkg. Note that this is an unsigned package and is intended for testing purposes only.
Microsoft.Data.SqlClient.6.0.0-pull.122701.zip

Update: Our test pipelines passed with the changes, and the provided repro is working well with this set of modifications. The fix appears promising. I’ll wait for the impacted users' test results before proceeding with the PR. Also I will look into adding some tests if possible to prevent this from happening in future.
Thank you all for your patience.
Also thanks to @David-Engel for pointing out the root cause and solution.

@JRahnama
Copy link
Member

JRahnama commented Aug 15, 2024

And what about 5.2.x? Will it also be released soon?

If the fix works it will be backported to 5.2 and the hot fix will be released in second half of August.

@JRahnama
Copy link
Member

The issue was at TCPHanlde.Connect function. Having those async tasks in the middle of the primary sync flow was causing thread pool starvation under load. The proposed solution was to remove that as we are using socket.Select() later with the timeout to honor the timeout. Which was also mentioned at here by @roji

@cbi-at-varian
Copy link

Could someone please test the package below to see if the issue is resolved? Simply change the extension to nupkg. Note that this is an unsigned package and is intended for testing purposes only. Microsoft.Data.SqlClient.6.0.0-pull.122701.zip

Update: Our test pipelines passed with the changes, and the provided repro is working well with this set of modifications. The fix appears promising. I’ll wait for the impacted users' test results before proceeding with the PR. Also I will look into adding some tests if possible to prevent this from happening in future. Thank you all for your patience. Also thanks to @David-Engel for pointing out the root cause and solution.

I integrated the package provided by @JRahnama and run the load tests. It seems to fix the issue, thanks for that!

Here the numbers to compare against:

setup details average request time minimum request time median request time max request time failed calls out of max
SqlClient.5.1.5; default Connection Timeout (15) 398.72m 1.19ms 376.23ms 1.49s failed 0% (0/226103)
SqlClient.6.0.0.pull; default Connection Timeout (15) 434.98ms 1.3ms 401.33ms 1.51s failed 0% (0/207359)

Please note the numbers do not imply a certain quality of the fix, the load test runs aren't deterministic. Important for me is that we keep the expected thresholds and that there are no failures. It feels the fix is slightly slower, but only a benchmark would reveal that.

@saurabh500
Copy link
Contributor

@cbi-at-varian what does the average request time column measure? The average time taken to open a connection?

@JRahnama
Copy link
Member

JRahnama commented Aug 15, 2024

Here are benchmarks after the fix:

BenchmarkDotNet v0.14.0, Windows 11 (10.0.22631.4037/23H2/2023Update/SunValley3)
13th Gen Intel Core i7-1365U, 1 CPU, 12 logical and 10 physical cores
.NET SDK 8.0.400
[Host] : .NET 8.0.8 (8.0.824.36612), X64 RyuJIT AVX2
.NET 8.0 : .NET 8.0.8 (8.0.824.36612), X64 RyuJIT AVX2

Job=.NET 8.0 Runtime=.NET 8.0

Method Mean Error StdDev Ratio RatioSD
OpenSqlConnectionAsync 172.3 ms 3.05 ms 2.55 ms 1.00 0.02

below are results from 5.1.5::

BenchmarkDotNet v0.14.0, Windows 11 (10.0.22631.4037/23H2/2023Update/SunValley3)
13th Gen Intel Core i7-1365U, 1 CPU, 12 logical and 10 physical cores
.NET SDK 8.0.400
[Host] : .NET 8.0.8 (8.0.824.36612), X64 RyuJIT AVX2
.NET 8.0 : .NET 8.0.8 (8.0.824.36612), X64 RyuJIT AVX2

Job=.NET 8.0 Runtime=.NET 8.0

Method Mean Error StdDev Ratio RatioSD
OpenSqlConnectionAsync 177.0 ms 3.52 ms 4.58 ms 1.00 0.04

My sample app was the repro provided from this issue with ConnectTimeout set to 10 seconds.

@JRahnama
Copy link
Member

There are a couple of PRs to be bacported to 5.2 and the we will proceed with the release of hotfix 5.2.2

@cbi-at-varian
Copy link

@cbi-at-varian what does the average request time column measure? The average time taken to open a connection?

@saurabh500 the measurements are on the client side of the load tests. Those numbers therefore include the web stack, some small business logic including accessing the database.

@patrickg-hchb
Copy link

Could someone please test the package below to see if the issue is resolved? Simply change the extension to nupkg. Note that this is an unsigned package and is intended for testing purposes only. Microsoft.Data.SqlClient.6.0.0-pull.122701.zip
Update: Our test pipelines passed with the changes, and the provided repro is working well with this set of modifications. The fix appears promising. I’ll wait for the impacted users' test results before proceeding with the PR. Also I will look into adding some tests if possible to prevent this from happening in future. Thank you all for your patience. Also thanks to @David-Engel for pointing out the root cause and solution.

I integrated the package provided by @JRahnama and run the load tests. It seems to fix the issue, thanks for that!

Here the numbers to compare against:

setup details average request time minimum request time median request time max request time failed calls out of max
SqlClient.5.1.5; default Connection Timeout (15) 398.72m 1.19ms 376.23ms 1.49s failed 0% (0/226103)
SqlClient.6.0.0.pull; default Connection Timeout (15) 434.98ms 1.3ms 401.33ms 1.51s failed 0% (0/207359)
Please note the numbers do not imply a certain quality of the fix, the load test runs aren't deterministic. Important for me is that we keep the expected thresholds and that there are no failures. It feels the fix is slightly slower, but only a benchmark would reveal that.

We are still seeing the same issue on our side with this version. open MS ticket 2406250040008194 Is there someone here from MS that can help with this issue?

@bdschaap
Copy link

Could someone please test the package below to see if the issue is resolved? Simply change the extension to nupkg. Note that this is an unsigned package and is intended for testing purposes only. Microsoft.Data.SqlClient.6.0.0-pull.122701.zip

Update: Our test pipelines passed with the changes, and the provided repro is working well with this set of modifications. The fix appears promising. I’ll wait for the impacted users' test results before proceeding with the PR. Also I will look into adding some tests if possible to prevent this from happening in future. Thank you all for your patience. Also thanks to @David-Engel for pointing out the root cause and solution.

Can a preview nuget be pushed?

@JRahnama
Copy link
Member

JRahnama commented Aug 26, 2024

Can a preview nuget be pushed?

M.D.SqlClient v6.0.0-preview1 and the v5.2.2 hotfix release are set to be released within the next day or two.

@JRahnama
Copy link
Member

Hotfix v5.2.2 is released.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Development

Successfully merging a pull request may close this issue.