Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Poorer performance using BulkCopy.WriteToServerAsync vs WriteToServer #716

Open
ghost opened this issue Aug 31, 2020 · 10 comments
Open

Poorer performance using BulkCopy.WriteToServerAsync vs WriteToServer #716

ghost opened this issue Aug 31, 2020 · 10 comments
Labels
📈 Performance Issues that are targeted to performance improvements.

Comments

@ghost
Copy link

ghost commented Aug 31, 2020

Describe the bug

I am using bulk copy to copy 100 million rows from one database to another on the same server. I am seeing terribly poor runtimes with the asynchronous version of WriteToServerAsync compared to the synchronous WriteToServer methods

To reproduce

-- SQL used to generate sample data

DECLARE @RowAmount AS INT = 100000000;
WITH InfiniteRows (RowNumber) AS (
   -- Anchor member definition
   SELECT 1 AS RowNumber
   UNION ALL
   -- Recursive member definition
   SELECT a.RowNumber + 1    AS RowNumber
   FROM   InfiniteRows a
   WHERE  a.RowNumber < @RowAmount
)
-- Statement that executes the CTE
SELECT CONVERT(VARCHAR(255), NEWID()) AS VAL
INTO temp
FROM   InfiniteRows
OPTION (MAXRECURSION 0);
GO
async Task Main()
{
	using (var sqlConnection = new Microsoft.Data.SqlClient.SqlConnection(@"Data Source=.\SQLEXPRESS;Initial Catalog=Scratch;Integrated Security=True"))
	{
		await sqlConnection.OpenAsync();

		using (var command = new Microsoft.Data.SqlClient.SqlCommand("SELECT VAL FROM dbo.temp", sqlConnection))
		{
			using (var bulk = new Microsoft.Data.SqlClient.SqlBulkCopy(@"Data Source=.\SQLEXPRESS;Initial Catalog=Scratch;Trusted_Connection=True;Integrated Security=True;MultipleActiveResultSets=True", Microsoft.Data.SqlClient.SqlBulkCopyOptions.KeepNulls | Microsoft.Data.SqlClient.SqlBulkCopyOptions.TableLock))
			{
				bulk.SqlRowsCopied += (o, e) =>
					{
						Console.WriteLine(e.RowsCopied);
					};

				bulk.ColumnMappings.Add(new Microsoft.Data.SqlClient.SqlBulkCopyColumnMapping("VAL", "VAL"));
				bulk.BulkCopyTimeout = 0;
				bulk.BatchSize = 10000;
				bulk.EnableStreaming = true;
				bulk.DestinationTableName = "temp1";
				bulk.NotifyAfter = 500000;

				// In an empty table, finishes in 2 Mins 17 Seconds
				bulk.WriteToServer(command.ExecuteReader());
				
				// an empty table, finishes in ~9 minutes
				await bulk.WriteToServerAsync(await command.ExecuteReaderAsync());
			}
		}
	}
}

Expected behavior

I would have expected the Async calls to be atleast comparable to the synchronous calls.

Further technical details

Microsoft.Data.SqlClient version: 2.0.1
.NET target: ,NET Framework 4.7.2
SQL Server version: SQL Server Version 2016 Express
Operating system: Windows 10

Additional context
Add any other context about the problem here.

@JRahnama
Copy link
Contributor

@RMalhotra85 Thank you for bringing this up to our attention. I will look into the issue today and get back to you soon.

@JRahnama
Copy link
Contributor

JRahnama commented Sep 1, 2020

I have checked the issue and was able to repro what you mentioned. We will look into a solution.

@JRahnama JRahnama added the 🐛 Bug! Issues that are bugs in the drivers we maintain. label Sep 1, 2020
@ghost
Copy link
Author

ghost commented Sep 1, 2020

Thanks - fwiw, I see the same behavior in System.Data.SqlClient as well (in case that's helpful)

@Wraith2
Copy link
Contributor

Wraith2 commented Sep 1, 2020

Is this something that will be improved by the changes in #358 ? If not I can take a look at it (I like playing with profilers.) if you don't already have plans @JRahnama

@roji
Copy link
Member

roji commented Sep 1, 2020

Since this seems to be async-related, I'd suspect maybe something related to #593 rather than boxing, but who knows...

@Wraith2
Copy link
Contributor

Wraith2 commented Sep 1, 2020

It could be. Bulk copy is quite separate from the other parts of the api but perhaps they cross over or the same pattern exists separately. I haven't really investigated bulk copy since I have little need for it personally.

@JRahnama
Copy link
Contributor

JRahnama commented Sep 1, 2020

Well, I was looking into BulkCopy code for last couple of weeks regarding another issue with Sparse columns. Basically dotnet core is not good with Asynchronous Pattern, (the ones that function name starts with Begin and End) design and it may throw platform not supported error when trying to call delegates. The base code, for SqlBulkcopy, is calling the same methods with some if/else statement inside it and that needs to be changed to async/await (TAP) specially in dotnet core,

@Wraith2, by all means, give it a try and I really appreciate your help. It is fun working with packets and learning how driver talks to server. Most probably you will need Wireshark to take a look inside the packet you send to the server. Going through MS-TDS documents is the best help we can get. There are explanations about each type of header and identifiers especially with PLP(Partially Length-prefixed) columns.

@roji First I thought the issue comes from OpenAsync call. I did several testing with differrent approaches and it all went back to Bulkcopy. Our final goal should be, especially in dotnet core, to convert all Asynchronous Pattern which comes from earlier versions of dotnet framework, All EventBased asynchronous patterns to Task-Based Async pattern(TAP) and simplify everything to use async/await keywords and that would be a big change.

@roji
Copy link
Member

roji commented Sep 1, 2020

@JRahnama I definitely agree it should be all the old-style async code should be replaced with async/await. Aside from significantly increasing code quality, it's likely to improve perf in a significant way. It's no small undertaking, however...

@Wraith2
Copy link
Contributor

Wraith2 commented Sep 2, 2020

I put it through some profiler runs and no single item sticks out as a problem. There is some wasted memory that can be improved but the top 5 most allocated items are strings chars and task machinery that can't be removed. In cpu terms the cost looks to be async overhead from reading strings which as @roji said is a bit inefficient but since those strings are short it isn't too bad.

The three things to so that spring to mind are:

  • get SqlBulkCopy - generics to avoid boxing #358 by @cmeyertons merged because it changes internals of bulk copy, no direct improvement for this scenario because strings don't box but I'd prefer not to cross refactor PRs
  • address the easy memory and cpu issues from profiles
  • investigate whether GetChars() overloads can be used instead of getting strings

@JRahnama JRahnama added 📈 Performance Issues that are targeted to performance improvements. and removed 🐛 Bug! Issues that are bugs in the drivers we maintain. 📈 Performance Issues that are targeted to performance improvements. labels Sep 3, 2020
@JRahnama
Copy link
Contributor

JRahnama commented Sep 3, 2020

@RMalhotra85 something that can make the code run a bit quicker in async mode is taking the MARS out of your Bulkcopy Connection string that will make it better, but still slower than sync one. I will keep looking in the issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
📈 Performance Issues that are targeted to performance improvements.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants