SqlBulkCopy - generics to avoid boxing #358

cmeyertons · 2019-12-18T22:41:56Z

This PR is in reference to #353

This is an attempt to reduce garbage created during a SqlBulkCopy operation for applications that are using SqlBulkCopy to stream batches of data out during a larger overall process.

The overall approach was to be more respectful of the IDataReader interface and use it's typed methods if possible to eliminate uneccesary boxing. The unboxed value is passed around generically and converted back to its underlying type using a ValueTypeConverter.

I did observe some performance degradation when passing around by value, kicking off the idea to pass everything by ref -- this is more in line with how SqlBulkCopy previously behaved, minus the boxing.

While I do see less garbage created, I am seeing some disappointing performance results (abbreviated Benchmark.NET results):

|                Method |     Mean ||    Gen 0 | Gen 1 | Gen 2 |  Allocated |
|---------------------- |---------:||---------:|------:|------:|-----------:|
| NewBulkCopy_NewReader | 387.4 ms ||        - |     - |     - | 12017576 B |
| OldBulkCopy_NewReader | 161.9 ms || 250.0000 |     - |     - | 26412872 B |

There are still a considerable amount of heap allocations occurring, primarily during SqlDecimal -- I have logged the following issue to address: dotnet/runtime#1034

I think there is something different with the way my dlls are compiled? I'm seeing much slower times in places where I did not make any changes, such as TdsParser.WriteTokenLength

My WriteTokenLength: 144ms
Old WriteTokenLength: 30ms

Any assistance in understanding the perf degradation is welcomed and appreciated!

dnfclas · 2019-12-18T22:42:08Z

All CLA requirements met.

cheenamalhotra · 2019-12-19T01:31:28Z

There seem to be build errors for the PR, please take a look at the logs and try to build and run tests locally before the pipelines trigger builds. You can use steps from BUILDGUIDE to do the same.

src/Microsoft.Data.SqlClient/tests/FunctionalTests/Microsoft.Data.SqlClient.Tests.csproj

Wraith2 · 2019-12-19T09:51:41Z

Can you post your test program/query and i'll see if i can help on the WriteTokenLength speed issue?

cmeyertons · 2019-12-19T16:33:06Z

@Wraith2 I pushed the branch benchmark in my fork that has a Benchmark project added to it. Currently it's pulling from NuGet 1.1 to run the benchmark.

src/Microsoft.Data.SqlClient/netcore/src/Microsoft/Data/SqlClient/ValueTypeConverter.cs

Wraith2 · 2019-12-25T00:54:50Z

I've done some experimenting and have some suggestions. All of this I've only checked on netcore, not netfx so you'll need to check the behaviour on netfx to see if they same applies.

ValueConverter can be reduced to:

    public static class ValueTypeConverter
    {
        /// <summary>
        /// Converts the value to the TOut type
        /// </summary>
        /// <typeparam name="TIn"></typeparam>
        /// <typeparam name="TOut"></typeparam>
        public static TOut Convert<TIn,TOut>(TIn input) => (TOut)(object)input;
    }

And then made internal (so it doesn't change the public api surface). You can then add a linked reference to the file in the test harness so you can access it if you need to. Though really I don't think there need to be specific tests for it.

I suggest changing the type to be non-generic and making the method generic so that the number of instantiations don't eat space in the type list for the app. Consider that if you have a list of all the types in the application and then add 40 generic instantiations of ValueTypeConverter you have to search longer than if you have a single ValueTypeConverter type with 40 method overloads on it.
when you're not using lambda compilation you don't need the 'ref' everywhere so all the signatures can have it removed which makes thing look a lot more natural. Because you're no longer having to pass references you don't need to add many intermediate variables you can just instantiate directly at the call site.
AOT compilation will "see" these calls and be able to produce the required instantiations because they're all statically declared in the code. This removes any problem with lambda and code emission.

All of this relies on the netcore JIT being able to elide the pattern (T)(object)t to t. If netfx doesn't do this then your approach of using lambda compilation will probably be the way to implement it on netfx and that should be ok because we know netfx is desktop only. If this is the case can you see if it's possible to do it using the same public api surface on ValueTypeConverter and using a local variable to ref inside the call somewhere? that way if we do ever get to the position where we can merge a lot of the netfx and netcore codebases we only have to conditionally include the correct implantation and it'll all still fit together.

cmeyertons · 2019-12-30T18:45:18Z

I've done some experimenting and have some suggestions. All of this I've only checked on netcore, not netfx so you'll need to check the behaviour on netfx to see if they same applies.

ValueConverter can be reduced to:
    public static class ValueTypeConverter
    {
        /// <summary>
        /// Converts the value to the TOut type
        /// </summary>
        /// <typeparam name="TIn"></typeparam>
        /// <typeparam name="TOut"></typeparam>
        public static TOut Convert<TIn,TOut>(TIn input) => (TOut)(object)input;
    }
And then made internal (so it doesn't change the public api surface). You can then add a linked reference to the file in the test harness so you can access it if you need to. Though really I don't think there need to be specific tests for it.

I will implement your ValueTypeConverter suggestions -- I like that approach more. I will also make it internal.

I was using the unit tests to more quickly iterate on the ValueTypeConverter implementation -- took some trial and error to get it right, especially for the ref implementation. I would say they are still valuable to keep, but definitely agree that adding the linked reference is much cleaner than making it public.

Side note, I was kind of surprised to see the Test project as a whole didn't have internals of the SqlClient project exposed to it.

I suggest changing the type to be non-generic and making the method generic so that the number of instantiations don't eat space in the type list for the app. Consider that if you have a list of all the types in the application and then add 40 generic instantiations of ValueTypeConverter you have to search longer than if you have a single ValueTypeConverter type with 40 method overloads on it.

I definitely agree -- that is definitely an improvement and I will implement that.

when you're not using lambda compilation you don't need the 'ref' everywhere so all the signatures can have it removed which makes thing look a lot more natural. Because you're no longer having to pass references you don't need to add many intermediate variables you can just instantiate directly at the call site.

I actually didn't need the ref implementation initially -- it's not a requirement of the lambda compilation. My first iteration omitted ref, so T value would pass by value -- ValueTypeConverter just needed some slight tweaks to go to ref. Then I started trying to diagnose the performance issues. I realized the current implementation passes the boxed value around by reference, so I converted to the ref implementation to align to the previous behavior. I was hoping to see better perf using ref, but did not.

Side note, for a decimal type, is the conclusion correct that passing by value is slower than passing by stack reference (16 bytes vs 8 bytes in 64x)?

AOT compilation will "see" these calls and be able to produce the required instantiations because they're all statically declared in the code. This removes any problem with lambda and code emission.

All of this relies on the netcore JIT being able to elide the pattern (T)(object)t to t. If netfx doesn't do this then your approach of using lambda compilation will probably be the way to implement it on netfx and that should be ok because we know netfx is desktop only. If this is the case can you see if it's possible to do it using the same public api surface on ValueTypeConverter and using a local variable to ref inside the call somewhere? that way if we do ever get to the position where we can merge a lot of the netfx and netcore codebases we only have to conditionally include the correct implantation and it'll all still fit together.

I will definitely keep the public API surface aligned between netfx and netcore (I guess that means I have to copy over all of the T value changes to the netfx side?).

Wraith2 · 2019-12-30T18:57:43Z

Side note, for a decimal type, is the conclusion correct that passing by value is slower than passing by stack reference (16 bytes vs 8 bytes in 64x)?

That's a behaviour which can be changed by the JIT and by the platform. Unix and windows may have different behaviours due to the ABI of the native code generated. It is also a behaviour that may change in future iterations of the JIT so it's a level I don't tend to push at too hard. So I don't know 😁.

cmeyertons · 2019-12-31T19:41:06Z

Expanded ValueMethod to include GetInt32, etc. -- considering switching it to a bit-wise enum to possibly improve perf.

Still seeing some perf diffs, however they're much smaller (20%) when doing a like-for-like comparison using benchmark_current branch as the baseline and benchmark branch containing the generic changes.

Method	Mean	Error	StdDev	Rank	Gen 0	Gen 1	Gen 2	Allocated
current_1	264.8 ms	15.17 ms	44.49 ms	1	4000.0000	-	-	25.19 MB
current_2	273.2 ms	21.70 ms	63.97 ms	1	4000.0000	-	-	25.19 MB
new_1	327.1 ms	20.00 ms	58.65 ms	1	1000.0000	-	-	11.46 MB
new_2	319.9 ms	20.81 ms	60.36 ms	1	1000.0000	-	-	11.46 MB
new_3	306.6 ms	20.38 ms	60.09 ms	1	1000.0000	-	-	11.46 MB

cmeyertons · 2020-01-02T15:09:39Z

@Wraith2 I completed your PR feedback on the netcore side.

Some notes:

I changed the name of ValueTypeConverter to GenericConverter to more accurately describe the behavior. I think on the netfx side, it's still going to require an underlying GenericConverter<TIn, TOut to leverage the static constructor to cache off the lambda expression compilation -- I didn't see a good way of implementing that w/o using some typed dictionary lookup (which would not perform adequately compared to the generic).

I'd like to resolve the perf issues above before porting to netfx to avoid having to keep the code in sync multiple times.

Some ideas I have:

passing the decimal / SqlDecimal around by value is hurting perf
less inlining w/ the generics?
ValueMethod could be converted to a bit-wise enum.

I've been profiling using JetBrains DotTrace and haven't had success in determining why it's so much slower -- it's hard to pinpoint comparing old vs. new because the control flow has changed to let the generic value flow downwards from read -> write.

Wraith2 · 2020-01-02T17:16:55Z

Can you export the profiler traces and upload them, I have dottrace so I should be able to open them and take a look. You also need to take a look at the test failures and see if there's anything that can be fixed.

cmeyertons · 2020-01-02T18:02:25Z

Can you export the profiler traces and upload them, I have dottrace so I should be able to open them and take a look. You also need to take a look at the test failures and see if there's anything that can be fixed.

@Wraith2 I can't upload to Github, so sent to your email listed on your profile. I will take a look at the tests to get that cleaned up.

cmeyertons · 2020-01-02T20:27:07Z

I split out the benchmark into 4 different tables (decimal, string, int, bool) being copied into (instead of one table w/ all 4), and now i'm seeing zero performance degradation:

Method	Mean	Error	StdDev	Median	Rank	Gen 0	Gen 1	Gen 2	Allocated
Decimal_100000_old	86.23 ms	4.994 ms	14.004 ms	83.92 ms	4	3000.0000	-	-	18760.16 KB
Decimal_100000_new	73.38 ms	2.352 ms	6.749 ms	70.83 ms	3	1875.0000	-	-	11728.91 KB
String_100000_old	69.12 ms	3.608 ms	9.996 ms	67.73 ms	3	-	-	-	8.72 KB
String_100000_new	63.69 ms	4.466 ms	12.742 ms	59.59 ms	2	-	-	-	8.78 KB
Int_100000_old	54.92 ms	1.976 ms	5.702 ms	53.51 ms	1	333.3333	-	-	2352.45 KB
Int_100000_new	50.90 ms	1.791 ms	5.081 ms	48.76 ms	1	-	-	-	8.05 KB
Bool_100000_old	58.14 ms	2.311 ms	6.518 ms	55.70 ms	2	333.3333	-	-	2352.5 KB
Bool_100000_new	59.42 ms	3.138 ms	8.747 ms	56.05 ms	2	-	-	-	8.11 KB

Wraith2 · 2020-01-02T20:34:49Z

Weird. Incidentally a lot of those SqlDecimal allocations will be removed in .net 5 if dotnet/runtime#1155 gets merged.

cmeyertons

@Wraith2 @cheenamalhotra please see my comments in this review --

@cheenamalhotra does this logic needs to be additionally ported into netfx?

src/Microsoft.Data.SqlClient/netcore/src/Microsoft/Data/SqlClient/SqlBulkCopy.cs

cmeyertons · 2020-01-02T22:23:36Z

There was a problem with my benchmark (some random-ness) and after de-randomizing the copied data, the changes in this PR appear to improve BulkCopy's overall performance (as I was hoping).

src/Microsoft.Data.SqlClient/netfx/src/Microsoft/Data/SqlClient/GenericConverter.cs

src/Microsoft.Data.SqlClient/tests/FunctionalTests/GenericConverterTests.cs

src/Microsoft.Data.SqlClient/netcore/src/Microsoft/Data/SqlClient/TdsParser.cs

cheenamalhotra

General comments:

The classes GenericConverter you added in NetFx and NetCore projects are not same.
You did not make changes to NetFx project, any specific reason?
Also I did not see extensive test cases to reflect what we achieved with the PR.

I'm yet to go deep down the logic path, wanted to highlight some top observations.

cmeyertons · 2020-09-14T12:13:23Z

Are the failing checks due to something more widespread in the SqlClient ecosystem?

Looks like a lot of CreateDatabase failures in Enclave jobs + 2 functional test failures related to Always Encrypted

ErikEJ · 2020-09-14T12:54:05Z

@cmeyertons Yes, I think these failures are not related to your PR

cmeyertons · 2020-09-17T14:01:47Z

@cheenamalhotra @DavoudEshtehari @Wraith2 I believe this PR has all feedback completed -- the only remaining item is the decision on whether or not we want to port these changes to netfx -- happy to do so, just need to hear from team if we're good w/ changes thus far

Wraith2 · 2020-09-17T16:18:03Z

Have you investigated how difficult/different the port to netfx would be? I think it it's low friction you'll almost certainly be asked to do so, there's a lot of line of business software out there using bulk copy which would be improved by better performance and it would be a good selling point for driving library adoption over the system version.

cheenamalhotra · 2020-09-18T00:04:18Z

@cmeyertons

Please go ahead and update NetFx source. I think the changes look good to take it forward to completion 👍

Please don't mind the Enclave pipeline failures, we're dealing with some environment issues currently. We've disabled these pipelines for PR validation currently, and will resume them when test environments are ready again!

cmeyertons · 2020-09-24T15:13:00Z

@cheenamalhotra @Wraith2 performed the following:

ported code into netfx -- it was pretty trivial. The only thing I had to watch out for was not copying over optimizations made to Guid to use stackalloc (left alone to use .ToByteArray())
got the benchmark fully working (even in net461) as a unit test that measures the no boxing effect.
after getting the benchmark fully working, was able to confirm that (TOut)(object)value gets optimized out by the JIT even in net461, so was able to align the netfx and netcore GenericConverter implementations

Wraith2 · 2020-09-24T15:59:26Z

Excellent work, thanks 👍

cheenamalhotra · 2020-09-24T16:58:31Z

That's great progress, would request you to fix the failing tests in NetCore, as these failures are related to changes here..
Also could you provide us benchmarks for perf improvements you observed, in order to document here for the reviewing team?

Wraith2 · 2021-04-13T00:02:14Z

/me pokes @cmeyertons

It'd be nice to get this into the next release

cmeyertons · 2021-04-14T13:16:13Z

@Wraith2 i can carve out some time today

I was kind of stuck on fixing the test - I remember it being related to the stream bulk copy functionality and couldn’t easily step through it because the test had a large amount of data unrelated to the breakage.

will dig in some more today and post my findings

DavoudEshtehari · 2021-04-22T16:32:30Z

@cmeyertons Could you resolve conflicts, please?

Wraith2 · 2021-04-22T16:45:04Z

It is quite irksome that PR's are left open for so long and yet when you do get chance to review them quite often the only thing that is asked is to rebase them on the current version. If you gave us some warning we could do it ahead of time but trying to keep all open PR's rebased on master constantly is a big time sync for contributors.
Is there a way that we could get told ahead of time that a PR is going to be reviewed so we can make sure it's current? /cc @David-Engel

cmeyertons · 2021-04-22T23:20:29Z

I’m currently working on patching my changes on top of main and will create a new PR - there were too many hairy conflicts I couldn’t easily resolve.

cheenamalhotra · 2021-04-28T17:52:00Z

Closing as new PR #1048 replaces it.

SqlBulkCopy - generics to avoid boxing

82682c7

cheenamalhotra reviewed Dec 19, 2019

View reviewed changes

src/Microsoft.Data.SqlClient/tests/FunctionalTests/Microsoft.Data.SqlClient.Tests.csproj Outdated Show resolved Hide resolved

Carl Meyertons added 2 commits December 19, 2019 10:41

fixing build

b601cda

pr feedback

bbc1e1d

Wraith2 reviewed Dec 20, 2019

View reviewed changes

src/Microsoft.Data.SqlClient/netcore/src/Microsoft/Data/SqlClient/ValueTypeConverter.cs Outdated Show resolved Hide resolved

Carl Meyertons added 3 commits December 30, 2019 14:40

simplifying netcore ValueTypeConverter

cb586a5

more PR feedback

020dbc8

moving type resolution outside of hot path

ec7d01e

fixing test

795e4ff

cmeyertons commented Jan 2, 2020

View reviewed changes

pr feedback

76bcbcf

cheenamalhotra linked an issue Mar 19, 2020 that may be closed by this pull request

SqlBulkCopy - ReadWriteColumnValueAsync creates lot of garbage #353

Open

cheenamalhotra reviewed Mar 19, 2020

View reviewed changes

src/Microsoft.Data.SqlClient/netfx/src/Microsoft/Data/SqlClient/GenericConverter.cs Outdated Show resolved Hide resolved

cheenamalhotra reviewed Mar 19, 2020

View reviewed changes

src/Microsoft.Data.SqlClient/tests/FunctionalTests/GenericConverterTests.cs Outdated Show resolved Hide resolved

cheenamalhotra reviewed Mar 19, 2020

View reviewed changes

src/Microsoft.Data.SqlClient/netcore/src/Microsoft/Data/SqlClient/TdsParser.cs Outdated Show resolved Hide resolved

cheenamalhotra reviewed Mar 19, 2020

View reviewed changes

Carl Meyertons added 3 commits September 11, 2020 13:24

fixup

de79bcb

fixup

1745d6f

netfx fixup

b8d1475

cheenamalhotra added this to the 2.1.0-preview2 milestone Sep 18, 2020

Carl Meyertons added 2 commits September 24, 2020 10:02

netfx port + getting benchmark fully working

b748e35

debug

486aa5d

removing generic converter tests

633696e

cheenamalhotra removed this from the 2.1.0-preview2 milestone Oct 14, 2020

Base automatically changed from master to main March 15, 2021 17:54

asserts + test

3cca4ca

cmeyertons mentioned this pull request Apr 26, 2021

SqlBulkCopy - Leverage Generics to Eliminate Boxing #1048

Closed

cheenamalhotra closed this Apr 28, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SqlBulkCopy - generics to avoid boxing #358

SqlBulkCopy - generics to avoid boxing #358

cmeyertons commented Dec 18, 2019 •

edited

Loading

dnfclas commented Dec 18, 2019 •

edited

Loading

cheenamalhotra commented Dec 19, 2019

Wraith2 commented Dec 19, 2019

cmeyertons commented Dec 19, 2019 •

edited

Loading

Wraith2 commented Dec 25, 2019

cmeyertons commented Dec 30, 2019

Wraith2 commented Dec 30, 2019

cmeyertons commented Dec 31, 2019 •

edited

Loading

cmeyertons commented Jan 2, 2020

Wraith2 commented Jan 2, 2020

cmeyertons commented Jan 2, 2020

cmeyertons commented Jan 2, 2020

Wraith2 commented Jan 2, 2020

cmeyertons left a comment •

edited

Loading

cmeyertons commented Jan 2, 2020

cheenamalhotra left a comment

cmeyertons commented Sep 14, 2020 •

edited

Loading

ErikEJ commented Sep 14, 2020

cmeyertons commented Sep 17, 2020

Wraith2 commented Sep 17, 2020

cheenamalhotra commented Sep 18, 2020 •

edited

Loading

cmeyertons commented Sep 24, 2020

Wraith2 commented Sep 24, 2020

cheenamalhotra commented Sep 24, 2020

Wraith2 commented Apr 13, 2021

cmeyertons commented Apr 14, 2021

DavoudEshtehari commented Apr 22, 2021

Wraith2 commented Apr 22, 2021

cmeyertons commented Apr 22, 2021

cheenamalhotra commented Apr 28, 2021

SqlBulkCopy - generics to avoid boxing #358

SqlBulkCopy - generics to avoid boxing #358

Conversation

cmeyertons commented Dec 18, 2019 • edited Loading

dnfclas commented Dec 18, 2019 • edited Loading

cheenamalhotra commented Dec 19, 2019

Wraith2 commented Dec 19, 2019

cmeyertons commented Dec 19, 2019 • edited Loading

Wraith2 commented Dec 25, 2019

cmeyertons commented Dec 30, 2019

Wraith2 commented Dec 30, 2019

cmeyertons commented Dec 31, 2019 • edited Loading

cmeyertons commented Jan 2, 2020

Wraith2 commented Jan 2, 2020

cmeyertons commented Jan 2, 2020

cmeyertons commented Jan 2, 2020

Wraith2 commented Jan 2, 2020

cmeyertons left a comment • edited Loading

Choose a reason for hiding this comment

cmeyertons commented Jan 2, 2020

cheenamalhotra left a comment

Choose a reason for hiding this comment

cmeyertons commented Sep 14, 2020 • edited Loading

ErikEJ commented Sep 14, 2020

cmeyertons commented Sep 17, 2020

Wraith2 commented Sep 17, 2020

cheenamalhotra commented Sep 18, 2020 • edited Loading

cmeyertons commented Sep 24, 2020

Wraith2 commented Sep 24, 2020

cheenamalhotra commented Sep 24, 2020

Wraith2 commented Apr 13, 2021

cmeyertons commented Apr 14, 2021

DavoudEshtehari commented Apr 22, 2021

Wraith2 commented Apr 22, 2021

cmeyertons commented Apr 22, 2021

cheenamalhotra commented Apr 28, 2021

cmeyertons commented Dec 18, 2019 •

edited

Loading

dnfclas commented Dec 18, 2019 •

edited

Loading

cmeyertons commented Dec 19, 2019 •

edited

Loading

cmeyertons commented Dec 31, 2019 •

edited

Loading

cmeyertons left a comment •

edited

Loading

cmeyertons commented Sep 14, 2020 •

edited

Loading

cheenamalhotra commented Sep 18, 2020 •

edited

Loading