Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add and use TryReadChars method #1544

Merged
merged 5 commits into from
Feb 10, 2023
Merged

Conversation

Wraith2
Copy link
Contributor

@Wraith2 Wraith2 commented Mar 13, 2022

In #593 (comment) @panoskj pointed out that there was a path in the example reproduction which was copying characters from a buffer one at a time in a loop and that this was inefficient. I've done a little digging and this is an attempt to speed it up. Note that not all strings are copied this way.

Unfortunately the fix isn't as simple as copying into from the byte[]. We need to make sure that if we reach a packet boundary that we call a function capable of fetching the next packet. We also need to be aware that we could end up with a partial character in the input buffer and that needs special handling. The strategy I've settled on for this PR is to identify how much can be copied in bulk from the current packet and then pick up any remainder or simply force a new packet by calling TryReadChar() as the code currently does. This should result in most data being read in large blocks with single chars being picked up at existing boundaries which should be faster if not the fastest possible.

/cc @panoskj could you take a look and see if you agree with this and possibly run it through your benches/profiler to see if it improves the numbers you saw?

@Wraith2 Wraith2 closed this Mar 14, 2022
@Wraith2 Wraith2 reopened this Mar 14, 2022
@panoskj
Copy link
Contributor

panoskj commented Mar 14, 2022

I have not run any benchmarks with it yet, but it seems like a reasonable change. It should speed up both sync and async paths.

It also looks like you could replace usages of TryReadPlpUnicodeChars with TryReadStringWithEncoding (passing Encoding.Unicode as encoding parameter and doubled string size), if the result will be converted to a string anyway. This is the case inside TryReadSqlStringValue for example. This way you can avoid the case of "partially read characters" altogether. It looks like TryReadPlpUnicodeChars is only needed for SqlDataReader.GetChars actually.

Though, this change cannot improve the async path much, because as I mentioned in #593 (comment), its time complexity is quadratic. No matter how fast you make the copying itself, all packets in the replay buffer will be copied into the characters buffer for each packet received again and again.

The optimal solution would copy each packet once instead. See https://github.com/panoskj/SqlClient/tree/wip/issue/593 for a proof of concept. Async runs about as fast as the sync does, even for very large columns, e.g. 50MB. Note that I haven't reviewed the last commit much, it wasn't meant to be uploaded yet.

In conclusion, perhaps we should merge the netcore and netfx files before applying a dozen of new patches to them. But except for the pending merge, I agree with this change.

@Wraith2
Copy link
Contributor Author

Wraith2 commented Mar 14, 2022

It also looks like you could replace usages of TryReadPlpUnicodeChars with TryReadStringWithEncoding (passing Encoding.Unicode as encoding parameter and doubled string size), if the result will be converted to a string anyway. This is the case inside TryReadSqlStringValue for example. This way you can avoid the case of "partially read characters" altogether. It looks like TryReadPlpUnicodeChars is only needed for SqlDataReader.GetChars actually.

Possibly. Identifying whether the behaviour is the same can be time consuming.

Though, this change cannot improve the async path much, because as I mentioned in #593 (comment), its time complexity is quadratic. No matter how fast you make the copying itself, all packets in the replay buffer will be copied into the characters buffer for each packet received again and again.

We both have branches which can linearize that, just waiting on the tree merges.

In conclusion, perhaps we should merge the netcore and netfx files before applying a dozen of new patches to them. But except for the pending merge, I agree with this change.

I've love to. Tree merging is going slowly though. If I can get a small PR through to improve performance I'll try it.

@panoskj
Copy link
Contributor

panoskj commented Mar 15, 2022

Identifying whether the behaviour is the same can be time consuming.

Nevertheless, this insight may be worth it, given the context of optimizing the async path later. But it is completely optional right now.

@Wraith2
Copy link
Contributor Author

Wraith2 commented Apr 8, 2022

It looks like this doesn't work with some of the bulkcopy tests.

@Wraith2
Copy link
Contributor Author

Wraith2 commented Apr 13, 2022

I updated without making any material change and the CI is mostly clean. The only failures are in AE and while they do look look related I have no way to replicate that environment to debug them.

@JRahnama JRahnama added the ➕ Code Health Issues/PRs that are targeted to source code quality improvements. label Aug 16, 2022
@JRahnama JRahnama added this to the 5.1.0-preview1 milestone Aug 16, 2022
@codecov
Copy link

codecov bot commented Aug 18, 2022

Codecov Report

Base: 70.86% // Head: 69.56% // Decreases project coverage by -1.31% ⚠️

Coverage data is based on head (2ad4391) compared to base (5ff68eb).
Patch coverage: 84.44% of modified lines in pull request are covered.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1544      +/-   ##
==========================================
- Coverage   70.86%   69.56%   -1.31%     
==========================================
  Files         292      292              
  Lines       61750    61768      +18     
==========================================
- Hits        43759    42967     -792     
- Misses      17991    18801     +810     
Flag Coverage Δ
addons 92.38% <ø> (ø)
netcore 73.74% <83.33%> (-0.76%) ⬇️
netfx 67.83% <93.93%> (-1.33%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
...c/Microsoft/Data/SqlClient/TdsParserStateObject.cs 23.30% <ø> (-21.08%) ⬇️
.../netcore/src/Microsoft/Data/SqlClient/TdsParser.cs 73.38% <66.66%> (-0.15%) ⬇️
...nt/netfx/src/Microsoft/Data/SqlClient/TdsParser.cs 70.84% <66.66%> (-0.14%) ⬇️
...c/Microsoft/Data/SqlClient/TdsParserStateObject.cs 87.46% <90.90%> (-2.27%) ⬇️
...c/Microsoft/Data/SqlClient/TdsParserStateObject.cs 26.18% <0.00%> (-39.85%) ⬇️
...ata/SqlClient/SqlConnectionTimeoutErrorInternal.cs 30.35% <0.00%> (-21.43%) ⬇️
.../Microsoft/Data/SqlClient/SqlTransaction.Common.cs 74.35% <0.00%> (-5.13%) ⬇️
...ta/SqlClient/SqlConnectionPoolGroupProviderInfo.cs 42.50% <0.00%> (-5.00%) ⬇️
... and 17 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

Copy link
Contributor

@DavoudEshtehari DavoudEshtehari left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Wraith2 LGTM, do you have any plan to port it into netfx too?

@DavoudEshtehari
Copy link
Contributor

PR #1520 is the prerequisite of this change.

@Wraith2
Copy link
Contributor Author

Wraith2 commented Feb 1, 2023

Prerequisites are met...

@panoskj
Copy link
Contributor

panoskj commented Feb 1, 2023

I see you haven't modified any existing code inside TdsParserStateObject - you only added a new method. This means, you can now update this PR to add the new method inside the merged file (src\Microsoft\Data\SqlClient\TdsParserStateObject.cs) instead of inside the corefx version. Of course, to do this, you will have to merge main branch into this one (to get #1520's changes).

Moreover, according to this comment, there may be someone working on merging TdsParser already, but there is no PR yet (they are waiting for #1844, which will help them a lot). You could port the changes to TryReadPlpUnicodeCharsChunk to netfx in order to help them. Alternatively, you could merge TryReadPlpUnicodeCharsChunk into a new common file (just merge this one method and leave the rest). That being said, I haven't checked whether TryReadPlpUnicodeCharsChunk is the same in corefx and netfx - if there are significant changes, porting/merging may be too hard. But I suspect they are identical.

@Wraith2 Wraith2 force-pushed the perf-tryreadchar branch 19 times, most recently from d920c94 to a406d3d Compare February 9, 2023 10:39
@Wraith2
Copy link
Contributor Author

Wraith2 commented Feb 9, 2023

Count someone from the MS side check the CI machines? the enclave steps are either timing out or just plain not working.

@Wraith2
Copy link
Contributor Author

Wraith2 commented Feb 9, 2023

Fixed, quick, merge while it's green!

Copy link
Contributor

@David-Engel David-Engel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets get this in sooner than later so it gets some miles on it during preview.

@JRahnama JRahnama merged commit 6db45d4 into dotnet:main Feb 10, 2023
@Wraith2
Copy link
Contributor Author

Wraith2 commented Feb 10, 2023

Lets get this in sooner than later so it gets some miles on it during preview.

Good idea. @panoskj do you have any benches you want to try with it since you found the original issue?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
➕ Code Health Issues/PRs that are targeted to source code quality improvements.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants