Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf: rework TdsParserStateObject and SqlDataReader snapshots #198

Merged
merged 4 commits into from
Nov 6, 2019

Conversation

Wraith2
Copy link
Contributor

@Wraith2 Wraith2 commented Sep 14, 2019

Profiling the DataAccessPerformance project which emulates the TechEmpower fortunes benchmark shows that a lot the work done by SqlClient is spent in managing state snapshots. The data returned to the user is all string instances which are placed in a Fortune object but these aren't the dominators in the memory profile.

This PR changes the implementation of the snapshot mechanisms used by SqlDataReader and TdsStateParserObject to:

  • Keep track of a cached SqlDataReader snapshot object once one is created so that it can be efficiently reused. This is possible because a only a single async operation is permitted at any time. Access to the cached instance uses interlocked to take the instance so that it cannot ever be used twice and lazily returns the cleared object using standard assignment because creating a new one every now again again isn't a problem as long as it is usually reused. Under load one snapshot was created per reader and reused cleanly.

  • Keep track of a cached TdsStateParserObject snapshot in a similar way to SqlDataReader but using interlocked for both rent and return to the cache variable.

  • Use slightly smaller data structures by compressing multiple boolean fields on TdsParserStateObject into a flags enumeration, this makes multiple flags copy and restore a single copy not 5. All access to the affected properties is now done through accessor functions.

  • Introduce a small class Snapshot.PLPData to store any partially length prefixed data state if it is used, if it is not used the allocation object object and continual tracking of 128 bits of data it contains are avoided.

  • Change Snapshot.PacketData to be a self assembling singly linked list. This removes List<PacketData> and linked PacketData[] allocations when taking snapshots and allows a cached PacketData link to be retained in the snapshot since one will always be required is a snapshot is used.

I also removed most of the setting of TdsParserStateObject variables to default values so it is now easier to tell when they are initialized to non-default values. The only exceptions are some variables which must be initialized to default because they are only touched through reflection in testing so the compiler will complain that it can't see them being set.

Profile results before, note that of the top 6 lines only the 4th is a result object we actually want:
sqlclient snapshot master

After:
sqlclient snapshot branch

I have another branch which removes lines 3 4 and 5 intervening async machinery allocations which will give some more gains

Performance results are reasonable showing 8-10% throughput increase.

name sync threads TPS stdev description
ado-sqlclient+async+32 async 32 62461 2411 mdsql snapshot master
ado-sqlclient+async+32 async 32 69154 2368 mdsql snapshot branch
ado-sqlclient+async+64 async 64 63023 1550 mdsql snapshot master
ado-sqlclient+async+64 async 64 68612 3114 mdsql snapshot branch
ado-sqlclient+async+128 async 128 59937 1640 mdsql snapshot master
ado-sqlclient+async+128 async 128 66253 1922 mdsql snapshot branch

The function and manual tests pass. This work had already been done and tested on the corefx branch so it's had some scrutiny there as well. Coverage I can't tell about but the fact that I'm directly seeing the types being used to take them out of the profile indicates they're definitely being touched. The DataAccessPerformance benchmark is a reasonable stress test, and I've found race conditions with it before so getting through it repeatedly successfully for minutes at a time is an indicator of stability to me.

Note: if this is ported to netfx the snapshot enum comparison in TdsParserStateObject will need to be changed from using HasFlag to manual masking. On coreclr HasFlag was optimized to a jit primitive so it produces very simple and fast assembly instructions. On netfx HasFlag performance is notoriously poor and causes boxing of the argument.

@ErikEJ
Copy link
Contributor

ErikEJ commented Sep 15, 2019

Is this already merged in corefx?

@Wraith2
Copy link
Contributor Author

Wraith2 commented Sep 15, 2019

Is this already merged in corefx?

No and it won't be going into corefx now that SqlClient there has gone into deprecated mode. See discussion at the end of dotnet/corefx#37241

@David-Engel
Copy link
Contributor

Just want to comment on review status. I've been reviewing these changes today as I have time. The changes for points 1-3 look good; have not found any issues. It might take me some time to digest the changes for points 4 and 5 since they aren't as simple as the previous points. Obviously we want to make sure there are no regressions introduced. Thanks for your patience.

@Wraith2
Copy link
Contributor Author

Wraith2 commented Oct 4, 2019

Understandable. I know the obvious surface area works because queries work but I don't know how to be sure that things like multiple levels of plp data are functional.

Copy link
Contributor

@David-Engel David-Engel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the changes look good. The changes from points 4 and 5, while appearing to be very different, make sense once untangled, are obviously an improvement, and even seem a bit more understandable than the original code to me (even if they make me a little nervous).

@cheenamalhotra cheenamalhotra added the 📈 Performance Use this label for performance improvement activities label Oct 7, 2019
@Wraith2
Copy link
Contributor Author

Wraith2 commented Oct 12, 2019

Any update on this? I'm pretty sure that to address #245 will require touching snapshots which is going to cause merge conflicts with this PR.

snapshot.Clear();
}
_snapshot = snapshot;
_snapshot.Snap(this);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any special reason to use extra variable snapshot here? Can you reduce this to:

if (_snapshot is null)
{
    _snapshot = Interlocked.Exchange(ref _cachedSnapshot, null) ?? new StateSnapshot();
}
else
{
    _snapshot.Clear();
}
_snapshot.Snap(this);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's there so there is no intermediate state where _snapshot is not null but is not a correctly setup snapshot. It sets it up locally and then assigns it back to the shared variable. It could be an internlocked assignment but that isn't seems necessary.

#if DEBUG
_rollingPend = 0;
_rollingPendCount = 0;
_stateObj._lastStack = null;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know you didn't add this intentionally, but this seems unnecessary step as stateObj will be null eventually. Also in above places.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_laststack is used in some DEBUG pn;y sanity checking so it's a good idea to clear it here to make sure that code stays working.

@Wraith2
Copy link
Contributor Author

Wraith2 commented Nov 4, 2019

It's coming up on 2 months now. Any feedback on what is blocking this?

@David-Engel
Copy link
Contributor

@Wraith2 These core changes always make us nervous about introducing regressions. But obviously I do understand the need for them to move performance improvements forward. I'd like to get this PR merged. Can you merge the latest changes in master to trigger a new CI build against the new public CI pipeline?

@David-Engel David-Engel merged commit 178fe51 into dotnet:master Nov 6, 2019
@David-Engel David-Engel added this to the 1.1.0 milestone Nov 6, 2019
@Wraith2 Wraith2 deleted the snapshots branch November 8, 2019 22:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
📈 Performance Use this label for performance improvement activities
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants