This repository has been archived by the owner on Jan 23, 2023. It is now read-only.
[Release/3.1] Fix race condition issues between SinglePhaseCommit and TransactionEnded events #43070
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Ports dotnet/sqlclient#1042 to fix below issue in System.Data.SqlClient:
Summary
A race condition exists between "Single Phase Commit" and "Transaction Ended" as both are triggered externally by delegated SqlTransaction. With new changes to doom connection in "Transaction Ended" Event (#42937), "Commit" started failing intermittently leading to this issue. Dooming connection is essential to prevent connection re-use that leads to security issues, so that PR is valid and important.
But as a consequence, "Commit"'s inconsistent locking leads to this problem. Locking is essential in this part of code, but in "Single Phase Commit" implementation, late and split locking causes issues between Commit and Abort event handling, leading to intermittent "Transaction Aborted Exception".
This change in lock scope fixes the issue. It wasn't easily reproducible in Microsoft.Data.SqlClient but happens very often with System.Data.SqlClient 4.8.2 due to slow performance. Making test-case more rigorous and forcing latency while debugging aided in reproducing this issue.
Customer Impact
❗ Critical: Premier customer application is impacted due to this issue (followed via Emails).
Regression?
Yes: Issue started occurring since System.Data.SqlClient v4.8.2 (PR #42937)
Testing
It's not possible to add test case for this behavior since it's a race condition scenario and is greatly influenced by MS DTC that manages delegated transaction completion.
Risk
Low: This PR does not introduce any critical design changes, but only makes changes in lock scope.
Issue has been verified by customers as well as with the repro apps available in issue #729
cc: @danmoseley @saurabh500 @David-Engel