-
Notifications
You must be signed in to change notification settings - Fork 191
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update sync-diff-inspector to perform row-level comparison when table checksums match #784
Update sync-diff-inspector to perform row-level comparison when table checksums match #784
Conversation
Adding the "do-not-merge/release-note-label-needed" label because no release-note block was detected, please follow our release note process to remove it. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Welcome @michaelmdeng! |
8071f1b
to
a787fbd
Compare
ec5d81f
to
7032b9c
Compare
/retest |
Although this pr can fix the false positive issue, the row-level comparison will cause huge performance regression, as it abandons the distributed computing in tikv. We do not prefer to fix an issue with very low probability to happen at the cost of performance regression for most cases. For now, we may still use this pr #707 to solve the problem. |
Ack, I've opened #787 as an alternative approach using the checksum improvements you reference. |
Closing in favor of #787 |
What problem does this PR solve?
Current behavior is to perform row-level data comparison only when table checksums don't match and when user requests DML generation. Given #634, there are possible checksum collisions that cause sync-diff-inspector to return false negatives where it considers two different tables to be equal. In these scenarios, sync-diff-inspector will not compare rows and will mistakenly determine tables to be equal.
A checksum can only identify true negatives, ie. two tables are certainly different, and cannot determine true positives, ie. two tables are certainly equal. Thus a more desirable behavior is to use the checksum to remove the need for more expensive row-level data comparison when a simpler/faster checksum can already tell us the tables are different.
However, in the common case that the user wants to use sync-diff-inspector to generate DML statements when the tables are different, we need to perform row-level comparison anyway.
Issue Number: close #634
What is changed and how it works?
Thus, we should perform row-level data comparison if user requests DML generation or if the checksums match. The only case where we shouldn't compare row data is when the user does not desire DML and if the checksums don't match (confirming tables are different w/out data check).
Check List
Tests
Generate checksum collision
Confirm that current state considers tables equal. Confirm changed state considers tables different
Code changes
Side effects
This change will run row-level data comparison in more cases than previous, which can be more resource-intensive than simply checksumming. However this is in exchange for more correctness in cases where checksum collisions occur.
Related changes