Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sync-diff-inspector: change bit_xor to sum #671

Closed
wants to merge 5 commits into from

Conversation

Leavrth
Copy link
Contributor

@Leavrth Leavrth commented Aug 16, 2022

What problem does this PR solve?

Issue Number: close #634

What is changed and how it works?

use SUM instead of BIT_XOR.

Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
  • No code

@ti-chi-bot
Copy link
Member

[REVIEW NOTIFICATION]

This pull request has not been approved.

To complete the pull request process, please ask the reviewers in the list to review by filling /cc @reviewer in the comment.
After your PR has acquired the required number of LGTMs, you can assign this pull request to the committer in the list by filling /assign @committer in the comment to help you merge this pull request.

The full list of commands accepted by this bot can be found here.

Reviewer can indicate their review by submitting an approval review.
Reviewer can cancel approval by submitting a request changes review.

@GMHDBJD
Copy link
Contributor

GMHDBJD commented Aug 16, 2022

Don't think SUM is better then XOR. 🤔

@ti-chi-bot ti-chi-bot added size/M and removed size/XS labels Aug 16, 2022
@ti-chi-bot ti-chi-bot added size/L and removed size/M labels Aug 16, 2022
@Leavrth
Copy link
Contributor Author

Leavrth commented Aug 16, 2022

Don't think SUM is better then XOR. 🤔

Yes, I think so. But BIT_XOR is easiler to lead to hash collision when the table has no unique column.
I think the difference caused by special failure is easiler to appear than that by the failure such as some bits randomly changed in the data migration. Especially duplicate same rows(a xor a xor a=a, we use count to solve it), a column of some rows is changed in a regular way(#634), ...

So, how about using SUM function only for the table without unique column? @GMHDBJD

@lance6716
Copy link
Collaborator

This is not related to UNIQUE column?

@Leavrth
Copy link
Contributor Author

Leavrth commented Aug 16, 2022

This is not related to UNIQUE column?

It seems not as simple as I thought. 🥵

@Leavrth
Copy link
Contributor Author

Leavrth commented Aug 17, 2022

Change the content of the concat_ws, the results of bit_xor(crc32(...)) of 2 rows from 2 tables are always the same.

mysql> select BIT_XOR(cast(crc32(concat_ws('.', j, i, x, j,x, j)) as unsigned)) as CHECKSUM from db_test.t4;
+------------+
| CHECKSUM   |
+------------+
| 2478727481 |
+------------+
1 row in set (0.01 sec)

mysql> select BIT_XOR(cast(crc32(concat_ws('.', j, i, x, j,x, j)) as unsigned)) as CHECKSUM from db_test.t5;
+------------+
| CHECKSUM   |
+------------+
| 2478727481 |
+------------+
1 row in set (0.00 sec)

mysql> select * from db_test.t5;
+------+-------+------+
| i    | j     | x    |
+------+-------+------+
|    1 | 10001 |    3 |
|    1 | 10011 |    3 |
+------+-------+------+
2 rows in set (0.00 sec)

mysql> select * from db_test.t4;
+------+-------+------+
| i    | j     | x    |
+------+-------+------+
|    1 | 10001 |    2 |
|    1 | 10011 |    2 |
+------+-------+------+
2 rows in set (0.01 sec)

Seems that bit_xor + crc32 is not a good idea. 🤔

@CLAassistant
Copy link

CLAassistant commented Sep 2, 2022

CLA assistant check
All committers have signed the CLA.

@Leavrth Leavrth closed this Sep 5, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BIT_ XOR function has a collision example that is easy to occur
6 participants