Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

*: fix bug that UnionScan can't keep order caused wrong result #33218

Merged
merged 10 commits into from
Mar 22, 2022

Conversation

tiancaiamao
Copy link
Contributor

What problem does this PR solve?

Issue Number: close #33175

Problem Summary:

What is changed and how it works?

Actually, there are two bugs that cause the UnionScan fail to keep order,
and since it can't keep order, the following plan return wrong result.
Limit1->UnionScan->XXX doesn't return the max(id)

mysql> explain select max(id) from t;
+------------------------------+---------+-----------+---------------+-------------------------------------+
| id                           | estRows | task      | access object | operator info                       |
+------------------------------+---------+-----------+---------------+-------------------------------------+
| StreamAgg_10                 | 1.00    | root      |               | funcs:max([test.t.id](http://test.t.id/))->Column#3      |
| └─Limit_17                   | 1.00    | root      |               | offset:0, count:1                   |
|   └─UnionScan_21             | 1.00    | root      |               |                                     |
|     └─TableReader_23         | 1.00    | root      |               | data:TableFullScan_22               |
|       └─TableFullScan_22     | 1.00    | cop[tikv] | table:t       | keep order:true, desc, stats:pseudo |
+------------------------------+---------+-----------+---------------+-------------------------------------+
5 rows in set (0.00 sec)

The first bug is the datum comparing, the data type is KindUint64, but the old code treat it as GetInt64(),
10353107668348738101 become a negative number and 10353107668348738101 < 33

The second bug is the handling of kv range, in TiKV, the range is singed int
Here the data type is uint, so correct range handling is:

handling range: 7480000000000000415f720000000000000000 7480000000000000415f728000000000000000
handling range: 7480000000000000415f728000000000000000 7480000000000000415f72ffffffffffffffff00
add Row is nil, get snapshot row... [KindUint64 10353107668348738101]
add Row is nil, get snapshot row... [KindUint64 9734095886065816707]
add Row is nil, get snapshot row... [KindUint64 0]

the wrong range handling is:

handling range: 7480000000000000415f728000000000000000 7480000000000000415f72ffffffffffffffff00
in iterator ... add row == [KindUint64 0]
handling range: 7480000000000000415f720000000000000000 7480000000000000415f728000000000000000
in iterator ... add row == [KindUint64 9734095886065816707]
in iterator ... add row == [KindUint64 10353107668348738101]

i.e. the kv range order should be [-max int64, 0] [0, max int64]

Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
  • No code

Side effects

  • Performance regression: Consumes more CPU
  • Performance regression: Consumes more Memory
  • Breaking backward compatibility

Documentation

  • Affects user behaviors
  • Contains syntax changes
  • Contains variable changes
  • Contains experimental features
  • Changes MySQL compatibility

Release note

Fix bug that 'select max(PK) ...' query return wrong result, the trigger condition is that UnionScan is used and the type of `PK` is uint64

@ti-chi-bot
Copy link
Member

ti-chi-bot commented Mar 17, 2022

[REVIEW NOTIFICATION]

This pull request has been approved by:

  • lcwangchao
  • winoros

To complete the pull request process, please ask the reviewers in the list to review by filling /cc @reviewer in the comment.
After your PR has acquired the required number of LGTMs, you can assign this pull request to the committer in the list by filling /assign @committer in the comment to help you merge this pull request.

The full list of commands accepted by this bot can be found here.

Reviewer can indicate their review by submitting an approval review.
Reviewer can cancel approval by submitting a request changes review.

@ti-chi-bot ti-chi-bot added release-note Denotes a PR that will be considered when it comes time to generate release notes. needs-cherry-pick-6.0 size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Mar 17, 2022
@sre-bot
Copy link
Contributor

sre-bot commented Mar 17, 2022

@jackysp
Copy link
Member

jackysp commented Mar 21, 2022

When was this introduced? @tiancaiamao

@ti-chi-bot ti-chi-bot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Mar 21, 2022
@tiancaiamao
Copy link
Contributor Author

When was this introduced? @tiancaiamao

I believe it's been there for quite a long time...
Maybe v3.0 or even v2.1 ... we already have it @jackysp

// TODO: `IterReverse` is not used... to get the same effect, reverse the kv ranges first,
// Then reverse the whole added rows.
// [99, 100] [44, 45] [1, 3] => [1, 3] [44, 45] [99, 100] => [100, 99] [45, 44] [3 1]
if m.desc {
Copy link
Collaborator

@lcwangchao lcwangchao Mar 21, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may still cause bugs when the int is signed, try the following:

create global temporary table `tmp2` (id bigint primary key) on commit delete rows;
begin;
insert into tmp2 values(-2),(-1),(0),(1),(2);
-- The following  query will give a wrong result: 
-- expected 2, 1, 0, -1, -2 , actual: -1, -2, 2, 1
select * from tmp2 where id <= -1 or id > 0 order by id desc; 

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update, PTAL @lcwangchao

executor/mem_reader.go Outdated Show resolved Hide resolved
@ti-chi-bot ti-chi-bot added the status/LGT1 Indicates that a PR has LGTM 1. label Mar 22, 2022
@tiancaiamao
Copy link
Contributor Author

/run-unit-test

@ti-chi-bot ti-chi-bot added status/LGT2 Indicates that a PR has LGTM 2. and removed status/LGT1 Indicates that a PR has LGTM 1. labels Mar 22, 2022
@tiancaiamao
Copy link
Contributor Author

/merge

@ti-chi-bot
Copy link
Member

This pull request has been accepted and is ready to merge.

Commit hash: 41b3f5b

@ti-chi-bot ti-chi-bot added the status/can-merge Indicates a PR has been approved by a committer. label Mar 22, 2022
@ti-chi-bot ti-chi-bot merged commit eb00246 into pingcap:master Mar 22, 2022
ti-srebot pushed a commit to ti-srebot/tidb that referenced this pull request Mar 22, 2022
@ti-srebot
Copy link
Contributor

cherry pick to release-6.0 in PR #33319

@tiancaiamao tiancaiamao deleted the issue-33175 branch March 22, 2022 09:48
@tiancaiamao tiancaiamao added the needs-cherry-pick-release-5.4 Should cherry pick this PR to release-5.4 branch. label Sep 14, 2022
ti-srebot pushed a commit to ti-srebot/tidb that referenced this pull request Sep 14, 2022
@ti-srebot
Copy link
Contributor

cherry pick to release-5.4 in PR #37805

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs-cherry-pick-release-5.4 Should cherry pick this PR to release-5.4 branch. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. status/can-merge Indicates a PR has been approved by a committer. status/LGT2 Indicates that a PR has LGTM 2.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

the results is wrong because of bug in bigint(45) handle column comparing
7 participants