Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

plan,executor: support IndexJoin over UnionScan #7877

Merged
merged 9 commits into from
Oct 12, 2018

Conversation

eurekaka
Copy link
Contributor

@eurekaka eurekaka commented Oct 11, 2018

What problem does this PR solve?

Fix pingcap/docs-cn#862

Before this PR, IndexJoin only applies when inner child is DataSource. In the above issue, the join query is in a write transaction, so each DataSource would be decorated with an LogicalUnionScan on top of it, thus making IndexJoin not applicable for the join implementation.

What is changed and how it works?

First enhance executor to support UnionScan as inner child of IndexJoin, then we enable IndexJoin for UnionScan in planner.

Below is the output after this PR:

mysql> begin;
Query OK, 0 rows affected (0.00 sec)

mysql> select * from t1;
Empty set (0.00 sec)

mysql> select * from t2;
+----+------+------+------+
| id | a    | b    | c    |
+----+------+------+------+
|  1 |    1 |    1 |    1 |
+----+------+------+------+
1 row in set (0.00 sec)

mysql> insert into t1 values(2,2,2);
Query OK, 1 row affected (0.00 sec)

mysql> insert into t2 values(2,2,2);
Query OK, 1 row affected (0.00 sec)

mysql> select * from t1;
+----+------+------+
| id | a    | b    |
+----+------+------+
|  2 |    2 |    2 |
+----+------+------+
1 row in set (0.00 sec)

mysql> select * from t2;
+----+------+------+------+
| id | a    | b    | c    |
+----+------+------+------+
|  1 |    1 |    1 |    1 |
|  2 |    2 |    2 |    2 |
+----+------+------+------+
2 rows in set (0.00 sec)

mysql> select /*+ TIDB_INLJ(t1, t2) */ * from t1 join t2 on t1.a = t2.id;
+----+------+------+----+------+------+------+
| id | a    | b    | id | a    | b    | c    |
+----+------+------+----+------+------+------+
|  2 |    2 |    2 |  2 |    2 |    2 |    2 |
+----+------+------+----+------+------+------+
1 row in set (0.00 sec)

mysql> explain select /*+ TIDB_INLJ(t1, t2) */ * from t1 join t2 on t1.a = t2.id;
+--------------------------+----------+------+---------------------------------------------------------------------------+
| id                       | count    | task | operator info                                                             |
+--------------------------+----------+------+---------------------------------------------------------------------------+
| IndexJoin_11             | 1.25     | root | inner join, inner:UnionScan_10, outer key:test.t1.a, inner key:test.t2.id |
| ├─UnionScan_12           | 10000.00 | root |                                                                           |
| │ └─TableReader_14       | 10000.00 | root | data:TableScan_13                                                         |
| │   └─TableScan_13       | 10000.00 | cop  | table:t1, range:[-inf,+inf], keep order:false, stats:pseudo               |
| └─UnionScan_10           | 0.00     | root |                                                                           |
|   └─TableReader_9        | 0.00     | root | data:TableScan_8                                                          |
|     └─TableScan_8        | 0.00     | cop  | table:t2, range: decided by [test.t1.a], keep order:false, stats:pseudo   |
+--------------------------+----------+------+---------------------------------------------------------------------------+
7 rows in set (0.00 sec)

Check List

Tests

  • Unit test
  • Manual test: check example pasted above

Future works

  • For each outer tuple of IndexJoin, we would call buildAndSortAddedRows to sort dirty table, we should consider caching the sorted dirty table for this IndexJoin operator;
  • For TableScan or IndexScan below UnionScan, we would build a new range according to each outer tuple now, shall we apply this on UnionScan as well? if we do this, we cannot cache the sorted dirty table then, this should be a tradeoff;
  • Another issue related to above decision is the computation of IndexJoin cost, i.e, shall we update PhysicalIndexJoin::getCost to take UnionScan overhead into consideration?

@eurekaka eurekaka added type/enhancement The issue or PR belongs to an enhancement. status/WIP sig/planner SIG: Planner sig/execution SIG execution labels Oct 11, 2018
@eurekaka
Copy link
Contributor Author

/run-all-tests

@eurekaka eurekaka changed the title [WIP] plan,executor: support IndexJoin over UnionScan plan,executor: support IndexJoin over UnionScan Oct 11, 2018
@eurekaka
Copy link
Contributor Author

@zz-jason @winoros @XuHuaiyu @lamxTyler PTAL

@zz-jason
Copy link
Member

LGTM

@@ -199,10 +204,8 @@ func (ds *DataSource) tryToGetDualTask() (task, error) {
// findBestTask implements the PhysicalPlan interface.
// It will enumerate all the available indices and choose a plan with least cost.
func (ds *DataSource) findBestTask(prop *property.PhysicalProperty) (t task, err error) {
// If ds is an inner plan in an IndexJoin, the IndexJoin will generate an inner plan by itself.
// So here we do nothing.
// TODO: Add a special prop to handle IndexJoin's inner plan.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keep this TODO?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have thought about adding this special prop in this patch, and then realized that current solution using nil for quick return is clear enough, so I removed this TODO.

physicalUnionScan.SetChildren(scan)
scan = physicalUnionScan
}
return scan
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

465-473 and 516-524 maybe can extract a method, except this LGTM

@eurekaka
Copy link
Contributor Author

@lysu comments addressed, PTAL.

@eurekaka eurekaka added the status/LGT1 Indicates that a PR has LGTM 1. label Oct 12, 2018
@eurekaka
Copy link
Contributor Author

/run-all-tests

@lysu
Copy link
Contributor

lysu commented Oct 12, 2018

LGTM

@lysu lysu added status/LGT2 Indicates that a PR has LGTM 2. and removed status/LGT1 Indicates that a PR has LGTM 1. labels Oct 12, 2018
@zz-jason zz-jason merged commit 5efcacb into pingcap:master Oct 12, 2018
@eurekaka eurekaka deleted the index_join_union_scan branch October 12, 2018 12:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
sig/execution SIG execution sig/planner SIG: Planner status/LGT2 Indicates that a PR has LGTM 2. type/enhancement The issue or PR belongs to an enhancement.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants