Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

broadcast all the filters on join keys #7098

Closed
zz-jason opened this issue Jul 18, 2018 · 4 comments
Closed

broadcast all the filters on join keys #7098

zz-jason opened this issue Jul 18, 2018 · 4 comments
Assignees
Labels
sig/planner SIG: Planner type/enhancement The issue or PR belongs to an enhancement.

Comments

@zz-jason
Copy link
Member

zz-jason commented Jul 18, 2018

We could broadcast all the filters on the join keys to reduce the input records:

  • Inner Join: all the filters on the join keys can be merged and pushed to both sides.
  • Outer Join: filters that pushed to the outer table and involves the outer join keys can also be pushed to the inner table on the corresponding inner join keys.

An example for Inner Join:

TiDB(localhost:4000) > desc select * from t1 join t2 on t1.a=t2.a and t1.a in (12, 13) and t2.a in (14, 15);
+-------------------------+----------+------+--------------------------------------------------------------------+
| id                      | count    | task | operator info                                                      |
+-------------------------+----------+------+--------------------------------------------------------------------+
| HashLeftJoin_6          | 25.00    | root | inner join, inner:TableReader_13, equal:[eq(test.t1.a, test.t2.a)] |
| ├─TableReader_10        | 20.00    | root | data:Selection_9                                                   |
| │ └─Selection_9         | 20.00    | cop  | in(test.t1.a, 12, 13)                                              |
| │   └─TableScan_8       | 10000.00 | cop  | table:t1, range:[-inf,+inf], keep order:false, stats:pseudo        |
| └─TableReader_13        | 20.00    | root | data:Selection_12                                                  |
|   └─Selection_12        | 20.00    | cop  | in(test.t2.a, 14, 15)                                              |
|     └─TableScan_11      | 10000.00 | cop  | table:t2, range:[-inf,+inf], keep order:false, stats:pseudo        |
+-------------------------+----------+------+--------------------------------------------------------------------+
7 rows in set (0.00 sec)

If the filter on both sides join keys are merged together, the result could be:

TiDB(localhost:4000) > desc select * from t1 join t2 on t1.a=t2.a and t1.a in (12, 13) and t1.a in (14, 15) and t2.a in (14, 15) and t2.a in (12, 13);
+-------------------------+----------+------+--------------------------------------------------------------------+
| id                      | count    | task | operator info                                                      |
+-------------------------+----------+------+--------------------------------------------------------------------+
| HashLeftJoin_6          | 0.00     | root | inner join, inner:TableReader_13, equal:[eq(test.t1.a, test.t2.a)] |
| ├─TableReader_10        | 0.00     | root | data:Selection_9                                                   |
| │ └─Selection_9         | 0.00     | cop  | in(test.t1.a, 12, 13), in(test.t1.a, 14, 15)                       |
| │   └─TableScan_8       | 10000.00 | cop  | table:t1, range:[-inf,+inf], keep order:false, stats:pseudo        |
| └─TableReader_13        | 0.00     | root | data:Selection_12                                                  |
|   └─Selection_12        | 0.00     | cop  | in(test.t2.a, 14, 15), in(test.t2.a, 12, 13)                       |
|     └─TableScan_11      | 10000.00 | cop  | table:t2, range:[-inf,+inf], keep order:false, stats:pseudo        |
+-------------------------+----------+------+--------------------------------------------------------------------+
7 rows in set (0.00 sec)

The latter one filters out all the records of left and right side of the join operator, which could be more faster than the former one, and reduces the resource consumption of the query execution.

An example for Outer Join:

TiDB(localhost:4000) > desc select * from t1 left join t2 on t1.a=t2.a where t1.a in (12, 13);
+-------------------------+----------+------+-------------------------------------------------------------------------+
| id                      | count    | task | operator info                                                           |
+-------------------------+----------+------+-------------------------------------------------------------------------+
| HashLeftJoin_7          | 25.00    | root | left outer join, inner:TableReader_12, equal:[eq(test.t1.a, test.t2.a)] |
| ├─TableReader_10        | 20.00    | root | data:Selection_9                                                        |
| │ └─Selection_9         | 20.00    | cop  | in(test.t1.a, 12, 13)                                                   |
| │   └─TableScan_8       | 10000.00 | cop  | table:t1, range:[-inf,+inf], keep order:false, stats:pseudo             |
| └─TableReader_12        | 10000.00 | root | data:TableScan_11                                                       |
|   └─TableScan_11        | 10000.00 | cop  | table:t2, range:[-inf,+inf], keep order:false, stats:pseudo             |
+-------------------------+----------+------+-------------------------------------------------------------------------+
6 rows in set (0.00 sec)

NOTE: in this case, t1.a in (12, 13) works on the result of the outer join and we have pushed it down to the outer table.

But we can further push this filter down to the inner table, since only the the records satisfy t2.a in (12, 13) could make join predicate t1.a = t2.a be positive in the join operator. So we can optimize this query to:

select * from t1 left join t2 on t1.a=t2.a and t2.a in (12, 13) where t1.a in (12, 13);

And the join predicate t2.a in (12, 13) can be pushed down:

TiDB(localhost:4000) > desc select * from t1 left join t2 on t1.a=t2.a and t2.a in (12, 13) where t1.a in (12, 13);
+-------------------------+-------+------+-------------------------------------------------------------------------+
| id                      | count | task | operator info                                                           |
+-------------------------+-------+------+-------------------------------------------------------------------------+
| HashLeftJoin_7          | 0.00  | root | left outer join, inner:TableReader_13, equal:[eq(test.t1.a, test.t2.a)] |
| ├─TableReader_10        | 0.00  | root | data:Selection_9                                                        |
| │ └─Selection_9         | 0.00  | cop  | in(test.t1.a, 12, 13)                                                   |
| │   └─TableScan_8       | 2.00  | cop  | table:t1, range:[-inf,+inf], keep order:false, stats:pseudo             |
| └─TableReader_13        | 0.00  | root | data:Selection_12                                                       |
|   └─Selection_12        | 0.00  | cop  | in(test.t2.a, 12, 13)                                                   |
|     └─TableScan_11      | 2.00  | cop  | table:t2, range:[-inf,+inf], keep order:false, stats:pseudo             |
+-------------------------+-------+------+-------------------------------------------------------------------------+
7 rows in set (0.00 sec)
@zz-jason zz-jason added type/enhancement The issue or PR belongs to an enhancement. sig/planner SIG: Planner labels Jul 18, 2018
@zz-jason zz-jason self-assigned this Jul 18, 2018
@shenli
Copy link
Member

shenli commented Jul 18, 2018

Is this kind of constant propagation?

@zz-jason
Copy link
Member Author

@shenli Probably 😂
I think maybe we could do this in the process of predicate pushdown.

@bb7133
Copy link
Member

bb7133 commented Jul 21, 2018

Hi @zz-jason I will work on this optimization

@zz-jason
Copy link
Member Author

@bb7133 Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
sig/planner SIG: Planner type/enhancement The issue or PR belongs to an enhancement.
Projects
None yet
Development

No branches or pull requests

3 participants