Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

executor: vectorize hash calculation in hashJoin (#12048) #12076

Merged
merged 2 commits into from
Sep 11, 2019
Merged

executor: vectorize hash calculation in hashJoin (#12048) #12076

merged 2 commits into from
Sep 11, 2019

Conversation

sduzh
Copy link
Contributor

@sduzh sduzh commented Sep 8, 2019

What problem does this PR solve?

Fix issue #12048

What is changed and how it works?

benchstat result

name                                                                    old time/op    new time/op    delta
HashJoinExec/(rows:100000,_concurency:4,_joinKeyIdx:_[0_1])-4              810ms ± 1%     806ms ± 1%   -0.54%  (p=0.021 n=8+8)
HashJoinExec/(rows:100000,_concurency:4,_joinKeyIdx:_[0])-4                145ms ±16%     135ms ± 3%   -7.23%  (p=0.001 n=10+8)
BuildHashTableForList/(rows:100000,_concurency:4,_joinKeyIdx:_[0_1])-4     594ms ± 1%     587ms ± 0%   -1.18%  (p=0.000 n=10+10)
BuildHashTableForList/(rows:100000,_concurency:4,_joinKeyIdx:_[0])-4      13.3ms ± 1%    10.8ms ± 0%  -18.85%  (p=0.000 n=10+9)
BuildHashTableForList/(rows:10,_concurency:4,_joinKeyIdx:_[0])-4          4.17µs ± 4%    4.23µs ± 3%     ~     (p=0.287 n=9+8)

name                                                                    old alloc/op   new alloc/op   delta
HashJoinExec/(rows:100000,_concurency:4,_joinKeyIdx:_[0_1])-4              304MB ± 0%     304MB ± 0%   +0.01%  (p=0.000 n=9+10)
HashJoinExec/(rows:100000,_concurency:4,_joinKeyIdx:_[0])-4                304MB ± 0%     304MB ± 0%   +0.01%  (p=0.000 n=10+10)
BuildHashTableForList/(rows:100000,_concurency:4,_joinKeyIdx:_[0_1])-4    7.11MB ± 0%    7.13MB ± 0%   +0.35%  (p=0.000 n=10+10)
BuildHashTableForList/(rows:100000,_concurency:4,_joinKeyIdx:_[0])-4      7.11MB ± 0%    7.14MB ± 0%   +0.36%  (p=0.000 n=10+10)
BuildHashTableForList/(rows:10,_concurency:4,_joinKeyIdx:_[0])-4          2.00kB ± 0%    2.28kB ± 0%  +13.57%  (p=0.000 n=10+10)

name                                                                    old allocs/op  new allocs/op  delta
HashJoinExec/(rows:100000,_concurency:4,_joinKeyIdx:_[0_1])-4               306k ± 0%      307k ± 0%   +0.35%  (p=0.000 n=10+9)
HashJoinExec/(rows:100000,_concurency:4,_joinKeyIdx:_[0])-4                 306k ± 0%      307k ± 0%   +0.34%  (p=0.000 n=10+10)
BuildHashTableForList/(rows:100000,_concurency:4,_joinKeyIdx:_[0_1])-4     3.79k ± 1%     4.80k ± 2%  +26.63%  (p=0.000 n=8+10)
BuildHashTableForList/(rows:100000,_concurency:4,_joinKeyIdx:_[0])-4       3.79k ± 0%     4.82k ± 0%  +27.04%  (p=0.000 n=10+9)
BuildHashTableForList/(rows:10,_concurency:4,_joinKeyIdx:_[0])-4            16.0 ± 0%      27.0 ± 0%  +68.75%  (p=0.000 n=10+10)
name                                                                    old time/op    new time/op    delta
HashJoinExec/(rows:100000,_concurency:4,_joinKeyIdx:_[0_1])-4              817ms ± 2%     806ms ± 1%   -1.34%  (p=0.011 n=9+8)
HashJoinExec/(rows:100000,_concurency:4,_joinKeyIdx:_[0])-4                145ms ± 7%     135ms ± 3%   -6.92%  (p=0.000 n=10+8)
BuildHashTableForList/(rows:100000,_concurency:4,_joinKeyIdx:_[0_1])-4     592ms ± 0%     587ms ± 0%   -0.96%  (p=0.000 n=9+10)
BuildHashTableForList/(rows:100000,_concurency:4,_joinKeyIdx:_[0])-4      13.4ms ± 2%    10.8ms ± 0%  -19.19%  (p=0.000 n=9+9)
BuildHashTableForList/(rows:10,_concurency:4,_joinKeyIdx:_[0])-4          4.16µs ± 5%    4.23µs ± 3%     ~     (p=0.173 n=10+8)

name                                                                    old alloc/op   new alloc/op   delta
HashJoinExec/(rows:100000,_concurency:4,_joinKeyIdx:_[0_1])-4              304MB ± 0%     304MB ± 0%   +0.01%  (p=0.000 n=10+10)
HashJoinExec/(rows:100000,_concurency:4,_joinKeyIdx:_[0])-4                304MB ± 0%     304MB ± 0%   +0.01%  (p=0.000 n=10+10)
BuildHashTableForList/(rows:100000,_concurency:4,_joinKeyIdx:_[0_1])-4    7.11MB ± 0%    7.13MB ± 0%   +0.34%  (p=0.000 n=10+10)
BuildHashTableForList/(rows:100000,_concurency:4,_joinKeyIdx:_[0])-4      7.11MB ± 0%    7.14MB ± 0%   +0.37%  (p=0.000 n=10+10)
BuildHashTableForList/(rows:10,_concurency:4,_joinKeyIdx:_[0])-4          2.00kB ± 0%    2.28kB ± 0%  +13.57%  (p=0.000 n=10+10)

name                                                                    old allocs/op  new allocs/op  delta
HashJoinExec/(rows:100000,_concurency:4,_joinKeyIdx:_[0_1])-4               306k ± 0%      307k ± 0%   +0.35%  (p=0.000 n=10+9)
HashJoinExec/(rows:100000,_concurency:4,_joinKeyIdx:_[0])-4                 306k ± 0%      307k ± 0%   +0.34%  (p=0.000 n=10+10)
BuildHashTableForList/(rows:100000,_concurency:4,_joinKeyIdx:_[0_1])-4     3.78k ± 2%     4.80k ± 2%  +26.84%  (p=0.000 n=10+10)
BuildHashTableForList/(rows:100000,_concurency:4,_joinKeyIdx:_[0])-4       3.79k ± 0%     4.82k ± 0%  +27.10%  (p=0.000 n=10+9)
BuildHashTableForList/(rows:10,_concurency:4,_joinKeyIdx:_[0])-4            16.0 ± 0%      27.0 ± 0%  +68.75%  (p=0.000 n=10+10)
name                                                                    old time/op    new time/op    delta
HashJoinExec/(rows:100000,_concurency:4,_joinKeyIdx:_[0_1])-4              810ms ± 1%     811ms ± 1%     ~     (p=0.541 n=8+9)
HashJoinExec/(rows:100000,_concurency:4,_joinKeyIdx:_[0])-4                145ms ±16%     143ms ±14%     ~     (p=0.739 n=10+10)
BuildHashTableForList/(rows:100000,_concurency:4,_joinKeyIdx:_[0_1])-4     594ms ± 1%     588ms ± 0%   -1.01%  (p=0.000 n=10+9)
BuildHashTableForList/(rows:100000,_concurency:4,_joinKeyIdx:_[0])-4      13.3ms ± 1%    10.9ms ± 2%  -18.17%  (p=0.000 n=10+10)
BuildHashTableForList/(rows:10,_concurency:4,_joinKeyIdx:_[0])-4          4.17µs ± 4%    4.44µs ±10%   +6.52%  (p=0.004 n=9+10)

name                                                                    old alloc/op   new alloc/op   delta
HashJoinExec/(rows:100000,_concurency:4,_joinKeyIdx:_[0_1])-4              304MB ± 0%     304MB ± 0%   +0.01%  (p=0.000 n=9+10)
HashJoinExec/(rows:100000,_concurency:4,_joinKeyIdx:_[0])-4                304MB ± 0%     304MB ± 0%   +0.01%  (p=0.000 n=10+10)
BuildHashTableForList/(rows:100000,_concurency:4,_joinKeyIdx:_[0_1])-4    7.11MB ± 0%    7.14MB ± 0%   +0.41%  (p=0.000 n=10+10)
BuildHashTableForList/(rows:100000,_concurency:4,_joinKeyIdx:_[0])-4      7.11MB ± 0%    7.14MB ± 0%   +0.35%  (p=0.000 n=10+10)
BuildHashTableForList/(rows:10,_concurency:4,_joinKeyIdx:_[0])-4          2.00kB ± 0%    2.28kB ± 0%  +13.57%  (p=0.000 n=10+10)

name                                                                    old allocs/op  new allocs/op  delta
HashJoinExec/(rows:100000,_concurency:4,_joinKeyIdx:_[0_1])-4               306k ± 0%      307k ± 0%   +0.34%  (p=0.000 n=10+10)
HashJoinExec/(rows:100000,_concurency:4,_joinKeyIdx:_[0])-4                 306k ± 0%      307k ± 0%   +0.34%  (p=0.000 n=10+9)
BuildHashTableForList/(rows:100000,_concurency:4,_joinKeyIdx:_[0_1])-4     3.79k ± 1%     4.82k ± 2%  +27.07%  (p=0.000 n=8+10)
BuildHashTableForList/(rows:100000,_concurency:4,_joinKeyIdx:_[0])-4       3.79k ± 0%     4.81k ± 0%  +26.95%  (p=0.000 n=10+10)
BuildHashTableForList/(rows:10,_concurency:4,_joinKeyIdx:_[0])-4            16.0 ± 0%      27.0 ± 0%  +68.75%  (p=0.000 n=10+10)

Check List

Tests

  • Unit test
  • Integration test

Code changes

  • Has exported function/method change

Side effects

  • Possible performance regression

Related changes

Release note

@sre-bot sre-bot added the contribution This PR is from a community contributor. label Sep 8, 2019
@codecov
Copy link

codecov bot commented Sep 8, 2019

Codecov Report

Merging #12076 into master will decrease coverage by 0.0331%.
The diff coverage is 73.1707%.

@@               Coverage Diff                @@
##             master     #12076        +/-   ##
================================================
- Coverage   81.4863%   81.4532%   -0.0332%     
================================================
  Files           449        449                
  Lines         97058      96421       -637     
================================================
- Hits          79089      78538       -551     
+ Misses        12356      12266        -90     
- Partials       5613       5617         +4

@francis0407 francis0407 added sig/execution SIG execution type/enhancement The issue or PR belongs to an enhancement. labels Sep 8, 2019
@shenli
Copy link
Member

shenli commented Sep 8, 2019

@sduzh Thanks! Could you show some benchmark results?

Copy link
Contributor

@SunRunAway SunRunAway left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, @sduzh, your pull request looks awesome, I've let some comments in there, PTAL.

In addition, you may provide a reportable benchmark differences.
We have two benchmarks about your pull request, BenchmarkHashJoinExec and BenchmarkBuildHashTableForList.

We could run the benchmarks by using

go test -run=^$ -bench="BenchmarkHashJoinExec|BenchmarkBuildHashTableForList" -test.benchmem -count 10

Then copying and pasting that output to two different files: old.txt and new.txt
and then runing:

benchstat old.txt new.txt

and if you need to, you can find benchstat at https://godoc.org/golang.org/x/perf/cmd/benchstat, then putting your benchstat result into your pull request description.

executor/hash_table.go Outdated Show resolved Hide resolved
util/codec/codec.go Outdated Show resolved Hide resolved
util/codec/codec.go Outdated Show resolved Hide resolved
util/codec/codec.go Outdated Show resolved Hide resolved
util/codec/codec.go Outdated Show resolved Hide resolved
util/codec/codec.go Outdated Show resolved Hide resolved
util/codec/codec.go Outdated Show resolved Hide resolved
executor/hash_table.go Outdated Show resolved Hide resolved
executor/hash_table.go Outdated Show resolved Hide resolved
util/codec/codec.go Outdated Show resolved Hide resolved
util/codec/codec.go Outdated Show resolved Hide resolved
util/codec/codec.go Outdated Show resolved Hide resolved
util/codec/codec.go Outdated Show resolved Hide resolved
util/codec/codec.go Outdated Show resolved Hide resolved
util/codec/codec.go Outdated Show resolved Hide resolved
util/codec/codec.go Outdated Show resolved Hide resolved
util/codec/codec.go Outdated Show resolved Hide resolved
util/codec/codec.go Outdated Show resolved Hide resolved
util/codec/codec.go Outdated Show resolved Hide resolved
util/codec/codec.go Outdated Show resolved Hide resolved
util/codec/codec.go Outdated Show resolved Hide resolved
Copy link
Contributor

@SunRunAway SunRunAway left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@SunRunAway SunRunAway added the status/LGT1 Indicates that a PR has LGTM 1. label Sep 10, 2019
@SunRunAway
Copy link
Contributor

@Reminiscent @qw4990 @XuHuaiyu PTAL, thanks.

executor/hash_table.go Outdated Show resolved Hide resolved
util/codec/codec.go Outdated Show resolved Hide resolved
@qw4990
Copy link
Contributor

qw4990 commented Sep 10, 2019

Could you please investigate why there is performance drawback in the second case HashJoinExec/(rows:100000,_concurency:4,_joinKeyIdx:_[0])-4 189ms ±25% 215ms ±56% ~ (p=0.400 n=9+10)? @sduzh

executor/hash_table.go Show resolved Hide resolved
executor/hash_table.go Show resolved Hide resolved
@sduzh
Copy link
Contributor Author

sduzh commented Sep 10, 2019

Could you please investigate why there is performance drawback in the second case HashJoinExec/(rows:100000,_concurency:4,_joinKeyIdx:_[0])-4 189ms ±25% 215ms ±56% ~ (p=0.400 n=9+10)? @sduzh

I ran the benchmark again and could not reproduce the result.
You can check the latest benchmark result from the pull comment.

@sduzh sduzh closed this Sep 10, 2019
@sduzh sduzh reopened this Sep 10, 2019
@XuHuaiyu
Copy link
Contributor

LGTM

rows := chk.NumRows()
switch tp.Tp {
case mysql.TypeTiny, mysql.TypeShort, mysql.TypeInt24, mysql.TypeLong, mysql.TypeLonglong, mysql.TypeYear:
i64s := column.Int64s()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about:

for i := 0; i < rows; i++ {
    if column.IsNull(i) {
        h[i].Write(NilFlag)
        continue
    }
    h[i].Write(column.GetRaw(i))
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean no need to write the flag byte?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ignore the flag byte will generate a different hash value from that generated by HashChunkRow

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay. Maybe we can optimize this in another PR:

  1. optimize the functions we used to calculate the hash value.
  2. vectorize the way to calculate hash values for the outer table when performs a hash join.

Copy link
Member

@zz-jason zz-jason left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@zz-jason zz-jason added status/can-merge Indicates a PR has been approved by a committer. status/LGT2 Indicates that a PR has LGTM 2. and removed status/LGT1 Indicates that a PR has LGTM 1. labels Sep 11, 2019
@zz-jason zz-jason removed the request for review from Reminiscent September 11, 2019 08:00
@sre-bot
Copy link
Contributor

sre-bot commented Sep 11, 2019

/run-all-tests

@sre-bot sre-bot merged commit d29751c into pingcap:master Sep 11, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
contribution This PR is from a community contributor. sig/execution SIG execution status/can-merge Indicates a PR has been approved by a committer. status/LGT2 Indicates that a PR has LGTM 2. type/enhancement The issue or PR belongs to an enhancement.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants