executor: vectorize hash calculation in hashJoin (#12048) #12076

sduzh · 2019-09-08T10:07:54Z

What problem does this PR solve?

Fix issue #12048

What is changed and how it works?

benchstat result

name                                                                    old time/op    new time/op    delta
HashJoinExec/(rows:100000,_concurency:4,_joinKeyIdx:_[0_1])-4              810ms ± 1%     806ms ± 1%   -0.54%  (p=0.021 n=8+8)
HashJoinExec/(rows:100000,_concurency:4,_joinKeyIdx:_[0])-4                145ms ±16%     135ms ± 3%   -7.23%  (p=0.001 n=10+8)
BuildHashTableForList/(rows:100000,_concurency:4,_joinKeyIdx:_[0_1])-4     594ms ± 1%     587ms ± 0%   -1.18%  (p=0.000 n=10+10)
BuildHashTableForList/(rows:100000,_concurency:4,_joinKeyIdx:_[0])-4      13.3ms ± 1%    10.8ms ± 0%  -18.85%  (p=0.000 n=10+9)
BuildHashTableForList/(rows:10,_concurency:4,_joinKeyIdx:_[0])-4          4.17µs ± 4%    4.23µs ± 3%     ~     (p=0.287 n=9+8)

name                                                                    old alloc/op   new alloc/op   delta
HashJoinExec/(rows:100000,_concurency:4,_joinKeyIdx:_[0_1])-4              304MB ± 0%     304MB ± 0%   +0.01%  (p=0.000 n=9+10)
HashJoinExec/(rows:100000,_concurency:4,_joinKeyIdx:_[0])-4                304MB ± 0%     304MB ± 0%   +0.01%  (p=0.000 n=10+10)
BuildHashTableForList/(rows:100000,_concurency:4,_joinKeyIdx:_[0_1])-4    7.11MB ± 0%    7.13MB ± 0%   +0.35%  (p=0.000 n=10+10)
BuildHashTableForList/(rows:100000,_concurency:4,_joinKeyIdx:_[0])-4      7.11MB ± 0%    7.14MB ± 0%   +0.36%  (p=0.000 n=10+10)
BuildHashTableForList/(rows:10,_concurency:4,_joinKeyIdx:_[0])-4          2.00kB ± 0%    2.28kB ± 0%  +13.57%  (p=0.000 n=10+10)

name                                                                    old allocs/op  new allocs/op  delta
HashJoinExec/(rows:100000,_concurency:4,_joinKeyIdx:_[0_1])-4               306k ± 0%      307k ± 0%   +0.35%  (p=0.000 n=10+9)
HashJoinExec/(rows:100000,_concurency:4,_joinKeyIdx:_[0])-4                 306k ± 0%      307k ± 0%   +0.34%  (p=0.000 n=10+10)
BuildHashTableForList/(rows:100000,_concurency:4,_joinKeyIdx:_[0_1])-4     3.79k ± 1%     4.80k ± 2%  +26.63%  (p=0.000 n=8+10)
BuildHashTableForList/(rows:100000,_concurency:4,_joinKeyIdx:_[0])-4       3.79k ± 0%     4.82k ± 0%  +27.04%  (p=0.000 n=10+9)
BuildHashTableForList/(rows:10,_concurency:4,_joinKeyIdx:_[0])-4            16.0 ± 0%      27.0 ± 0%  +68.75%  (p=0.000 n=10+10)

name                                                                    old time/op    new time/op    delta
HashJoinExec/(rows:100000,_concurency:4,_joinKeyIdx:_[0_1])-4              817ms ± 2%     806ms ± 1%   -1.34%  (p=0.011 n=9+8)
HashJoinExec/(rows:100000,_concurency:4,_joinKeyIdx:_[0])-4                145ms ± 7%     135ms ± 3%   -6.92%  (p=0.000 n=10+8)
BuildHashTableForList/(rows:100000,_concurency:4,_joinKeyIdx:_[0_1])-4     592ms ± 0%     587ms ± 0%   -0.96%  (p=0.000 n=9+10)
BuildHashTableForList/(rows:100000,_concurency:4,_joinKeyIdx:_[0])-4      13.4ms ± 2%    10.8ms ± 0%  -19.19%  (p=0.000 n=9+9)
BuildHashTableForList/(rows:10,_concurency:4,_joinKeyIdx:_[0])-4          4.16µs ± 5%    4.23µs ± 3%     ~     (p=0.173 n=10+8)

name                                                                    old alloc/op   new alloc/op   delta
HashJoinExec/(rows:100000,_concurency:4,_joinKeyIdx:_[0_1])-4              304MB ± 0%     304MB ± 0%   +0.01%  (p=0.000 n=10+10)
HashJoinExec/(rows:100000,_concurency:4,_joinKeyIdx:_[0])-4                304MB ± 0%     304MB ± 0%   +0.01%  (p=0.000 n=10+10)
BuildHashTableForList/(rows:100000,_concurency:4,_joinKeyIdx:_[0_1])-4    7.11MB ± 0%    7.13MB ± 0%   +0.34%  (p=0.000 n=10+10)
BuildHashTableForList/(rows:100000,_concurency:4,_joinKeyIdx:_[0])-4      7.11MB ± 0%    7.14MB ± 0%   +0.37%  (p=0.000 n=10+10)
BuildHashTableForList/(rows:10,_concurency:4,_joinKeyIdx:_[0])-4          2.00kB ± 0%    2.28kB ± 0%  +13.57%  (p=0.000 n=10+10)

name                                                                    old allocs/op  new allocs/op  delta
HashJoinExec/(rows:100000,_concurency:4,_joinKeyIdx:_[0_1])-4               306k ± 0%      307k ± 0%   +0.35%  (p=0.000 n=10+9)
HashJoinExec/(rows:100000,_concurency:4,_joinKeyIdx:_[0])-4                 306k ± 0%      307k ± 0%   +0.34%  (p=0.000 n=10+10)
BuildHashTableForList/(rows:100000,_concurency:4,_joinKeyIdx:_[0_1])-4     3.78k ± 2%     4.80k ± 2%  +26.84%  (p=0.000 n=10+10)
BuildHashTableForList/(rows:100000,_concurency:4,_joinKeyIdx:_[0])-4       3.79k ± 0%     4.82k ± 0%  +27.10%  (p=0.000 n=10+9)
BuildHashTableForList/(rows:10,_concurency:4,_joinKeyIdx:_[0])-4            16.0 ± 0%      27.0 ± 0%  +68.75%  (p=0.000 n=10+10)

name                                                                    old time/op    new time/op    delta
HashJoinExec/(rows:100000,_concurency:4,_joinKeyIdx:_[0_1])-4              810ms ± 1%     811ms ± 1%     ~     (p=0.541 n=8+9)
HashJoinExec/(rows:100000,_concurency:4,_joinKeyIdx:_[0])-4                145ms ±16%     143ms ±14%     ~     (p=0.739 n=10+10)
BuildHashTableForList/(rows:100000,_concurency:4,_joinKeyIdx:_[0_1])-4     594ms ± 1%     588ms ± 0%   -1.01%  (p=0.000 n=10+9)
BuildHashTableForList/(rows:100000,_concurency:4,_joinKeyIdx:_[0])-4      13.3ms ± 1%    10.9ms ± 2%  -18.17%  (p=0.000 n=10+10)
BuildHashTableForList/(rows:10,_concurency:4,_joinKeyIdx:_[0])-4          4.17µs ± 4%    4.44µs ±10%   +6.52%  (p=0.004 n=9+10)

name                                                                    old alloc/op   new alloc/op   delta
HashJoinExec/(rows:100000,_concurency:4,_joinKeyIdx:_[0_1])-4              304MB ± 0%     304MB ± 0%   +0.01%  (p=0.000 n=9+10)
HashJoinExec/(rows:100000,_concurency:4,_joinKeyIdx:_[0])-4                304MB ± 0%     304MB ± 0%   +0.01%  (p=0.000 n=10+10)
BuildHashTableForList/(rows:100000,_concurency:4,_joinKeyIdx:_[0_1])-4    7.11MB ± 0%    7.14MB ± 0%   +0.41%  (p=0.000 n=10+10)
BuildHashTableForList/(rows:100000,_concurency:4,_joinKeyIdx:_[0])-4      7.11MB ± 0%    7.14MB ± 0%   +0.35%  (p=0.000 n=10+10)
BuildHashTableForList/(rows:10,_concurency:4,_joinKeyIdx:_[0])-4          2.00kB ± 0%    2.28kB ± 0%  +13.57%  (p=0.000 n=10+10)

name                                                                    old allocs/op  new allocs/op  delta
HashJoinExec/(rows:100000,_concurency:4,_joinKeyIdx:_[0_1])-4               306k ± 0%      307k ± 0%   +0.34%  (p=0.000 n=10+10)
HashJoinExec/(rows:100000,_concurency:4,_joinKeyIdx:_[0])-4                 306k ± 0%      307k ± 0%   +0.34%  (p=0.000 n=10+9)
BuildHashTableForList/(rows:100000,_concurency:4,_joinKeyIdx:_[0_1])-4     3.79k ± 1%     4.82k ± 2%  +27.07%  (p=0.000 n=8+10)
BuildHashTableForList/(rows:100000,_concurency:4,_joinKeyIdx:_[0])-4       3.79k ± 0%     4.81k ± 0%  +26.95%  (p=0.000 n=10+10)
BuildHashTableForList/(rows:10,_concurency:4,_joinKeyIdx:_[0])-4            16.0 ± 0%      27.0 ± 0%  +68.75%  (p=0.000 n=10+10)

Check List

Tests

Unit test
Integration test

Code changes

Has exported function/method change

Side effects

Possible performance regression

Related changes

Release note

codecov · 2019-09-08T10:13:33Z

Codecov Report

Merging #12076 into master will decrease coverage by 0.0331%.
The diff coverage is 73.1707%.

@@               Coverage Diff                @@
##             master     #12076        +/-   ##
================================================
- Coverage   81.4863%   81.4532%   -0.0332%     
================================================
  Files           449        449                
  Lines         97058      96421       -637     
================================================
- Hits          79089      78538       -551     
+ Misses        12356      12266        -90     
- Partials       5613       5617         +4

shenli · 2019-09-08T13:27:08Z

@sduzh Thanks! Could you show some benchmark results?

SunRunAway

Hi, @sduzh, your pull request looks awesome, I've let some comments in there, PTAL.

In addition, you may provide a reportable benchmark differences.
We have two benchmarks about your pull request, BenchmarkHashJoinExec and BenchmarkBuildHashTableForList.

We could run the benchmarks by using

go test -run=^$ -bench="BenchmarkHashJoinExec|BenchmarkBuildHashTableForList" -test.benchmem -count 10

Then copying and pasting that output to two different files: old.txt and new.txt
and then runing:

benchstat old.txt new.txt

and if you need to, you can find benchstat at https://godoc.org/golang.org/x/perf/cmd/benchstat, then putting your benchstat result into your pull request description.

executor/hash_table.go

util/codec/codec.go

executor/hash_table.go

util/codec/codec.go

SunRunAway

LGTM

SunRunAway · 2019-09-10T04:36:54Z

@Reminiscent @qw4990 @XuHuaiyu PTAL, thanks.

executor/hash_table.go

util/codec/codec.go

qw4990 · 2019-09-10T07:24:40Z

Could you please investigate why there is performance drawback in the second case HashJoinExec/(rows:100000,_concurency:4,_joinKeyIdx:_[0])-4 189ms ±25% 215ms ±56% ~ (p=0.400 n=9+10)? @sduzh

executor/hash_table.go

sduzh · 2019-09-10T09:18:13Z

Could you please investigate why there is performance drawback in the second case HashJoinExec/(rows:100000,_concurency:4,_joinKeyIdx:_[0])-4 189ms ±25% 215ms ±56% ~ (p=0.400 n=9+10)? @sduzh

I ran the benchmark again and could not reproduce the result.
You can check the latest benchmark result from the pull comment.

XuHuaiyu · 2019-09-10T09:53:10Z

LGTM

util/codec/codec.go

zz-jason · 2019-09-11T07:27:29Z

util/codec/codec.go

+	rows := chk.NumRows()
+	switch tp.Tp {
+	case mysql.TypeTiny, mysql.TypeShort, mysql.TypeInt24, mysql.TypeLong, mysql.TypeLonglong, mysql.TypeYear:
+		i64s := column.Int64s()


how about:

for i := 0; i < rows; i++ { if column.IsNull(i) { h[i].Write(NilFlag) continue } h[i].Write(column.GetRaw(i)) }

Do you mean no need to write the flag byte?

Ignore the flag byte will generate a different hash value from that generated by HashChunkRow

Okay. Maybe we can optimize this in another PR:

optimize the functions we used to calculate the hash value.

vectorize the way to calculate hash values for the outer table when performs a hash join.

zz-jason

LGTM

sre-bot · 2019-09-11T08:02:18Z

/run-all-tests

sre-bot added the contribution This PR is from a community contributor. label Sep 8, 2019

francis0407 added sig/execution SIG execution type/enhancement The issue or PR belongs to an enhancement. labels Sep 8, 2019

SunRunAway self-requested a review September 8, 2019 15:23

XuHuaiyu self-requested a review September 9, 2019 01:46

qw4990 requested review from qw4990 and Reminiscent September 9, 2019 02:06

SunRunAway reviewed Sep 9, 2019

View reviewed changes

util/codec/codec.go Outdated Show resolved Hide resolved

executor/hash_table.go Outdated Show resolved Hide resolved

qw4990 reviewed Sep 9, 2019

View reviewed changes

executor/hash_table.go Outdated Show resolved Hide resolved

qw4990 reviewed Sep 9, 2019

View reviewed changes

util/codec/codec.go Outdated Show resolved Hide resolved

sduzh commented Sep 9, 2019

View reviewed changes

util/codec/codec.go Outdated Show resolved Hide resolved

SunRunAway reviewed Sep 9, 2019

View reviewed changes

SunRunAway reviewed Sep 10, 2019

View reviewed changes

SunRunAway added the status/LGT1 Indicates that a PR has LGTM 1. label Sep 10, 2019

SunRunAway requested a review from qw4990 September 10, 2019 04:36

qw4990 reviewed Sep 10, 2019

View reviewed changes

executor/hash_table.go Outdated Show resolved Hide resolved

qw4990 reviewed Sep 10, 2019

View reviewed changes

util/codec/codec.go Outdated Show resolved Hide resolved

executor: vectorize hash calculation in hashJoin (#12048)

0156b3e

XuHuaiyu reviewed Sep 10, 2019

View reviewed changes

executor/hash_table.go Show resolved Hide resolved

executor/hash_table.go Show resolved Hide resolved

sduzh closed this Sep 10, 2019

sduzh reopened this Sep 10, 2019

zz-jason reviewed Sep 10, 2019

View reviewed changes

util/codec/codec.go Show resolved Hide resolved

zz-jason reviewed Sep 11, 2019

View reviewed changes

zz-jason approved these changes Sep 11, 2019

View reviewed changes

zz-jason added status/can-merge Indicates a PR has been approved by a committer. status/LGT2 Indicates that a PR has LGTM 2. and removed status/LGT1 Indicates that a PR has LGTM 1. labels Sep 11, 2019

zz-jason removed the request for review from Reminiscent September 11, 2019 08:00

Merge branch 'master' into issue-12048

ab50e50

sre-bot merged commit d29751c into pingcap:master Sep 11, 2019

SunRunAway mentioned this pull request Sep 12, 2019

Vectorize hash calculation in hashJoin. #12048

Closed

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

executor: vectorize hash calculation in hashJoin (#12048) #12076

executor: vectorize hash calculation in hashJoin (#12048) #12076

sduzh commented Sep 8, 2019 •

edited

Loading

codecov bot commented Sep 8, 2019 •

edited

Loading

shenli commented Sep 8, 2019

SunRunAway left a comment

SunRunAway left a comment

SunRunAway commented Sep 10, 2019

qw4990 commented Sep 10, 2019

sduzh commented Sep 10, 2019

XuHuaiyu commented Sep 10, 2019

zz-jason Sep 11, 2019

sduzh Sep 11, 2019

sduzh Sep 11, 2019

zz-jason Sep 11, 2019

zz-jason left a comment

sre-bot commented Sep 11, 2019

executor: vectorize hash calculation in hashJoin (#12048) #12076

executor: vectorize hash calculation in hashJoin (#12048) #12076

Conversation

sduzh commented Sep 8, 2019 • edited Loading

What problem does this PR solve?

What is changed and how it works?

Check List

codecov bot commented Sep 8, 2019 • edited Loading

Codecov Report

shenli commented Sep 8, 2019

SunRunAway left a comment

Choose a reason for hiding this comment

SunRunAway left a comment

Choose a reason for hiding this comment

SunRunAway commented Sep 10, 2019

qw4990 commented Sep 10, 2019

sduzh commented Sep 10, 2019

XuHuaiyu commented Sep 10, 2019

zz-jason Sep 11, 2019

Choose a reason for hiding this comment

sduzh Sep 11, 2019

Choose a reason for hiding this comment

sduzh Sep 11, 2019

Choose a reason for hiding this comment

zz-jason Sep 11, 2019

Choose a reason for hiding this comment

zz-jason left a comment

Choose a reason for hiding this comment

sre-bot commented Sep 11, 2019

sduzh commented Sep 8, 2019 •

edited

Loading

codecov bot commented Sep 8, 2019 •

edited

Loading