executor, util: vectorize hash calculation during probing #12669

sduzh · 2019-10-13T06:48:06Z

What problem does this PR solve?

Fix #12048

What is changed and how it works?

name                                                                       old time/op  new time/op  delta
HashJoinExec/(rows:100000,_concurency:4,_joinKeyIdx:_[0_1],_disk:false)-8   716ms ± 1%   717ms ± 3%    ~     (p=1.000 n=16+20)
HashJoinExec/(rows:100000,_concurency:4,_joinKeyIdx:_[0],_disk:false)-8     102ms ± 2%   100ms ± 1%  -1.16%  (p=0.000 n=20+18)
HashJoinExec/(rows:100000,_concurency:4,_joinKeyIdx:_[0],_disk:true)-8      708ms ±13%   700ms ±11%    ~     (p=0.640 n=20+20)
HashJoinExec/(rows:1000,_concurency:4,_joinKeyIdx:_[0],_disk:true)-8       17.8ms ±20%  17.9ms ±15%    ~     (p=0.968 n=20+20)

Check List

Tests

Unit test

Code changes

Has exported function/method change

Side effects

Possible performance regression
Increased code complexity

Related changes

Release note

Write release note for bug-fix or new feature.

codecov · 2019-10-13T06:52:51Z

Codecov Report

Merging #12669 into master will not change coverage.
The diff coverage is n/a.

@@             Coverage Diff             @@
##             master     #12669   +/-   ##
===========================================
  Coverage   80.4487%   80.4487%           
===========================================
  Files           468        468           
  Lines        112898     112898           
===========================================
  Hits          90825      90825           
  Misses        15167      15167           
  Partials       6906       6906

qw4990

Thanks for your contribution!
I wonder why there is no performance gain.
How many rows in the outer table in your test?
PTAL @sduzh

SunRunAway · 2019-10-17T10:17:30Z

executor/join.go

+
+	hCtx.initHash(outerChk.NumRows())
+	for _, i := range hCtx.keyColIdx {
+		err = codec.HashChunkSelected(e.rowContainer.sc, hCtx.hashVals, outerChk, hCtx.allTypes[i], i, hCtx.buf, hCtx.hasNull, selected)


Suggested change

err = codec.HashChunkSelected(e.rowContainer.sc, hCtx.hashVals, outerChk, hCtx.allTypes[i], i, hCtx.buf, hCtx.hasNull, selected)

err = codec.HashChunkSelected(e.ctx, hCtx.hashVals, outerChk, hCtx.allTypes[i], i, hCtx.buf, hCtx.hasNull, selected)

The implementation of HashChunkColumns relies on HashChunkSelected.
It is inconvenient to pass sessionctx.Context from HashChunkColumns.

SunRunAway · 2019-10-17T10:33:05Z

Hi, @sduzh
Thank you for your PR, your code looks great.

May you help do a little more profiling?

Use go test -run=XX -bench="BenchmarkHashJoinExec" -benchmem -count 5 -memprofile memprofile.out -cpuprofile profile.out to collect profile.out

go tool pprof -http=":8080" profile.out to analyze through flame graph, or use go tool pprof profile.out and list command in CLI to analyze the cumulative cost of hash calculation during probing. Maybe you would like to change the function name of getJoinKeyFromChkRow for probing to get the cumulative cost during probing in old code.

Also, you may just target this test HashJoinExec/(rows:100000,_concurency:4,_joinKeyIdx:_[0],_disk:false)-8, and temporarily disable the others.

util/codec/codec.go

sduzh · 2019-10-17T10:52:15Z

Hi, @sduzh
Thank you for your PR, your code looks great.

May you help do a little more profiling?

Use go test -run=XX -bench="BenchmarkHashJoinExec" -benchmem -count 5 -memprofile memprofile.out -cpuprofile profile.out to collect profile.out

go tool pprof -http=":8080" profile.out to analyze through flame graph, or use go tool pprof profile.out and list command in CLI to analyze the cumulative cost of hash calculation during probing. Maybe you would like to change the function name of getJoinKeyFromChkRow for probing to get the cumulative cost during probing in old code.

Also, you may just target this test HashJoinExec/(rows:100000,_concurency:4,_joinKeyIdx:_[0],_disk:false)-8, and temporarily disable the others.

Thanks for your advice, I will try it tonight.

sduzh · 2019-10-17T17:42:46Z

@SunRunAway
results on this new branch

➜  executor git:(vec-hash-probe) ✗ go tool pprof profile.out
Type: cpu
Time: Oct 18, 2019 at 1:27am (CST)
Duration: 23.05s, Total samples = 38.50s (167.05%)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof) list join2Chunk
Total: 38.50s
ROUTINE ======================== github.com/pingcap/tidb/executor.(*HashJoinExec).join2Chunk in /Users/zhuming/go/src/github.com/sduzh/tidb/executor/join.go
     130ms     10.07s (flat, cum) 26.16% of Total
         .          .    421:		return false, joinResult
         .          .    422:	}
         .          .    423:
         .          .    424:	hCtx.initHash(outerChk.NumRows())
         .          .    425:	for _, i := range hCtx.keyColIdx {
         .       20ms    426:		err = codec.HashChunkSelected(e.rowContainer.sc, hCtx.hashVals, outerChk, hCtx.allTypes[i], i, hCtx.buf, hCtx.hasNull, selected)
         .          .    427:		if err != nil {
         .          .    428:			joinResult.err = err
         .          .    429:			return false, joinResult
         .          .    430:		}
         .          .    431:	}
         .          .    432:
      20ms       20ms    433:	for i := range selected {
      50ms       50ms    434:		if !selected[i] || hCtx.hasNull[i] { // process unmatched outer rows
         .          .    435:			e.joiners[workerID].onMissMatch(false, outerChk.GetRow(i), joinResult.chk)
         .          .    436:		} else { // process matched outer rows
      30ms       50ms    437:			probeKey, probeRow := hCtx.hashVals[i].Sum64(), outerChk.GetRow(i)
      30ms      9.93s    438:			ok, joinResult = e.joinMatchedOuterRow2Chunk(workerID, probeKey, probeRow, hCtx, joinResult)
         .          .    439:			if !ok {
         .          .    440:				return false, joinResult
         .          .    441:			}
         .          .    442:		}
         .          .    443:		if joinResult.chk.IsFull() {

results on master branch

➜  executor git:(master) ✗ go tool pprof profile.out.old
Type: cpu
Time: Oct 18, 2019 at 1:34am (CST)
Duration: 20.49s, Total samples = 33.69s (164.45%)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof) list join2Chunk
Total: 33.69s
ROUTINE ======================== github.com/pingcap/tidb/executor.(*HashJoinExec).join2Chunk in /Users/zhuming/go/src/github.com/sduzh/tidb/executor/join.go
      30ms      9.85s (flat, cum) 29.24% of Total
         .          .    412:}
         .          .    413:
         .          .    414:func (e *HashJoinExec) join2Chunk(workerID uint, outerChk *chunk.Chunk, hCtx *hashContext, joinResult *hashjoinWorkerResult,
         .          .    415:	selected []bool) (ok bool, _ *hashjoinWorkerResult) {
         .          .    416:	var err error
      10ms       20ms    417:	selected, err = expression.VectorizedFilter(e.ctx, e.outerFilter, chunk.NewIterator4Chunk(outerChk), selected)
         .          .    418:	if err != nil {
         .          .    419:		joinResult.err = err
         .          .    420:		return false, joinResult
         .          .    421:	}
         .          .    422:	for i := range selected {
      10ms       10ms    423:		if !selected[i] { // process unmatched outer rows
         .          .    424:			e.joiners[workerID].onMissMatch(false, outerChk.GetRow(i), joinResult.chk)
         .          .    425:		} else { // process matched outer rows
      10ms      9.82s    426:			ok, joinResult = e.joinMatchedOuterRow2Chunk(workerID, outerChk.GetRow(i), hCtx, joinResult)
         .          .    427:			if !ok {
         .          .    428:				return false, joinResult
         .          .    429:			}
         .          .    430:		}
         .          .    431:		if joinResult.chk.IsFull() {
(pprof) list getJoinKeyFromChkRow
Total: 33.69s
ROUTINE ======================== github.com/pingcap/tidb/executor.(*hashRowContainer).getJoinKeyFromChkRowX in /Users/zhuming/go/src/github.com/sduzh/tidb/executor/hash_table.go
      30ms      290ms (flat, cum)  0.86% of Total
         .          .    239:}
         .          .    240:
         .          .    241:// getJoinKeyFromChkRow fetches join keys from row and calculate the hash value.
         .          .    242:func (*hashRowContainer) getJoinKeyFromChkRow(sc *stmtctx.StatementContext, row chunk.Row, hCtx *hashContext) (hasNull bool, key uint64, err error) {
         .          .    243:	for _, i := range hCtx.keyColIdx {
         .       10ms    244:		if row.IsNull(i) {
         .          .    245:			return true, 0, nil
         .          .    246:		}
         .          .    247:	}
      10ms       50ms    248:	hCtx.initHash(1)
      10ms      220ms    249:	err = codec.HashChunkRow(sc, hCtx.hashVals[0], row, hCtx.allTypes, hCtx.keyColIdx, hCtx.buf)
      10ms       10ms    250:	return false, hCtx.hashVals[0].Sum64(), err
         .          .    251:}
         .          .    252:
         .          .    253:// Len returns the length of the records in hashRowContainer.
         .          .    254:func (c hashRowContainer) Len() int {
         .          .    255:	return c.hashTable.Len()
(pprof)

sduzh · 2019-10-21T06:18:40Z

@SunRunAway
results on this new branch

➜  executor git:(vec-hash-probe) ✗ go tool pprof profile.out
Type: cpu
Time: Oct 18, 2019 at 1:27am (CST)
Duration: 23.05s, Total samples = 38.50s (167.05%)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof) list join2Chunk
Total: 38.50s
ROUTINE ======================== github.com/pingcap/tidb/executor.(*HashJoinExec).join2Chunk in /Users/zhuming/go/src/github.com/sduzh/tidb/executor/join.go
     130ms     10.07s (flat, cum) 26.16% of Total
         .          .    421:		return false, joinResult
         .          .    422:	}
         .          .    423:
         .          .    424:	hCtx.initHash(outerChk.NumRows())
         .          .    425:	for _, i := range hCtx.keyColIdx {
         .       20ms    426:		err = codec.HashChunkSelected(e.rowContainer.sc, hCtx.hashVals, outerChk, hCtx.allTypes[i], i, hCtx.buf, hCtx.hasNull, selected)
         .          .    427:		if err != nil {
         .          .    428:			joinResult.err = err
         .          .    429:			return false, joinResult
         .          .    430:		}
         .          .    431:	}
         .          .    432:
      20ms       20ms    433:	for i := range selected {
      50ms       50ms    434:		if !selected[i] || hCtx.hasNull[i] { // process unmatched outer rows
         .          .    435:			e.joiners[workerID].onMissMatch(false, outerChk.GetRow(i), joinResult.chk)
         .          .    436:		} else { // process matched outer rows
      30ms       50ms    437:			probeKey, probeRow := hCtx.hashVals[i].Sum64(), outerChk.GetRow(i)
      30ms      9.93s    438:			ok, joinResult = e.joinMatchedOuterRow2Chunk(workerID, probeKey, probeRow, hCtx, joinResult)
         .          .    439:			if !ok {
         .          .    440:				return false, joinResult
         .          .    441:			}
         .          .    442:		}
         .          .    443:		if joinResult.chk.IsFull() {

results on master branch

➜  executor git:(master) ✗ go tool pprof profile.out.old
Type: cpu
Time: Oct 18, 2019 at 1:34am (CST)
Duration: 20.49s, Total samples = 33.69s (164.45%)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof) list join2Chunk
Total: 33.69s
ROUTINE ======================== github.com/pingcap/tidb/executor.(*HashJoinExec).join2Chunk in /Users/zhuming/go/src/github.com/sduzh/tidb/executor/join.go
      30ms      9.85s (flat, cum) 29.24% of Total
         .          .    412:}
         .          .    413:
         .          .    414:func (e *HashJoinExec) join2Chunk(workerID uint, outerChk *chunk.Chunk, hCtx *hashContext, joinResult *hashjoinWorkerResult,
         .          .    415:	selected []bool) (ok bool, _ *hashjoinWorkerResult) {
         .          .    416:	var err error
      10ms       20ms    417:	selected, err = expression.VectorizedFilter(e.ctx, e.outerFilter, chunk.NewIterator4Chunk(outerChk), selected)
         .          .    418:	if err != nil {
         .          .    419:		joinResult.err = err
         .          .    420:		return false, joinResult
         .          .    421:	}
         .          .    422:	for i := range selected {
      10ms       10ms    423:		if !selected[i] { // process unmatched outer rows
         .          .    424:			e.joiners[workerID].onMissMatch(false, outerChk.GetRow(i), joinResult.chk)
         .          .    425:		} else { // process matched outer rows
      10ms      9.82s    426:			ok, joinResult = e.joinMatchedOuterRow2Chunk(workerID, outerChk.GetRow(i), hCtx, joinResult)
         .          .    427:			if !ok {
         .          .    428:				return false, joinResult
         .          .    429:			}
         .          .    430:		}
         .          .    431:		if joinResult.chk.IsFull() {
(pprof) list getJoinKeyFromChkRow
Total: 33.69s
ROUTINE ======================== github.com/pingcap/tidb/executor.(*hashRowContainer).getJoinKeyFromChkRowX in /Users/zhuming/go/src/github.com/sduzh/tidb/executor/hash_table.go
      30ms      290ms (flat, cum)  0.86% of Total
         .          .    239:}
         .          .    240:
         .          .    241:// getJoinKeyFromChkRow fetches join keys from row and calculate the hash value.
         .          .    242:func (*hashRowContainer) getJoinKeyFromChkRow(sc *stmtctx.StatementContext, row chunk.Row, hCtx *hashContext) (hasNull bool, key uint64, err error) {
         .          .    243:	for _, i := range hCtx.keyColIdx {
         .       10ms    244:		if row.IsNull(i) {
         .          .    245:			return true, 0, nil
         .          .    246:		}
         .          .    247:	}
      10ms       50ms    248:	hCtx.initHash(1)
      10ms      220ms    249:	err = codec.HashChunkRow(sc, hCtx.hashVals[0], row, hCtx.allTypes, hCtx.keyColIdx, hCtx.buf)
      10ms       10ms    250:	return false, hCtx.hashVals[0].Sum64(), err
         .          .    251:}
         .          .    252:
         .          .    253:// Len returns the length of the records in hashRowContainer.
         .          .    254:func (c hashRowContainer) Len() int {
         .          .    255:	return c.hashTable.Len()
(pprof)

It looks like the cost of hash calculation takes only a small fraction of the total cost of the hash probing phase, less than 1% in both branches.

SunRunAway · 2019-10-23T09:19:47Z

Sorry for the late reply, I'm really busy in last week.

And thanks for your work, the result and conclusion you post is really helpful.
I think it's the problem of the test case.
The test case must join two tables which both have wide columns (see

tidb/executor/benchmark_test.go

Line 542 in cc991d9

{Index: 1, RetType: types.NewFieldType(mysql.TypeVarString)},

The column of index 1 is a string type, which the size of it is 5K).
Could you try to design a new test case?

sre-bot · 2019-10-23T10:47:59Z

Benchmark Report

Run Sysbench Performance Test on VMs

@@                               Benchmark Diff                               @@
================================================================================
--- tidb: 5d5497bfeb4172ac97016ea632f0c36b1a6d8457
+++ tidb: b5187fd172df965f11c33d672e6e513dbad0fe39
tikv: 57aa7ba9edeb4132ba8bcef8ec916952ad343b0f
pd: 92a4c01b346e5190f4f884c4fe01af913a343a6c
================================================================================
oltp_update_non_index:
    * QPS: 4863.92 ± 0.24% (std=7.83) delta: -0.11% (p=0.373)
    * Latency p50: 26.31 ± 0.24% (std=0.04) delta: 0.11%
    * Latency p99: 42.64 ± 5.32% (std=1.62) delta: 3.72%
            
oltp_insert:
    * QPS: 3804.40 ± 0.89% (std=27.93) delta: -0.87% (p=0.171)
    * Latency p50: 33.64 ± 0.88% (std=0.25) delta: 0.88%
    * Latency p99: 78.13 ± 1.20% (std=0.66) delta: 0.30%
            
oltp_read_write:
    * QPS: 16263.35 ± 0.32% (std=42.76) delta: 0.04% (p=0.836)
    * Latency p50: 157.78 ± 0.35% (std=0.37) delta: 0.03%
    * Latency p99: 313.07 ± 2.27% (std=4.69) delta: -2.23%
            
oltp_update_index:
    * QPS: 4369.31 ± 0.23% (std=7.19) delta: 0.09% (p=0.807)
    * Latency p50: 29.29 ± 0.22% (std=0.04) delta: -0.11%
    * Latency p99: 55.84 ± 3.64% (std=1.24) delta: 1.23%
            
oltp_point_select:
    * QPS: 37668.72 ± 0.45% (std=129.26) delta: 0.16% (p=0.612)
    * Latency p50: 3.40 ± 0.44% (std=0.01) delta: -0.15%
    * Latency p99: 10.65 ± 0.00% (std=0.00) delta: 0.00%

sduzh · 2019-10-23T12:37:28Z

Sorry for the late reply, I'm really busy in last week.

And thanks for your work, the result and conclusion you post is really helpful.
I think it's the problem of the test case.
The test case must join two tables which both have wide columns (see

tidb/executor/benchmark_test.go

Line 542 in cc991d9

{Index: 1, RetType: types.NewFieldType(mysql.TypeVarString)},

The column of index 1 is a string type, which the size of it is 5K).
Could you try to design a new test case?

No problem.

sre-bot · 2019-10-23T14:01:32Z

Benchmark Report

Run Sysbench Performance Test on VMs

@@                               Benchmark Diff                               @@
================================================================================
--- tidb: c6d284e1de82635e2822563ee59290ce2ccc32fc
+++ tidb: b5187fd172df965f11c33d672e6e513dbad0fe39
tikv: 57aa7ba9edeb4132ba8bcef8ec916952ad343b0f
pd: 92a4c01b346e5190f4f884c4fe01af913a343a6c
================================================================================
oltp_update_non_index:
    * QPS: 4866.90 ± 0.23% (std=7.34) delta: -0.03% (p=0.729)
    * Latency p50: 26.30 ± 0.22% (std=0.04) delta: 0.04%
    * Latency p99: 41.68 ± 4.11% (std=1.12) delta: 1.41%
            
oltp_insert:
    * QPS: 4679.99 ± 0.31% (std=10.46) delta: 0.23% (p=0.649)
    * Latency p50: 27.34 ± 0.32% (std=0.06) delta: -0.25%
    * Latency p99: 51.29 ± 4.99% (std=2.11) delta: 2.68%
            
oltp_read_write:
    * QPS: 15129.68 ± 0.15% (std=15.67) delta: -0.27% (p=0.919)
    * Latency p50: 169.56 ± 0.12% (std=0.15) delta: 0.31%
    * Latency p99: 318.26 ± 1.20% (std=2.70) delta: 0.00%
            
oltp_update_index:
    * QPS: 4373.52 ± 0.20% (std=6.79) delta: -0.01% (p=0.410)
    * Latency p50: 29.26 ± 0.19% (std=0.05) delta: -0.03%
    * Latency p99: 54.84 ± 3.56% (std=1.20) delta: 0.02%
            
oltp_point_select:
    * QPS: 43412.50 ± 0.79% (std=207.43) delta: -0.16% (p=0.726)
    * Latency p50: 2.95 ± 0.76% (std=0.01) delta: 0.08%
    * Latency p99: 9.50 ± 1.19% (std=0.08) delta: -1.03%

XuHuaiyu · 2019-10-29T02:32:10Z

Friendly ping, is this PR still work in process? @sduzh

sduzh · 2019-10-29T17:28:30Z

Sorry for the late reply.
As @SunRunAway mentioned above, the original test case must join the wide string columns, which result in a lot of string copies.
So I designed a new test case, as suggested by @SunRunAway, replace the string columns with double columns, and the result is much better:

➜  executor git:(vec-hash-probe) ✗ benchstat old.txt new.txt
name                                                                                                        old time/op  new time/op  delta
HashJoinExec/(rows:100000,_cols:[[bigint(20)_double],_concurency:4,_joinKeyIdx:_[0_1],_disk:false)-8         28.7ms ± 1%  23.5ms ± 1%  -18.10%  (p=0.008 n=5+5)
HashJoinExec/(rows:100000,_cols:[bigint(20)_double],_concurency:4,_joinKeyIdx:_[0],_disk:false)-8           26.2ms ± 3%  22.8ms ± 4%  -12.77%  (p=0.008 n=5+5)

@XuHuaiyu

SunRunAway

LGTM

SunRunAway · 2019-10-30T05:31:55Z

/run-all-tests

SunRunAway · 2019-10-30T05:33:15Z

Sorry for the late reply.
As @SunRunAway mentioned above, the original test case must join the wide string columns, which result in a lot of string copies.
So I designed a new test case, as suggested by @SunRunAway, replace the string columns with double columns, and the result is much better:
➜  executor git:(vec-hash-probe) ✗ benchstat old.txt new.txt
name                                                                                                        old time/op  new time/op  delta
HashJoinExec/(rows:100000,_cols:[[bigint(20)_double],_concurency:4,_joinKeyIdx:_[0_1],_disk:false)-8         28.7ms ± 1%  23.5ms ± 1%  -18.10%  (p=0.008 n=5+5)
HashJoinExec/(rows:100000,_cols:[bigint(20)_double],_concurency:4,_joinKeyIdx:_[0],_disk:false)-8           26.2ms ± 3%  22.8ms ± 4%  -12.77%  (p=0.008 n=5+5)
@XuHuaiyu

Great job, you can post the newest benchmark result into the PR description.

SunRunAway · 2019-10-30T05:34:07Z

/bench +tpch

sre-bot · 2019-10-30T09:54:52Z

Benchmark Report

Run TPC-H 10G Performance Test on VMs

@@                               Benchmark Diff                               @@
================================================================================
--- tidb: 08d26a3be1b88ccfd277b134f9f1687acab16a11
+++ tidb: f6d0b4df3ed569fdbe3606b8f428a5cac6d7a99c
tikv: a9c108185f79e2fd7dfbfcaad762b9413c99ce8a
pd: 17383d9ffdfd47baec03c6166642e7b137d19840
================================================================================
01.sql duration: 19842.79 ms ± 1.37% (std=272.12) delta: -5.22% (p=0.553)
02.sql duration: 7831.95 ms ± 1.83% (std=143.13) delta: -1.67% (p=0.623)
03.sql duration: 18403.03 ms ± 0.24% (std=43.40) delta: 0.44% (p=0.867)
04.sql duration: 8190.27 ms ± 2.16% (std=177.18) delta: 4.82% (p=0.237)
06.sql duration: 11367.20 ms ± 0.65% (std=74.13) delta: -4.30% (p=0.086)
07.sql duration: 18114.63 ms ± 2.23% (std=403.27) delta: -2.37% (p=0.467)
08.sql duration: 12945.07 ms ± 1.17% (std=151.50) delta: -3.43% (p=0.199)
09.sql duration: 31994.01 ms ± 0.49% (std=155.56) delta: -3.84% (p=0.123)
10.sql duration: 10331.48 ms ± 1.18% (std=122.01) delta: -3.25% (p=0.199)
11.sql duration: 3765.94 ms ± 3.92% (std=147.75) delta: 2.52% (p=0.644)
12.sql duration: 13707.02 ms ± 3.08% (std=422.02) delta: 0.89% (p=0.836)
13.sql duration: 9089.68 ms ± 1.88% (std=170.87) delta: 0.15% (p=0.955)
14.sql duration: 14037.29 ms ± 1.60% (std=224.35) delta: 2.24% (p=0.389)
15.sql duration: 23276.33 ms ± 0.69% (std=161.64) delta: -2.99% (p=0.142)
16.sql duration: 3486.62 ms ± 3.04% (std=106.01) delta: -7.43% (p=0.628)
17.sql duration: 37964.56 ms ± 1.16% (std=439.20) delta: 3.78% (p=0.181)
18.sql duration: 55353.45 ms ± 0.39% (std=216.71) delta: -0.73% (p=0.800)
19.sql duration: 17942.35 ms ± 0.99% (std=177.83) delta: 0.41% (p=0.837)
20.sql duration: 13066.79 ms ± 0.28% (std=37.24) delta: -5.57% (p=0.405)
21.sql duration: 36382.42 ms ± 0.45% (std=163.61) delta: 1.10% (p=0.207)
22.sql duration: 5004.16 ms ± 0.13% (std=6.65) delta: 0.81% (p=0.774)

XuHuaiyu · 2019-10-31T04:38:18Z

04.sql duration: 8190.27 ms ± 2.16% (std=177.18) delta: 4.82% (p=0.237)

Is this expected?

sduzh · 2019-10-31T05:43:29Z

04.sql duration: 8190.27 ms ± 2.16% (std=177.18) delta: 4.82% (p=0.237)

Is this expected?

No

XuHuaiyu

LGTM, please resolve the conflicts

SunRunAway · 2019-11-05T10:02:32Z

/merge

sre-bot · 2019-11-05T10:05:13Z

/run-all-tests

sre-bot · 2019-11-05T10:07:18Z

@sduzh merge failed.

SunRunAway · 2019-11-05T10:16:16Z

/merge

sre-bot · 2019-11-05T10:18:42Z

/run-all-tests

sre-bot · 2019-11-05T10:29:40Z

@sduzh merge failed.

SunRunAway · 2019-11-05T10:33:07Z

/run-unit-test

SunRunAway · 2019-11-05T13:45:50Z

Thank you, @sduzh.

…pingcap#12669)

sre-bot added the contribution This PR is from a community contributor. label Oct 13, 2019

zz-jason added component/expression sig/execution SIG execution and removed component/expression labels Oct 13, 2019

zz-jason requested review from SunRunAway, XuHuaiyu and qw4990 October 13, 2019 09:03

qw4990 reviewed Oct 15, 2019

View reviewed changes

winoros changed the title ~~expression: vectorize hash calculation during probing (#12048)~~ executor, util: vectorize hash calculation during probing (#12048) Oct 15, 2019

SunRunAway changed the title ~~executor, util: vectorize hash calculation during probing (#12048)~~ executor, util: vectorize hash calculation during probing Oct 17, 2019

SunRunAway reviewed Oct 17, 2019

View reviewed changes

util/codec/codec.go Outdated Show resolved Hide resolved

XuHuaiyu removed their request for review October 29, 2019 02:32

SunRunAway reviewed Oct 30, 2019

View reviewed changes

SunRunAway added the status/LGT1 Indicates that a PR has LGTM 1. label Oct 30, 2019

SunRunAway requested review from qw4990 and XuHuaiyu October 30, 2019 05:32

SunRunAway mentioned this pull request Oct 30, 2019

Vectorize hash calculation in hashJoin. #12048

Closed

2 tasks

XuHuaiyu reviewed Nov 1, 2019

View reviewed changes

XuHuaiyu added the status/LGT2 Indicates that a PR has LGTM 2. label Nov 1, 2019

XuHuaiyu removed the request for review from qw4990 November 1, 2019 07:35

expression: vectorize hash calculation during probing (#12048)

59582eb

SunRunAway approved these changes Nov 5, 2019

View reviewed changes

sre-bot added the status/can-merge Indicates a PR has been approved by a committer. label Nov 5, 2019

SunRunAway removed the status/LGT1 Indicates that a PR has LGTM 1. label Nov 5, 2019

Merge branch 'master' into vec-hash-probe

3f89fcb

Merge branch 'master' into vec-hash-probe

628f586

SunRunAway added the status/all tests passed label Nov 5, 2019

Merge branch 'master' into vec-hash-probe

507888f

SunRunAway merged commit b697fac into pingcap:master Nov 5, 2019

XiaTianliang pushed a commit to XiaTianliang/tidb that referenced this pull request Dec 21, 2019

expression: vectorize hash calculation during probing (pingcap#12048) (…

77234a7

…pingcap#12669)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

executor, util: vectorize hash calculation during probing #12669

executor, util: vectorize hash calculation during probing #12669

sduzh commented Oct 13, 2019

codecov bot commented Oct 13, 2019 •

edited

Loading

qw4990 left a comment •

edited

Loading

SunRunAway Oct 17, 2019

sduzh Oct 21, 2019

SunRunAway commented Oct 17, 2019

sduzh commented Oct 17, 2019

sduzh commented Oct 17, 2019

sduzh commented Oct 21, 2019 •

edited

Loading

SunRunAway commented Oct 23, 2019 •

edited

Loading

sre-bot commented Oct 23, 2019

sduzh commented Oct 23, 2019

sre-bot commented Oct 23, 2019

XuHuaiyu commented Oct 29, 2019

sduzh commented Oct 29, 2019

SunRunAway left a comment

SunRunAway commented Oct 30, 2019

SunRunAway commented Oct 30, 2019

SunRunAway commented Oct 30, 2019

sre-bot commented Oct 30, 2019

XuHuaiyu commented Oct 31, 2019

sduzh commented Oct 31, 2019

XuHuaiyu left a comment

SunRunAway commented Nov 5, 2019

sre-bot commented Nov 5, 2019

sre-bot commented Nov 5, 2019

SunRunAway commented Nov 5, 2019

sre-bot commented Nov 5, 2019

sre-bot commented Nov 5, 2019

SunRunAway commented Nov 5, 2019

SunRunAway commented Nov 5, 2019

	err = codec.HashChunkSelected(e.rowContainer.sc, hCtx.hashVals, outerChk, hCtx.allTypes[i], i, hCtx.buf, hCtx.hasNull, selected)
	err = codec.HashChunkSelected(e.ctx, hCtx.hashVals, outerChk, hCtx.allTypes[i], i, hCtx.buf, hCtx.hasNull, selected)

executor, util: vectorize hash calculation during probing #12669

executor, util: vectorize hash calculation during probing #12669

Conversation

sduzh commented Oct 13, 2019

What problem does this PR solve?

What is changed and how it works?

Check List

codecov bot commented Oct 13, 2019 • edited Loading

Codecov Report

qw4990 left a comment • edited Loading

Choose a reason for hiding this comment

SunRunAway Oct 17, 2019

Choose a reason for hiding this comment

sduzh Oct 21, 2019

Choose a reason for hiding this comment

SunRunAway commented Oct 17, 2019

sduzh commented Oct 17, 2019

sduzh commented Oct 17, 2019

sduzh commented Oct 21, 2019 • edited Loading

SunRunAway commented Oct 23, 2019 • edited Loading

sre-bot commented Oct 23, 2019

Benchmark Report

sduzh commented Oct 23, 2019

sre-bot commented Oct 23, 2019

Benchmark Report

XuHuaiyu commented Oct 29, 2019

sduzh commented Oct 29, 2019

SunRunAway left a comment

Choose a reason for hiding this comment

SunRunAway commented Oct 30, 2019

SunRunAway commented Oct 30, 2019

SunRunAway commented Oct 30, 2019

sre-bot commented Oct 30, 2019

Benchmark Report

XuHuaiyu commented Oct 31, 2019

sduzh commented Oct 31, 2019

XuHuaiyu left a comment

Choose a reason for hiding this comment

SunRunAway commented Nov 5, 2019

sre-bot commented Nov 5, 2019

sre-bot commented Nov 5, 2019

SunRunAway commented Nov 5, 2019

sre-bot commented Nov 5, 2019

sre-bot commented Nov 5, 2019

SunRunAway commented Nov 5, 2019

SunRunAway commented Nov 5, 2019

codecov bot commented Oct 13, 2019 •

edited

Loading

qw4990 left a comment •

edited

Loading

sduzh commented Oct 21, 2019 •

edited

Loading

SunRunAway commented Oct 23, 2019 •

edited

Loading