use a more cache-friendly way to make join rows #7420

zz-jason · 2018-08-16T17:26:00Z

For left outer join, right outer join, and inner joins, we use the function makeJoinRowToChunk to concat the left and right records, take the left outer join as an example:

in file executor/joiner.go:

341     for ; inners.Current() != inners.End() && numToAppend > 0; numToAppend-- {
342         j.makeJoinRowToChunk(chkForJoin, outer, inners.Current())
343         inners.Next()
344     }

This method is far away from efficiency: Chunk has an column-oriented memory layout, data inside the same column are stored continuously in the memory. The relationship between row and column can be expressed like the following picture:

The function makeJoinRowToChunk copies data from the left and right row to the chunk field by field, which can be expressed like the following graph:

As you can see, the cache utilization of this method is not efficient, which may introduce a lot of cache miss and data swap. Usually, the L1 data cache size is 32KB, the chunk size we use is 1K(1024 rows), if the joined table has too many columns, these data to be calculated can easily exceed the L1 data cache.

If we copy the data column by column, the L1 data cache miss ratio can be greatly reduced during the execution, and the performance can also be improved.

The text was updated successfully, but these errors were encountered:

shenli · 2018-08-17T03:49:18Z

Sounds reasonable. Is there any quantitative evaluation of this issue?

zz-jason · 2018-08-17T04:53:20Z

@shenli Here is a simple benchmark, the benchmark result is:

➜ go test -bench Benchmark -count 3 --benchmem
goos: darwin
goarch: amd64
BenchmarkCopyFieldByField-8                20000             61823 ns/op               0 B/op          0 allocs/op
BenchmarkCopyFieldByField-8                20000             64356 ns/op               0 B/op          0 allocs/op
BenchmarkCopyFieldByField-8                20000             65751 ns/op               0 B/op          0 allocs/op
BenchmarkCopyColumnByColumn-8             100000             19014 ns/op               0 B/op          0 allocs/op
BenchmarkCopyColumnByColumn-8             100000             19174 ns/op               0 B/op          0 allocs/op
BenchmarkCopyColumnByColumn-8             100000             18989 ns/op               0 B/op          0 allocs/op
PASS
ok      _/tmp   12.195s

the benchmark code is:

package main

import "testing"

var (
    numRows = 1024
    numCols = 16
)

func genData() [][]int64 {
    columns := make([][]int64, numCols)
    for i := 0; i < numCols; i++ {
        columns[i] = make([]int64, numRows)
    }
    return columns
}

func BenchmarkCopyFieldByField(b *testing.B) {
    src := genData()
    dst := genData()

    b.ResetTimer()
    for counter := 0; counter < b.N; counter++ {
        for i := 0; i < numRows; i++ {
            for j := 0; j < numCols; j++ {
                dst[j][i] = src[j][i]
            }
        }
    }
}

func BenchmarkCopyColumnByColumn(b *testing.B) {
    src := genData()
    dst := genData()

    b.ResetTimer()
    for counter := 0; counter < b.N; counter++ {
        for j := 0; j < numCols; j++ {
            for i := 0; i < numRows; i++ {
                dst[j][i] = src[j][i]
            }
        }
    }
}

shenli · 2018-08-17T04:54:47Z

Great! Much faster!

zz-jason · 2018-08-17T04:59:47Z

And with the "copy column by column" method, we can copy a continuously memory, which also speeds up the execution time:

➜ go test -bench Benchmark -count 3 --benchmem
goos: darwin
goarch: amd64
BenchmarkCopyFieldByField-8                        30000             59081 ns/op               0 B/op          0 allocs/op
BenchmarkCopyFieldByField-8                        20000             59146 ns/op               0 B/op          0 allocs/op
BenchmarkCopyFieldByField-8                        30000             58260 ns/op               0 B/op          0 allocs/op
BenchmarkCopyColumnByColumn-8                     100000             17915 ns/op               0 B/op          0 allocs/op
BenchmarkCopyColumnByColumn-8                     100000             18199 ns/op               0 B/op          0 allocs/op
BenchmarkCopyColumnByColumn-8                     100000             18134 ns/op               0 B/op          0 allocs/op
BenchmarkCopyColumnByColumnContinuously-8         300000              3712 ns/op               0 B/op          0 allocs/op
BenchmarkCopyColumnByColumnContinuously-8         300000              3635 ns/op               0 B/op          0 allocs/op
BenchmarkCopyColumnByColumnContinuously-8         300000              3628 ns/op               0 B/op          0 allocs/op
PASS
ok      _/tmp   15.956s

package main

import "testing"

var (
    numRows = 1024
    numCols = 16
)

func genData() [][]int64 {
    columns := make([][]int64, numCols)
    for i := 0; i < numCols; i++ {
        columns[i] = make([]int64, numRows)
    }
    return columns
}

func BenchmarkCopyFieldByField(b *testing.B) {
    src := genData()
    dst := genData()

    b.ResetTimer()
    for counter := 0; counter < b.N; counter++ {
        for i := 0; i < numRows; i++ {
            for j := 0; j < numCols; j++ {
                dst[j][i] = src[j][i]
            }
        }
    }
}

func BenchmarkCopyColumnByColumn(b *testing.B) {
    src := genData()
    dst := genData()

    b.ResetTimer()
    for counter := 0; counter < b.N; counter++ {
        for j := 0; j < numCols; j++ {
            for i := 0; i < numRows; i++ {
                dst[j][i] = src[j][i]
            }
        }
    }
}

func BenchmarkCopyColumnByColumnContinuously(b *testing.B) {
    src := genData()
    dst := genData()

    b.ResetTimer()
    for counter := 0; counter < b.N; counter++ {
        for j := 0; j < numCols; j++ {
            copy(dst[j], src[j])
        }
    }
}

zz-jason · 2018-08-18T14:16:34Z

Just fund that the cache utilization of function joiner.filter() is also not efficient:

145 func (j *baseJoiner) filter(input, output *chunk.Chunk) (matched bool, err error) {
146     j.selected, err = expression.VectorizedFilter(j.ctx, j.conditions, chunk.NewIterator4Chunk(input), j.selected)
147     if err != nil {
148         return false, errors.Trace(err)
149     }
150     for i := 0; i < len(j.selected); i++ {
151         if !j.selected[i] {
152             continue
153         }
154         matched = true
155         output.AppendRow(input.GetRow(i))
156     }
157     return matched, nil
158 }

We should copy the selected row column by column as well.

laidahe · 2018-08-19T14:10:40Z

In my perspective, the first argument of tryToMatch of joiner interface should be changed to pass chunk.column rather than chunk.Row. Am I right? If so, there are many places must to be changed.

zz-jason · 2018-08-19T14:46:38Z

@laidahe No, it should be row, not chunk.column, just need to add another function to replace makeJoinRow

supernan1994 · 2018-08-19T14:57:52Z

I'm working on it~

zz-jason · 2018-08-19T14:58:25Z

@supernan1994 Oh, Thanks!

zz-jason · 2018-09-06T16:10:48Z

fixed by #7493

zz-jason added type/enhancement The issue or PR belongs to an enhancement. help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. sig/execution SIG execution labels Aug 16, 2018

zz-jason added this to the 2.1 milestone Aug 17, 2018

zz-jason added the good first issue Denotes an issue ready for a new contributor, according to the "help wanted" guidelines. label Aug 18, 2018

zz-jason added the status/WIP label Aug 19, 2018

crazycs520 mentioned this issue Aug 19, 2018

executor/join : use shallow copy for join. #7433

Merged

zz-jason assigned crazycs520 Aug 28, 2018

zz-jason removed good first issue Denotes an issue ready for a new contributor, according to the "help wanted" guidelines. help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. labels Aug 28, 2018

zz-jason closed this as completed Sep 6, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

use a more cache-friendly way to make join rows #7420

use a more cache-friendly way to make join rows #7420

zz-jason commented Aug 16, 2018

shenli commented Aug 17, 2018 •

edited

Loading

zz-jason commented Aug 17, 2018 •

edited

Loading

shenli commented Aug 17, 2018

zz-jason commented Aug 17, 2018

zz-jason commented Aug 18, 2018

laidahe commented Aug 19, 2018

zz-jason commented Aug 19, 2018

supernan1994 commented Aug 19, 2018

zz-jason commented Aug 19, 2018

zz-jason commented Sep 6, 2018

use a more cache-friendly way to make join rows #7420

use a more cache-friendly way to make join rows #7420

Comments

zz-jason commented Aug 16, 2018

shenli commented Aug 17, 2018 • edited Loading

zz-jason commented Aug 17, 2018 • edited Loading

shenli commented Aug 17, 2018

zz-jason commented Aug 17, 2018

zz-jason commented Aug 18, 2018

laidahe commented Aug 19, 2018

zz-jason commented Aug 19, 2018

supernan1994 commented Aug 19, 2018

zz-jason commented Aug 19, 2018

zz-jason commented Sep 6, 2018

shenli commented Aug 17, 2018 •

edited

Loading

zz-jason commented Aug 17, 2018 •

edited

Loading