Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use a more cache-friendly way to make join rows #7420

Closed
zz-jason opened this issue Aug 16, 2018 · 10 comments
Closed

use a more cache-friendly way to make join rows #7420

zz-jason opened this issue Aug 16, 2018 · 10 comments
Assignees
Labels
sig/execution SIG execution type/enhancement The issue or PR belongs to an enhancement.
Milestone

Comments

@zz-jason
Copy link
Member

For left outer join, right outer join, and inner joins, we use the function makeJoinRowToChunk to concat the left and right records, take the left outer join as an example:

in file executor/joiner.go:

341     for ; inners.Current() != inners.End() && numToAppend > 0; numToAppend-- {
342         j.makeJoinRowToChunk(chkForJoin, outer, inners.Current())
343         inners.Next()
344     }

This method is far away from efficiency: Chunk has an column-oriented memory layout, data inside the same column are stored continuously in the memory. The relationship between row and column can be expressed like the following picture:

chunk

The function makeJoinRowToChunk copies data from the left and right row to the chunk field by field, which can be expressed like the following graph:

makejoinrowtochunk

As you can see, the cache utilization of this method is not efficient, which may introduce a lot of cache miss and data swap. Usually, the L1 data cache size is 32KB, the chunk size we use is 1K(1024 rows), if the joined table has too many columns, these data to be calculated can easily exceed the L1 data cache.

If we copy the data column by column, the L1 data cache miss ratio can be greatly reduced during the execution, and the performance can also be improved.

@zz-jason zz-jason added type/enhancement The issue or PR belongs to an enhancement. help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. sig/execution SIG execution labels Aug 16, 2018
@shenli
Copy link
Member

shenli commented Aug 17, 2018

Sounds reasonable. Is there any quantitative evaluation of this issue?

@zz-jason
Copy link
Member Author

zz-jason commented Aug 17, 2018

@shenli Here is a simple benchmark, the benchmark result is:

➜ go test -bench Benchmark -count 3 --benchmem
goos: darwin
goarch: amd64
BenchmarkCopyFieldByField-8                20000             61823 ns/op               0 B/op          0 allocs/op
BenchmarkCopyFieldByField-8                20000             64356 ns/op               0 B/op          0 allocs/op
BenchmarkCopyFieldByField-8                20000             65751 ns/op               0 B/op          0 allocs/op
BenchmarkCopyColumnByColumn-8             100000             19014 ns/op               0 B/op          0 allocs/op
BenchmarkCopyColumnByColumn-8             100000             19174 ns/op               0 B/op          0 allocs/op
BenchmarkCopyColumnByColumn-8             100000             18989 ns/op               0 B/op          0 allocs/op
PASS
ok      _/tmp   12.195s

the benchmark code is:

package main

import "testing"

var (
    numRows = 1024
    numCols = 16
)

func genData() [][]int64 {
    columns := make([][]int64, numCols)
    for i := 0; i < numCols; i++ {
        columns[i] = make([]int64, numRows)
    }
    return columns
}

func BenchmarkCopyFieldByField(b *testing.B) {
    src := genData()
    dst := genData()

    b.ResetTimer()
    for counter := 0; counter < b.N; counter++ {
        for i := 0; i < numRows; i++ {
            for j := 0; j < numCols; j++ {
                dst[j][i] = src[j][i]
            }
        }
    }
}

func BenchmarkCopyColumnByColumn(b *testing.B) {
    src := genData()
    dst := genData()

    b.ResetTimer()
    for counter := 0; counter < b.N; counter++ {
        for j := 0; j < numCols; j++ {
            for i := 0; i < numRows; i++ {
                dst[j][i] = src[j][i]
            }
        }
    }
}

@shenli
Copy link
Member

shenli commented Aug 17, 2018

Great! Much faster!

@zz-jason
Copy link
Member Author

And with the "copy column by column" method, we can copy a continuously memory, which also speeds up the execution time:

➜ go test -bench Benchmark -count 3 --benchmem
goos: darwin
goarch: amd64
BenchmarkCopyFieldByField-8                        30000             59081 ns/op               0 B/op          0 allocs/op
BenchmarkCopyFieldByField-8                        20000             59146 ns/op               0 B/op          0 allocs/op
BenchmarkCopyFieldByField-8                        30000             58260 ns/op               0 B/op          0 allocs/op
BenchmarkCopyColumnByColumn-8                     100000             17915 ns/op               0 B/op          0 allocs/op
BenchmarkCopyColumnByColumn-8                     100000             18199 ns/op               0 B/op          0 allocs/op
BenchmarkCopyColumnByColumn-8                     100000             18134 ns/op               0 B/op          0 allocs/op
BenchmarkCopyColumnByColumnContinuously-8         300000              3712 ns/op               0 B/op          0 allocs/op
BenchmarkCopyColumnByColumnContinuously-8         300000              3635 ns/op               0 B/op          0 allocs/op
BenchmarkCopyColumnByColumnContinuously-8         300000              3628 ns/op               0 B/op          0 allocs/op
PASS
ok      _/tmp   15.956s
package main

import "testing"

var (
    numRows = 1024
    numCols = 16
)

func genData() [][]int64 {
    columns := make([][]int64, numCols)
    for i := 0; i < numCols; i++ {
        columns[i] = make([]int64, numRows)
    }
    return columns
}

func BenchmarkCopyFieldByField(b *testing.B) {
    src := genData()
    dst := genData()

    b.ResetTimer()
    for counter := 0; counter < b.N; counter++ {
        for i := 0; i < numRows; i++ {
            for j := 0; j < numCols; j++ {
                dst[j][i] = src[j][i]
            }
        }
    }
}

func BenchmarkCopyColumnByColumn(b *testing.B) {
    src := genData()
    dst := genData()

    b.ResetTimer()
    for counter := 0; counter < b.N; counter++ {
        for j := 0; j < numCols; j++ {
            for i := 0; i < numRows; i++ {
                dst[j][i] = src[j][i]
            }
        }
    }
}

func BenchmarkCopyColumnByColumnContinuously(b *testing.B) {
    src := genData()
    dst := genData()

    b.ResetTimer()
    for counter := 0; counter < b.N; counter++ {
        for j := 0; j < numCols; j++ {
            copy(dst[j], src[j])
        }
    }
}

@zz-jason zz-jason added this to the 2.1 milestone Aug 17, 2018
@zz-jason zz-jason added the good first issue Denotes an issue ready for a new contributor, according to the "help wanted" guidelines. label Aug 18, 2018
@zz-jason
Copy link
Member Author

Just fund that the cache utilization of function joiner.filter() is also not efficient:

145 func (j *baseJoiner) filter(input, output *chunk.Chunk) (matched bool, err error) {
146     j.selected, err = expression.VectorizedFilter(j.ctx, j.conditions, chunk.NewIterator4Chunk(input), j.selected)
147     if err != nil {
148         return false, errors.Trace(err)
149     }
150     for i := 0; i < len(j.selected); i++ {
151         if !j.selected[i] {
152             continue
153         }
154         matched = true
155         output.AppendRow(input.GetRow(i))
156     }
157     return matched, nil
158 }

We should copy the selected row column by column as well.

@laidahe
Copy link
Contributor

laidahe commented Aug 19, 2018

In my perspective, the first argument of tryToMatch of joiner interface should be changed to pass chunk.column rather than chunk.Row. Am I right? If so, there are many places must to be changed.

@zz-jason
Copy link
Member Author

@laidahe No, it should be row, not chunk.column, just need to add another function to replace makeJoinRow

@supernan1994
Copy link
Contributor

I'm working on it~

@zz-jason
Copy link
Member Author

@supernan1994 Oh, Thanks!

@zz-jason zz-jason removed good first issue Denotes an issue ready for a new contributor, according to the "help wanted" guidelines. help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. labels Aug 28, 2018
@zz-jason
Copy link
Member Author

zz-jason commented Sep 6, 2018

fixed by #7493

@zz-jason zz-jason closed this as completed Sep 6, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
sig/execution SIG execution type/enhancement The issue or PR belongs to an enhancement.
Projects
None yet
Development

No branches or pull requests

5 participants