-
Notifications
You must be signed in to change notification settings - Fork 17.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cmd/compile: explore optimizations for fixed-length array copies #46529
Comments
There’s a mistake in that benchmark, |
For large slices, the performance difference of
Other observations:
|
It looks, the conversion way is more efficient for very small and large enough arrays: package arrays
import "testing"
const N = 32
var r [128][N]byte
var s = make([]byte, N)
func init() {
println("============= N =", N)
}
func Benchmark_CopyClone(b *testing.B) {
for i := 0; i < b.N; i++ {
copy(r[i&127][:], s)
}
}
func Benchmark_ConvertClone(b *testing.B) {
for i := 0; i < b.N; i++ {
r[i&127] = *(*[N]byte)(s)
}
} Benchmark results:
|
I actually see an 18% performance decrease when copying 2048 bytes. This seems super platform- and case-dependent. I don't see significant improvements for my machine other than for <=64 bytes. On further testing (I increased the number of arrays from 128 to 256), the results are also sensitive to the number of distinct arrays being copied. The main factors in whether this provides a significant speedup are: 1, Your machine's block size (64 bytes for my machine) Once the arrays being copied exceed my machine's block size, I don't see a significant improvement. Eventually, once the total set of memory I'm copying from/to exceeds the L1 cache size, I start to see slowdowns (number of arrays * size of array > 64kb). The slowdowns become significant once the individual arrays being copied exceed the size of the L1 cache. |
Since there hasn't been any movement here since August and we're in the freeze, I'm going to put it into the backlog. @mdempsky Feel free to move it out to 1.19 if you plan to do some more exploration of this for next release. |
My experience when it comes to byte slices is that func next8BytesForKeywordLookup(bytes []byte) (chs uint64) {
if len(bytes) > 8 {
return // not a keyword
}
if cap(bytes) >= 8 {
chs = binary.LittleEndian.Uint64(bytes[0:8:8])
chs = chs & masks_for_len[len(bytes)]
} else {
var b [8]byte
copy(b[:], bytes)
chs = binary.LittleEndian.Uint64(b[:])
}
return
} The compiler could special case byte slices and generate |
#46505 (comment) reports these results for this benchmark:
We should see if cmd/compile can optimize the first two as efficiently as the last.
One observation though is they have different semantics:
SumNamed
andSumCopy
need to support partial copies iflen(b) < Size256
, butSumPointer
panics in this case.We could probably efficiently handle
copy(ret[:], b[:Size256])
though, by rewriting it intoret = *(*[Size256]byte)(b)
in the frontend.Marking for Go 1.18 to at least explore possibilities here.
Related #20859.
/cc @randall77 @bradfitz @katiehockman
The text was updated successfully, but these errors were encountered: