Optimize masking with math/bits #171

nhooyr · 2019-11-06T15:11:58Z

benchmark                      old MB/s     new MB/s     speedup 
BenchmarkXOR/2/fast-8          470.88       492.61       1.05x 
BenchmarkXOR/3/fast-8          602.24       719.25       1.19x 
BenchmarkXOR/4/fast-8          718.82       1186.64      1.65x 
BenchmarkXOR/8/fast-8          1027.60      1718.71      1.67x 
BenchmarkXOR/16/fast-8         1413.31      3430.46      2.43x 
BenchmarkXOR/32/fast-8         2701.81      5585.42      2.07x 
BenchmarkXOR/128/fast-8        7757.97      13432.37     1.73x 
BenchmarkXOR/512/fast-8        15155.03     18797.79     1.24x 
BenchmarkXOR/4096/fast-8       20689.95     20334.61     0.98x 
BenchmarkXOR/16384/fast-8      21687.87     21613.94     1.00x

Now its faster than basic XOR at every byte size greater than 2 on little
endian amd64 machines.

And faster at every level than gobwas/ws and gorilla/websocket.

@renthraysk

See golang/go#31586 (comment) Thanks @renthraysk benchmark old MB/s new MB/s speedup BenchmarkXOR/2/fast-8 470.88 492.61 1.05x BenchmarkXOR/3/fast-8 602.24 719.25 1.19x BenchmarkXOR/4/fast-8 718.82 1186.64 1.65x BenchmarkXOR/8/fast-8 1027.60 1718.71 1.67x BenchmarkXOR/16/fast-8 1413.31 3430.46 2.43x BenchmarkXOR/32/fast-8 2701.81 5585.42 2.07x BenchmarkXOR/128/fast-8 7757.97 13432.37 1.73x BenchmarkXOR/512/fast-8 15155.03 18797.79 1.24x BenchmarkXOR/4096/fast-8 20689.95 20334.61 0.98x BenchmarkXOR/16384/fast-8 21687.87 21613.94 1.00x Now its faster than basic XOR at every byte size greater than 2 on little endian amd64 machines.

nhooyr · 2019-11-07T01:46:21Z

More optimization is possible, see golang/go#31586 (comment)

@renthraysk

Thanks again to @renthraysk This provides another significant speedup. benchmark old MB/s new MB/s speedup Benchmark_mask/2/fast-8 405.48 513.25 1.27x Benchmark_mask/3/fast-8 518.93 661.92 1.28x Benchmark_mask/4/fast-8 1207.10 1252.39 1.04x Benchmark_mask/8/fast-8 1708.82 1655.63 0.97x Benchmark_mask/16/fast-8 3418.58 3051.25 0.89x Benchmark_mask/32/fast-8 5789.43 5813.31 1.00x Benchmark_mask/128/fast-8 12819.53 14804.50 1.15x Benchmark_mask/512/fast-8 18247.06 21659.50 1.19x Benchmark_mask/4096/fast-8 19802.31 23885.68 1.21x Benchmark_mask/16384/fast-8 20896.97 25081.11 1.20x

github-actions · 2019-11-07T02:08:26Z

Coverage decreased (-0.2%) to 92.308% when pulling 15d0a18 on fast-xor into 0fc34f9 on master.

nhooyr added 3 commits November 6, 2019 09:49

Add more sizes to BenchmarkXOR

2f8f69c

Rename xor to mask

3b6e614

nhooyr force-pushed the fast-xor branch from 4ab908d to 3b6e614 Compare November 6, 2019 15:15

nhooyr mentioned this pull request Nov 7, 2019

cmd/compile: optimize XOR masking code golang/go#31586

Closed

nhooyr merged commit c781bdf into master Nov 7, 2019

nhooyr deleted the fast-xor branch November 7, 2019 02:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize masking with math/bits #171

Optimize masking with math/bits #171

nhooyr commented Nov 6, 2019 •

edited

Loading

nhooyr commented Nov 7, 2019 •

edited

Loading

github-actions bot commented Nov 7, 2019

Optimize masking with math/bits #171

Optimize masking with math/bits #171

Conversation

nhooyr commented Nov 6, 2019 • edited Loading

nhooyr commented Nov 7, 2019 • edited Loading

github-actions bot commented Nov 7, 2019

nhooyr commented Nov 6, 2019 •

edited

Loading

nhooyr commented Nov 7, 2019 •

edited

Loading