Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize masking with math/bits #171

Merged
merged 4 commits into from
Nov 7, 2019
Merged

Optimize masking with math/bits #171

merged 4 commits into from
Nov 7, 2019

Conversation

nhooyr
Copy link
Contributor

@nhooyr nhooyr commented Nov 6, 2019

See golang/go#31586 (comment)

Thanks @renthraysk

benchmark                      old MB/s     new MB/s     speedup 
BenchmarkXOR/2/fast-8          470.88       492.61       1.05x 
BenchmarkXOR/3/fast-8          602.24       719.25       1.19x 
BenchmarkXOR/4/fast-8          718.82       1186.64      1.65x 
BenchmarkXOR/8/fast-8          1027.60      1718.71      1.67x 
BenchmarkXOR/16/fast-8         1413.31      3430.46      2.43x 
BenchmarkXOR/32/fast-8         2701.81      5585.42      2.07x 
BenchmarkXOR/128/fast-8        7757.97      13432.37     1.73x 
BenchmarkXOR/512/fast-8        15155.03     18797.79     1.24x 
BenchmarkXOR/4096/fast-8       20689.95     20334.61     0.98x 
BenchmarkXOR/16384/fast-8      21687.87     21613.94     1.00x

Now its faster than basic XOR at every byte size greater than 2 on little
endian amd64 machines.

And faster at every level than gobwas/ws and gorilla/websocket.

See golang/go#31586 (comment)

Thanks @renthraysk

benchmark                      old MB/s     new MB/s     speedup
BenchmarkXOR/2/fast-8          470.88       492.61       1.05x
BenchmarkXOR/3/fast-8          602.24       719.25       1.19x
BenchmarkXOR/4/fast-8          718.82       1186.64      1.65x
BenchmarkXOR/8/fast-8          1027.60      1718.71      1.67x
BenchmarkXOR/16/fast-8         1413.31      3430.46      2.43x
BenchmarkXOR/32/fast-8         2701.81      5585.42      2.07x
BenchmarkXOR/128/fast-8        7757.97      13432.37     1.73x
BenchmarkXOR/512/fast-8        15155.03     18797.79     1.24x
BenchmarkXOR/4096/fast-8       20689.95     20334.61     0.98x
BenchmarkXOR/16384/fast-8      21687.87     21613.94     1.00x

Now its faster than basic XOR at every byte size greater than 2 on
little endian amd64 machines.
@nhooyr
Copy link
Contributor Author

nhooyr commented Nov 7, 2019

More optimization is possible, see golang/go#31586 (comment)

Thanks again to @renthraysk

This provides another significant speedup.

benchmark                        old MB/s     new MB/s     speedup
Benchmark_mask/2/fast-8          405.48       513.25       1.27x
Benchmark_mask/3/fast-8          518.93       661.92       1.28x
Benchmark_mask/4/fast-8          1207.10      1252.39      1.04x
Benchmark_mask/8/fast-8          1708.82      1655.63      0.97x
Benchmark_mask/16/fast-8         3418.58      3051.25      0.89x
Benchmark_mask/32/fast-8         5789.43      5813.31      1.00x
Benchmark_mask/128/fast-8        12819.53     14804.50     1.15x
Benchmark_mask/512/fast-8        18247.06     21659.50     1.19x
Benchmark_mask/4096/fast-8       19802.31     23885.68     1.21x
Benchmark_mask/16384/fast-8      20896.97     25081.11     1.20x
@github-actions
Copy link

github-actions bot commented Nov 7, 2019

Coverage Status

Coverage decreased (-0.2%) to 92.308% when pulling 15d0a18 on fast-xor into 0fc34f9 on master.

@nhooyr nhooyr merged commit c781bdf into master Nov 7, 2019
@nhooyr nhooyr deleted the fast-xor branch November 7, 2019 02:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant