Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Benchmark FastXOR on non amd64 platforms #96

Closed
nhooyr opened this issue Jun 7, 2019 · 3 comments
Closed

Benchmark FastXOR on non amd64 platforms #96

nhooyr opened this issue Jun 7, 2019 · 3 comments

Comments

@nhooyr
Copy link
Contributor

nhooyr commented Jun 7, 2019

golang/go#31586 (comment)

It might be slower.

@nhooyr nhooyr added the perf label Jun 7, 2019
@nhooyr nhooyr modified the milestone: v1.2.0 Jun 7, 2019
@nhooyr
Copy link
Contributor Author

nhooyr commented Aug 31, 2019

It's unlikely to be any slower and ARM also supports unaligned loads/stores and is little endian so I don't expect there to be any issues here given ARM and AMD64 are the major architectures in use.

If anyone finds it too slow, they can open a new issue.

@nhooyr nhooyr closed this as completed Aug 31, 2019
@nhooyr
Copy link
Contributor Author

nhooyr commented Aug 31, 2019

I will test to confirm on an AWS arm instance.

@nhooyr nhooyr reopened this Aug 31, 2019
@nhooyr
Copy link
Contributor Author

nhooyr commented Sep 1, 2019

Tested on an AWS a1.xlarge aarch64 ARM instance.

ubuntu@ip-172-31-31-187:~/websocket$ go test -run=^\$ -bench=BenchmarkXOR
goos: linux
goarch: arm64
pkg: nhooyr.io/websocket
BenchmarkXOR/2/basic-4         	100000000	        10.0 ns/op	 199.39 MB/s	       0 B/op	       0 allocs/op
BenchmarkXOR/2/fast-4          	100000000	        12.7 ns/op	 158.10 MB/s	       0 B/op	       0 allocs/op
BenchmarkXOR/16/basic-4        	50000000	        34.5 ns/op	 464.38 MB/s	       0 B/op	       0 allocs/op
BenchmarkXOR/16/fast-4         	50000000	        26.6 ns/op	 601.40 MB/s	       0 B/op	       0 allocs/op
BenchmarkXOR/32/basic-4        	20000000	        69.6 ns/op	 459.61 MB/s	       0 B/op	       0 allocs/op
BenchmarkXOR/32/fast-4         	50000000	        30.5 ns/op	1048.10 MB/s	       0 B/op	       0 allocs/op
BenchmarkXOR/512/basic-4       	 2000000	       906 ns/op	 565.00 MB/s	       0 B/op	       0 allocs/op
BenchmarkXOR/512/fast-4        	20000000	       102 ns/op	4973.82 MB/s	       0 B/op	       0 allocs/op
BenchmarkXOR/4096/basic-4      	  200000	      7157 ns/op	 572.23 MB/s	       0 B/op	       0 allocs/op
BenchmarkXOR/4096/fast-4       	 2000000	       660 ns/op	6197.44 MB/s	       0 B/op	       0 allocs/op
BenchmarkXOR/16384/basic-4     	   50000	     28607 ns/op	 572.71 MB/s	       0 B/op	       0 allocs/op
BenchmarkXOR/16384/fast-4      	  500000	      2631 ns/op	6225.87 MB/s	       0 B/op	       0 allocs/op
PASS
ok  	nhooyr.io/websocket	19.901s

Still faster than basic. Not as much as amd64 but good enough to keep as is.

@nhooyr nhooyr closed this as completed Sep 1, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant