Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

drastically reduce allocations in ring buffer implementation #64

Merged
merged 3 commits into from
Nov 20, 2021

Conversation

pymq
Copy link
Contributor

@pymq pymq commented Oct 5, 2021

Reuse cap of s.b slice by shifting its elements to the left on writes. Before this change Append was allocating a new slice every time because len(s.b) == cap(s.b). In my testings len(s.b) is usually small enough (<= 50, sometimes spikes to 200) even in high-write benchmarks such as BenchmarkSendRecvLarge, often near 0, so copy should not be too expensive. It is also efficient when write rate is roughly equal to read rate.

Tested with

$ go version                                                                 
go version go1.16.4 linux/amd64

$ go test -bench=BenchmarkSendRecv -run ^$ -benchmem -benchtime=10s

Before

goos: linux
goarch: amd64
pkg: github.com/libp2p/go-yamux/v2
cpu: Intel(R) Core(TM) i7-8750H CPU @ 2.20GHz
BenchmarkSendRecv-12         	 4662453	      3042 ns/op	      24 B/op	       1 allocs/op
BenchmarkSendRecvLarge-12    	     163	 121443667 ns/op	  255279 B/op	   10165 allocs/op
PASS
ok  	github.com/libp2p/go-yamux/v2	55.470s

After this commit

goos: linux
goarch: amd64
pkg: github.com/libp2p/go-yamux/v2
cpu: Intel(R) Core(TM) i7-8750H CPU @ 2.20GHz
BenchmarkSendRecv-12         	 4864972	      2494 ns/op	       0 B/op	       0 allocs/op
BenchmarkSendRecvLarge-12    	     136	  84702011 ns/op	    7162 B/op	       5 allocs/op
PASS
ok  	github.com/libp2p/go-yamux/v2	35.208s

Before this change `Append` was allocating a new slice every time because len(s.b) == cap(s.b). In my testings len(s.b) is usually small enough (<= 50, sometimes spikes to 200) even in high-write benchmarks such as BenchmarkSendRecvLarge, often near 0, so copy should not be too expensive. It is also efficient when write rate is roughly equal to read rate.
Copy link
Member

@Stebalien Stebalien left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! This is tricky code, but it looks correct. One comment I'd like you to consider, but I don't feel too strongly about it.

But I'd like to get a review from @marten-seemann as well.

util.go Outdated
// have no unread chunks, just move pos
s.bPos = 0
s.b = s.b[:0]
} else {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd change this to to check if at least half of the slice is free. That'll slightly increase allocations until we hit a steady-state, but should avoid the degenerate case where we slide by one every single time.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is definitely a good improvement, but I am not sure about "at least half of the slice", I think it should be a bit lower than that, how about 0.25 of the capacity? Also, then it will be equal to append() growth factor (when cap > 1024). Added with 0.25 for now.

That got me thinking, should we limit the maximum capacity of the buffer (recreate slice with default capacity when reach certain maximum and buffer is empty now)? What is the average Stream lifespan?

I tried to test this on the package's benchmark, but it does not grow at all because of in-memory network.

Copy link
Contributor

@marten-seemann marten-seemann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't had the time to do a thorough review yet, but more documentation (maybe even an example) here would be very useful to understand what the code is doing.

@aschmahmann aschmahmann added the need/author-input Needs input from the original author label Oct 15, 2021
…me case when we shift slice by one every time
@pymq
Copy link
Contributor Author

pymq commented Oct 18, 2021

Sorry for the delay.

I am not sure what kind of documentation this needs, basically this code tries to reuse slice's capacity with shifting values to the start so append() still adds values to the end, like a ring-buffer but simply. I can try to add more inline comments if you prefer.

@pymq pymq requested a review from Stebalien October 23, 2021 09:50
@BigLep BigLep requested a review from a team October 29, 2021 06:31
@BigLep
Copy link

BigLep commented Oct 29, 2021

Assigned to @libp2p/go-libp2p-maintainers to see if anyone else can look.

@BigLep
Copy link

BigLep commented Oct 29, 2021

@marten-seemann : understood there are other things you're currently focused on. When you do reengage, does this comment make things more clear for you?

@marten-seemann
Copy link
Contributor

I am not sure what kind of documentation this needs, basically this code tries to reuse slice's capacity with shifting values to the start so append() still adds values to the end, like a ring-buffer but simply. I can try to add more inline comments if you prefer.

Yes, more inline comments would be highly appreciated. I haven't worked with this code for a few months, and I find it quite hard to understand without any comments (this partially applies to the code we had before as well).

@pymq
Copy link
Contributor Author

pymq commented Oct 29, 2021

@marten-seemann added more comments.

@@ -54,6 +54,7 @@ func BenchmarkSendRecv(b *testing.B) {
recvBuf := make([]byte, 512)

doneCh := make(chan struct{})
b.ResetTimer()
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest you also add b.ReportAllocs() here and in BenchmarkSendRecvLarge

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure this is necessary, you can add -benchmem to go test to achieve the same result.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, but I'm suggesting it because allocs seem to be of ongoing concern. Up to you.

@iand
Copy link

iand commented Nov 8, 2021

Comparison of benchmarks using benchstat (with ReportAllocs added manually):

name             old time/op    new time/op    delta
SendRecv-8         2.39µs ± 1%    2.30µs ± 1%    -3.56%  (p=0.000 n=9+10)
SendRecvLarge-8    77.1ms ± 0%    76.9ms ± 1%      ~     (p=0.456 n=7+7)

name             old alloc/op   new alloc/op   delta
SendRecv-8          24.0B ± 0%      0.0B       -100.00%  (p=0.000 n=10+10)
SendRecvLarge-8     337kB ± 4%     126kB ± 5%   -62.67%  (p=0.000 n=8+8)

name             old allocs/op  new allocs/op  delta
SendRecv-8           1.00 ± 0%      0.00       -100.00%  (p=0.000 n=10+10)
SendRecvLarge-8     8.60k ± 0%     0.14k ±11%   -98.38%  (p=0.000 n=7+10)

@aschmahmann aschmahmann added need/maintainer-input Needs input from the current maintainer(s) and removed need/author-input Needs input from the original author labels Nov 19, 2021
@marten-seemann marten-seemann changed the title drastically reduce allocations drastically reduce allocations in ring buffer implementation Nov 20, 2021
@marten-seemann marten-seemann merged commit d6101de into libp2p:master Nov 20, 2021
@aschmahmann aschmahmann mentioned this pull request Dec 1, 2021
80 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
need/maintainer-input Needs input from the current maintainer(s)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants