Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore: preallocate slices #1842

Merged
merged 5 commits into from
Nov 6, 2022
Merged

chore: preallocate slices #1842

merged 5 commits into from
Nov 6, 2022

Conversation

estensen
Copy link
Contributor

When we know the final size of a slice preallocate for it to save memory.
Assigning directly is slightly faster than using append.

@MarcoPolo
Copy link
Collaborator

Thanks! Did you use a tool to find these spots or something else?

@MarcoPolo MarcoPolo self-requested a review October 30, 2022 12:05
@estensen
Copy link
Contributor Author

yeah, repro by running:
golangci-lint run ./... -E prealloc --disable-all

Comment on lines 40 to 39
if len(pi.Addrs) > 0 {
addrs = make([][]byte, 0, len(pi.Addrs))
}
addrs := make([][]byte, len(pi.Addrs))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let’s keep the old behaviour of having this be nil. In this case I don’t think this matters, but generally we can’t be sure if callers check this.

Copy link
Contributor

@marten-seemann marten-seemann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand why you're replacing appends with assignments to a specific index. append is the nicer way to do this, please don't change that.

core/peer/addrinfo.go Outdated Show resolved Hide resolved
@MarcoPolo
Copy link
Collaborator

I don't understand why you're replacing appends with assignments to a specific index. append is the nicer way to do this, please don't change that.

I don’t have a preference, but assignments are technically faster. That said, I don’t think this matters here.

@marten-seemann
Copy link
Contributor

I don't understand why you're replacing appends with assignments to a specific index. append is the nicer way to do this, please don't change that.

I don’t have a preference, but assignments are technically faster. That said, I don’t think this matters here.

I'd expect it to compile to exactly the same code. I ran a benchmark, and the two perform exactly the same. If at all, appending is slightly faster, but probably not statistically significant.

goos: darwin
goarch: arm64
pkg: github.com/libp2p/go-libp2p/p2p/protocol/holepunch
BenchmarkAppend-10      24834030                48.23 ns/op
BenchmarkAssign-10      24644346                48.32 ns/op
const l = 128

func BenchmarkAppend(b *testing.B) {
	a := make([]int, 0, l)
	for i := 0; i < b.N; i++ {
		a = a[:0]
		for j := 0; j < l; j++ {
			a = append(a, j)
		}
	}
}

func BenchmarkAssign(b *testing.B) {
	a := make([]int, l)
	for i := 0; i < b.N; i++ {
		for j := 0; j < l; j++ {
			a[j] = j
		}
	}
}

The reason people tend to prefer append is that it's safer: If you get your length calculation wrong, all that happens is that you do an additional alloc. If you get your length calculation wrong and do an assignment, your process will panic.

@estensen
Copy link
Contributor Author

estensen commented Nov 6, 2022

I've always gotten better perf from assignment than appending. Same code on my machine:

goos: darwin
goarch: amd64
pkg: github.com/estensen/benchmarks/tmp
cpu: Intel(R) Core(TM) i7-9750H CPU @ 2.60GHz
BenchmarkAppend-12    	15084253	        68.56 ns/op
BenchmarkAssign-12    	17883320	        66.12 ns/op
PASS
ok  	github.com/estensen/benchmarks/tmp	2.536s

But I do agree that it is safer to use append. I'll revert it.

p2p/net/connmgr/connmgr_test.go Outdated Show resolved Hide resolved
p2p/net/connmgr/connmgr_test.go Outdated Show resolved Hide resolved
p2p/net/swarm/swarm_listen.go Outdated Show resolved Hide resolved
p2p/protocol/circuitv2/relay/relay.go Outdated Show resolved Hide resolved
p2p/protocol/internal/circuitv1-deprecated/util.go Outdated Show resolved Hide resolved
@marten-seemann marten-seemann changed the title Preallocate slices chore: preallocate slices Nov 6, 2022
@marten-seemann
Copy link
Contributor

Thank you!

@marten-seemann marten-seemann merged commit 21dc42b into libp2p:master Nov 6, 2022
@estensen estensen deleted the prealloc branch November 6, 2022 12:53
@MarcoPolo
Copy link
Collaborator

Let me preface this by saying this doesn't matter at all :)

I'd expect it to compile to exactly the same code.

I went into a rabbit hole here because this doesn't make sense to me. Append has to do more work than an assign (which should be a single instruction). So it should be slower. Sure enough if we load up the code in godbolt we see that the append has one more compare and mov than assign should have. I say should have because this benchmark actually is flawed since the compiler optimizes the assignment and makes it the same as the noop (you don't see a MOVQ in the asm output). To fix this you actually should do something like this. This optimization actually makes it a bit more surprising that append (an extra cmp and 2 movs) is as fast as a noop, but then again CPUs are pretty good at branch prediction and pipelining and parallelizing especially in this very simple case.

Also I'm impressed with how well the compiler optimized the append loop. It really keeps the hot path very small. If we run without compiler optimizations (gcflags=-N) we also see the expected speed diffs.

Results from my tests here: https://gist.github.com/MarcoPolo/18b39366c6f57270059d0ef03c29351b#file-results-md

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants