Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

all: use sync.Pool with new(bytes.Buffer) instead of always creating a new bytes.Buffer or sha256.New() and discarding them #371

Closed
odeke-em opened this issue Mar 2, 2021 · 5 comments · Fixed by #453

Comments

@odeke-em
Copy link
Contributor

odeke-em commented Mar 2, 2021

Noticed a day ago as I was dissecting performance issues in cosmos-sdk. The code inside for example encoding.go and import.go creates a bytes.Buffer per call, then discards it after exiting which shows up with buffer.Grow-> runtime.growslice in CPU and memory profiles.

Go's sync package provides sync.Pool https://golang.org/pkg/sync/#Pool which can be used to reuse and reset buffers and that will produce great performance at scale for example please do this

diff --git a/encoding.go b/encoding.go
index 6068fe1..a80087b 100644
--- a/encoding.go
+++ b/encoding.go
@@ -7,6 +7,7 @@ import (
        "fmt"
        "io"
        "math/bits"
+       "sync"
 )
 
 // decodeBytes decodes a varint length-prefixed byte slice, returning it along with the number
@@ -78,10 +79,19 @@ func encodeBytes(w io.Writer, bz []byte) error {
        return err
 }
 
+var bufPool = &sync.Pool{
+       New: func() interface{} {
+               return new(bytes.Buffer)
+       },
+}
+
 // encodeBytesSlice length-prefixes the byte slice and returns it.
 func encodeBytesSlice(bz []byte) ([]byte, error) {
-       var buf bytes.Buffer
-       err := encodeBytes(&buf, bz)
+       buf := bufPool.Get().(*bytes.Buffer)
+       buf.Reset()
+       defer bufPool.Put(buf)
+
+       err := encodeBytes(buf, bz)
        return buf.Bytes(), err
 }
 
diff --git a/import.go b/import.go
index 1933fdd..083e6eb 100644
--- a/import.go
+++ b/import.go
@@ -119,8 +119,11 @@ func (i *Importer) Add(exportNode *ExportNode) error {
                return err
        }
 
-       var buf bytes.Buffer
-       err = node.writeBytes(&buf)
+       buf := bufPool.Get().(*bytes.Buffer)
+       buf.Reset()
+       defer bufPool.Put(buf)
+
+       err = node.writeBytes(buf)

and

index c351faa..cef8fba 100644
--- a/proof.go
+++ b/proof.go
@@ -55,7 +55,10 @@ func (pin ProofInnerNode) stringIndented(indent string) string {
 
 func (pin ProofInnerNode) Hash(childHash []byte) []byte {
        hasher := sha256.New()
-       buf := new(bytes.Buffer)
+
+       buf := bufPool.Get().(*bytes.Buffer)
+       buf.Reset()
+       defer bufPool.Put(buf)
 
        err := encodeVarint(buf, int64(pin.Height))
        if err == nil {
@@ -145,7 +148,10 @@ func (pln ProofLeafNode) stringIndented(indent string) string {
 
 func (pln ProofLeafNode) Hash() []byte {
        hasher := sha256.New()
-       buf := new(bytes.Buffer)
+
+       buf := bufPool.Get().(*bytes.Buffer)
+       buf.Reset()
+       defer bufPool.Put(buf)
 
        err := encodeVarint(buf, 0)
        if err == nil {

Experimental result

and with some experiments this yields a 9+% speed up just for LeafOp

$ benchstat pre.txt post.txt 
name                       old time/op    new time/op    delta
pkg:github.com/cosmos/iavl goos:darwin goarch:amd64
ConvertLeafOp-8              1.02µs ± 2%    0.92µs ± 9%  -9.47%  (p=0.000 n=8+10)
pkg:github.com/cosmos/iavl/benchmarks goos:darwin goarch:amd64
RandomBytes/random-4-8       38.0ns ± 6%    35.3ns ± 5%  -7.17%  (p=0.000 n=10+10)
RandomBytes/random-16-8      53.4ns ± 6%    51.3ns ± 4%  -3.89%  (p=0.015 n=10+10)
RandomBytes/random-32-8      67.3ns ± 7%    66.9ns ± 4%    ~     (p=0.955 n=10+10)
RandomBytes/random-100-8      138ns ± 5%     136ns ± 4%    ~     (p=0.184 n=10+10)
RandomBytes/random-1000-8    1.01µs ±11%    0.97µs ± 5%    ~     (p=0.063 n=10+10)
pkg:github.com/cosmos/iavl/common goos:darwin goarch:amd64
RandBytes10B-8                191ns ± 3%     190ns ± 3%    ~     (p=0.342 n=10+10)
RandBytes100B-8              1.75µs ± 3%    1.72µs ± 3%  -1.57%  (p=0.050 n=10+10)
RandBytes1KiB-8              17.5µs ± 2%    17.4µs ± 1%    ~     (p=0.203 n=10+8)
RandBytes10KiB-8              173µs ± 1%     171µs ± 2%  -1.03%  (p=0.006 n=9+10)
RandBytes100KiB-8            1.73ms ± 1%    1.71ms ± 1%  -0.80%  (p=0.003 n=10+10)
RandBytes1MiB-8              17.8ms ± 1%    17.4ms ± 1%  -2.21%  (p=0.000 n=10+10)

name                       old alloc/op   new alloc/op   delta
pkg:github.com/cosmos/iavl goos:darwin goarch:amd64
ConvertLeafOp-8                768B ± 0%      768B ± 0%    ~     (all equal)
pkg:github.com/cosmos/iavl/common goos:darwin goarch:amd64
RandBytes10B-8                16.0B ± 0%     16.0B ± 0%    ~     (all equal)
RandBytes100B-8                112B ± 0%      112B ± 0%    ~     (all equal)
RandBytes1KiB-8              1.02kB ± 0%    1.02kB ± 0%    ~     (all equal)
RandBytes10KiB-8             10.2kB ± 0%    10.2kB ± 0%    ~     (all equal)
RandBytes100KiB-8             106kB ± 0%     106kB ± 0%    ~     (all equal)
RandBytes1MiB-8              1.05MB ± 0%    1.05MB ± 0%  -0.00%  (p=0.038 n=8+10)

name                       old allocs/op  new allocs/op  delta
pkg:github.com/cosmos/iavl goos:darwin goarch:amd64
ConvertLeafOp-8                24.0 ± 0%      24.0 ± 0%    ~     (all equal)
pkg:github.com/cosmos/iavl/common goos:darwin goarch:amd64
RandBytes10B-8                 1.00 ± 0%      1.00 ± 0%    ~     (all equal)
RandBytes100B-8                1.00 ± 0%      1.00 ± 0%    ~     (all equal)
RandBytes1KiB-8                1.00 ± 0%      1.00 ± 0%    ~     (all equal)
RandBytes10KiB-8               1.00 ± 0%      1.00 ± 0%    ~     (all equal)
RandBytes100KiB-8              1.00 ± 0%      1.00 ± 0%    ~     (all equal)
RandBytes1MiB-8                1.00 ± 0%      1.00 ± 0%    ~     (all equal)

This value is going to compound when used with lots of code in the cosmos-sdk running over a period of time and even starting up. Our goal is to squeeze out every single ounce of performance and reclaim as much as we can, given that the cosmos-sdk is being used at scale now.

I highly encourage and adoption of this pattern all around this package.

Kindly paging some folks who might be interested in this: @alessio @robert-zaremba @ethanfrey @marbar3778 @okwme @erikgrinaker @zmanian @ebuchman.

@odeke-em
Copy link
Contributor Author

odeke-em commented Mar 2, 2021

If there are cases where the bytes are stored and kept in memory for a long time, then perhaps we can't reuse buffers but majority of them will perhaps be conversions between bytes and integers.

@tac0turtle
Copy link
Member

Nice find!! We should definitely look into adopting this, we could release this in a minor release as well.

@robert-zaremba
Copy link
Collaborator

Good job with looking at it. Each % matters.
Will you like to make a PR?

@robert-zaremba
Copy link
Collaborator

Ping. @odeke-em let us know if you will like to handle it (would be good).

@odeke-em
Copy link
Contributor Author

odeke-em commented Mar 5, 2021

@robert-zaremba please go for it, I still have to finish creating benchmarking the cosmos-sdk and on the quest to finish that I discover things that need to be fixed then report them, but otherwise I'll get side tracked. All yours :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants