Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf: optimize iteration on nested cache context #13881

Merged
merged 36 commits into from
Dec 16, 2022

Conversation

yihuang
Copy link
Collaborator

@yihuang yihuang commented Nov 16, 2022

Description

Closes: #10310
Solution:

  • only call skipUntilExistsOrInvalid at construction and end of Next, cache the valid status
  • use tidwall btree directly instead of memdb.

run benchmark: go test -run=^$ -bench=. -benchmem ./store/cachekv/benchmark_test.go -count=10.
benchstat compare result:

$ benchstat before.txt after.txt
name                   old time/op    new time/op    delta
DeepContextStack1-12     11.2µs ±12%     5.7µs ± 1%   -48.57%  (p=0.000 n=10+8)
DeepContextStack3-12     32.2µs ±16%     7.1µs ± 2%   -77.99%  (p=0.000 n=9+9)
DeepContextStack10-12     244ms ±17%       0ms ±10%   -99.99%  (p=0.000 n=10+10)
DeepContextStack13-12     16.5s ± 1%      0.0s ± 1%  -100.00%  (p=0.000 n=10+10)

name                   old alloc/op   new alloc/op   delta
DeepContextStack1-12     3.94kB ± 0%    3.03kB ± 0%   -23.12%  (p=0.000 n=10+10)
DeepContextStack3-12     6.33kB ± 0%    3.59kB ± 0%   -43.24%  (p=0.000 n=10+10)
DeepContextStack10-12    15.1kB ± 1%     5.6kB ± 0%   -63.27%  (p=0.000 n=10+10)
DeepContextStack13-12    20.8kB ± 0%     6.4kB ± 0%   -69.20%  (p=0.000 n=10+10)

name                   old allocs/op  new allocs/op  delta
DeepContextStack1-12       50.0 ± 0%      42.0 ± 0%   -16.00%  (p=0.000 n=10+10)
DeepContextStack3-12       74.0 ± 0%      50.0 ± 0%   -32.43%  (p=0.000 n=10+10)
DeepContextStack10-12       172 ± 2%        78 ± 0%   -54.62%  (p=0.000 n=10+10)
DeepContextStack13-12       272 ± 0%        90 ± 0%   -66.91%  (p=0.000 n=10+10)

Author Checklist

All items are required. Please add a note to the item if the item is not applicable and
please add links to any relevant follow up issues.

I have...

  • included the correct type prefix in the PR title
  • added ! to the type prefix if API or client breaking change
  • targeted the correct branch (see PR Targeting)
  • provided a link to the relevant issue or specification
  • followed the guidelines for building modules
  • included the necessary unit and integration tests
  • added a changelog entry to CHANGELOG.md
  • included comments for documenting Go code
  • updated the relevant documentation or specification
  • reviewed "Files changed" and left comments if necessary
  • confirmed all CI checks have passed

Reviewers Checklist

All items are required. Please add a note if the item is not applicable and please add
your handle next to the items reviewed if you only reviewed selected items.

I have...

  • confirmed the correct type prefix in the PR title
  • confirmed ! in the type prefix if API or client breaking change
  • confirmed all author checklist items have been addressed
  • reviewed state machine logic
  • reviewed API design and naming
  • reviewed documentation is accurate
  • reviewed tests and test coverage
  • manually tested (if applicable)

@yihuang
Copy link
Collaborator Author

yihuang commented Nov 16, 2022

The test(02) failure seems like a non-deterministic one unrelated to this PR, I see it fail on some main branch commit as well: https://github.com/cosmos/cosmos-sdk/actions/runs/3467563489/jobs/5792555203

@yihuang
Copy link
Collaborator Author

yihuang commented Nov 16, 2022

After this optimization, the cpu profile points to the runtime stuff, I believe it's the go-routine used in memdb's iteration, why do we use a go-routine to iterate a in-memory data structure?
And the cache context don't need to be thread safe, so I'd suggest we use the btree directly here.

Closes: cosmos#10310
Solution:
- cache the valid status
go.mod Show resolved Hide resolved
@yihuang
Copy link
Collaborator Author

yihuang commented Nov 16, 2022

Anther round of benchmark:

original

$ go test -cpuprofile profile.out -benchtime=10s -bench=. ./store/cachekv/benchmark_test.go -run ^$
goos: darwin
goarch: amd64
cpu: Intel(R) Core(TM) i7-9750H CPU @ 2.60GHz
BenchmarkDeepContextStack1-12     	 1202187	     10004 ns/op
BenchmarkDeepContextStack3-12     	  412293	     29168 ns/op
BenchmarkDeepContextStack10-12    	      61	 184568144 ns/op
BenchmarkDeepContextStack13-12    	       1	11722949259 ns/op
PASS
ok  	command-line-arguments	59.099s

Cache valid status

$ go test -cpuprofile profile.out -benchtime=10s -bench=. ./store/cachekv/benchmark_test.go -run ^$
goos: darwin
goarch: amd64
cpu: Intel(R) Core(TM) i7-9750H CPU @ 2.60GHz
BenchmarkDeepContextStack1-12     	 1275110	      9359 ns/op
BenchmarkDeepContextStack3-12     	  640540	     18609 ns/op
BenchmarkDeepContextStack10-12    	  237964	     50065 ns/op
BenchmarkDeepContextStack13-12    	  182748	     64797 ns/op
PASS
ok  	command-line-arguments	59.894s

Use btree directly

$ go test -cpuprofile profile.out -benchtime=10s -bench=. ./store/cachekv/benchmark_test.go -run ^$
goos: darwin
goarch: amd64
cpu: Intel(R) Core(TM) i7-9750H CPU @ 2.60GHz
BenchmarkDeepContextStack1-12     	 2163434	      5485 ns/op
BenchmarkDeepContextStack3-12     	 1741351	      6830 ns/op
BenchmarkDeepContextStack10-12    	  880039	     13731 ns/op
BenchmarkDeepContextStack13-12    	  698671	     17385 ns/op
PASS
ok  	command-line-arguments	61.765s

@julienrbrt julienrbrt changed the title Optimize iteration on nested cache context perf: optimize iteration on nested cache context Nov 16, 2022
@tac0turtle
Copy link
Member

There is an issue to introduce and rewrite the memdb in favor of tidwall and to stop using the go go routine. This is for cosmos-db. I think we should put this change there and update the repo to cosmos-db like Jacob has done in a pr that is now unblocked. Would this work?

Tidwall has a btreehashmap that is more performant that we may be able to take advantage of as well.

@yihuang
Copy link
Collaborator Author

yihuang commented Nov 16, 2022

There is an issue to introduce and rewrite the memdb in favor of tidwall and to stop using the go go routine. This is for cosmos-db. I think we should put this change there and update the repo to cosmos-db like Jacob has done in a pr that is now unblocked. Would this work?

Tidwall has a btreehashmap that is more performant that we may be able to take advantage of as well.

I see, I guess we can do the first part first, caching the valid status.

@yihuang
Copy link
Collaborator Author

yihuang commented Nov 16, 2022

Actually I think it's maybe ok to have a separate implementation in cachekv, considering the importance of cachekv, and it don't need to be thread-safe like the MemDB, the MemDB is mainly used for mocking db in unit tests.

store/cachekv/btree.go Outdated Show resolved Hide resolved
@codecov
Copy link

codecov bot commented Nov 16, 2022

Codecov Report

Merging #13881 (4a27e1c) into main (2739f83) will increase coverage by 0.11%.
The diff coverage is 65.51%.

❗ Current head 4a27e1c differs from pull request most recent head b2eeaf8. Consider uploading reports for the commit b2eeaf8 to get more accurate results

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main   #13881      +/-   ##
==========================================
+ Coverage   56.25%   56.37%   +0.11%     
==========================================
  Files         667      639      -28     
  Lines       56576    54995    -1581     
==========================================
- Hits        31829    31005     -824     
+ Misses      22165    21466     -699     
+ Partials     2582     2524      -58     
Impacted Files Coverage Δ
store/cachekv/internal/mergeiterator.go 0.00% <0.00%> (ø)
store/cachekv/internal/memiterator.go 70.93% <70.93%> (ø)
store/cachekv/internal/btree.go 90.00% <90.00%> (ø)
store/cachekv/store.go 89.49% <100.00%> (+1.46%) ⬆️
x/distribution/simulation/operations.go 80.64% <0.00%> (-9.68%) ⬇️
x/gov/abci.go 90.29% <0.00%> (-6.86%) ⬇️
x/group/internal/math/dec.go 47.14% <0.00%> (-1.39%) ⬇️
x/group/module/module.go 48.05% <0.00%> (-0.10%) ⬇️
server/mock/tx.go 53.33% <0.00%> (ø)
... and 37 more

@yihuang
Copy link
Collaborator Author

yihuang commented Nov 16, 2022

it don't need to be thread-safe like the MemDB

take back this, it do need to be thread-safe, I've enabled the lock option.

rerun benchmark after recent changes:

$ go test -cpuprofile profile.out -benchtime=10s -bench=. ./store/cachekv/benchmark_test.go -run ^$
goos: darwin
goarch: amd64
cpu: Intel(R) Core(TM) i7-9750H CPU @ 2.60GHz
BenchmarkDeepContextStack1-12     	 2169144	      5495 ns/op
BenchmarkDeepContextStack3-12     	 1677499	      7228 ns/op
BenchmarkDeepContextStack10-12    	  886461	     13547 ns/op
BenchmarkDeepContextStack13-12    	  693322	     17535 ns/op
PASS
ok  	command-line-arguments	62.829s

Copy link
Member

@tac0turtle tac0turtle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

utACK. thank you for this PR.

@tac0turtle tac0turtle enabled auto-merge (squash) December 16, 2022 02:32
@tac0turtle tac0turtle merged commit cbee1b3 into cosmos:main Dec 16, 2022
@yihuang yihuang deleted the optim_cachekv branch December 16, 2022 03:02
@tac0turtle
Copy link
Member

@Mergifyio backport release/v0.46.x

@tac0turtle
Copy link
Member

@Mergifyio backport release/v0.47.x

@mergify
Copy link
Contributor

mergify bot commented Dec 16, 2022

backport release/v0.46.x

✅ Backports have been created

mergify bot pushed a commit that referenced this pull request Dec 16, 2022
Co-authored-by: Aleksandr Bezobchuk <[email protected]>
Co-authored-by: Marko <[email protected]>
Closes #10310

(cherry picked from commit cbee1b3)

# Conflicts:
#	CHANGELOG.md
#	go.mod
#	go.sum
#	simapp/go.mod
#	simapp/go.sum
#	store/cachekv/store.go
#	store/cachekv/store_test.go
#	tests/go.mod
#	tests/go.sum
@mergify
Copy link
Contributor

mergify bot commented Dec 16, 2022

backport release/v0.47.x

✅ Backports have been created

mergify bot pushed a commit that referenced this pull request Dec 16, 2022
Co-authored-by: Aleksandr Bezobchuk <[email protected]>
Co-authored-by: Marko <[email protected]>
Closes #10310

(cherry picked from commit cbee1b3)

# Conflicts:
#	CHANGELOG.md
#	go.mod
#	store/cachekv/store.go
@tac0turtle
Copy link
Member

@Mergifyio backport release/v0.45.x

@mergify
Copy link
Contributor

mergify bot commented Dec 16, 2022

backport release/v0.45.x

✅ Backports have been created

mergify bot pushed a commit that referenced this pull request Dec 16, 2022
Co-authored-by: Aleksandr Bezobchuk <[email protected]>
Co-authored-by: Marko <[email protected]>
Closes #10310

(cherry picked from commit cbee1b3)

# Conflicts:
#	CHANGELOG.md
#	go.mod
#	go.sum
#	simapp/go.mod
#	simapp/go.sum
#	store/cachekv/memiterator.go
#	store/cachekv/store.go
#	store/cachekv/store_test.go
#	tests/go.mod
#	tests/go.sum
mmsqe pushed a commit to mmsqe/cosmos-sdk that referenced this pull request Dec 22, 2022
Co-authored-by: Aleksandr Bezobchuk <[email protected]>
Co-authored-by: Marko <[email protected]>
Closes cosmos#10310

(cherry picked from commit cbee1b3)
mmsqe pushed a commit to mmsqe/cosmos-sdk that referenced this pull request Dec 28, 2022
Co-authored-by: Aleksandr Bezobchuk <[email protected]>
Co-authored-by: Marko <[email protected]>
Closes cosmos#10310

(cherry picked from commit cbee1b3)
tac0turtle added a commit that referenced this pull request Jan 5, 2023
…14341)

Co-authored-by: yihuang <[email protected]>
Co-authored-by: marbar3778 <[email protected]>
Co-authored-by: Matt Kocubinski <[email protected]>
@faddat faddat mentioned this pull request Mar 23, 2023
19 tasks
larry0x pushed a commit to larry0x/cosmos-sdk that referenced this pull request May 22, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Iterator time is exponential in number of Cache wrappings