-
Notifications
You must be signed in to change notification settings - Fork 235
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Problem: production rocksdb configuration is not optimal #813
Problem: production rocksdb configuration is not optimal #813
Conversation
Solution: - update related dependencies to allow customize rocksdb options. - especially using rocksdb v7. - tune rocksdb options.
Signed-off-by: yihuang <[email protected]>
571fc2f
to
69cdd25
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
still get could not determine kind of name for C.rocksdb_lru_cache_options_set_num_shard_bits
build error in grocksdb
try recent version of rocksdb, or using nix, the nix script use |
@JayT106 I'll merge now to unblock some other fixes, feel free to further discuss the options picked here. |
I remembered the value |
can you try |
tried cronosd |
Can you try Ah ,right. Was thinking your changes already enabled this option. will do it. |
Tried |
* Problem: eth_sendTransaction is not tested * Problem: json-rpc apis fail for legacy blocks after upgrade (#696) * Problem: json-rpc apis fail for legacy blocks Solution: - keep the query handler in cosmos-sdk backward-compatible - add integration test to check * update sdk to upstream * ibc-go to rc2 * Problem: file changes detection in workflow is problematic (backport #703) (#705) * Problem: file changes detection in workflow is problematic Solution: - fix wildcards according the plugin's doc - reformat python * fix py-lint * Problem: after v0.9.0 upgrade eth_call failed on old blocks (backport #713) (#719) * Problem: after v0.9.0 upgrade eth_call failed on old blocks Solution: - make grpc query compatible with old format * debug * fix eth_call * fix gravity upgrade test * update ethermint to main branch * update sdk * Problem: state streamers are not integrated (backport #702) (#721) Solution: - integration the basic file streamer * add integration test * changelog * fix build * fix lint * fix deliver tx event in cosmos-sdk * fix integration test * Update integration_tests/test_streamer.py Signed-off-by: yihuang <[email protected]> * update ethermint and fix build * add a small cli utility into test_streamer.py * fix integration test * update sdk to upstream Signed-off-by: yihuang <[email protected]> Signed-off-by: yihuang <[email protected]> * Problem: new iavl indexes migration is slow and not optional (#714) (#720) * Problem: new iavl indexes migration is slow and not optional Closes: #712 Solution: - Integrate the option introduced in cosmos-sdk * Update CHANGELOG.md Signed-off-by: yihuang <[email protected]> Signed-off-by: yihuang <[email protected]> Signed-off-by: yihuang <[email protected]> * Problem: recent dependencies are not used (backport #729) (#730) * Problem: recent dependencies are not used (backport #729) Solution: - update cosmos-sdk to 0.46.2, ibc-go to v5.0.0, ethermint to recent main branch Update highlights: - new flag to disable fast node migration - fix streaming listeners bug - fix grpc server panic - fix index-eth-tx error on empty db * Update CHANGELOG.md Signed-off-by: yihuang <[email protected]> Signed-off-by: yihuang <[email protected]> * Problem: chain state is inconsistent if upgrade migration is interrupted (#748) * Problem: chain state is inconsistent if upgrade migration is interrupted Solution: - update cosmos-sdk with the fix * Update CHANGELOG.md Signed-off-by: yihuang <[email protected]> * gomod2nix * skip streamer test Signed-off-by: yihuang <[email protected]> * Problem: recent fixes in dependencies are not included (#752) * Problem: recent fixes in dependencies are not included Solution: - update cosmos-sdk and iavl * Update CHANGELOG.md Signed-off-by: yihuang <[email protected]> * fix build Signed-off-by: yihuang <[email protected]> * Problem: binary version is not bump to v1.0.0 (#753) * Problem: recent fixes in dependencies are not used (#757) * Problem: recent fixes in dependencies are not used Solution: - cosmos-sdk -> v0.46.4 - ethermint -> main - ibc-go -> v5.0.1 - add dragonberry ics20 replacement * maintain ethermint fork * Problem: gas used is not backward compatible (#760) Solution: - revert the changes in ethermint * Problem: evm execute result is non-deterministic with concurrent grpc query (#761) * Problem: evm execute result is non-deterministic with concurrent grpc query Solution: - update dependencies to include the fix * Update CHANGELOG.md Signed-off-by: yihuang <[email protected]> * Update go.mod Signed-off-by: yihuang <[email protected]> Signed-off-by: yihuang <[email protected]> * Problem: extra_eips is not cleared on production network (#762) * Problem: extra_eips is not cleared on production network Closes: #755 Solution: - add 1.0.0 upgrade plan to clear it * Update CHANGELOG.md Signed-off-by: yihuang <[email protected]> * Update integration_tests/test_upgrade.py Signed-off-by: yihuang <[email protected]> * fix integration test Signed-off-by: yihuang <[email protected]> * Problem: no error log when iavl set failure trigger app hash mismatch (#763) * Problem: no error log when iavl set failure trigger app hash mismatch Solution: - log the error in cosmos-sdk * Update CHANGELOG.md Signed-off-by: yihuang <[email protected]> * PR merged Signed-off-by: yihuang <[email protected]> * Problem: different result from eth_getProof comparing with Ethereum (#764) * Problem: different result from eth_getProof comparing with Ethereum Solution: - cherry-pick solution from ethermint, thanks @mmsqe * Update CHANGELOG.md Signed-off-by: yihuang <[email protected]> Signed-off-by: yihuang <[email protected]> * Problem: nix exceeds github rate limit occationally in CI (backport #766) (#768) Solution: - configure access-token - update the action plugins * Problem: fixes in ibc-go v5.1 are not included (#765) * Problem: fixes in ibc-go v5.1 are not included Solution: - make a breaking change to upgrade to ibc-go `v5.1.0`. - will do v1.0.0 upgrade on both testnet and mainnet. * Update CHANGELOG.md Signed-off-by: yihuang <[email protected]> * fix lint * include cache fix in tendermint * update sdk * Update CHANGELOG.md Signed-off-by: yihuang <[email protected]> * Update CHANGELOG.md Signed-off-by: yihuang <[email protected]> * make different plan name v1.0.0-testnet3 for testnet3 Signed-off-by: yihuang <[email protected]> Co-authored-by: mmsqe <[email protected]> * Problem: london hardfork number failed validation (#771) * fix upgrade set parameters * changelog * Problem: formal v0.46.5 cosmos-sdk release is not used (#772) * Problem: formal v0.46.5 cosmos-sdk release is not used Solution: - update dependency, should be non-breaking for cronos * Update CHANGELOG.md Signed-off-by: yihuang <[email protected]> * update to v0.46.6 Signed-off-by: yihuang <[email protected]> * Problem: final v1.0.0 is not released (#774) Solution: - update changelog * Problem: manual prune cmd is not included (backport #781) (#782) Solution: - add to root cmd * Problem: cosmos-sdk `v0.46.7` is not used (#790) * Problem: cosmos-sdk `v0.46.7` is not used Solution: - update dependency - `v0.46.7` fix a gov migration issue which affect query votes of old proposals. * Update CHANGELOG.md Signed-off-by: yihuang <[email protected]> * use sdk streamers config * fix streamer test * fix file streamer integration test * changelog Signed-off-by: yihuang <[email protected]> * Problem: discontinued ibc-go version (#802) * Problem: discontinued ibc-go version Solution: - update ibc-go to v5.2.0. - do another coordinated upgrade on testnet3. * Update CHANGELOG.md Signed-off-by: yihuang <[email protected]> * Update app/upgrades.go Signed-off-by: yihuang <[email protected]> Signed-off-by: yihuang <[email protected]> * Problem: production rocksdb configuration is not optimal (#813) * Problem: production rocksdb configuration is not optimal Solution: - update related dependencies to allow customize rocksdb options. - especially using rocksdb v7. - tune rocksdb options. * Update Makefile Signed-off-by: yihuang <[email protected]> * remove rocksdb from niv * rocksdb options * update flake * fix build * create_if_missing * OptimizeLevelStyleCompaction and IncreaseParallelism * remove SetLevelCompactionDynamicLevelBytes and add BlockCache * fix integration test * comments Signed-off-by: yihuang <[email protected]> * Problem: prometheus metrics is lost (#814) * Problem: prometheus metrics is lost Solution: - setup correctly in ethermint * changelog * release v1.0.3 * Update CHANGELOG.md Signed-off-by: yihuang <[email protected]> * fix changelog * fix merge * Update integration_tests/test_upgrade.py Co-authored-by: mmsqe <[email protected]> Signed-off-by: yihuang <[email protected]> * fix test * Update integration_tests/configs/default.jsonnet Signed-off-by: yihuang <[email protected]> * fix test_multiple_attestation_processing * fix changelog Signed-off-by: yihuang <[email protected]> Co-authored-by: mmsqe <[email protected]> Co-authored-by: mmsqe <[email protected]> Co-authored-by: Tomas Tauber <[email protected]>
Solution:
During the versiondb migration experiments, I found the default rocksdb settings are multiple times slower than the one I experimented, because I've tuned options and did a full compaction on the db before, but after I keep the node running with the normal binary for a while, suddenly my script's performance drop significantly, and the sst files are all rewritten by compaction process.
I suspect the main difference is sst file sizes, and probably filter and compression types as well, so I plan to tune the rocksdb options for default binary first, then try the versiondb migration script again, to see if it can get back to the old speed.
Compare Compression Options
sst_dump --file=sample.sst --command=recompress ...
Basically I picked relatively slower speed and higher compression ratio for the bottommost level, I think our node do have the capacity to occupy one core to do the heavier compaction gradually.
👮🏻👮🏻👮🏻 !!!! REFERENCE THE PROBLEM YOUR ARE SOLVING IN THE PR TITLE AND DESCRIBE YOUR SOLUTION HERE !!!! DO NOT FORGET !!!! 👮🏻👮🏻👮🏻
PR Checklist:
make
)make test
)go fmt
)golangci-lint run
)go list -json -m all | nancy sleuth
)Thank you for your code, it's appreciated! :)