Switch to ParityDB by default; deprecate RocksDB #1792

koute · 2023-10-04T20:13:09Z

ParityDB seems to be stable and work pretty well, and it's significantly more efficient than RocksDB, so let's finally switch to it by default and get rid of RocksDB.

This PR disables the ability to create new RocksDB databases and makes ParityDB the default database used.

Existing RocksDB databases are still supported and will still work, but will emit a warning about its deprecation.

Here's how the error looks like when starting a Polkadot node with a --database rocksdb where there's no existing RocksDB database present:

2023-10-04 18:46:02 Parity Polkadot
2023-10-04 18:46:02 ✌️  version 1.1.0-d95ce90cbc3
2023-10-04 18:46:02 ❤️  by Parity Technologies <[email protected]>, 2017-2023
2023-10-04 18:46:02 📋 Chain specification: Development
2023-10-04 18:46:02 🏷  Node name: boring-brick-3060
2023-10-04 18:46:02 👤 Role: AUTHORITY
2023-10-04 18:46:02 💾 Database: RocksDb at /tmp/db/chains/rococo_dev/db/full
Error:
   0: Backend error: Creating new RocksDB databases is not supported anymore; please use ParityDB

Backtrace omitted. Run with RUST_BACKTRACE=1 environment variable to display it.
Run with RUST_BACKTRACE=full to include source snippets.

And here's the deprecation warning if a RocksDB database already exists:

2023-10-04 21:57:42 Substrate Node
2023-10-04 21:57:42 ✌️  version 3.0.0-dev-0db50256685
2023-10-04 21:57:42 ❤️  by Parity Technologies <[email protected]>, 2017-2023
2023-10-04 21:57:42 📋 Chain specification: Development
2023-10-04 21:57:42 🏷  Node name: utter-name-3738
2023-10-04 21:57:42 👤 Role: AUTHORITY
2023-10-04 21:57:42 💾 Database: RocksDb at /tmp/db/chains/dev/db/full
2023-10-04 21:57:45 💀 RocksDB support is deprecated and will be removed on 2024-04-01; please switch to ParityDB
2023-10-04 21:57:45 [0] 💸 generated 1 npos voters, 1 from validators and 0 nominators
2023-10-04 21:57:45 [0] 💸 generated 1 npos targets

…uto`

paritytech-cicd-pr · 2023-10-05T13:07:43Z

The CI pipeline was cancelled due to failure one of the required jobs.
Job name: test-linux-stable 2/3
Logs: https://gitlab.parity.io/parity/mirrors/polkadot-sdk/-/jobs/3884841

bkchr · 2023-10-05T13:34:37Z

substrate/client/db/src/upgrade.rs

-		)
-		.map(|_| ())
-		.map_err(|e| sp_blockchain::Error::Backend(e.to_string()))
+		crate::utils::open_kvdb_rocksdb::<Block>(db_path, db_type, true, 128)


Why not use parity db in this test?

bkchr · 2023-10-05T13:38:38Z

polkadot/tests/purge_chain_works.rs

+		let blob = std::io::Cursor::new(include_bytes!("rocksdb_test_database.zip"));
+		zip::read::ZipArchive::new(blob).unwrap().extract(&tmpdir).unwrap();


Why not just use open_kvdb_rocksdb?

bkchr · 2023-10-05T13:39:17Z

cumulus/polkadot-parachain/src/main.rs

-fn main() -> sc_cli::Result<()> {
-	command::run()
+fn main() {
+	if let Err(error) = command::run() {


Why? To use the Display implementation? Can we not make Debug use Display internally?

arkpar · 2023-10-05T15:25:46Z

Thank you for taking action ;-)

We're currently working on a major feature in paritydb that will change how state data is stored. That might affect stability. I still think we can make the switch now and roll out this new storage method in a fail safe or experimental manner.

This PR disables the ability to create new RocksDB databases

This seems a bit harsh. It will break existing setups that people may have for testing, automation, etc.
I suggest deprecating --db=rocksdb with a similar warning message first, disabling it in a subsequent release.

bkchr · 2023-10-05T19:01:01Z

This seems a bit harsh. It will break existing setups that people may have for testing, automation, etc.

I think it is fine. We want to discourage creating new dbs. You can still use --database rocksdb to open an old db. As long as your stuff continues to work in production, I don't see any problem in this.

hitchhooker · 2023-10-12T17:15:46Z

took me a month(september) to sync kusama on paritydb with archive mode on 7950x3d + ecc ram + nvme:s in Bangkok, db constantly stopping sync and sync files being. rocksdb syncs in a day or two. that major feature update is said to fix this, but might be something to consider.
sync experience was so painful that i really hope rocksdb aint deprecated too soon

gituser · 2023-10-14T13:22:10Z

@koute

ParityDB seems to be stable and work pretty well, and it's significantly more efficient than RocksDB

Can you explain more how it is efficient?
It seems ParityDB takes much more time (slower) compared to RocksDB for initial sync, in terms of storage ParityDB is the same as RocksDB (~300 GB with --pruning=100000), ParityDB uses more host memory compared to RocksDB.

RocksDB imo much more stable compared to ParityDB and it shouldn't be removed or deprecated at this stage unless all these issues are fixed first.

bkchr · 2023-10-15T08:35:25Z

@hitchhooker @gituser please open issues in the parity db repo. We need to fix them. However, rocksdb has its own downsides and we don't want to continue to maintain it.

koute · 2023-10-15T10:57:27Z

took me a month(september) to sync kusama on paritydb with archive mode on 7950x3d + ecc ram + nvme:s in Bangkok, db constantly stopping sync and sync files being. rocksdb syncs in a day or two. that major feature update is said to fix this, but might be something to consider.
sync experience was so painful that i really hope rocksdb aint deprecated too soon

Are you sure this was due to ParityDB and not a generic syncing issue related to networking? (I have seen e.g. syncing stopping myself too in the past, but that was also on RocksDB and was unrelated to the database used.)

ParityDB uses more host memory compared to RocksDB

Last time I checked the difference was visibly in favor of ParityDB, unless something changed since that time. From time to time I run performance stress tests where I run a testnet composed of hundreds of nodes on a single machine, and in that test I can run many more nodes when using ParityDB before I hit the memory limit and everything crashes.

Anyway, thank you for your feedback. If you see any problems with ParityDB please make sure to open an issue! If people don't report the problems they see we can't fix them.

@bkchr Perhaps we should go with a little bit softer deprecation here, just in case there are actually still some showstopper problems related to ParityDB?

How about we add a temporary escape hatch, e.g. --allow-deprecated-create-new-rocksdb-database parameter (or something like that) which will allow people to still create new RocksDB databases if absolutely necessary? This will still hopefully push people who have --database rocksdb parameter hardcoded in their node invocations to switch to ParityDB as that will still produce an error, but it case there are actually any showstopper bugs they'll be able to temporarily switch back until we fix the issue.

(So people will have to have --allow-deprecated-create-new-rocksdb-database --database rocksdb to force the node to create a new RocksDB database.)

And if there will be no new issues reported we can just nuke this parameter in the next release or two.

I will also do some testing myself syncing my archive node under ParityDB from scratch a couple of times, just in case.

kogeler · 2023-10-17T16:40:04Z

@bkchr
We are switching from ParityDB to RocksDB for some of our critical nodes right now.

Big DBs can't sync at all if they have a big gap of unsynchronized in the blocks. I met this a lot of times. It is a little bit dangerous in some cases.

ref:

sync issue for big databases (archive nodes) parity-db#215
Tree column parity-db#199 ( the issue hasn't been fixed since Apr)

kogeler · 2023-10-17T16:45:39Z

This seems a bit harsh. It will break existing setups that people may have for testing, automation, etc.

I think it is fine. We want to discourage creating new dbs. You can still use --database rocksdb to open an old db. As long as your stuff continues to work in production, I don't see any problem in this.

recently we had had some cases when we needed to sync new DBs from scratch. We couldn't do it using ParityDB. We were forced to use binary DB snapshots.

kogeler · 2023-10-17T16:54:28Z

@koute

I am absolutely sure that the problem is not in the network or in the machines. We use large instances with huge bandwidth with the fastest disks that the large cloud providers can offer. The issue looks like an overloaded disk subsystem. You can check the screenshots of the load charts in my issue.

arkpar · 2023-10-17T17:52:28Z

Archive DBs are indeed slower at the moment on most VPC/VPS machines. This is mostly because there's a lot of small random writes going on. The database hits VPC or hardware IOPS limit. Every inserted trie node results in a random write
Having lots of RAM for page cache helps with performance here.
paritytech/parity-db#199 should resolve this issue by eliminating random writes, but it is a substantial change (mostly on the trie/substrate side). I estimate it to be ready in 4-6 weeks.

kogeler · 2023-10-18T05:03:33Z

Archive DBs are indeed slower at the moment on most VPC/VPS machines. This is mostly because there's a lot of small random writes going on. The database hits VPC or hardware IOPS limit. Every inserted trie node results in a random write Having lots of RAM for page cache helps with performance here. paritytech/parity-db#199 should resolve this issue by eliminating random writes, but it is a substantial change (mostly on the trie/substrate side). I estimate it to be ready in 4-6 weeks.

GCP SSD PD (that I used in the issue) has performance parameters:

Read throughput per VM (MB/s) - 1,200
Write throughput per VM (MB/s) - 1,200
Read IOPS per VM - 100,000
Write IOPS per VM - 80,000

Also, I used my own bare-metal instances with local nvme SSD disks. It doesn't matter. Parity DB can not sync current Polkadot and Kusama state archive DBs from scratch.

I feel that the ParityDB database is not stable enough to disable RocksDB.

kogeler · 2023-10-18T05:04:01Z

other performance issue refs:

kogeler · 2023-10-18T05:21:19Z

Also, I created the feature request #219 that is related to the ParityDB snapshots.

It is also relevant because we cannot synchronize the database from scratch and cannot quickly load it from a snapshot in a predictable time at the same time.

It's why we are switching from ParityDB to RocksDB for some of our critical nodes right now.

hitchhooker · 2023-10-18T08:57:35Z

Archive DBs are indeed slower at the moment on most VPC/VPS machines. This is mostly because there's a lot of small random writes going on. The database hits VPC or hardware IOPS limit. Every inserted trie node results in a random write Having lots of RAM for page cache helps with performance here. paritytech/parity-db#199 should resolve this issue by eliminating random writes, but it is a substantial change (mostly on the trie/substrate side). I estimate it to be ready in 4-6 weeks.

My setup is designed to reach 250k IOPS read/write per TB at hardware level, but I think ZFS as a filesystem might be the limiting factor for random writes(about 2.5x slower performance than lvm/ext4 for random writes in benchmark).

arkpar · 2023-10-18T09:10:50Z

ZFS uses 128k IO pages by default, which is suboptimal for DB workloads, especially ones involving small random writes.
https://openzfs.github.io/openzfs-docs/Performance%20and%20Tuning/Workload%20Tuning.html#database-workloads
For parity-db ZFS recordsize option should be set to a minimum. E.g. 4k

hitchhooker · 2023-10-18T09:49:20Z

I think main issue for poor performance is just copy on write. Have been running with 16k recordsize for paritydb without really improving benchmarks. Will give 4k a try.

disks=("nvme1n1" "nvme2n1" "nvme3n1" "nvme4n1")
zpool create -o ashift=12 tank $(for disk in "${disks[@]}"; do echo "/dev/${disk}p2"; done)

# Disable access time (atime) as it can negatively impact performance
zfs set atime=off tank
# Set recordsize to 16K as most values in the ParityDb are small and values over 16K are rare
zfs set recordsize=16k tank
# throughput safer than latency
zfs set logbias=throughput tank
# Set the primary cache to only metadata, as ParityDb relies on the OS page cache
zfs set primarycache=metadata tank
# Enable compression as it can provide both space and performance benefits
zfs set compression=lz4 tank
# Set redundant metadata to most to protect against data corruption
zfs set redundant_metadata=most tank
# Synchronous writes (sync) should be set to standard to ensure data integrity in case of an unexpected shutdown
zfs set sync=standard tank

echo "Finished setting up ZFS pool and swap partitions"

ggwpez · 2023-10-18T10:02:23Z

I started a Kusama archive sync on GCP server ref-hw-ggwpez so that anyone from Parity can SSH into from the web-UI.
It is currently at block 2M and already doing 17k write IOPS. It has 4 local NVMEs with a combined 680K/360K read/write IOPS.

bkchr · 2023-10-18T10:04:30Z

How about we add a temporary escape hatch, e.g. --allow-deprecated-create-new-rocksdb-database parameter (or something like that) which will allow people to still create new RocksDB databases if absolutely necessary? This will still hopefully push people who have --database rocksdb parameter hardcoded in their node invocations to switch to ParityDB as that will still produce an error, but it case there are actually any showstopper bugs they'll be able to temporarily switch back until we fix the issue.

(So people will have to have --allow-deprecated-create-new-rocksdb-database --database rocksdb to force the node to create a new RocksDB database.)

Good idea! But just let's do it by --database rocksdb-deprecated-whatever.

hitchhooker · 2023-11-24T07:35:21Z

what I have noticed that zfs is really terrible.

single nvme with ext4

2x these nvme in striped zfs with 4k recordsize

so I doubt that actually the issues with syncing archive paritydb databases are related to use of zfs/btrfs instead just lvm/ext4. I guess copy on write slows down random writes, so probably good idea to recommend to avoid usage of zfs/btrfs for database disks. I think @kogeler had also some alternative filesystem in use.

11279 ± ./scripts/test_bootnodes.sh [0m]! ✹ ✭ Initialized output directory and JSON file. relaychain: kusama /usr/local/bin/polkadot-parachain --no-hardware-benchmarks --no-mdns -d /tmp/bootnode_data --chain ./chain-spec/people-kusama.json --relay-chain-rpc-urls wss://rpc.ibp.network/kusama --bootnodes /dns/boot-node.helikon.io/tcp/7510/p2p/12D3KooWM1X4setrMWjwnV8iDkAtYhqFHNkGozdWdq6sawWh5Yhv 2024-06-10 12:41:21 [Parachain] ⚙️ Syncing 115.2 bps, target=#175337 (3 peers), best: paritytech#1792 (0x8e56…b18a), finalized #0 (0xc1af…8b3f), ⬇ 943.9kiB/s ⬆ 10.7kiB/s helikon people-kusama peer_count: 3 relaychain: kusama /usr/local/bin/polkadot-parachain --no-hardware-benchmarks --no-mdns -d /tmp/bootnode_data --chain ./chain-spec/people-kusama.json --relay-chain-rpc-urls wss://rpc.ibp.network/kusama --bootnodes /dns/boot-node.helikon.io/tcp/7512/wss/p2p/12D3KooWM1X4setrMWjwnV8iDkAtYhqFHNkGozdWdq6sawWh5Yhv 2024-06-10 12:41:47 [Parachain] ⚙️ Syncing 115.2 bps, target=#175339 (2 peers), best: #1024 (0x2970…7678), finalized #0 (0xc1af…8b3f), ⬇ 541.6kiB/s ⬆ 14.4kiB/s helikon people-kusama peer_count: 2 All data has been fetched and saved.

timorleph · 2024-07-16T10:20:43Z

Hi, is making ParityDB default something you are still working on? If so, when would you expect this to be done?

We are wondering which DB we should use, and knowing what your plans related to this PR are would be helpful.

bkchr · 2024-07-16T19:26:11Z

We're currently working on a major feature in paritydb that will change how state data is stored. That might affect stability. I still think we can make the switch now and roll out this new storage method in a fail safe or experimental manner.

We are still waiting for this.

arkpar · 2024-07-17T07:31:36Z

The parity-db feature is complete and the integration into substrate was pending testing with this PR: #4176
@MattHalpinParity is taking over the effort to get it merged.

timorleph · 2024-07-19T07:59:41Z

So I assume #4176 is the sole blocker here – do you have an (approximate ofc) expected date of delivery? Asking mostly since the last activity there is from a couple months ago.

koute added 4 commits October 4, 2023 21:41

Change the default database to ParityDB

4e70781

Disable creation of new RocksDB databases

eac0193

Print out errors through Display in Cumulus

0db5025

Print RocksDB deprecation warning

5421afe

koute requested review from arkpar and a team October 4, 2023 20:13

koute added 4 commits October 4, 2023 23:45

Handle non-existing RocksDB databases when running with `--database a…

f6cdf51

…uto`

Fix tests

3aaf976

Remove unused import in tests

8fb0767

Fix the purge_chain_rocksdb_works test

4064cba

bkchr reviewed Oct 5, 2023

View reviewed changes

hitchhooker mentioned this pull request Jun 10, 2024

Add Helikon bootnode for People Kusama ibp-network/polkadot-sdk#7

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Switch to ParityDB by default; deprecate RocksDB #1792

Switch to ParityDB by default; deprecate RocksDB #1792

koute commented Oct 4, 2023

paritytech-cicd-pr commented Oct 5, 2023

bkchr Oct 5, 2023

bkchr Oct 5, 2023

bkchr Oct 5, 2023

arkpar commented Oct 5, 2023

bkchr commented Oct 5, 2023

hitchhooker commented Oct 12, 2023

gituser commented Oct 14, 2023

bkchr commented Oct 15, 2023

koute commented Oct 15, 2023

kogeler commented Oct 17, 2023 •

edited

Loading

kogeler commented Oct 17, 2023

kogeler commented Oct 17, 2023

arkpar commented Oct 17, 2023

kogeler commented Oct 18, 2023 •

edited

Loading

kogeler commented Oct 18, 2023 •

edited

Loading

kogeler commented Oct 18, 2023 •

edited

Loading

hitchhooker commented Oct 18, 2023

arkpar commented Oct 18, 2023

hitchhooker commented Oct 18, 2023 •

edited

Loading

ggwpez commented Oct 18, 2023 •

edited

Loading

bkchr commented Oct 18, 2023

hitchhooker commented Nov 24, 2023

timorleph commented Jul 16, 2024

bkchr commented Jul 16, 2024

arkpar commented Jul 17, 2024

timorleph commented Jul 19, 2024

		let blob = std::io::Cursor::new(include_bytes!("rocksdb_test_database.zip"));
		zip::read::ZipArchive::new(blob).unwrap().extract(&tmpdir).unwrap();

Switch to ParityDB by default; deprecate RocksDB #1792

Are you sure you want to change the base?

Switch to ParityDB by default; deprecate RocksDB #1792

Conversation

koute commented Oct 4, 2023

paritytech-cicd-pr commented Oct 5, 2023

bkchr Oct 5, 2023

Choose a reason for hiding this comment

bkchr Oct 5, 2023

Choose a reason for hiding this comment

bkchr Oct 5, 2023

Choose a reason for hiding this comment

arkpar commented Oct 5, 2023

bkchr commented Oct 5, 2023

hitchhooker commented Oct 12, 2023

gituser commented Oct 14, 2023

bkchr commented Oct 15, 2023

koute commented Oct 15, 2023

kogeler commented Oct 17, 2023 • edited Loading

kogeler commented Oct 17, 2023

kogeler commented Oct 17, 2023

arkpar commented Oct 17, 2023

kogeler commented Oct 18, 2023 • edited Loading

kogeler commented Oct 18, 2023 • edited Loading

kogeler commented Oct 18, 2023 • edited Loading

hitchhooker commented Oct 18, 2023

arkpar commented Oct 18, 2023

hitchhooker commented Oct 18, 2023 • edited Loading

ggwpez commented Oct 18, 2023 • edited Loading

bkchr commented Oct 18, 2023

hitchhooker commented Nov 24, 2023

timorleph commented Jul 16, 2024

bkchr commented Jul 16, 2024

arkpar commented Jul 17, 2024

timorleph commented Jul 19, 2024

kogeler commented Oct 17, 2023 •

edited

Loading

kogeler commented Oct 18, 2023 •

edited

Loading

kogeler commented Oct 18, 2023 •

edited

Loading

kogeler commented Oct 18, 2023 •

edited

Loading

hitchhooker commented Oct 18, 2023 •

edited

Loading

ggwpez commented Oct 18, 2023 •

edited

Loading