-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rocksdb corruption issues on compaction #9009
Comments
Linked rocksdb issue: |
The |
I've not heard any reports of this issue recently, closing for now |
actually, there is sporadic reports of this (search our discord server)..... |
I've been waiting to exclude our unusual manual compaction codepath could possibly be causing this. But, after the write stall fix is well rolled out, there's still reports. So, I dug up a bit:
making a fun debugging story short, this boils down rocksdb actually caught single bit rot from underlying hardware. So, our next action here would be to properly handle the error and show message and abort
|
@ryoqun @sakridge As requested, a crash report: We experienced it on
File: https://github.com/calypso-it/sol-rocksdb-panic/blob/main/011080.sst |
@yhchiang-sol could you take a look at that report? Is there a way we can recover from checksum mismatch maybe by removing that data which is corrupted and re-downloading it? |
Should be something doable, but we need to write some code to make it happen. Since the corruption happens in one or more blocks inside a sst file, it means we might still be able to read its sst file metadata to obtain its key range. If so, then we might be able to recover using the following steps:
Note that there would be some subtle difference between the recovered db and the db before the corruption as the recovered entries will become newer than they used to be, but if we always read the db without snapshot, then we should be fine. |
Note that the assumption here is we are still able to read the metadata of the sst file that has one or more block corruption. Btw, in case the corrupted instance is still there, can I have a copy of that instance for debug purposes? |
@yhchiang-sol What files do you mean by "corrupted instance"? |
Ideally, the entire rocksdb database, but that will definitely be too big to keep.
Thank you, can you save the sst file as well on the next crash. Btw, can I know whether this happens frequently to all your validators or just one validator if you happen to have multiple validators? |
(i think this is another data corruption. let me revive the last investigation patch...) |
so, this time wan't that the root cause is a single bit flip. but there was a oddly zero-ed range in the reported sst file:
as far as I understand, sst files serialization isn't aligned that way. Also, sst files won't waste file space with those zeros under its usual operation. so, i highly suspect the underlying filesystem/hardware wiped the nicely-hex-round block range due to some failure. |
... so rocksdb is innocent for this case as well. :) |
Thanks for adding some data points, @ryoqun. Btw, in case the underlying filesystem/hardware is the root cause, we are still interested in providing the recovery tool to recover the sst file? |
@calypso-it hi, did you experience this panic at testnet or mainnet-beta? |
Testnet
That was my suspicion, too. I will change the NVMe and see, if the issue persists. Nonetheless auto-recovery / repair would be a nice feature, if repairable |
Good to hear that. I will invest some time into this. |
Aghh, a bit show stopper of my previously proposed solution based on my WIP PR (#26790) as we are not able to obtain the key range of the corrupted sst file even if the corruption only happens on the data block. Output from a normal sst file:
output from a corrupted sst file, where the tool is not able to obtain the sst file metadata. Probably need to change rocksdb code in order to bypass the check.
|
Please ignore my previous comment as it turns out that xxd somehow always padding one more byte that cause the file unreadable to rocksdb. If the corruption only happens in the data block of an sst file without changing the file size, then we are able to obtain the key range of that corrupted file! This opens the possibility that we might be able to replace the corrupted file by copying uncorrupted data of that key range from a healthy validator instance.
|
Created a separate issue for recovery tools: #26813 |
Problem
Seen messages like this on many validators from rocksdb.
Proposed Solution
Debug and fix.
The text was updated successfully, but these errors were encountered: