-
Notifications
You must be signed in to change notification settings - Fork 231
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
nimbus spontanously crashes with "database disk image is malformed" #6425
Comments
Sorry, there is no such information, so i will ask, do you have enough free space on the disk where database is stored? |
Yes, there is more than enough space in both cases (over 200GB free on both hosts).
No disk/filesystem error as far I can see.
Yes, but note those are on two separate hosts - on one it failed spontaneously, and on another didn't started anymore after update (no issues before update). I suspect it might be related to something on the network at that time, but I'm not sure... |
This error message is from SQLITE3 code |
sqlite3 you say, so I did this (on the one that failed during write operation):
and got:
I'm not sure how helpful that is... |
We've never seen this particular error, and it appears to be something happening in the SQLite library itself, given the
Nimbus does not use SQLite3 in a fine-grained enough way to seemingly trigger such an issue unless other random memory corruption or similar issues are happening. It's worth checking, perhaps, if the nodes and hosts in question:
Should one be given to understand that
it's otherwise all defaults, bare metal, ext4, default filesystem mount options? |
It is a VM (on Xen), but otherwise plain ext4, and with I don't see anything unusual in monitoring at that time (temperature, i/o rates, RAID state etc all at normal). BTW, yesterday two more hosts behaved in an usual but different way - OOM killer killed nimbus process, after it quickly reached over 16GB (normally sits at around 4GB). Never happened before. |
In the meantime, the database crash happened two more times (on yet another hosts), but interestingly, after automatic service restart (via systemd) it continued normally. Here is one of the crashes: Details
and the startup:
The database itself has about 158GB, which I assume is expected size, right? |
Yes, that's the expected size. |
Describe the bug
After about a month of uptime, Nimbus beacon node crashed and refuses to start anymore complaining database is malformed. This happened on two separate hosts about 1h apart.
To Reproduce
Steps to reproduce the behavior:
crash message from 24.5.1
logs from 24.6.0
Screenshots
If applicable, add screenshots to help explain your problem.
Additional context
I'm not really sure if those two incidents are related, but since Nimbus was running flawlessly before, and it happened in similar time on two separate hosts (even in separate physical locations), I suspect they might be related.
Few other hosts running nimbus 24.6.0 and 24.5.1 are not affected.
The text was updated successfully, but these errors were encountered: