-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issues with syncing from scratch and long resyncs in the splitstore #6769
Comments
I'm concerned about directing writes directly into the cold store. My cold store is not nearly as performant as my hot store. My original motivation switching to split store was because the performance of my cold store sata SSDs was not good enough to stay in sync. Switching to a split store resolved that, and lotus ran fine for months, until I had another unrelated issue that caused it to stop syncing, and since then I've been stuck. I think forcing a compaction when either A: The compaction is so large it will likely fail due to memory, or B: The hot store is out of disk space, would be a preferable solution. |
So we have merged in master support for on-disk marksets; this alleviates memory pressure during compaction and might as well work to bring your node back to life. Can you give it a try?
|
I would also recommend doing a full (moving) gc in your hotstore to bring it back down to about 55G.
The default is 20, which will do a full gc every 20 compactions (about once a week); you can restore the default after compacting or leave it at 1 if you wish; moving gc is not all that slow (takes about 9min in my nodes). |
Further work on reducing memory usage in #6949 |
Long resyncs should be much better with #8008 which uses on-disk coldset and eliminates sorting. |
From discussion in #5788
It has become apparent that there are two difficulties for the splitstore syncing and compacting, both scenarios that blow the hotstore:
To fix this, the splitstore will need to detect when it is way out of sync (in Start), say for more than CompactionThreshold.
When that's the case, it should switch into a mode where writes are redirected to the coldstore until fully synced, at which point a warmup is run to fetch state object references; the splitstore can then run as normal (and compact etc).
Original Post:
I've run into an issue. I have limited disk space on my hot store, but lots of storage on my cold store. But due to a different error my lotus lost sync for a while. Now that lotus is trying to catch up, it never compacts and fills my hot store disk, then stops syncing again.
I tried starting lotus with --no-bootstrap in the hope that I could trigger a compaction manually, but there is no way to do that either.
Originally posted by @clinta in #5788 (comment)
The text was updated successfully, but these errors were encountered: