Issues with syncing from scratch and long resyncs in the splitstore #6769

vyzo · 2021-07-16T04:57:35Z

From discussion in #5788

It has become apparent that there are two difficulties for the splitstore syncing and compacting, both scenarios that blow the hotstore:

In a sync from scratch (@whyrusleeping wants to do that) everything goes into the hotstore, which will then try to compact once synced and very likely fail because of memory requirements.
In a resync after long downtime, the same problem arises: the hotstore might be blown and compaction might have difficulties running.

To fix this, the splitstore will need to detect when it is way out of sync (in Start), say for more than CompactionThreshold.
When that's the case, it should switch into a mode where writes are redirected to the coldstore until fully synced, at which point a warmup is run to fetch state object references; the splitstore can then run as normal (and compact etc).

Original Post:
I've run into an issue. I have limited disk space on my hot store, but lots of storage on my cold store. But due to a different error my lotus lost sync for a while. Now that lotus is trying to catch up, it never compacts and fills my hot store disk, then stops syncing again.

I tried starting lotus with --no-bootstrap in the hope that I could trigger a compaction manually, but there is no way to do that either.

Originally posted by @clinta in #5788 (comment)

clinta · 2021-07-16T12:29:25Z

I'm concerned about directing writes directly into the cold store. My cold store is not nearly as performant as my hot store. My original motivation switching to split store was because the performance of my cold store sata SSDs was not good enough to stay in sync. Switching to a split store resolved that, and lotus ran fine for months, until I had another unrelated issue that caused it to stop syncing, and since then I've been stuck.

I think forcing a compaction when either A: The compaction is so large it will likely fail due to memory, or B: The hot store is out of disk space, would be a preferable solution.

vyzo · 2021-07-28T08:27:58Z

So we have merged in master support for on-disk marksets; this alleviates memory pressure during compaction and might as well work to bring your node back to life.

Can you give it a try?
You can enable by adding this to the config

[Chainstore.Splitstore]
 MarkSetType = "badger"

vyzo · 2021-07-28T08:30:31Z

I would also recommend doing a full (moving) gc in your hotstore to bring it back down to about 55G.
You can force a full gc in the next compaction with this setting:

[Chainstore.Splitstore]
  HotStoreFullGCFrequency = 1

The default is 20, which will do a full gc every 20 compactions (about once a week); you can restore the default after compacting or leave it at 1 if you wish; moving gc is not all that slow (takes about 9min in my nodes).

vyzo · 2021-07-30T08:33:04Z

Further work on reducing memory usage in #6949

vyzo · 2022-02-07T10:10:11Z

Long resyncs should be much better with #8008 which uses on-disk coldset and eliminates sorting.

vyzo changed the title ~~Support syncing from scratch and long resyncs in the splitstore~~ Issues with syncing from scratch and long resyncs in the splitstore Jul 16, 2021

vyzo added the kind/bug Kind: Bug label Jul 16, 2021

vyzo self-assigned this Jul 16, 2021

vyzo added the epic/splitstore label Jul 28, 2021

vyzo mentioned this issue Aug 19, 2021

Splitstore: Rework the Compaction algorithm to eliminate sorting #7137

Closed

jennijuju removed the epic/splitstore label Dec 13, 2021

vyzo mentioned this issue Feb 7, 2022

Splitstore: the road to production readiness #8037

Open

10 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Issues with syncing from scratch and long resyncs in the splitstore #6769

Issues with syncing from scratch and long resyncs in the splitstore #6769

vyzo commented Jul 16, 2021 •

edited

Loading

clinta commented Jul 16, 2021

vyzo commented Jul 28, 2021

vyzo commented Jul 28, 2021

vyzo commented Jul 30, 2021

vyzo commented Feb 7, 2022

Issues with syncing from scratch and long resyncs in the splitstore #6769

Issues with syncing from scratch and long resyncs in the splitstore #6769

Comments

vyzo commented Jul 16, 2021 • edited Loading

clinta commented Jul 16, 2021

vyzo commented Jul 28, 2021

vyzo commented Jul 28, 2021

vyzo commented Jul 30, 2021

vyzo commented Feb 7, 2022

vyzo commented Jul 16, 2021 •

edited

Loading