Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regen refactor > safer resilient strategy #4005

Open
dapplion opened this issue May 10, 2022 · 4 comments
Open

Regen refactor > safer resilient strategy #4005

dapplion opened this issue May 10, 2022 · 4 comments
Labels
prio-medium Resolve this some time soon (tm). scope-none Issues that do not fit within any of the other defined scopes. scope-security Issues that fix security issues: DOS, key leak, CVEs.

Comments

@dapplion
Copy link
Contributor

dapplion commented May 10, 2022

Background

Consensus clients need to cache some states to fully participate in the network. States are very heavy so you can't cache all the states that you may need. Writing all the possible states you may need to disk is not practical either. So what do you do?

  • Keep in memory the few states that you need the most
  • Regenerate from memory or disk (i.e. re-process blocks) to access states that probabilistically are less useful.

So the stateCache and checkpointStateCache handle the first point: deciding which states to keep in memory. The regen module handles the second: provide the ability to regenerate any state, within some boundary.

Current strategy

  • Keep in memory the most recent 96 states regardless of forks
  • Keep in memory checkpoint states for the most recent ~4 epochs
  • Keep in memory the latest finalized state + the head state
  • Write to disk the latest finalized state, delete the previous finalized state

This approach works well for good network conditions. Thanks for tree structural sharing the cost of those 96 states in a linear chain is very low multiple of the cost of a single state (~1.2x).

However, during attacks, bugs or highly forked network our node quickly runs out of memory or can become unable to follow the chain.

  • If those 96 states are significantly different between each other structural sharing is not useful so the total memory could become 96x the cost of a single state.
  • Same for checkpoint states, see past example of this, which caused fast OOMs Fast OOM when syncing close to head #3171
  • Structural sharing is only useful for states that are close to each other. In long periods of non-finality we would regen from latest finalized state which could be hours old, potentially DOS-ing ourselves.

Relevant issues:

Improvements goals

So, we can do better. Specifically:

  1. Don't let the state cache cause an OOM if states are too expensive
  2. Limit max regen cost in all cases = reduce DOS risk
  3. Make regen as cheap as possible using both regen from memory states and disk states

Proposed strategies

1. Regen from memory and disk

On every checkpoint write a state to disk to a "hot state db" bucket. On finalization, move some of those states "cold state db" or "archive db" bucket. Then on regen, use those states depending on the max distance of the closest available state in memory if any. This would allow to drop the need to keep the finalized state in memory.

2. Bound regen depending on consumer

Depending on the caller, restrict the work triggered by regen

3. WeakRef state cache

Allow GC to drop state when low in memory. Cache only 3 states behind current head. Do it behind a flag to extend modes for lightclient

TBD

Closes #3099

@dapplion dapplion added the scope-security Issues that fix security issues: DOS, key leak, CVEs. label May 10, 2022
@wemeetagain
Copy link
Member

Regarding loading a state from db, iirc this is expensive, like 6+ seconds. Might be good to benchmark this and get this lower.

@dapplion
Copy link
Contributor Author

dapplion commented May 18, 2022

Regarding loading a state from db, iirc this is expensive, like 6+ seconds. Might be good to benchmark this and get this lower.

In terms of time to result, the tradeoff math is roughly:

  • Load from disk = deserialize (600ms) + process few block (10ms x block) + hashTreeRoot (8000ms)
  • Advance old state = process many blocks (10ms x block) + process many epoch transitions (600ms x epoch) + hashTreeRoot (?? ms)

Also keep in mind that if you advance and old state significantly the cost of the final hashTreeRoot can be very high as the whole state is different.

However there's a memory limit in the amount and fork-ness of states you can keep in memory. In bad network conditions you must drop states to prevent OOMs, so regen from disk must always be available.

@stale
Copy link

stale bot commented Sep 21, 2022

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 15 days if no further activity occurs. Thank you for your contributions.

@stale stale bot added the meta-stale Label for stale issues applied by the stale bot. label Sep 21, 2022
@philknows philknows removed the meta-stale Label for stale issues applied by the stale bot. label Sep 23, 2022
@dapplion dapplion added the prio-medium Resolve this some time soon (tm). label Sep 29, 2022
@philknows
Copy link
Member

#6008 should resolve this issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
prio-medium Resolve this some time soon (tm). scope-none Issues that do not fit within any of the other defined scopes. scope-security Issues that fix security issues: DOS, key leak, CVEs.
Projects
None yet
Development

No branches or pull requests

3 participants