Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat:chain:splitstore chain prune #9056

Merged
merged 24 commits into from
Aug 5, 2022
Merged

Conversation

ZenGround0
Copy link
Contributor

@ZenGround0 ZenGround0 commented Jul 19, 2022

Related Issues

#6577

Proposed Changes

Because many changes from the existing coldstore GC PR (#6728) have made it into lotus master and have been heavily modified with the last year of changes this PR is meant to replace #6728.

The scope of this PR is adding correct splitstore coldstore GC, integration tests of the splitstore, and the chain prune command to trigger this manually. Auto cold GC will happen in another PR

Additional Info

There are a few tricky changes to the splitstore itself

  • splitstore correctly serves transaction requests with respect to the right store during the critical section. compactType field and splitstore.go functions are modified to cover this.
  • purge is generalized to delete from the correct store (again compactType achieves this)
  • markset now generally contains both cold and hot store live cids in order to handle transaction protection uniformly. I've optimized puts to not be protected during cold compaction. Reification still needs work and I might do a similar optimization but cold reified objects might show up in hotstore compaction markset to keep things simple.
  • new paths and startup code so coldstore compaction can benefit from checkpoints and recovery

Testing also has some nuanced challenges. The splitstore / itest combined complex relies on the coldstore being the universal blockstore for genesis or any other setup state to be loaded. Coldstore can only be gced if it is backed by a badger blockstore as map blockstores don't implement the BlockstoreGC interface. This means that we need to add an option to itest setup to instantiate a fs repo to get a universal badger blockstore. We don't actually need this if we don't test coldstore gc.

Testing

  • framework writing small amount of hash linked state and then allowing to overwrite it to test whether hot and cold store moved around or deleted that state
  • One hotstore discard happy path test
  • One badger coldstore and chain prune happy path test

Checklist

Before you mark the PR ready for review, please make sure that:

  • All commits have a clear commit message.
  • The PR title is in the form of of <PR type>: <area>: <change being made>
    • example: fix: mempool: Introduce a cache for valid signatures
    • PR type: fix, feat, INTERFACE BREAKING CHANGE, CONSENSUS BREAKING, build, chore, ci, docs,perf, refactor, revert, style, test
    • area: api, chain, state, vm, data transfer, market, mempool, message, block production, multisig, networking, paychan, proving, sealing, wallet, deps
  • This PR has tests for new functionality or change in behaviour
  • If new user-facing features are introduced, clear usage guidelines and / or documentation updates should be included in https://lotus.filecoin.io or Discussion Tutorials.
  • CI is green

@ZenGround0 ZenGround0 marked this pull request as ready for review August 3, 2022 19:34
@ZenGround0 ZenGround0 requested a review from a team as a code owner August 3, 2022 19:34
@@ -255,6 +258,15 @@ func (bm *BlockMiner) MineBlocks(ctx context.Context, blocktime time.Duration) {
defer bm.wg.Done()

for {
if paused := atomic.LoadInt64(&bm.pause); paused == 1 {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm an atomic n00b so I might be getting the usage quite wrong

@@ -283,6 +295,18 @@ func (bm *BlockMiner) InjectNulls(rounds abi.ChainEpoch) {
atomic.AddInt64(&bm.nextNulls, int64(rounds))
}

// Pause compels the miner to wait for a signal to restart
func (bm *BlockMiner) Pause() {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These helped me avoid races in test and seem generally useful. Worth checking carefully to make sure I'm not opening up deadlocks.

itests/kit/ensemble.go Outdated Show resolved Hide resolved
@@ -102,6 +102,14 @@ func (b *idstore) Put(ctx context.Context, blk blocks.Block) error {
return b.bs.Put(ctx, blk)
}

func (b *idstore) ForEachKey(f func(cid.Cid) error) error {
Copy link
Contributor Author

@ZenGround0 ZenGround0 Aug 3, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needed so that badger backed cold store satisfies the interface needed for pruning

@@ -14,35 +14,36 @@ func NewMemory() MemBlockstore {
}

// MemBlockstore is a terminal blockstore that keeps blocks in memory.
type MemBlockstore map[cid.Cid]blocks.Block
// To match behavior of badger blockstore we index by multihash only.
type MemBlockstore map[string]blocks.Block
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something I learned during this work is that the badger backed blockstores store mh only as keys to blocks. Until now the MemBlockstore has had different behavior, only matching when codec also matches. When using a coldstore backed by this MemBlockstore this led to a byzantine failure case. I think this is more correct across the board.

@ZenGround0
Copy link
Contributor Author

I'll add a chain prune discussion tutorial before merging

@ZenGround0 ZenGround0 removed the request for review from vyzo August 4, 2022 03:34
Copy link
Contributor

@magik6k magik6k left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First pass, couple of (mostly nitpicky) comments. I feel like I have a decent idea of what's going on here, and the core logic makes sense.

Would be nice to setup some chaos-style tests for both this and hotstore compaction, crashing the process in the middle of compacting, and seeing if we resume correctly (not as a part of this PR, but before we call splitstore stable)

blockstore/mem.go Outdated Show resolved Hide resolved
api/api_full.go Outdated Show resolved Hide resolved
itests/kit/blockminer.go Outdated Show resolved Hide resolved
itests/kit/blockminer.go Outdated Show resolved Hide resolved
itests/kit/blockminer.go Show resolved Hide resolved
blockstore/splitstore/splitstore.go Show resolved Hide resolved
blockstore/splitstore/splitstore_prune.go Show resolved Hide resolved
blockstore/splitstore/splitstore_prune.go Outdated Show resolved Hide resolved
blockstore/splitstore/splitstore_prune.go Show resolved Hide resolved
blockstore/splitstore/splitstore_prune.go Show resolved Hide resolved
@ZenGround0 ZenGround0 requested a review from magik6k August 5, 2022 04:34
Copy link
Contributor

@magik6k magik6k left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nothing serious to complain about.

Definitely needs lots of testing, and maybe measuring perf in a bunch of scenarios, but that seems to be planned already.

blockstore/splitstore/splitstore.go Show resolved Hide resolved
blockstore/splitstore/splitstore_prune.go Show resolved Hide resolved
@magik6k
Copy link
Contributor

magik6k commented Aug 5, 2022

Nothing serious to complain about.

Well, other than a bunch of itests hanging for whatever reason

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants