Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce splitstore memory usage during chain walks #6949

Merged
merged 16 commits into from
Aug 10, 2021

Conversation

vyzo
Copy link
Contributor

@vyzo vyzo commented Jul 30, 2021

Walking the chain was using a cid.Set, which can grow quite big (30M+ objects, maybe more) leading to signficant transient memory usage.

This adds an ObjectVisitor interface to use instead of a blanket cid.Set. This allows us to use a noop set during compaciton (as we already have the markset to lean on, which will be on-disk with MarkSetType = "badger") and markset-backed ones during warmup/check.

We also reset the walked block set on every iteration, to avoid building up a big one with all the blocks; we can do that because the walk is BFS.

@vyzo vyzo requested a review from Stebalien July 30, 2021 08:01
@vyzo
Copy link
Contributor Author

vyzo commented Jul 30, 2021

Opening as draft until I have tested timing/memory effects in my nodes.

@vyzo vyzo added epic/splitstore team/ignite Issues and PRs being tracked by Team Ignite at Protocol Labs labels Jul 30, 2021
@vyzo
Copy link
Contributor Author

vyzo commented Jul 30, 2021

warmup with on-disk visitor is some 2x slower, takes about an hour now.

@vyzo
Copy link
Contributor Author

vyzo commented Jul 30, 2021

marking is somewhat slower also; but on the bright side, there was no watchdog action whatsoever (which was a usual occurence).

@vyzo vyzo marked this pull request as ready for review July 30, 2021 15:15
@vyzo vyzo requested a review from a team as a code owner July 30, 2021 15:15
@vyzo
Copy link
Contributor Author

vyzo commented Jul 30, 2021

Marked as ready for review, as node burn was successful.

@vyzo
Copy link
Contributor Author

vyzo commented Jul 30, 2021

Unified the markset/visitor dichotomy, which opens up interesting possibilities: we can now check+mark atomically using Visit.

Follow up: parallelize walkChain; yes we can now!

@vyzo vyzo force-pushed the fix/splitstore-memory-usage branch from 2aa060a to 6f59511 Compare July 30, 2021 20:53
@vyzo
Copy link
Contributor Author

vyzo commented Jul 30, 2021

rebasd on master.

Copy link
Member

@Stebalien Stebalien left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The general logic looks good, but I feel like we can simplify a bit.

blockstore/splitstore/markset.go Show resolved Hide resolved
blockstore/splitstore/markset_badger.go Show resolved Hide resolved
return err
})

switch err {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could always copy the version, then unlock, then do the rest of this function to simplify the locking a bit.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we have to hold the lock while doing the fast checks on the maps.

blockstore/splitstore/markset_badger.go Show resolved Hide resolved
blockstore/splitstore/markset_badger.go Outdated Show resolved Hide resolved
blockstore/splitstore/markset_badger.go Outdated Show resolved Hide resolved
blockstore/splitstore/markset_badger.go Outdated Show resolved Hide resolved
blockstore/splitstore/markset_badger.go Show resolved Hide resolved
pend := s.pend
seqno := s.seqno
s.seqno++
s.writing[seqno] = pend
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, with a background worker, we can get rid of this and just have a single "writing" and "pending" set. When the background worker is done flushing the last set, it would check to see if it should flush the next one (e.g., use a channel as a flag).

That should help simplify things a bit.

Copy link
Contributor Author

@vyzo vyzo Aug 3, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The disadvantage of the background worker is that we can do only one write at a time -- this can do multiple concurrent writes, which is strictly better i think.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It also loses the timely error return property.

I guess we could have a worker that spawns goroutines to do writes, and somehow bubble up the error to be checked in the next operation, but this gets more complicated than I'd like it to be.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think badger will handle write parallelism under the covers.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hrm not sure -- I prefer to be explicitly parallel and also keep the ability to return an error at some writer so that we can abort.

On the same time, I don't feel terribly strongly about this and we could use a background worker who spawns writers and bubbles up the error in some way.

blockstore/splitstore/splitstore_compact.go Outdated Show resolved Hide resolved
@vyzo
Copy link
Contributor Author

vyzo commented Aug 3, 2021

Did a number of simplifications:

  • The MarkSetVisitor interface was widened to be the sum of ObjectVisitor and MakrSet, thus obviating the need for a cast in splitstore code.
  • Removed redundant writers state variable from badger markset.
  • Simplified and deduplicated the code for Has and Visit by introducing tryPending and tryDB helpers.
  • Added ".tmp" suffix to all transient dbs in the markset directory.

@vyzo
Copy link
Contributor Author

vyzo commented Aug 3, 2021

pushed forgotten commit.

@Stebalien
Copy link
Member

Stebalien commented Aug 3, 2021 via email

Copy link
Member

@Stebalien Stebalien left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, most of this patch looks fine. But the locking is still a bit tricker than I'd like. I can live with everything except taking the lock in one function and dropping it in another. That's really the only blocking change.

blockstore/splitstore/markset.go Show resolved Hide resolved
blockstore/splitstore/markset_badger.go Outdated Show resolved Hide resolved
blockstore/splitstore/markset_badger.go Outdated Show resolved Hide resolved
blockstore/splitstore/markset_badger.go Outdated Show resolved Hide resolved
blockstore/splitstore/markset_badger.go Show resolved Hide resolved
@vyzo vyzo requested a review from Stebalien August 10, 2021 07:11
Copy link
Member

@Stebalien Stebalien left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two non-blocking nits.

blockstore/splitstore/markset_badger.go Show resolved Hide resolved
blockstore/splitstore/markset_badger.go Show resolved Hide resolved
@vyzo vyzo force-pushed the fix/splitstore-memory-usage branch from cbe19e0 to eb0a62e Compare August 10, 2021 07:47
@vyzo vyzo enabled auto-merge August 10, 2021 07:47
@vyzo
Copy link
Contributor Author

vyzo commented Aug 10, 2021

rebased on master for merge, as it was a bit behind.

@vyzo vyzo merged commit c24d4e1 into master Aug 10, 2021
@vyzo vyzo deleted the fix/splitstore-memory-usage branch August 10, 2021 08:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
team/ignite Issues and PRs being tracked by Team Ignite at Protocol Labs
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants