Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

vatWarehouse transcript replay must commit changes, and coordinate with crank success/fail rewind #2422

Closed
warner opened this issue Feb 13, 2021 · 2 comments
Assignees
Labels
enhancement New feature or request SwingSet package: SwingSet
Milestone

Comments

@warner
Copy link
Member

warner commented Feb 13, 2021

What is the Problem Being Solved?

I wanted to capture an issue with the new #2277 "Vat Warehouse" that came up in conversation with @dckc today. It won't be a problem right now, but there are a couple of different lines of development that are destined to collide in an interesting way.

  • when the VatWarehouse is asked to deliver a message to a vat, it might need to load that vat into memory (if it wasn't already resident)
  • when doing that load, it must replay some number of transcript entries (all of them, unless the vat was loaded from a snapshot, in which case it only needs to replay the entries that occurred after the last snapshot was taken)
  • at present, transcript replay does not cause any kernel state changes: all syscalls are ignored (well, compared against the transcript for consistency, then ignored)
  • however, one path for non-deterministic GC #1872 is to do extra work at the end of each crank to gather FinalizationRegistry callbacks and report the newly-released imports to the kernel with some sort of syscall.drop()
    • on consensus-hosted swingsets (i.e. on-chain), this is required to behave deterministically. We plan to achieve this by running all vats on XS-based workers, instructing the XS engine to perform a full GC sweep at the end of each crank, and giving the supervisor+liveslots time to run (after the vat code has gone quiescent) to let the finalizers fire and send the resulting syscall.drops back up to the kernel
    • on non-consensus swingsets, if we're using local workers, this might not be deterministic: the finalizers will fire earlier/later between the initial execution of a transcript, vs on subsequent replays
    • if a drop happens during replay for a vref that is not in the c-list, we ignore it: we observed and processed this one in the previous run
    • but if the vref is still in the c-list, we must collect it and and process the resulting refcount work, because it's not going to happen again, and if we ignore it during replay, we'll never release the object
    • the refcount work will make state changes, some collection of c-list removals, refcount decrements, deletion of kernel object/promise table entries, and "hey exporting vat we don't need this anymore" run-queue work items
      • (we might not do the full refcount work right away, but we'll record at least enough information to let it happen later)
  • so the act of "paging in" a vat will potentially cause state changes
  • these changes need to be committed independently of the crank which provoked the paging-in
    • if that crank fails, any syscalls that happen during the crank get rolled back, so the saved state of the vat will be left in a coherent state, so it can be reloaded again if/when we attempt the crank again
    • the changes that get committed during page-in should not also get rolled back
    • in particular, the cycle begins with a run-queue item being popped off the queue. Our current "crank failed, abort the txn and record vat-termination handling instead" code assumes that this pop gets unwound as well, and it re-pops the item (so we won't try it again, after the vat is deleted). The two sides of the delivery need to coordinate correctly to make sure only one item is popped from the queue.
    • likewise the crankNum is a database entry, which we increment just before (or after? on which side of the circular fencepost do you stand?) the crank is processed, and we should increment it exactly once, no matter how the crank fares

Now that I write it up, I see that maybe it wouldn't be fatal to unwind the page-in syscall.drop consequences (in response to a failed crank) too: if the vat is then terminated, we'll be dropping all its imports, so any refcounting consequences will be repeated as part of the vat-termination logic. If we allow the vat to live (so that we might attempt the delivery again in the future, e.g. with a different Meter), then when we re-load it, we'll have another opportunity to observe the drop.

But in general, we need to be very deliberate about the transaction windows, and be clear about what state changes are and are not subject to rewind when a crank fails.

@warner warner added enhancement New feature or request SwingSet package: SwingSet labels Feb 13, 2021
@dckc dckc added this to the Testnet: Stress Test Phase milestone Apr 15, 2021
@dckc
Copy link
Member

dckc commented Jun 9, 2021

@warner notes complications around evict in #2784 (review)

warner added a commit that referenced this issue Jun 26, 2021
This enhances SwingSet to have a "Vat Warehouse" which limits the number of
"paged-in" vats to some maximum (currently 50). The idea is to conserve
system RAM by allowing idle vats to remain "paged-out", which consumes only
space on disk, until someone sends a message to them. The vat is then paged
in, by creating a new xsnap process and reloading the necessary vat state.

This reload process is greatly accelerated by loading a heap snapshot, if one
is available. We only need to replay the suffix of the transcript that was
recorded after the snapshot was taken, rather than the full (huge)
transcript. Heap snapshots are stored in a new swingstore component named the
"stream store".

For each vat, the warehouse saves a heap snapshot after a configurable number
of deliveries (default 200). In addition, it saves an initial snapshot after
just a few deliveries (default 2), because all contracts vats start out with
a large delivery that provides the contract bundle to evaluate. By taking a
snapshot quickly, we can avoid the time needed to re-evaluate that large
bundle on almost all process restarts. This algorithm is a best guess: we'll
refine it as we gather more data about the tradeoff between work now (the
time it takes to create and write a snapshot), the storage space consumed by
those snapshots, and work later (replaying more transcript).

We're estimating that a typical contract snapshot consumes about 300kB
(compressed).

closes #2273
closes #2277
refs #2422
refs #2138 (might close it)


* refactor(replay): hoist handle declaration

* chore(xsnap): clarify names of snapStore temp files for debugging

* feat(swingset): initializeSwingset snapshots XS supervisor

 - solo: add xsnap, tmp dependencies
 - cosmic-swingset: declare dependencies on xsnap, tmp
 - snapshotSupervisor()
 - vk.saveSnapshot(), vk.getLastSnapshot()
   - test: mock vatKeeper needs getLastSnapshot()
 - test(snapstore): update snapshot hash
 - makeSnapstore in solo, cosmic-swingset
   - chore(solo): create xs-snapshots directory
 - more getVatKeeper -> provideVatKeeper
 - startPos arg for replayTransript()
 - typecheck shows vatAdminRootKref could be missing
 - test pre-SES snapshot size
   - hoist snapSize to test title
   - clarify SES vs. pre-SES XS workers
   - factor bootWorker out of bootSESWorker
   - hoist Kb, relativeSize for sharing between tests

misc:
 - WIP: restore from snapshot
 - hard-code remote style

fix(swingset): don't leak xs-worker in initializeSwingset

When taking a snapshot of the supervisor in initializeSwingset, we
neglected to `.close()` it.

Lack of a name hindered diagnosis, so let's fix that while we're at
it.

* feat(swingset): save snapshot periodically after deliveries

 - vk.saveSnapShot() handles snapshotInterval
   - annotate type of kvStore in makeVatKeeper
   - move getLastSnapshot up for earlier use
   - refactor: rename snapshotDetail to lastSnapshot
   - factor out getTranscriptEnd
 - vatWarehouse.maybeSaveSnapshot()
 - saveSnapshot:
   - don't require snapStore
   - fix startPos type
 - provide snapstore to vatKeeper via kernelKeeper
 - buildKernel: get snapstore out of hostStorage
 - chore: don't try to snapshot a terminated vat

* feat(swingset): load vats from snapshots

 - don't `setBundle` when loading from snapshot
 - provide startPos to replayTranscript()
 - test reloading a vat

* refactor(vatWarehouse): factor out, test LRU logic

* fix(vat-warehouse): remove vatID from LRU when evicting

* chore(vatKeeper): prune debug logging in saveSnapshot (FIXUP)

* feat(swingset): log bringing vats online (esp from snapshot)

 - manager.replayTranscript returns number of entries replayed

* chore: resove "skip crank buffering?" issue

after discussion with CM:
maybeSaveSnapshot() happens before commitCrank()
so nothing special needed here

* chore: prune makeSnapshot arg from evict()

Not only is this option not implemented now, but CM's analysis shows
that adding it would likely be harmful.

* test(swingset): teardown snap-store

* chore(swingset): initial sketch of snapshot reload test

* refactor: let itemCount be not-optional in StreamPosition

* feat: snapshot early then infrequently

  - refactor: move snapshot decision up from vk.saveSnapshot() up
    to vw.maybeSaveSnapshot

* test: provide getLastSnapshot to mock vatKeeper

* chore: vattp: turn off managerType local work-around

* chore: vat-warehouse: initial snapshot after 2 deliveries

integration testing shows this is closer to ideal

* chore: prune deterministic snapshot assertion

oops. rebase problem.

* chore: fix test-snapstore ld.asset

rebase / merge problem?!

* chore: never mind supervisorHash optimization

With snapshotInitial at 2, there is little reason to snapshot after
loading the supervisor bundles. The code doesn't carry its own weight.

Plus, it seems to introduce a strange bug with marshal or something...

```
  test/test-home.js:37

   36:   const { board } = E.get(home);
   37:   await t.throwsAsync(
   38:     () => E(board).getValue('148'),

  getting a value for a fake id throws

  Returned promise rejected with unexpected exception:

  Error {
    message: 'Remotable (a string) is already frozen',
  }
```

* docs(swingset): document lastSnapshot kernel DB key

* refactor: capitalize makeSnapStore consistently

* refactor: replayTranscript caller is responsible to getLastSnapshot

* test(swingset): consistent vat-warehouse test naming

* refactor(swingset): compute transcriptSnapshotStats in vatKeeper

In an attempt to avoid reading the lastSnapshot DB key if the
t.endPosition key was enough information to decide to take a snapshot,
the vatWarehouse was peeking into the vatKeeper's business.  Let's go
with code clarity over (un-measured) performance.

* chore: use harden, not freeze; clarify lru

* chore: use distinct fixture directories to avoid collision

The "temporary" snapstore directories used by two different tests began to
overlap when the tests were moved into the same parent dir, and one test was
deleting the directory while the other was still using it (as well as
mingling files at runtime), causing an xsnap process to die with an IO error
if the test were run in parallel.

This changes the the two tests to use distinct directories. In the long run,
we should either have them use `mktmp` to build a randomly-named known-unique
directory, or establish a convention where tempdir names match the name of
the test file and case using them, to avoid collisions as we add more tests.

Co-authored-by: Brian Warner <[email protected]>
@dckc
Copy link
Member

dckc commented Jun 26, 2021

@warner is this addressed by #2848 for the same reasons #3279 is?

@dckc dckc closed this as completed Jun 30, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request SwingSet package: SwingSet
Projects
None yet
Development

No branches or pull requests

3 participants