Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"kernel activity hash": add some amount of swingset state into the cosmos AppHash #3442

Closed
warner opened this issue Jul 1, 2021 · 3 comments · Fixed by #3626
Closed

"kernel activity hash": add some amount of swingset state into the cosmos AppHash #3442

warner opened this issue Jul 1, 2021 · 3 comments · Fixed by #3626
Assignees
Labels
cosmic-swingset package: cosmic-swingset enhancement New feature or request SwingSet package: SwingSet

Comments

@warner
Copy link
Member

warner commented Jul 1, 2021

What is the Problem Being Solved?

In trying to debug #3428, @mhofman and I compared slog traces of two different validators to confirm that they were running the same cranks. He found variation in the vbank messages (#3435), and we also observed differences in the GC Actions sent into vats between validators which had or had not reloaded a vat from snapshot (due to #3428 causing different GC syscalls to be made slightly earlier).

We've been kind of on the fence about how much variation to allow in the swingsets running on different validators. Currently, the only swingset variations that will cause a validator to break consensus are outbound comms/IBC messages (written into the cosmos-sdk "swingset" Module's state) and messages sent to the vbank Module (also saved in its state). Anything else will silently diverge until one of those two values observes a difference.

It's probably best to detect these kernel-level variations as quickly as possible. Ideally we'd store all consensus-sensitive kernel state in the cosmos DB where it can be rolled up into the AppHash along with everything else. We tried this early on, and found the performance was troublesome, but maybe we should consider trying it again.

Description of the Design

We may benefit from a partial approach, something like:

  • at the beginning of controller.step(), read a "previous kernel state hash" value from the cosmos DB
  • hash it and the next run-queue message together
  • store the result back to the DB

That would at least be sensitive to deviations in the deliveries that the kernel is performing. This would have caught the GC Action variations we observed.

The next level would be to hash all the DB changes made during a block (the contents of the "block buffer") and fold them in as well. It wouldn't provide any insight into what the variance was, but it would detect it immediately. We've talked about excluding certain kernelDB keys from consensus (vat snapshot ID, transcript prefix offset); we'd need to implement that before adding the block buffer into the hash.

Consensus Considerations

This would be a consensus-breaking change. If we choose to implement it, we should do so before our first release.

Security Considerations

Most chain environments are entirely transparent, but if we anticipate some sort of hybrid model in which the swingset state is somehow concealed from the host-side application, we must consider the information leakage of this activity hash. This leakage can be prevented by including a full-strength random value (256 bits or more) into each hash computation, and not revealing this value to the host application.

Test Plan

A unit test which somehow modifies the kernel between two replays of the same state (maybe by swapping two items on the run-queue), and verifies that the two instances finish with different kernel activity hash values.

@warner warner added enhancement New feature or request SwingSet package: SwingSet cosmic-swingset package: cosmic-swingset labels Jul 1, 2021
@warner warner added this to the Testnet: Metering Phase milestone Jul 14, 2021
@warner
Copy link
Member Author

warner commented Jul 14, 2021

I'm adding this to the Metering Phase, but since adding this feature will expose any existing non-determinism pretty quickly (and we're already finding some from the previous phase), it's really important that we do extensive multi-validator multi-restart testing before the start of that phase.

@warner
Copy link
Member Author

warner commented Jul 17, 2021

It occurred to me that we don't even need to do any hashing from the kernel side. Simply writing the block buffer into a single key of the cosmos DB at the end of each block should suffice, because it will be sampled immediately (it will change the AppHash for that same block).

That might consume more space in the state vector, but it will be replaced on the next block, so it won't accumulate (except for archival nodes, which record the full history of the state vector). Recording a hash of the current block buffer, instead of the full contents, would reduce that.

warner added a commit that referenced this issue Jul 19, 2021
The mailbox device tracking the highest inbound message number and ack for
each peer, to de-duplicate repeated messages, so it could reduce the amount
of kernel activity. Each call to `deliverInbound` would return a boolean to
indicate whether the messages/ack were new, and thus the kernel needed to be
cycled.

However, the device was holding this tracking data in non-durable state, so
if/when the kernel was restarted, the state would be lost. A duplicate
message/ack arriving in the restarted process would trigger kernel activity
that would not have run in the original process. These extra cranks caused
diverge between validators when one of them was restarted, and the client
sent a duplicate message (such as the pre-emptive `ack` all clients send at
startup). The extra crank does not get very far, because vattp does its own
deduplication, so the divergence was only visible in the slog. But when #3442
is implemented, even a single extra crank will flag the validator as out of
consensus.

The fix is to remove the mailbox device's dedup code, and rely upon vattp for
this function. The test was also updated to match, and a new test (comparing
two parallel kernels, one restarted, one not) was added.

closes #3471
warner added a commit that referenced this issue Jul 22, 2021
The mailbox device tracking the highest inbound message number and ack for
each peer, to de-duplicate repeated messages, so it could reduce the amount
of kernel activity. Each call to `deliverInbound` would return a boolean to
indicate whether the messages/ack were new, and thus the kernel needed to be
cycled.

However, the device was holding this tracking data in non-durable state, so
if/when the kernel was restarted, the state would be lost. A duplicate
message/ack arriving in the restarted process would trigger kernel activity
that would not have run in the original process. These extra cranks caused
diverge between validators when one of them was restarted, and the client
sent a duplicate message (such as the pre-emptive `ack` all clients send at
startup). The extra crank does not get very far, because vattp does its own
deduplication, so the divergence was only visible in the slog. But when #3442
is implemented, even a single extra crank will flag the validator as out of
consensus.

The fix is to remove the mailbox device's dedup code, and rely upon vattp for
this function. The test was also updated to match, and a new test (comparing
two parallel kernels, one restarted, one not) was added.

closes #3471
warner added a commit that referenced this issue Jul 22, 2021
The mailbox device tracking the highest inbound message number and ack for
each peer, to de-duplicate repeated messages, so it could reduce the amount
of kernel activity. Each call to `deliverInbound` would return a boolean to
indicate whether the messages/ack were new, and thus the kernel needed to be
cycled.

However, the device was holding this tracking data in non-durable state, so
if/when the kernel was restarted, the state would be lost. A duplicate
message/ack arriving in the restarted process would trigger kernel activity
that would not have run in the original process. These extra cranks caused
diverge between validators when one of them was restarted, and the client
sent a duplicate message (such as the pre-emptive `ack` all clients send at
startup). The extra crank does not get very far, because vattp does its own
deduplication, so the divergence was only visible in the slog. But when #3442
is implemented, even a single extra crank will flag the validator as out of
consensus.

The fix is to remove the mailbox device's dedup code, and rely upon vattp for
this function. The test was also updated to match, and a new test (comparing
two parallel kernels, one restarted, one not) was added.

closes #3471
warner added a commit that referenced this issue Jul 22, 2021
The mailbox device tracking the highest inbound message number and ack for
each peer, to de-duplicate repeated messages, so it could reduce the amount
of kernel activity. Each call to `deliverInbound` would return a boolean to
indicate whether the messages/ack were new, and thus the kernel needed to be
cycled.

However, the device was holding this tracking data in non-durable state, so
if/when the kernel was restarted, the state would be lost. A duplicate
message/ack arriving in the restarted process would trigger kernel activity
that would not have run in the original process. These extra cranks caused
diverge between validators when one of them was restarted, and the client
sent a duplicate message (such as the pre-emptive `ack` all clients send at
startup). The extra crank does not get very far, because vattp does its own
deduplication, so the divergence was only visible in the slog. But when #3442
is implemented, even a single extra crank will flag the validator as out of
consensus.

The fix is to remove the mailbox device's dedup code, and rely upon vattp for
this function. The test was also updated to match, and a new test (comparing
two parallel kernels, one restarted, one not) was added.

closes #3471
warner added a commit that referenced this issue Jul 22, 2021
The mailbox device tracking the highest inbound message number and ack for
each peer, to de-duplicate repeated messages, so it could reduce the amount
of kernel activity. Each call to `deliverInbound` would return a boolean to
indicate whether the messages/ack were new, and thus the kernel needed to be
cycled.

However, the device was holding this tracking data in non-durable state, so
if/when the kernel was restarted, the state would be lost. A duplicate
message/ack arriving in the restarted process would trigger kernel activity
that would not have run in the original process. These extra cranks caused
diverge between validators when one of them was restarted, and the client
sent a duplicate message (such as the pre-emptive `ack` all clients send at
startup). The extra crank does not get very far, because vattp does its own
deduplication, so the divergence was only visible in the slog. But when #3442
is implemented, even a single extra crank will flag the validator as out of
consensus.

The fix is to remove the mailbox device's dedup code, and rely upon vattp for
this function. The test was also updated to match, and a new test (comparing
two parallel kernels, one restarted, one not) was added.

closes #3471
warner added a commit that referenced this issue Jul 22, 2021
The mailbox device tracking the highest inbound message number and ack for
each peer, to de-duplicate repeated messages, so it could reduce the amount
of kernel activity. Each call to `deliverInbound` would return a boolean to
indicate whether the messages/ack were new, and thus the kernel needed to be
cycled.

However, the device was holding this tracking data in non-durable state, so
if/when the kernel was restarted, the state would be lost. A duplicate
message/ack arriving in the restarted process would trigger kernel activity
that would not have run in the original process. These extra cranks caused
diverge between validators when one of them was restarted, and the client
sent a duplicate message (such as the pre-emptive `ack` all clients send at
startup). The extra crank does not get very far, because vattp does its own
deduplication, so the divergence was only visible in the slog. But when #3442
is implemented, even a single extra crank will flag the validator as out of
consensus.

The fix is to remove the mailbox device's dedup code, and rely upon vattp for
this function. The test was also updated to match, and a new test (comparing
two parallel kernels, one restarted, one not) was added.

closes #3471
@warner warner self-assigned this Jul 23, 2021
@warner
Copy link
Member Author

warner commented Jul 30, 2021

My current idea for the design:

  • controller.js is responsible for building a SHA256 hash function (out of the Node.js stdlib crypto module) and passing it as a kernelEndowment
  • the kernel passes this into buildCrankBuffer
  • the crank buffer's commitCrank is updated to:
    • sort all keys of additions lexicographically
    • read a special crankhash key
    • initialize a SHA256 hasher with crankhash (as 32 hex characters)
    • for each addition, update the hasher with add ${key} ${value}
    • then do the kvStore.set as usual
    • then sort all keys of deletions lexicographically
    • then for each deletion, update the hasher with delete ${key}
    • then do the kvStore.delete as usual
    • then finish the hasher, retrieving the hashed output
    • then do kvStore.set('crankhash', crankhash) (as 32 hex characters)
  • the crankhash key will be stored as hex, since the kvstore is a string-to-string table
  • the crankBuffer, kernel, and controller APIs will be extended with a getCrankHash() method, which returns kvStore.get('crankhash') (as 32 hex characters)
  • in the host application, the end-of-block sequence should look like:
await controller.run(runPolicy);
swingstore.commit();
const crankHash = controller.getCrankHash();
appState.save('swingset-crankhash', crankHash);
appState.commit();

In addition, the initializeSwingSet function (which initializes the database) should be updated to save an empty string into crankhash.

warner added a commit that referenced this issue Aug 9, 2021
The host application (which uses swingset as a library) is obligated to
provide a "hostStorage" object, which is a trio of kvStore, streamStore, and
snapStore. The host typically uses openLMDBSwingStore() to create this, and
holds onto the "commit block" facet.

Previously the kernelKeeper was built on top of the streamStore, the
snapStore, and an "enhancedCrankBuffer", which was the host-provided
key-value store plus an abortable/committable "crank buffer", plus some extra
convenience methods. The kernel held the abort/commit facet. The kernel was
responsible for extracting the raw kvStore from `hostStorage`, wrapping it,
and passing the enhanced version into `makeKernelKeeper` along with the other
two values.

Instead, we now give the entire "hostStorage" object to `makeKernelKeeper`,
and give the kernel keeper responsibility for doing the wrapping. When the
kernel wants to commit or abort, it calls `kernelKeeper.abortCrank()` or
`commitCrank()`.

This simplifies the `makeKernelKeeper` construction and will provide a place
for `commitCrank` to store a hash of kernel activity.

refs #3442
warner added a commit that referenced this issue Aug 9, 2021
The host application (which uses swingset as a library) is obligated to
provide a "hostStorage" object, which is a trio of kvStore, streamStore, and
snapStore. The host typically uses openLMDBSwingStore() to create this, and
holds onto the "commit block" facet.

Previously the kernelKeeper was built on top of the streamStore, the
snapStore, and an "enhancedCrankBuffer", which was the host-provided
key-value store plus an abortable/committable "crank buffer", plus some extra
convenience methods. The kernel held the abort/commit facet. The kernel was
responsible for extracting the raw kvStore from `hostStorage`, wrapping it,
and passing the enhanced version into `makeKernelKeeper` along with the other
two values.

Instead, we now give the entire "hostStorage" object to `makeKernelKeeper`,
and give the kernel keeper responsibility for doing the wrapping. When the
kernel wants to commit or abort, it calls `kernelKeeper.abortCrank()` or
`commitCrank()`.

This simplifies the `makeKernelKeeper` construction and will provide a place
for `commitCrank` to store a hash of kernel activity.

refs #3442
warner added a commit that referenced this issue Aug 9, 2021
The multiple members of a consensus machines are supposed to perform
identical exection of every crank. To detect any possible divergence as
quickly as possible, the kernel maintains the "crankhash": a
constantly-updated string which incorporates (by SHA256 hash) a copy of every
DB write and delete.

`controller.getCrankHash()` can be run after one or more cranks have
finished, and the resulting hex string can be e.g. stored in a host
application consensus state vector. If two members diverge in a way that
causes their swingset state to differ, they will have different crankhashes,
and the consensus state vectors will diverge. This should cause at least one
of them to fall out of consensus.

Some keys are excluded from consensus: currently just those involving vat
snapshots and the truncation (non-initial starting point) of the transcript.

refs #3442
warner added a commit that referenced this issue Aug 9, 2021
The multiple members of a consensus machine are supposed to perform identical
exection of every crank. To detect any possible divergence as quickly as
possible, the kernel maintains the "crankhash": a constantly-updated string
which incorporates (by SHA256 hash) a copy of every DB write and delete.

`controller.getCrankHash()` can be run after one or more cranks have
finished, and the resulting hex string can be e.g. stored in a host
application consensus state vector. If two members diverge in a way that
causes their swingset state to differ, they will have different crankhashes,
and the consensus state vectors will diverge. This should cause at least one
of them to fall out of consensus.

Some keys are excluded from consensus: currently just those involving vat
snapshots and the truncation (non-initial starting point) of the transcript.

refs #3442
warner added a commit that referenced this issue Aug 9, 2021
The multiple members of a consensus machine are supposed to perform identical
exection of every crank. To detect any possible divergence as quickly as
possible, the kernel maintains the "crankhash": a constantly-updated string
which incorporates (by SHA256 hash) a copy of every DB write and delete.

`controller.getCrankHash()` can be run after one or more cranks have
finished, and the resulting hex string can be e.g. stored in a host
application consensus state vector. If two members diverge in a way that
causes their swingset state to differ, they will have different crankhashes,
and the consensus state vectors will diverge. This should cause at least one
of them to fall out of consensus.

Some keys are excluded from consensus: currently just those involving vat
snapshots and the truncation (non-initial starting point) of the transcript.

refs #3442
warner added a commit that referenced this issue Aug 9, 2021
This 'crankhash' covers all kernel state changes, so if any two validators
diverge, their crankhashes should diverge too. By storing them into the
cosmos-sdk state vector, the AppHashes will diverage as well, causing at
least one of them to fall out of consensus. This should let us catch
divergence as soon as possible.

closes #3442
warner added a commit that referenced this issue Aug 9, 2021
This 'crankhash' covers all kernel state changes, so if any two validators
diverge, their crankhashes should diverge too. By storing them into the
cosmos-sdk state vector, the AppHashes will diverage as well, causing at
least one of them to fall out of consensus. This should let us catch
divergence as soon as possible.

closes #3442
warner added a commit that referenced this issue Aug 10, 2021
This 'activityhash' covers all kernel state changes, so if any two validators
diverge, their activityhashes should diverge too. By storing them into the
cosmos-sdk state vector, the AppHashes will diverage as well, causing at
least one of them to fall out of consensus. This should let us catch
divergence as soon as possible.

closes #3442
warner added a commit that referenced this issue Aug 12, 2021
This 'activityhash' covers all kernel state changes, so if any two validators
diverge, their activityhashes should diverge too. By storing them into the
cosmos-sdk state vector, the AppHashes will diverage as well, causing at
least one of them to fall out of consensus. This should let us catch
divergence as soon as possible.

closes #3442
warner added a commit that referenced this issue Aug 12, 2021
The host application (which uses swingset as a library) is obligated to
provide a "hostStorage" object, which is a trio of kvStore, streamStore, and
snapStore. The host typically uses openLMDBSwingStore() to create this, and
holds onto the "commit block" facet.

Previously the kernelKeeper was built on top of the streamStore, the
snapStore, and an "enhancedCrankBuffer", which was the host-provided
key-value store plus an abortable/committable "crank buffer", plus some extra
convenience methods. The kernel held the abort/commit facet. The kernel was
responsible for extracting the raw kvStore from `hostStorage`, wrapping it,
and passing the enhanced version into `makeKernelKeeper` along with the other
two values.

Instead, we now give the entire "hostStorage" object to `makeKernelKeeper`,
and give the kernel keeper responsibility for doing the wrapping. When the
kernel wants to commit or abort, it calls `kernelKeeper.abortCrank()` or
`commitCrank()`.

This simplifies the `makeKernelKeeper` construction and will provide a place
for `commitCrank` to store a hash of kernel activity.

refs #3442
warner added a commit that referenced this issue Aug 12, 2021
We're defining the `local.*` namespace as the portion of the DB that is
excluded from consensus (the #3442 kernel activity hash). Vat snapshots are a
local concern: each member of a consensus machine can decide for itself
if/when to make a snapshot. So both the vat->snapshot and the snapshot->vats
tables must be moved to `local.`.
warner added a commit that referenced this issue Aug 12, 2021
The kernelstats are merely incidentally deterministic: unlike c-lists and
run-queues and object/promise state, stats aren't the primary thing we want
to compare across replicas in a consensus machine. And we might want to add
more in the future without causing a compatibility break. So this changes the
DB key from `kernelStats` to `local.kernelStats`, which will exclude them
from the #3442 kernel activity hash.
warner added a commit that referenced this issue Aug 12, 2021
The multiple members of a consensus machine are supposed to perform identical
exection of every crank. To detect any possible divergence as quickly as
possible, the kernel maintains the "activityhash": a constantly-updated
string which incorporates (by SHA256 hash) a copy of every DB write and
delete.

Each crank's DB writes/deletes are accumulated into the "crankhash". If the
crank is committed, its crankhash is hashed into the old activityhash to form
the new "activityhash".

`controller.getActivityhash()` can be run after one or more cranks have
finished, and the resulting hex string can be e.g. stored in a host
application consensus state vector. If two members diverge in a way that
causes their swingset state to differ, they will have different
activityhashes, and the consensus state vectors will diverge. This should
cause at least one of them to fall out of consensus.

Some keys are excluded from consensus: currently just those involving vat
snapshots and the truncation (non-initial starting point) of the transcript.

refs #3442
warner added a commit that referenced this issue Aug 12, 2021
This 'activityhash' covers all kernel state changes, so if any two validators
diverge, their activityhashes should diverge too. By storing them into the
cosmos-sdk state vector, the AppHashes will diverage as well, causing at
least one of them to fall out of consensus. This should let us catch
divergence as soon as possible.

closes #3442
warner added a commit that referenced this issue Aug 13, 2021
This 'activityhash' covers all kernel state changes, so if any two validators
diverge, their activityhashes should diverge too. By storing them into the
cosmos-sdk state vector, the AppHashes will diverage as well, causing at
least one of them to fall out of consensus. This should let us catch
divergence as soon as possible.

closes #3442
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cosmic-swingset package: cosmic-swingset enhancement New feature or request SwingSet package: SwingSet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant