-
Notifications
You must be signed in to change notification settings - Fork 668
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: stacks signer able to save multiple dkg shares and load it where appropriate #4704
Conversation
0ff200c
to
70f7d69
Compare
70f7d69
to
f433116
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Everything LGTM! There are a few instances where it would be helpful to pluralize function names that now return states instead of one state.
Good point. I've updated |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just have a few questions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The structure of this code is giving me a lot of pause. It seems that there are several places where the signer state can just be willy-nilly reset from saved state, but without any consideration to how it will affect the execution of the signer at that point in the reload. I fear this will lead to hard-to-reproduce bugs and state corruption, because it is hard to see in advance what the downstream consequences can be when changing this state mid-way through a main loop pass.
Instead, can this code be factored to make a state reload happen only at the top-level of the main loop? A reload should cancel any in-progress activities and force the signer to restart whatever it was doing anew.
I'm not too happy about the current structure either. While I have attempted to be careful with how and when we load state from the signer, I do share your concern. We've already seen plenty of hard-to-reproduce bugs in the signer so far, and I definitely don't want to add to it, especially since the intention of this change is to make the signer more robust towards bugs caused by network delays and timing issues. I don't fully understand right now how exactly we'd be able to make the state reloads only happen at the top-level of the main loop (assuming you mean within the |
I've updated now so that state reloads only happen in This feels much cleaner than the previous solution. Thank you for bringing this up @jcnelson. Let me know if you have further opinions on how to structure this. For example, if you want state reloads to be explicit in the main loop I can do some further refactors to achieve this - but I think the solution we have now fits better with the current structure of the signer. |
…to-save-multiple-dkg-aggregate-key-round-states-and-load-it-where-appropriate
stacks-signer/src/signer.rs
Outdated
match self.load_saved_state_for_aggregate_key(approved_aggregate_key) { | ||
Ok(()) => self.send_dkg_results(&approved_aggregate_key), | ||
Err(e) => warn!( | ||
"{self}: Failed to load saved state for key {approved_aggregate_key}: {e}" | ||
), | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm still a little confused by the logic here. update_approved_aggregate_key
is invoked to refresh the signer's view of the approved aggregate key. Why should that trigger a load from saved state? Shouldn't the signer's state already be loaded into memory at this point? Why doesn't the signer just load state into memory during process startup (as I think was the general idea behind Jude's suggestion)?
The general logic seems like it should be:
On boot:
self.refresh_approved_aggregate_key();
if self.approved_aggregate_public_key.is_some():
self.signer_state = self.load_saved_states()[self.aggregate_key];
if self.signer_state.pending_dkg_results.is_some():
send_dkg_results();
self.signer_state.pending_dkg_results.clear();
self.overwrite_saved_states()
On refresh aggregate public key:
if self.approved_aggregate_public_key is changed:
if self.signer_state.pending_dkg_results.is_some():
send_dkg_results();
self.signer_state.pending_dkg_results.clear();
self.overwrite_saved_states()
That way, the signer sends a DkgResults message after the aggregate key is approved if they're running at the time, or if they wakeup after the aggregate key has been approved, they send the message then.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This avoids excessively reloading the signer states and it avoids resending the DkgResults message potentially every time a state reload occurs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm yeah I think we could do it that way. Two thoughts on the logic:
- On boot the signer won't have any pending DKG results so we don't have to send DKG results at that point.
- State writes happen now as soon as DKG is finished (i.e. a
DkgEnd(_)
message is received), which I think makes sense because you want to persist the shares as soon as you have confirmation that you have been able to create an aggregate key.
With these I think we can simplify the logic to
On boot:
self.refresh_approved_aggregate_key();
if self.approved_aggregate_public_key.is_some():
self.signer_state = self.load_saved_states()[self.aggregate_key];
On refresh aggregate public key:
if self.approved_aggregate_public_key is changed:
if self.signer_state.pending_dkg_results.is_some():
send_dkg_results();
self.signer_state.pending_dkg_results.clear();
On DkgEnd received:
self.save_signer_state()
What do you think about this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why should that trigger a load from saved state? Shouldn't the signer's state already be loaded into memory at this point?
It probably should be, but the rationale behind the current logic is to ensure that the saved state is always consistent with the approved aggregate key. If we for any reason observe a change in the approved aggregate key, a natural reaction would be to reload the state observed with the changed aggregate key.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think its possible to have a pending dkg results on boot, because the signer may have gone offline between DkgEnd
and the key being approved by the network. I think we should still be persisting these states after the pending_dkg_results is cleared as well, so I think my suggested logic is about as simplified as it can get. I do agree that we should be persisting on DkgEnd
as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, a problem with the proposed logic is if you boot the signer and an approved aggregate key is not yet set you would never load any persisted state. If you are unlucky with the timing, your signer might boot just before the last vote for an aggregate key arrives and sets that key in the contract.
From that perspective, I think it is safer to load the saved state when you observe a change in the aggregate key.
To prevent excessive state reloads I could add a condition to only load the saved state in case the approved aggregate key is different from the one already present in the state.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I'd then propose considering something that looks like this:
On boot:
Start without any approved_aggregate_key or loaded signer state
On refresh aggregate public key:
if self.approved_aggregate_public_key is changed:
if the new approved_aggregate_public_key is not the same as the one in our signer state:
self.load_saved_state(approved_aggregate_key)
if self.signer_state.pending_dkg_results.is_some():
send_dkg_results();
self.signer_state.pending_dkg_results.clear();
On DkgEnd received:
self.save_signer_state()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would avoid excessive state reloads, but ensure that the signer doesn't miss loading the state for a particular aggregate key.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The way we've addressed this in the past (albeit in very different contexts) is to have two structs: a "preload" struct and a "loaded" struct which is instantiated from a "preload" struct and some additional state. You'd have an UninitializedSigner
struct, which represents a Signer
that does not yet have its state loaded. Its impl would have a conversion method like into_signer(state)
, which fully instantiated the Signer
from the loaded state once it became available.
Then, you'd make it impossible to accidentally have a partially-instantiated or corrupted Signer
used somewhere. Given the importance of the signer to chain liveness, I think it's worth the effort to make invalid Signer states unrepresentable here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that it makes a lot of sense to differentiate between an UninitializedSigner
and a Signer
instead of having the current State
field within the signer. It's not clear to me what the relation is between an initialized/uninitialized signer and which DKG shares the signer should have loaded. Afaik, a Signer may still be initialized but without having a known approved aggregate key or any DKG shares.
Closing this PR due to us going in a different direction with the signer. |
This pull request has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs. |
Description
This PR extends the signer persistence to allow for storing shares for multiple concurrent DKG rounds, identified by the aggregate public key for that round. On startup, the signer looks up the approved aggregate key in the contract and loads the corresponding shares.
To ensure the Nakamoto sign coordinator is able to pick up loaded DKG shares, the signers broadcast
DkgResults
messages when loaded with their polynomial commitments. The sign coordinator, which previously assumed that anyDkgResults
message contained the complete set of polynomial commitments, has been extended to also allow individual polynomial commitments in these messages - which it will then aggregate for all signers to form the complete set of commitments.Applicable issues
Additional info
To ensure a bounded size stored in StackerDB, we limit the implementation to only hold two sets of shares concurrently to handle the scenario described in #4654.