Skip to content
This repository has been archived by the owner on Oct 22, 2024. It is now read-only.

Reject finalized updates without a sync committee in next store period #145

Merged
merged 11 commits into from
May 16, 2024

Conversation

claravanstaden
Copy link
Collaborator

@claravanstaden claravanstaden commented May 14, 2024

This change corrects the bug where importing a finalized header in the next sync committee period, without the next sync committee, prevents any subsequent sync committee updates.

For illustration purposes:

Light client period: 1
Import header in period 2 (because next sync committee is already known)
Light client period should now be 2, but it is still 1.
Current and next sync committee now references the wrong periods, and every subsequent sync committee imports fails because of this check: https://github.com/Snowfork/polkadot-sdk/blob/snowbridge/bridges/snowbridge/pallets/ethereum-client/src/lib.rs#L384

More Details

A detailed description of the issue here: https://www.notion.so/snowfork/Prod-Investigation-45a74b16488d44249de26f149dbc3ad1

Resolves: SNO-981

@claravanstaden claravanstaden marked this pull request as ready for review May 14, 2024 13:25
Comment on lines 465 to 466
<CurrentSyncCommittee<T>>::set(<NextSyncCommittee<T>>::get());
<NextSyncCommittee<T>>::kill();
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: slightly more readable

Suggested change
<CurrentSyncCommittee<T>>::set(<NextSyncCommittee<T>>::get());
<NextSyncCommittee<T>>::kill();
CurrentSyncCommittee::<T>::set(NextSyncCommittee::<T>::take());

Copy link

@yrong yrong May 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I maybe wrong but would prefer not do this(i.e. <NextSyncCommittee<T>>::kill()), my concern here is that reset NextSyncCommittee is expensive which we should avoid unless abosolutely nessessary, and we only switch SyncCommittee for sync_committee_update(i.e. hand-over update).

Is there a way to only change the logic on relayer side to ensure signature_period == update_finalized_period == store_period for non-handover update?

Be more specific, code here https://github.com/Snowfork/snowbridge/blob/18fe233041e34c3b14f8a23d7c07ef1b4df706ca/relayer/relays/beacon/header/header.go#L251 can we find a proper update(non-handover in 1086) to syncInterimFinalizedUpdate with?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I maybe wrong but would prefer not do this(i.e. <NextSyncCommittee>::kill()), my concern here is that reset NextSyncCommittee is expensive which we should avoid unless abosolutely nessessary.

It is absolutely necessary, otherwise the light client is in an incorrect state. For example, now CurrentSyncCommittee = period 1087 and NextSyncCommittee = period 1087, which is fundamentally incorrect and could open up weird code paths that we do not expect. It could open us up to exploits, perhaps a header that was not signed by the validator set will now be verified successfully.

Also, if we don't clear NextSyncCommittee, the next sync committee import will fail because of this check: https://github.com/Snowfork/polkadot-sdk/blob/snowbridge/bridges/snowbridge/pallets/ethereum-client/src/lib.rs#L384 This is basically what happened in prod.

NextSyncCommittee is expensive

This can only happen once every sync committee period, in any case, right?

Is there a way to only change the logic on relayer side to ensure signature_period == update_finalized_period == store_period for non-handover update?

Our relayers are permissionless, so we have to do an on-chain fix for this. Otherwise anyone else could exploit this bug in our light client and send an update for store_period + 1 and brick the bridge.

The light client allows sending an update in the next period (update for store_period + 1) here: https://github.com/Snowfork/polkadot-sdk/blob/snowbridge/bridges/snowbridge/pallets/ethereum-client/src/lib.rs#L404 So I think the light client spec supports this and so should we.

Copy link

@yrong yrong May 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I understand the light client is in an incorrect state we need to fix with a runtime upgrade.

What I'm trying to fix is add an else branch at verification phase to check signature_period == update_finalized_period == store_period for non-handover update.

Because I can't find a code path to switch current_sync_committee for non-handover update in the spec https://github.com/ethereum/consensus-specs/blob/dev/specs/altair/light-client/sync-protocol.md#apply_light_client_update so I'm a bit hesitate to do that.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, fair enough, that's a good point.

What I'm trying to fix is add an else branch at verification phase to check signature_period == update_finalized_period == store_period for non-handover update.

What do you think the else here should do? Something like

else if update_finalized_period == store_period + 1 {
	return Error::<T>::SkippedSyncCommitteePeriod;
}

Because I can't find a code path to switch current_sync_committee for non-handover update in the spec

Me neither, but that's because the consensus spec always has a sync committee in the update, right? Imo this allows headers to be verified in the next period:

And so we should handle it.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yrong another practical implication if we disallow the non-sync committee rollover is that if we are in period 1086, and we've received a header in 1087, we won't be able to process any messages until we've received period 1087's sync committee update. Maybe that's OK, just mentioning it.

Copy link

@yrong yrong May 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the consensus spec always has a sync committee in the update

Not sure fully understand your point. Updates withthout sync committee is definetely allowed. Search is_sync_committee_update(update) in the spec we can see logics accordingly.

https://github.com/ethereum/consensus-specs/blob/dev/specs/altair/light-client/sync-protocol.md#apply_light_client_update

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we are in period 1086, and we've received a header in 1087, we won't be able to process any messages until we've received period 1087's sync committee update.

Yeah, I would assume that's the expected behaviour previously before adding syncInterimFinalizedUpdate, right? so I'm expecting we refactor the relayer logic to still respect it.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I see! But isn't the condition to handle updates in the next period missing in the spec, then? I read through validate_light_client_update again now and I don't see it handled.

// If there was no sync committee in the update, but the finalized header provided
// is in the new period, we need to move the sync committee period forward.
if update_finalized_period == store_period + 1 {
ensure!(<NextSyncCommittee<T>>::exists(), <Error<T>>::NextSyncCommitteeUnknown);
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yrong @vgeddes I added this check. One thing I dislike about the Ethereum consensus spec is that the separation between verify_update and apply_update makes it difficult to see what conditions are possible in apply_update and what has already been checked in verify_update.

I don't think it is possible for <NextSyncCommittee> to be empty here because the header validation had to pass BLS signature verification here to get to this point. I would rather err on the side of being too cautious though. Let me know your thoughts.

Copy link

@yrong yrong May 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I would prefer to move all check logics to verification phase and apply update with only actions required.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that sounds good. So are we going to disallow updates in the next period without a sync committee update, like I illustrated here: #145 (comment)

@claravanstaden
Copy link
Collaborator Author

@vgeddes @alistair-singh @yrong This version is what I think is the best solution. I am not against disallowing beacon header updates in the next period though, I just think the spec supports it and it is more flexible so it is the solution I prefer (to allow updates in the next period without a sync committee update).

Let me know what you think.

@claravanstaden claravanstaden changed the title Sync committee handover after finalized header in new period Reject finalized updates without a sync committee in next store period May 15, 2024
@claravanstaden
Copy link
Collaborator Author

@vgeddes @yrong @alistair-singh I've decided to go with the more conservative approach for now, since the potential side-effects are less. Ready for review

Copy link

@vgeddes vgeddes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

bridges/snowbridge/pallets/ethereum-client/src/lib.rs Outdated Show resolved Hide resolved
Copy link

@alistair-singh alistair-singh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@claravanstaden
Copy link
Collaborator Author

A downside to this approach is that when the sync committee rolls over, we'll have to wait for the next sync committee to be available in the finalized header update. This would mean any messages received in the last two epochs of the sync committee period would have to wait 2 additional epochs to be verified. That means at most 4 epochs waiting time (4 x 6.4 minutes) = 25.6 minutes.

https://discord.com/channels/595666850260713488/595701319793377299/1240360783309967483

Copy link

@yrong yrong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@claravanstaden claravanstaden merged commit d95751b into snowbridge May 16, 2024
7 checks passed
@claravanstaden claravanstaden deleted the sync-committee-rollover branch May 16, 2024 06:49
claravanstaden added a commit that referenced this pull request May 16, 2024
#145)

* sync committee handover fix

* remove comment

* Update bridges/snowbridge/pallets/ethereum-client/src/lib.rs

Co-authored-by: Vincent Geddes <[email protected]>

* add extra check

* move checks around

* remove extra validation

* fix compiler error

* reject updates in next period without sync committee

* fix test

* polish

* Update bridges/snowbridge/pallets/ethereum-client/src/lib.rs

Co-authored-by: Vincent Geddes <[email protected]>

---------

Co-authored-by: Vincent Geddes <[email protected]>
claravanstaden added a commit that referenced this pull request May 16, 2024
#145)

* sync committee handover fix

* remove comment

* Update bridges/snowbridge/pallets/ethereum-client/src/lib.rs

Co-authored-by: Vincent Geddes <[email protected]>

* add extra check

* move checks around

* remove extra validation

* fix compiler error

* reject updates in next period without sync committee

* fix test

* polish

* Update bridges/snowbridge/pallets/ethereum-client/src/lib.rs

Co-authored-by: Vincent Geddes <[email protected]>

---------

Co-authored-by: Vincent Geddes <[email protected]>
claravanstaden added a commit that referenced this pull request May 16, 2024
#145)

* sync committee handover fix

* remove comment

* Update bridges/snowbridge/pallets/ethereum-client/src/lib.rs

Co-authored-by: Vincent Geddes <[email protected]>

* add extra check

* move checks around

* remove extra validation

* fix compiler error

* reject updates in next period without sync committee

* fix test

* polish

* Update bridges/snowbridge/pallets/ethereum-client/src/lib.rs

Co-authored-by: Vincent Geddes <[email protected]>

---------

Co-authored-by: Vincent Geddes <[email protected]>
github-merge-queue bot pushed a commit to paritytech/polkadot-sdk that referenced this pull request May 16, 2024
…c committee in next store period (#4478)

While syncing Ethereum consensus updates to the Snowbridge Ethereum
light client, the syncing process stalled due to error
`InvalidSyncCommitteeUpdate` when importing the next sync committee for
period `1087`.

This bug manifested specifically because our light client checkpoint is
a few weeks old (submitted to governance weeks ago) and had to catchup
until a recent block. Since then, we have done thorough testing of the
catchup sync process.

### Symptoms
- Import next sync committee for period `1086` (essentially period
`1087`). Light client store period = `1086`.
- Import header in period `1087`. Light client store period = `1087`.
The current and next sync committee is not updated, and is now in an
outdated state. (current sync committee = `1086` and current sync
committee = `1087`, where it should be current sync committee = `1087`
and current sync committee = `None`)
- Import next sync committee for period `1087` (essentially period
`1088`) fails because the expected next sync committee's roots don't
match.

### Bug
The bug here is that the current and next sync committee's didn't
handover when an update in the next period was received.

### Fix
There are two possible fixes here:
1. Correctly handover sync committees when a header in the next period
is received.
2. Reject updates in the next period until the next sync committee
period is known.

We opted for solution 2, which is more conservative and requires less
changes.

### Polkadot-sdk versions
This fix should be backported in polkadot-sdk versions 1.7 and up.

Snowfork PR: Snowfork#145

---------

Co-authored-by: Vincent Geddes <[email protected]>
claravanstaden added a commit that referenced this pull request May 16, 2024
…c committee in next store period (paritytech#4478)

While syncing Ethereum consensus updates to the Snowbridge Ethereum
light client, the syncing process stalled due to error
`InvalidSyncCommitteeUpdate` when importing the next sync committee for
period `1087`.

This bug manifested specifically because our light client checkpoint is
a few weeks old (submitted to governance weeks ago) and had to catchup
until a recent block. Since then, we have done thorough testing of the
catchup sync process.

### Symptoms
- Import next sync committee for period `1086` (essentially period
`1087`). Light client store period = `1086`.
- Import header in period `1087`. Light client store period = `1087`.
The current and next sync committee is not updated, and is now in an
outdated state. (current sync committee = `1086` and current sync
committee = `1087`, where it should be current sync committee = `1087`
and current sync committee = `None`)
- Import next sync committee for period `1087` (essentially period
`1088`) fails because the expected next sync committee's roots don't
match.

### Bug
The bug here is that the current and next sync committee's didn't
handover when an update in the next period was received.

### Fix
There are two possible fixes here:
1. Correctly handover sync committees when a header in the next period
is received.
2. Reject updates in the next period until the next sync committee
period is known.

We opted for solution 2, which is more conservative and requires less
changes.

### Polkadot-sdk versions
This fix should be backported in polkadot-sdk versions 1.7 and up.

Snowfork PR: #145

---------

Co-authored-by: Vincent Geddes <[email protected]>
claravanstaden added a commit that referenced this pull request May 16, 2024
…c committee in next store period (paritytech#4478)

While syncing Ethereum consensus updates to the Snowbridge Ethereum
light client, the syncing process stalled due to error
`InvalidSyncCommitteeUpdate` when importing the next sync committee for
period `1087`.

This bug manifested specifically because our light client checkpoint is
a few weeks old (submitted to governance weeks ago) and had to catchup
until a recent block. Since then, we have done thorough testing of the
catchup sync process.

### Symptoms
- Import next sync committee for period `1086` (essentially period
`1087`). Light client store period = `1086`.
- Import header in period `1087`. Light client store period = `1087`.
The current and next sync committee is not updated, and is now in an
outdated state. (current sync committee = `1086` and current sync
committee = `1087`, where it should be current sync committee = `1087`
and current sync committee = `None`)
- Import next sync committee for period `1087` (essentially period
`1088`) fails because the expected next sync committee's roots don't
match.

### Bug
The bug here is that the current and next sync committee's didn't
handover when an update in the next period was received.

### Fix
There are two possible fixes here:
1. Correctly handover sync committees when a header in the next period
is received.
2. Reject updates in the next period until the next sync committee
period is known.

We opted for solution 2, which is more conservative and requires less
changes.

### Polkadot-sdk versions
This fix should be backported in polkadot-sdk versions 1.7 and up.

Snowfork PR: #145

---------

Co-authored-by: Vincent Geddes <[email protected]>
claravanstaden added a commit that referenced this pull request May 16, 2024
…c committee in next store period (paritytech#4478)

While syncing Ethereum consensus updates to the Snowbridge Ethereum
light client, the syncing process stalled due to error
`InvalidSyncCommitteeUpdate` when importing the next sync committee for
period `1087`.

This bug manifested specifically because our light client checkpoint is
a few weeks old (submitted to governance weeks ago) and had to catchup
until a recent block. Since then, we have done thorough testing of the
catchup sync process.

### Symptoms
- Import next sync committee for period `1086` (essentially period
`1087`). Light client store period = `1086`.
- Import header in period `1087`. Light client store period = `1087`.
The current and next sync committee is not updated, and is now in an
outdated state. (current sync committee = `1086` and current sync
committee = `1087`, where it should be current sync committee = `1087`
and current sync committee = `None`)
- Import next sync committee for period `1087` (essentially period
`1088`) fails because the expected next sync committee's roots don't
match.

### Bug
The bug here is that the current and next sync committee's didn't
handover when an update in the next period was received.

### Fix
There are two possible fixes here:
1. Correctly handover sync committees when a header in the next period
is received.
2. Reject updates in the next period until the next sync committee
period is known.

We opted for solution 2, which is more conservative and requires less
changes.

### Polkadot-sdk versions
This fix should be backported in polkadot-sdk versions 1.7 and up.

Snowfork PR: #145

---------

Co-authored-by: Vincent Geddes <[email protected]>
claravanstaden added a commit that referenced this pull request May 16, 2024
…c committee in next store period (paritytech#4478)

While syncing Ethereum consensus updates to the Snowbridge Ethereum
light client, the syncing process stalled due to error
`InvalidSyncCommitteeUpdate` when importing the next sync committee for
period `1087`.

This bug manifested specifically because our light client checkpoint is
a few weeks old (submitted to governance weeks ago) and had to catchup
until a recent block. Since then, we have done thorough testing of the
catchup sync process.

### Symptoms
- Import next sync committee for period `1086` (essentially period
`1087`). Light client store period = `1086`.
- Import header in period `1087`. Light client store period = `1087`.
The current and next sync committee is not updated, and is now in an
outdated state. (current sync committee = `1086` and current sync
committee = `1087`, where it should be current sync committee = `1087`
and current sync committee = `None`)
- Import next sync committee for period `1087` (essentially period
`1088`) fails because the expected next sync committee's roots don't
match.

### Bug
The bug here is that the current and next sync committee's didn't
handover when an update in the next period was received.

### Fix
There are two possible fixes here:
1. Correctly handover sync committees when a header in the next period
is received.
2. Reject updates in the next period until the next sync committee
period is known.

We opted for solution 2, which is more conservative and requires less
changes.

### Polkadot-sdk versions
This fix should be backported in polkadot-sdk versions 1.7 and up.

Snowfork PR: #145

---------

Co-authored-by: Vincent Geddes <[email protected]>
claravanstaden added a commit that referenced this pull request May 16, 2024
…c committee in next store period (paritytech#4478)

While syncing Ethereum consensus updates to the Snowbridge Ethereum
light client, the syncing process stalled due to error
`InvalidSyncCommitteeUpdate` when importing the next sync committee for
period `1087`.

This bug manifested specifically because our light client checkpoint is
a few weeks old (submitted to governance weeks ago) and had to catchup
until a recent block. Since then, we have done thorough testing of the
catchup sync process.

### Symptoms
- Import next sync committee for period `1086` (essentially period
`1087`). Light client store period = `1086`.
- Import header in period `1087`. Light client store period = `1087`.
The current and next sync committee is not updated, and is now in an
outdated state. (current sync committee = `1086` and current sync
committee = `1087`, where it should be current sync committee = `1087`
and current sync committee = `None`)
- Import next sync committee for period `1087` (essentially period
`1088`) fails because the expected next sync committee's roots don't
match.

### Bug
The bug here is that the current and next sync committee's didn't
handover when an update in the next period was received.

### Fix
There are two possible fixes here:
1. Correctly handover sync committees when a header in the next period
is received.
2. Reject updates in the next period until the next sync committee
period is known.

We opted for solution 2, which is more conservative and requires less
changes.

### Polkadot-sdk versions
This fix should be backported in polkadot-sdk versions 1.7 and up.

Snowfork PR: #145

---------

Co-authored-by: Vincent Geddes <[email protected]>
hitchhooker pushed a commit to ibp-network/polkadot-sdk that referenced this pull request Jun 5, 2024
…c committee in next store period (paritytech#4478)

While syncing Ethereum consensus updates to the Snowbridge Ethereum
light client, the syncing process stalled due to error
`InvalidSyncCommitteeUpdate` when importing the next sync committee for
period `1087`.

This bug manifested specifically because our light client checkpoint is
a few weeks old (submitted to governance weeks ago) and had to catchup
until a recent block. Since then, we have done thorough testing of the
catchup sync process.

### Symptoms
- Import next sync committee for period `1086` (essentially period
`1087`). Light client store period = `1086`.
- Import header in period `1087`. Light client store period = `1087`.
The current and next sync committee is not updated, and is now in an
outdated state. (current sync committee = `1086` and current sync
committee = `1087`, where it should be current sync committee = `1087`
and current sync committee = `None`)
- Import next sync committee for period `1087` (essentially period
`1088`) fails because the expected next sync committee's roots don't
match.

### Bug
The bug here is that the current and next sync committee's didn't
handover when an update in the next period was received.

### Fix
There are two possible fixes here:
1. Correctly handover sync committees when a header in the next period
is received.
2. Reject updates in the next period until the next sync committee
period is known.

We opted for solution 2, which is more conservative and requires less
changes.

### Polkadot-sdk versions
This fix should be backported in polkadot-sdk versions 1.7 and up.

Snowfork PR: Snowfork#145

---------

Co-authored-by: Vincent Geddes <[email protected]>
TarekkMA pushed a commit to moonbeam-foundation/polkadot-sdk that referenced this pull request Aug 2, 2024
…c committee in next store period (paritytech#4478)

While syncing Ethereum consensus updates to the Snowbridge Ethereum
light client, the syncing process stalled due to error
`InvalidSyncCommitteeUpdate` when importing the next sync committee for
period `1087`.

This bug manifested specifically because our light client checkpoint is
a few weeks old (submitted to governance weeks ago) and had to catchup
until a recent block. Since then, we have done thorough testing of the
catchup sync process.

### Symptoms
- Import next sync committee for period `1086` (essentially period
`1087`). Light client store period = `1086`.
- Import header in period `1087`. Light client store period = `1087`.
The current and next sync committee is not updated, and is now in an
outdated state. (current sync committee = `1086` and current sync
committee = `1087`, where it should be current sync committee = `1087`
and current sync committee = `None`)
- Import next sync committee for period `1087` (essentially period
`1088`) fails because the expected next sync committee's roots don't
match.

### Bug
The bug here is that the current and next sync committee's didn't
handover when an update in the next period was received.

### Fix
There are two possible fixes here:
1. Correctly handover sync committees when a header in the next period
is received.
2. Reject updates in the next period until the next sync committee
period is known.

We opted for solution 2, which is more conservative and requires less
changes.

### Polkadot-sdk versions
This fix should be backported in polkadot-sdk versions 1.7 and up.

Snowfork PR: Snowfork#145

---------

Co-authored-by: Vincent Geddes <[email protected]>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants