Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[R-W Bridge] Redeploy to start relaying/synching #1675

Closed
3 tasks done
Tracked by #1704
EmmanuellNorbertTulbure opened this issue Dec 1, 2022 · 18 comments
Closed
3 tasks done
Tracked by #1704

[R-W Bridge] Redeploy to start relaying/synching #1675

EmmanuellNorbertTulbure opened this issue Dec 1, 2022 · 18 comments
Assignees

Comments

@bkontur
Copy link
Contributor

bkontur commented Jan 9, 2023

@serban300
you could also take this one, now there is no version guard for Ro/Wo,
we are waiting for deployment of bridge-hub-rococo-9302 https://github.com/paritytech/devops/issues/1934#issuecomment-1375554017

so I would like to setup bridge-hub-rococo-9302 as starting version guard for Ro/Wo,
it is not really needed for testnet, but I would like to set it up to be as-close-as-possible to the later Kusama/Polkadot bridge - just to avoid any unexpected surprieses in production :)

@serban300
Copy link
Collaborator

Thanks ! I'll assign it to me then and I think I'll be able to do it these days.

@serban300 serban300 self-assigned this Jan 9, 2023
@bkontur
Copy link
Contributor

bkontur commented Jan 9, 2023

@svyatonik
Slava, do we have/need something for monitoring/alerts about "version guard"?

@svyatonik
Copy link
Contributor

Version guard isn't about monitoring - it is about stopping the process when the relay detects it is connected to node that is using unexpected version. E.g. in Kusama<>Polkadot bridge when one of chains is upgraded, the relay stops (because of version guard) and we shall check that the upgrade isn't breaking our bridge and then we have following options:

  • just recompile relay with upgraded bundled chain version and restart it;
  • wait until the other chain is upgraded (e.g. we may need to wait for Kusama upgrade if Polkadot has been upgraded) - during that time, we shall not start relay;
  • do some coding, republish and restart relay;
  • something else that I can't think of now.

In any way - let me prepare some doc (before Thursday) - we need to discuss how we'll be handling runtime upgrades (have some process, some tools, ...). Maybe we can optimize/reduce number of manual actions. Let's discuss it on Thursday.

@bkontur
Copy link
Contributor

bkontur commented Jan 9, 2023

yes, yes, ok, cool,
I was thinking more about, how do we receive information that relayer was stopped by guard? Do we receive some notification on matrix or how?

@svyatonik
Copy link
Contributor

yes, yes, ok, cool, I was thinking more about, how do we receive information that relayer was stopped by guard? Do we receive some notification on matrix or how?

Most of dashboards will switch to NoData state and we'll get alerts (unless relay is configured to be autorestarted by devops - needs to be checked) -> alerts will go to the matrix room. But the problem is that we'll be getting alerts until we'll get the relay restarted, so ideally we'll start preparing new relay version when we have a release draft and will be ready to restart asap after upgrade. But again - let's discuss it on Thu - maybe we'll come with some better solution.

@svyatonik
Copy link
Contributor

One of possible solutions that I'm currently leaning towards (to decrease our maintenance burden) is to halt on transaction-version change instead of spec-version change. We now don't care about message payload to be delivered before upgrade (vs our previous encoded-call messaging) => probably it'll be better. But we need to discuss that first

@serban300
Copy link
Collaborator

Set the bundle runtime version to {spec_version: 9302, transaction_version: 1} for R/Wococo bridge hubs .

@bkontur @svyatonik do you want to address the upgrade process as part of this issue, or do we need to do anything else here ?

@bkontur
Copy link
Contributor

bkontur commented Jan 11, 2023

yes, I agree with Slava, afaik changed transaction-version means that some extrinsinc (order, arguments,...) was changed, so the encoding for calls is changed,
spec_version is changed every release and does not talk about extrinsics

@bkontur
Copy link
Contributor

bkontur commented Jan 11, 2023

@serban300
this is what I did last time for the last release to Live Rococo/Wococo for substrate-relay:

so, you could also verify and try this "process" and propose improvements.

and two more things, I would suggest:

@serban300
Copy link
Collaborator

Ok, thanks @bkontur for the details ! I'll try these steps.

@serban300
Copy link
Collaborator

The new version of the relayer was deployed.

@bkontur
Copy link
Contributor

bkontur commented Jan 16, 2023

@serban300
Serban, please, try succcessful message delivery on live: https://github.com/paritytech/cumulus/tree/bridge-hub-rococo-wococo/parachains/runtimes/bridge-hubs#live-rococorockmine2---wococowockmint
and if passes, just close this issue

I am working on bumping cumulus bridge-hub-rococo-wococo with new bridges repo, once done, we will redeploy BridgeHub runtimes, and we should see how this version guard will work :)

@serban300
Copy link
Collaborator

serban300 commented Jan 16, 2023

I didn't manage to send any message yet, but I don't know if it's because of the relayers. Rockmine2 might be stuck. If I understand correctly, it hasn't generated a new block in 13h. Not sure how to debug this and if there are any logs for Rockmine2.

@serban300
Copy link
Collaborator

Found Rockmine2 logs

2023-01-16 14:17:36.100  INFO tokio-runtime-worker cumulus-collator: [Parachain] Starting collation. relay_parent=0xf45f7a33369f37ba8924c5737ba56d54f2fda617f2dcbdc4cf55b3ad5e0d3bea at=0xef0e5373cfad9bde6bbc088db0d360aac91af52becf361476c3aca6a903f3bf9
2023-01-16 14:17:36.102  INFO tokio-runtime-worker sc_basic_authorship::basic_authorship: [Parachain] 🙌 Starting consensus session on top of parent 0xef0e5373cfad9bde6bbc088db0d360aac91af52becf361476c3aca6a903f3bf9    
2023-01-16 14:17:36.107  INFO tokio-runtime-worker sc_basic_authorship::basic_authorship: [Parachain] 🎁 Prepared block for proposing at 164620 (2 ms) [hash: 0x2ba5d045b6069b3f76ac6d7116660da9dd596996a4935538f0038fb9474468ed; parent_hash: 0xef0e…3bf9; extrinsics (3): [0xfb81…cbd4, 0x656a…fa9b, 0x83a6…1865]]    
2023-01-16 14:17:36.111  INFO tokio-runtime-worker aura: [Parachain] 🔖 Pre-sealed block for proposal at 164620. Hash now 0x9016afdc2cdb6d6e00d7399c008642fd5356cb49b914c8a78f11c6dea8cea422, previously 0x2ba5d045b6069b3f76ac6d7116660da9dd596996a4935538f0038fb9474468ed.    
2023-01-16 14:17:36.112  WARN tokio-runtime-worker sc_service::client::client: [Parachain] Block import error: State Database error: Too many sibling blocks inserted    
2023-01-16 14:17:36.112  WARN tokio-runtime-worker aura: [Parachain] Error with block built on 0xef0e5373cfad9bde6bbc088db0d360aac91af52becf361476c3aca6a903f3bf9: Import failed: State Database error: Too many sibling blocks inserted    
2023-01-16 14:17:36.112  INFO tokio-runtime-worker cumulus-collator: [Parachain] PoV size { header: 0.181640625kb, extrinsics: 6.5439453125kb, storage_proof: 12.8564453125kb }
2023-01-16 14:17:36.112  INFO tokio-runtime-worker cumulus-collator: [Parachain] Compressed PoV size: 13.2431640625kb
2023-01-16 14:17:36.112 ERROR tokio-runtime-worker cumulus-collator: [Parachain] Failed to collect collation info. error=Application(UnknownBlock("Header was not found in the database: 0x9016afdc2cdb6d6e00d7399c008642fd5356cb49b914c8a78f11c6dea8cea422"))

@serban300
Copy link
Collaborator

Opened https://github.com/paritytech/devops/issues/2261 for fixing this

@bkontur
Copy link
Contributor

bkontur commented Jan 24, 2023

@serban300
everything should be finally redeployed with latest stuff, and looks like they fixed also Rockmine2 here https://github.com/paritytech/devops/issues/2190

@serban300
Copy link
Collaborator

@serban300 Serban, please, try succcessful message delivery on live: https://github.com/paritytech/cumulus/tree/bridge-hub-rococo-wococo/parachains/runtimes/bridge-hubs#live-rococorockmine2---wococowockmint and if passes, just close this issue

I am working on bumping cumulus bridge-hub-rococo-wococo with new bridges repo, once done, we will redeploy BridgeHub runtimes, and we should see how this version guard will work :)

Done. Checked. The message is successfully delivered. Resolving.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants