Skip to content
This repository has been archived by the owner on Nov 15, 2023. It is now read-only.

Support running the relay chain node as an extra process #545

Closed
bkchr opened this issue Jul 21, 2021 · 11 comments
Closed

Support running the relay chain node as an extra process #545

bkchr opened this issue Jul 21, 2021 · 11 comments
Assignees
Labels
J0-enhancement An additional feature request.

Comments

@bkchr
Copy link
Member

bkchr commented Jul 21, 2021

The Vision

Currently every parachain node includes a relay chain node. This relay chain node is required to get certain information about the relay chain that are important for the parachain, for example to query what is the best parachain block currently etc. While this makes it relative easy to run a parachain node, it also brings some problems like the high compilation time as we need to compile the parachain node and the relay chain node. Another, a more bigger problem is that parachain developers are required to update their collators when there is a new relay chain release that requires a timely update, because a new host function is added or something in the client code of the relay chain is fixed. So, it would be nice to have the relay chain node running as an extra process. Collator operators would just run an extra relay chain node (that could maybe even shared between multiple parachain nodes, but this is no real initial requirement!) and can freely update the relay chain node. The relay chain node itself could maybe directly bring the functionality required by Cumulus to connect to it or we provide some sort of wrapper (probably the best way for the first implementation).

The Plan

This feature would be implemented in the following order:

  1. Refactor all usages of the polkadot client to have them behind some common trait or maybe multiple traits. So, we should not have any reference to polkadot-service or polkadot-client in any of the "low-level" functionality of Cumulus. It should only use these interfaces to talk to the relay chain.
  2. Write an implementation of these traits for the "in-node relay chain" so that we are back to on par with the current implementation.
  3. Research what is the best way to implement the inter-process communication. Maybe some sort of json rpc over https://crates.io/crates/parity-tokio-ipc or whatever.
  4. Implement the wrapper and make it work. Running the relay chain as an external process should always be some sort of optional way of doing it. So, if the feature is compiled it should be enabled via some cli flag or something.
  5. ...
  6. Profit :P

Open Questions

If you want to help us out and contribute to this issue, in this section you can find open questions and tasks where we would appreciate any input.


Here you can find the board with specific sub-tasks to this milestone:
https://github.com/orgs/paritytech/projects/18/views/8

@xlc
Copy link
Contributor

xlc commented Jul 21, 2021

This should also make it possible to share a relaychain node between multiple parachain nodes?

Will this make it possible to allow relaychain part and parachain part using different versions of Substrate?

@nuke-web3
Copy link
Contributor

Would you be able to have remote relay-nodes that collators could connect to? If so, what requirements (latency, bandwidth, etc.) should be outlined?

@bkchr
Copy link
Member Author

bkchr commented Jul 22, 2021

This should also make it possible to share a relaychain node between multiple parachain nodes?

Yes, as written above. However, I don't see this as an initial requirement for this issue.

Will this make it possible to allow relaychain part and parachain part using different versions of Substrate?

This is the whole point or better, make parachains collators not required to update when the relay chain requires a node update.

@bkchr
Copy link
Member Author

bkchr commented Jul 22, 2021

Would you be able to have remote relay-nodes that collators could connect to? If so, what requirements (latency, bandwidth, etc.) should be outlined?

As written in the issue, maybe. As this is not an initial requirement, I don't think we need to outline anything of that.

@bkchr
Copy link
Member Author

bkchr commented Jul 22, 2021

I also just realized that this will probably be a little bit more complicated for collators, as the connection collators not only read data from the relay chain. They also get called by the overseer when they need to produce a new pov and give it back to the overseer. This would be a little bit more time critical, but should also be solvable.

@skunert
Copy link
Contributor

skunert commented Mar 2, 2022

After #963 was merged, it is now possible to start a parachain full node by passing the address to a relay chain full node. The parachain node will not internally create a relay-chain node but fetch all needed data via RPC. Be aware that we are viewing this as an experimental feature currently.
Note: Collation is not supported at this time

Example command (assumes relay chain full node running locally on ws-port 9944):
polkadot-collator --tmp --relay-chain-rpc-url "ws://localhost:9944"

@crystalin
Copy link

I know this is not directly the target of this change, but in the case of the 1 relay node => many parachain node scenario, it would be better if the parachain nodes can specify multiple --relay-chain-rpc-url for redundancy (like having 20 parachain nodes pointing to the same 2 relay nodes).
This would allow, if one of the relay goes down to still have the parachain node getting synced

@purestaketdb
Copy link

Similar to @crystalin 's feedback - the two main use cases we are interested in are:

  1. Not requiring updates to the collators when the relay chain version changes. This is not initially supported per the comments of July 2021.

  2. Allowing RPC servers to avoid running their own relay chains. However, for resilience, it would be preferable if either a set of relays could be designated OR if it would be supported/safe to run a set of relay RPCs behind a load balancer.

@skunert
Copy link
Contributor

skunert commented Apr 11, 2022

Thanks @crystalin and @purestaketdb for the feedback. Having multiple relay chain nodes to connect to is something I have thought about. I can look into it once the collation over RPC feature is ready.
The main challenge will be to sort out subscription handling for this. Currently, we are listening to RPC subscriptions that notify us about new blocks on the relay chain. To switch relay-chain nodes on the fly, we need to gracefully continue a new subscription where the old one left off.

@skunert
Copy link
Contributor

skunert commented Oct 10, 2022

Experimental support for relay chain collators is now merged. You can try this by passing the --relay-chain-rpc-url argument together with --collator. Network related relay chain args are still respected, so you can pass an extra set of bootnodes or other configs.

Next phase is additional testing to discover potential problems.

Road forward:

  • Investigate having a set of relay chain nodes as suggested
  • Improve error handling, network stability is currently assumed and we are not resilient against failing requests (we recommend to run the relay chain full node locally)
  • Look into improved logging and debuggability

@skunert
Copy link
Contributor

skunert commented Jan 17, 2023

The points of the last comment have been addressed in #1880. Future enhancements can have their separate issues.

@skunert skunert closed this as completed Jan 17, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
J0-enhancement An additional feature request.
Projects
Status: Done
Status: done
Development

No branches or pull requests

7 participants