-
Notifications
You must be signed in to change notification settings - Fork 690
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Light sync state needs a proper format #60
Comments
To give an idea, these are the data structures that smoldot uses for a checkpoint: It also has an equivalent JSON format, which could more or less be standardized and copy-pasted into Substrate: Note that this JSON format is a superset of a checkpoint, as for example |
Sounds very interesting definition to me, in the sense that it would be nice that we have a single encoding output and a way to associate a unique hash to such state. The smoldot checkpoint looks good (three content: block production, block finalization and state). Also I think this could be interesting for full client too, so maybe something a bit generic (array of extension/subsystems), which make me think a binary serialization with schema would be good, but having a versioning of the different components should be enough. IIRC implementing this in substrate may be a bit of work (I remember previously that a bit of grandpa history was needed due to implementation, may have be lifted by recent changes), but probably worth doing anyway. |
This is a bit nit-picky, but JSON makes sense to me. Performance for parsing this doesn't matter, and our chain specs are already in JSON. There's no need to make it more complicated to read than it already is. |
My not JSON is not so much about perf (even if I enjoy compactness), more about having unique hash for the same set of info. |
I can not really follow what you are talking here about @cheme. I mean you can just hash the file and get a unique hash? Or what kind of unique hash do you want to have? |
I was targeting a unique snapshot hash for a state, (not unique hash for a encoded snapshot): having a single valid snapshot hash for a given block. So unique set of info at a state (may not be achievable), and unique encoding for this set of infos (so encoding would not allow different ordering of infos, not allow additional spaces...). But in fact it may be something better abstracted from file format (for state there is already header -> state root that does the job). |
What would be the purpose of that hash? Slightly unrelated: a maybe controversial question is which information exactly to include in the checkpoint. Strictly speaking, you could retrieve GrandPa and Babe information from the chain and the checkpoint would just be a finalized block header (or even hash). The problem with doing that is that you need to find a node that hasn't pruned the storage of that finalized block, otherwise you can't retrieve them. If you use warp syncing, then you need the GrandPa info of the starting point in order to verify the warp sync proof, but you don't need the Babe information because the warp sync will bring you to a block that hasn't been pruned and you can retrieve this Babe info from the chain. Including the Babe information, however, makes it possible to immediately start verifying blocks on top of the checkpoint, without first having to do some networking requests. |
Only use case I got in mind is trusted users signing the hash (of course you can sign if there is multiple hash, but it creates possible issues or less convergence if multiple signing are done on different hash: generally with a single snapshot builder everyone can ends up on same hash, but it sounds less good). In other word if you got two snapshot providers (trusted through https), you know both snaps are containing the same content without having to look inside. The babe question sounds tricky, I would attach it. Still a use case could be to download a snapshot in order to warpsync from it afterward. Then no need for it. |
This issue has been mentioned on Polkadot Forum. There might be relevant details there: https://forum.polkadot.network/t/community-bootnodes-and-snapshot-providers/630/1 |
Another idea: maybe the checkpoint could be just a block header and a Merkle proof containing all the information necessary to construct the state of GrandPa and/or the consensus? The Merkle proof would contain the runtime code. The user of the checkpoint builds the runtime, then performs a few runtime calls to get the state of GrandPa/Babe, and these runtime calls use the Merkle proof as well. This would have the advantage of being simple to define, future proof (we can upgrade to sassafras/others without modifying the format), and can be made deterministic if desired. Having to provide the runtime code bothers me a bit, as it inflates checkpoints quite a lot and is necessary just for the idea to work rather than because you fundamentally need that runtime code. |
It would also have the advantage that it's possible to build a checkpoint through the JSON-RPC API without having to introduce a separate JSON-RPC function just for checkpoints. |
If you agree, what I would suggest is to add a new optional field to the chain spec, something like this: "checkpoint": {
"header": "...",
"callProof": "..."
} The two I'm suggesting a new field rather than put the data inside The proof must contain a storage proof of Once implemented in both Substrate and smoldot, and that everything works, we could immediately remove the old format in Substrate. Only smoldot would need to continue supporting the old one until everyone updates their chain specs. Seems simple enough, no? |
If nobody is opposed, I'm going to implement this in smoldot (smol-dot/smoldot#1186), and submit to Substrate a PR that adds a dummy |
To make sure I understood this correctly:
I'm wondering if the storage proof contains the |
I'm confused by your comment because your vocabulary is very weird. Also, there's no server and client in this context, as this is about the chain spec format, though I understand "server" means "the thing that generates a chain spec" and "client" means "the thing that reads a chain spec". |
I looked around a bit and I think I have a better understanding of this now. I've created this PoC PR in substrate which adds the I was wondering if we should also add the execution proof of:
|
You can find here all the runtime calls that smoldot might do in order to determine the state of the chain: https://github.com/paritytech/smoldot/blob/938055a638ec201c022f680c8e8cbd0349e70ed1/src/chain/chain_information/build.rs#L122-L146 As a side note, one issue I've realized with that idea is that smoldot needs to ask a full node for a proof in order to be able to generate a checkpoint. Smoldot reads the initial list of GrandPa and Babe authorities and parameters by reading from the state or a call proof, then later follows updates to these GrandPa/Babe parameters by reading block headers. Unfortunately, it can't convert back the up-to-date parameters into a checkpoint. |
Would it be possible to have a different format that could easily be built by smoldot without involving full nodes? 🙏 If not, is there anything we could add to increase our confidence in the data provided by full nodes and trust the checkpoint smoldot builds? Ideally, we'd build the checkpoint in an agnostic way which doesn't specifically mention the consensus and allows us to include more information in the format, without breaking changes. However, we could always resort to having specific types that the checkpoint needs (https://github.com/paritytech/smoldot/blob/4da02f21a6278f100153d2878756a19bdfb216f2/src/chain/chain_information.rs#L121-L333). I think having this format specified in the spec would still be beneficial for both users and developers. |
The issue is that Babe and GrandPa are themselves not specced properly. There are ambiguous corner cases (that fortunately can never happen in practice, but remain a theoretical possibility in case of a bug somewhere). Also, for example Babe disabled authorities are I believe not used at all and could be entirely removed. Babe and GrandPa are in this state of limbo where they could get clean-ups but nobody wants to clean them up. In this state of things, properly defining a format that relies on specific Babe/GrandPa concepts (instead of an abstraction over the consensus/finality engine like I suggested above in this issue) is kind of complicated and would likely add more to the "things that need to be cleaned up later". |
I've had a go at trying to spec out babe and grandpa in our rpc-spec-v2 repo: https://github.com/paritytech/json-rpc-interface-spec/pull/124/files. The PR feels a bit too complex and the main downside is that we'll have to maintain quite a few fields to only deprecate them later (once we allocate resources for a cleanup work). At the moment, the checkpoint idea looks the cleanest to me.
Could we do anything to make the checkpoint work for an |
Smoldot doesn't "store" any chain spec other than the one that the user provides when connecting to a chain. |
There's been a lot of confusion. To summarize: While it is connected to a chain, smoldot stores strongly-typed consensus data, which contains mainly the list of Babe and GrandPa authorities plus a few other things. That's what it uses to verify blocks. Right now, the checkpoint format found in chain specs directly contains this strongly-typed consensus data. When you pass a checkpoint to smoldot, it simply loads this data in memory and can immediately verify descendants of the block of the checkpoint. This strongly-typed consensus data can also be calculated by doing a runtime call. This runtime call requires accessing the state of the chain, which requires querying a full node. That's what smoldot does as part of the GrandPa warp sync process. The somewhat elegant solution that I suggested above is to store in the chain spec the fragment of the state of the chain necessary in order to compute this strongly-typed consensus data, instead of the strongly-typed consensus data directly. The problem however is that it is not possible to convert from the strongly-typed consensus data back to the state of the chain (you can't do the inverse of a runtime call). While smoldot stores this strongly-typed consensus data in memory, it can't generate the state of the chain without querying a full node. |
I like this idea. |
Thanks for the info @tomaka 🙏
Would it be acceptable from the light-client perspective to not generate a chainSpec if querying a full node fails? Are there other concerns (ie the full node could give us wrong/malicious information 🤔 ? From the user perspective, not receiving the chainSpec some times might be a bit misleading indeed. I'm wondering if we could have an unstable method to implement for full nodes, that may produce the result most of the times for lightclients as well 🤔 I think specing out Babe/Grandpa consensus should be placed at https://github.com/w3f/polkadot-spec, instead of the rpc-v2 spec repo |
The
sync_state_genSyncSpec
RPC call can be used in order to dump the current finalized state of the chain. This state is commonly called a "checkpoint". This is currently used by smoldot in order to have a starting point for syncing, so that it doesn't have to sync from scratch.Unfortunately, this was sloppily implemented.
Code can be found here: https://github.com/paritytech/substrate/blob/b9927cdca03830d07bc1cdd8b8d80a3d32f64bd1/client/sync-state-rpc/src/lib.rs#L132-L178
We literally just copy some SCALE-encoded blob into the chain spec. The SCALE-encoded blob use internal data structures of Substrate, such as
EpochChanges
.This means that any modification to the
EpochChanges
struct or its children structs modifies the format of the checkpoints and requires an update on the smoldot side (or any other piece of code that would want to decode checkpoints).This happened for example in paritytech/substrate#9284, which modifies this struct. Since it's not hinted anywhere that modifying this struct will break things, it is easy to accidentally do it.
An additional problem is that this blob contains a lot of data that is completely unnecessary. Checkpoints should only contain the minimum amount of information necessary to sync from the current finalized block, whereas right now it also contains all the upcoming epoch changes in all non-finalized forks.
Instead, we should properly define what information is found in these checkpoints, and simplify them.
The text was updated successfully, but these errors were encountered: