Run preparation for all existing PVFs on the network before a release updating wasmtime #657

eskimor · 2023-04-11T09:48:57Z

We assume that once a PVF passed pre-checking that it will compile just fine also in the future. We should therefore make sure that whenever we are upgrading wasmtime or doing any changes to the preparation process that all existing/already registered PVFs on the network would still pass pre-checking with those changes.

Provide tool for automatically scraping all PVFs from chain and compiling them.
Include that tool in the release process

Why is this important?

If previously working PVFs suddenly stop working, even for a single parachain/parathread, relay chain finality would stall until the issue is resolved: Better to discover issues before we hit production.

@ordian I believe you already have some tooling in place which could help with this?
@Sophia-Gold Once we have the tooling, we would need to coordinate with the release team and CI/CD to automate those checks.

Safe path would be to run those compilations as part of each release.

sandreim · 2023-04-11T12:37:28Z

@eskimor can you detail a bit why would the relay chain stall if a PVF stops working? I was expecting this to either cause disputes or the candidates will not be backed at all.

eskimor · 2023-04-11T14:48:38Z

Hah, I actually had this elaboration, but deleted it as not relevant. 😆

For the second: You can not rely on them not being backed, because backers can be malicious. For the first: We don't dispute on preparation errors as those should have been found by pre-checking. So if there are any, it has to be some issue with the local node. The assumption is only correct though if we assume that node upgrades don't break things.

But even if we rolled that decision back, then we would indeed have disputes. Which is not better at all, as honest nodes would get slashed and there could be lots of them.

Sophia-Gold · 2023-04-11T15:02:50Z

I think we should roll that decision back. It's better to have disputes than the relay chain stopping and slashes can be refunded through governance if it's our fault.

Of course, we should also still have the tests in this issue.

sandreim · 2023-04-11T15:03:06Z

Thanks, that makes sense

eskimor · 2023-04-16T17:48:37Z

I think we should roll that decision back. It's better to have disputes than the relay chain stopping and slashes can be refunded through governance if it's our fault.

Of course, we should also still have the tests in this issue.

I was briefly thinking about that as well, but I don't think it improves the situation: If we rolled back the decision we would have a dispute storm instead, which would either result in security threats or equal liveness threats. Regardless of whether we have a finality stall or a dispute storm: We would need to push a fix to recover the network.

There are arguments on both sides, what would be better: finality stall or dispute storm: But reality is they are both pretty bad and we should have quality control in place to prevent either.

burdges · 2023-04-16T22:18:55Z

Afaik there is nothing remotely controversial about your statement:

We should therefore make sure that whenever we are upgrading wasmtime or doing any changes to the preparation process that all existing/already registered PVFs on the network would still pass pre-checking with those changes.

But when? Do we just say governance should always do this on some range of configurations? Or do we make the chain enforce re-votes on re-build results of PVFs? Or do we re-build schedule re-builds as parathread blocks, so then only approval checkers vote yes or no on the re-build.

We might answer this exactly like we answered the original "who builds" question: Should all validators re-build anyways? If yes, then we've no reason to do any fancy approval checkers games. Also if yes then we need everyone to re-build anyways irrespective of if governance checks. If no, then yeah those other options still exist.

Anyways, we do not necessarily have so much choice here, depending upon what a wasmtime upgrade means.

Sophia-Gold · 2023-04-17T00:03:54Z

I was briefly thinking about that as well, but I don't think it improves the situation: If we rolled back the decision we would have a dispute storm instead, which would either result in security threats or equal liveness threats. Regardless of whether we have a finality stall or a dispute storm: We would need to push a fix to recover the network.

There are arguments on both sides, what would be better: finality stall or dispute storm: But reality is they are both pretty bad and we should have quality control in place to prevent either.

I'm not sure it particularly matters if we're able to prevent this situation through testing, but in my understanding it wouldn't result in a dispute storm unless multiple PVFs fail to compile and multiple backers are dishonest. I don't know where the threshold is for multiple disputes halting the relay chain, but we'd for sure want to be able to slash the backers.

To make sure I understand this, what would we do if this test fails? Would the PVF need to be updated before the release? I'm not sure if this is what @burdges is asking.

burdges · 2023-04-17T06:41:17Z

We always risk a dispute storm if the PVF or candidate works on some node configurations but not others. It's maybe not a "storm" if we've only one bad parachain, but one could imagine many bad parathreads being caused by some crate. And malicious backers sounds unnecessary. We're less enamored of alternative node configurations now than previously but they remain a reality.

We'll deactivate any PVF which fail to re-compile under the new wasmtime. We do not give PVF authors an opportunity to stall a relay chain upgrade they dislike by crafting a PVF to break it.

All I really said is: If all nodes must rebuild all PVFs anyways then we should just rebuild them all as if they were all re-uploaded afresh, including voting. I do not know if all nodes must rebuild all PVFs anyways.

We could keep both old and new wasmtimes available, so then we slowly transition PVFs from the old wasmtime to the new wasmtime. We transition first all true system parachains/parathreads, second all common good chains in whatever order, third all full parachains but with higher locked dots ones going first, and finally all other parathreads in whatever order, maybe time since last block.

Another suggestion: Any wasmtime update includes a random 32 byte nonce, which by convention we should secretly produce by sha3("relaxed update" || some_secret_random_32_bytes). We never reveal these pre-images, except if we need a fast transition for security reasons then we instead produce this nonce by sha3("security update" || cve_commit) where cve_commit = sha3(cve_number || some_secret_random_32_bytes). We add a "security update reveal" transaction which reveals cve_commit to immediately halts backing untransitioned parachains. We choose when we reveal cve_commit to compromise between security and disruption.

Polkadot-Forum · 2023-04-17T09:51:58Z

This issue has been mentioned on Polkadot Forum. There might be relevant details there:

https://forum.polkadot.network/t/ux-implications-of-pvf-executor-environment-versioning/2519/27

eskimor · 2023-04-17T11:03:40Z

To be clear, I suggest to have Parity re-check all PVFs as part of the release process of a new Parity node version on some CI machine. I don't think this needs to be done on validators - although that CI machine should run the exact same thing validators would. If any PVF would fail this check, we would not release until we fixed the issue:

Either the release is faulty - just fix it.
Or the PVFs relied on some previous flaw in wasmtime that now got fixed. In that case we would need to ask parachain devs to upgrade their PVFs or disable them via governance or something before the release - or both.

Although Jeff's suggestion makes sense as well, especially the part about disabling parachains in case of a security vulnerability.

But given that we don't want implementation details like the used wasmtime as part of the spec, this seems infeasible.

Any other alternative node implementation team should have similar testing, but that would be their business.

ordian · 2023-05-16T13:00:15Z

The initial version of pvf-checker is available here: https://github.com/paritytech/pvf-checker.

bkchr · 2023-05-16T20:43:49Z

Nice! It would be really nice to also go back in history. Actually it would be much better to test all ever existing PVFs.

ordian · 2023-05-17T10:13:47Z

It can now accept --at-block <hash> to query PVFs at a specific block hash assuming it's not pruned on the RPC node.
Testing all ever existing PVFs probably requires having a scraper service that will collect all PVFs while syncing from genesis. Some PVFs are already not passing checks like 2268 for Kusama, so I added --skip <para_id> flag for that.

Feel free to open an issue on the repo for any feature requests.

eskimor · 2023-05-30T18:28:42Z

Why would we care for any non current PVFs? It should be enough if anything that is used right now works, future upgrades will be detected by pre-checking ( a bit racy, but still). Previously existing PVFs sound like a nice bonus that can run on a best effort basis?

bkchr · 2023-05-31T14:39:37Z

Why would we care for any non current PVFs?

To have a bigger testing space. Maybe the current files are not using some niche feature that was used before. Especially in the light of different wasm engines it is better to test with more input data imo.

) Co-authored-by: claravanstaden <Cats 4 life!>

mrcnski mentioned this issue Aug 24, 2023

Executor parameters change may affect PVF validity and executability retrospectively #694

Open

ordian self-assigned this May 10, 2023

Sophia-Gold transferred this issue from paritytech/polkadot Aug 24, 2023

eskimor mentioned this issue Aug 24, 2023

disputes: include a reason for an invalid vote #872

Open

claravanstaden added a commit to Snowfork/polkadot-sdk that referenced this issue Dec 8, 2023

Adds missing beacon client genesis config to chain spec. (paritytech#657

8325b49

) Co-authored-by: claravanstaden <Cats 4 life!>

serban300 pushed a commit to serban300/polkadot-sdk that referenced this issue Apr 8, 2024

Bump log from 0.4.11 to 0.4.13 (paritytech#657)

65b765d

serban300 pushed a commit to serban300/polkadot-sdk that referenced this issue Apr 8, 2024

Bump log from 0.4.11 to 0.4.13 (paritytech#657)

b1c49ef

serban300 pushed a commit to serban300/polkadot-sdk that referenced this issue Apr 8, 2024

Bump log from 0.4.11 to 0.4.13 (paritytech#657)

fbc6111

serban300 pushed a commit to serban300/polkadot-sdk that referenced this issue Apr 8, 2024

Bump log from 0.4.11 to 0.4.13 (paritytech#657)

37115cc

serban300 pushed a commit to serban300/polkadot-sdk that referenced this issue Apr 9, 2024

Bump log from 0.4.11 to 0.4.13 (paritytech#657)

6a1a2a9

serban300 pushed a commit to serban300/polkadot-sdk that referenced this issue Apr 9, 2024

Bump log from 0.4.11 to 0.4.13 (paritytech#657)

1b1b6a9

serban300 pushed a commit to serban300/polkadot-sdk that referenced this issue Apr 9, 2024

Bump log from 0.4.11 to 0.4.13 (paritytech#657)

1919d98

serban300 pushed a commit to serban300/polkadot-sdk that referenced this issue Apr 9, 2024

Bump log from 0.4.11 to 0.4.13 (paritytech#657)

dc01d18

serban300 pushed a commit to serban300/polkadot-sdk that referenced this issue Apr 9, 2024

Bump log from 0.4.11 to 0.4.13 (paritytech#657)

a0f7be1

serban300 pushed a commit to serban300/polkadot-sdk that referenced this issue Apr 9, 2024

Bump log from 0.4.11 to 0.4.13 (paritytech#657)

65ab3c5

serban300 pushed a commit to serban300/polkadot-sdk that referenced this issue Apr 10, 2024

Bump log from 0.4.11 to 0.4.13 (paritytech#657)

febe5b5

serban300 pushed a commit to serban300/polkadot-sdk that referenced this issue Apr 10, 2024

Bump log from 0.4.11 to 0.4.13 (paritytech#657)

0ae5664

bkchr pushed a commit that referenced this issue Apr 10, 2024

Bump log from 0.4.11 to 0.4.13 (#657)

507edb9

ordian mentioned this issue Apr 24, 2024

Do not re-prepare PVFs if not needed #4211

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Run preparation for all existing PVFs on the network before a release updating wasmtime #657

Run preparation for all existing PVFs on the network before a release updating wasmtime #657

eskimor commented Apr 11, 2023 •

edited by ordian

Loading

sandreim commented Apr 11, 2023

eskimor commented Apr 11, 2023

Sophia-Gold commented Apr 11, 2023

sandreim commented Apr 11, 2023

eskimor commented Apr 16, 2023

burdges commented Apr 16, 2023 •

edited

Loading

Sophia-Gold commented Apr 17, 2023 •

edited

Loading

burdges commented Apr 17, 2023 •

edited

Loading

Polkadot-Forum commented Apr 17, 2023

eskimor commented Apr 17, 2023 •

edited

Loading

ordian commented May 16, 2023

bkchr commented May 16, 2023

ordian commented May 17, 2023 •

edited

Loading

eskimor commented May 30, 2023

bkchr commented May 31, 2023

Run preparation for all existing PVFs on the network before a release updating wasmtime #657

Run preparation for all existing PVFs on the network before a release updating wasmtime #657

Comments

eskimor commented Apr 11, 2023 • edited by ordian Loading

Why is this important?

sandreim commented Apr 11, 2023

eskimor commented Apr 11, 2023

Sophia-Gold commented Apr 11, 2023

sandreim commented Apr 11, 2023

eskimor commented Apr 16, 2023

burdges commented Apr 16, 2023 • edited Loading

Sophia-Gold commented Apr 17, 2023 • edited Loading

burdges commented Apr 17, 2023 • edited Loading

Polkadot-Forum commented Apr 17, 2023

eskimor commented Apr 17, 2023 • edited Loading

ordian commented May 16, 2023

bkchr commented May 16, 2023

ordian commented May 17, 2023 • edited Loading

eskimor commented May 30, 2023

bkchr commented May 31, 2023

eskimor commented Apr 11, 2023 •

edited by ordian

Loading

burdges commented Apr 16, 2023 •

edited

Loading

Sophia-Gold commented Apr 17, 2023 •

edited

Loading

burdges commented Apr 17, 2023 •

edited

Loading

eskimor commented Apr 17, 2023 •

edited

Loading

ordian commented May 17, 2023 •

edited

Loading