Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Run preparation for all existing PVFs on the network before a release updating wasmtime #657

Open
1 of 2 tasks
eskimor opened this issue Apr 11, 2023 · 15 comments
Open
1 of 2 tasks
Assignees

Comments

@eskimor
Copy link
Member

eskimor commented Apr 11, 2023

We assume that once a PVF passed pre-checking that it will compile just fine also in the future. We should therefore make sure that whenever we are upgrading wasmtime or doing any changes to the preparation process that all existing/already registered PVFs on the network would still pass pre-checking with those changes.

  • Provide tool for automatically scraping all PVFs from chain and compiling them.
  • Include that tool in the release process

Why is this important?

If previously working PVFs suddenly stop working, even for a single parachain/parathread, relay chain finality would stall until the issue is resolved: Better to discover issues before we hit production.

@ordian I believe you already have some tooling in place which could help with this?
@Sophia-Gold Once we have the tooling, we would need to coordinate with the release team and CI/CD to automate those checks.

Safe path would be to run those compilations as part of each release.

@sandreim
Copy link
Contributor

@eskimor can you detail a bit why would the relay chain stall if a PVF stops working? I was expecting this to either cause disputes or the candidates will not be backed at all.

@eskimor
Copy link
Member Author

eskimor commented Apr 11, 2023

Hah, I actually had this elaboration, but deleted it as not relevant. 😆

For the second: You can not rely on them not being backed, because backers can be malicious. For the first: We don't dispute on preparation errors as those should have been found by pre-checking. So if there are any, it has to be some issue with the local node. The assumption is only correct though if we assume that node upgrades don't break things.

But even if we rolled that decision back, then we would indeed have disputes. Which is not better at all, as honest nodes would get slashed and there could be lots of them.

@Sophia-Gold
Copy link
Contributor

I think we should roll that decision back. It's better to have disputes than the relay chain stopping and slashes can be refunded through governance if it's our fault.

Of course, we should also still have the tests in this issue.

@sandreim
Copy link
Contributor

Thanks, that makes sense

@eskimor
Copy link
Member Author

eskimor commented Apr 16, 2023

I think we should roll that decision back. It's better to have disputes than the relay chain stopping and slashes can be refunded through governance if it's our fault.

Of course, we should also still have the tests in this issue.

I was briefly thinking about that as well, but I don't think it improves the situation: If we rolled back the decision we would have a dispute storm instead, which would either result in security threats or equal liveness threats. Regardless of whether we have a finality stall or a dispute storm: We would need to push a fix to recover the network.

There are arguments on both sides, what would be better: finality stall or dispute storm: But reality is they are both pretty bad and we should have quality control in place to prevent either.

@burdges
Copy link

burdges commented Apr 16, 2023

Afaik there is nothing remotely controversial about your statement:

We should therefore make sure that whenever we are upgrading wasmtime or doing any changes to the preparation process that all existing/already registered PVFs on the network would still pass pre-checking with those changes.

But when? Do we just say governance should always do this on some range of configurations? Or do we make the chain enforce re-votes on re-build results of PVFs? Or do we re-build schedule re-builds as parathread blocks, so then only approval checkers vote yes or no on the re-build.

We might answer this exactly like we answered the original "who builds" question: Should all validators re-build anyways? If yes, then we've no reason to do any fancy approval checkers games. Also if yes then we need everyone to re-build anyways irrespective of if governance checks. If no, then yeah those other options still exist.

Anyways, we do not necessarily have so much choice here, depending upon what a wasmtime upgrade means.

@Sophia-Gold
Copy link
Contributor

Sophia-Gold commented Apr 17, 2023

I was briefly thinking about that as well, but I don't think it improves the situation: If we rolled back the decision we would have a dispute storm instead, which would either result in security threats or equal liveness threats. Regardless of whether we have a finality stall or a dispute storm: We would need to push a fix to recover the network.

There are arguments on both sides, what would be better: finality stall or dispute storm: But reality is they are both pretty bad and we should have quality control in place to prevent either.

I'm not sure it particularly matters if we're able to prevent this situation through testing, but in my understanding it wouldn't result in a dispute storm unless multiple PVFs fail to compile and multiple backers are dishonest. I don't know where the threshold is for multiple disputes halting the relay chain, but we'd for sure want to be able to slash the backers.

To make sure I understand this, what would we do if this test fails? Would the PVF need to be updated before the release? I'm not sure if this is what @burdges is asking.

@burdges
Copy link

burdges commented Apr 17, 2023

We always risk a dispute storm if the PVF or candidate works on some node configurations but not others. It's maybe not a "storm" if we've only one bad parachain, but one could imagine many bad parathreads being caused by some crate. And malicious backers sounds unnecessary. We're less enamored of alternative node configurations now than previously but they remain a reality.

We'll deactivate any PVF which fail to re-compile under the new wasmtime. We do not give PVF authors an opportunity to stall a relay chain upgrade they dislike by crafting a PVF to break it.

All I really said is: If all nodes must rebuild all PVFs anyways then we should just rebuild them all as if they were all re-uploaded afresh, including voting. I do not know if all nodes must rebuild all PVFs anyways.

We could keep both old and new wasmtimes available, so then we slowly transition PVFs from the old wasmtime to the new wasmtime. We transition first all true system parachains/parathreads, second all common good chains in whatever order, third all full parachains but with higher locked dots ones going first, and finally all other parathreads in whatever order, maybe time since last block.

Another suggestion: Any wasmtime update includes a random 32 byte nonce, which by convention we should secretly produce by sha3("relaxed update" || some_secret_random_32_bytes). We never reveal these pre-images, except if we need a fast transition for security reasons then we instead produce this nonce by sha3("security update" || cve_commit) where cve_commit = sha3(cve_number || some_secret_random_32_bytes). We add a "security update reveal" transaction which reveals cve_commit to immediately halts backing untransitioned parachains. We choose when we reveal cve_commit to compromise between security and disruption.

@Polkadot-Forum
Copy link

This issue has been mentioned on Polkadot Forum. There might be relevant details there:

https://forum.polkadot.network/t/ux-implications-of-pvf-executor-environment-versioning/2519/27

@eskimor
Copy link
Member Author

eskimor commented Apr 17, 2023

To be clear, I suggest to have Parity re-check all PVFs as part of the release process of a new Parity node version on some CI machine. I don't think this needs to be done on validators - although that CI machine should run the exact same thing validators would. If any PVF would fail this check, we would not release until we fixed the issue:

  1. Either the release is faulty - just fix it.
  2. Or the PVFs relied on some previous flaw in wasmtime that now got fixed. In that case we would need to ask parachain devs to upgrade their PVFs or disable them via governance or something before the release - or both.

Although Jeff's suggestion makes sense as well, especially the part about disabling parachains in case of a security vulnerability.

But given that we don't want implementation details like the used wasmtime as part of the spec, this seems infeasible.

Any other alternative node implementation team should have similar testing, but that would be their business.

@ordian ordian self-assigned this May 10, 2023
@ordian
Copy link
Member

ordian commented May 16, 2023

The initial version of pvf-checker is available here: https://github.com/paritytech/pvf-checker.

@bkchr
Copy link
Member

bkchr commented May 16, 2023

Nice! It would be really nice to also go back in history. Actually it would be much better to test all ever existing PVFs.

@ordian
Copy link
Member

ordian commented May 17, 2023

It can now accept --at-block <hash> to query PVFs at a specific block hash assuming it's not pruned on the RPC node.
Testing all ever existing PVFs probably requires having a scraper service that will collect all PVFs while syncing from genesis. Some PVFs are already not passing checks like 2268 for Kusama, so I added --skip <para_id> flag for that.

Feel free to open an issue on the repo for any feature requests.

@eskimor
Copy link
Member Author

eskimor commented May 30, 2023

Why would we care for any non current PVFs? It should be enough if anything that is used right now works, future upgrades will be detected by pre-checking ( a bit racy, but still). Previously existing PVFs sound like a nice bonus that can run on a best effort basis?

@bkchr
Copy link
Member

bkchr commented May 31, 2023

Why would we care for any non current PVFs?

To have a bigger testing space. Maybe the current files are not using some niche feature that was used before. Especially in the light of different wasm engines it is better to test with more input data imo.

@Sophia-Gold Sophia-Gold transferred this issue from paritytech/polkadot Aug 24, 2023
claravanstaden added a commit to Snowfork/polkadot-sdk that referenced this issue Dec 8, 2023
)

Co-authored-by: claravanstaden <Cats 4 life!>
serban300 pushed a commit to serban300/polkadot-sdk that referenced this issue Apr 8, 2024
serban300 pushed a commit to serban300/polkadot-sdk that referenced this issue Apr 8, 2024
serban300 pushed a commit to serban300/polkadot-sdk that referenced this issue Apr 8, 2024
serban300 pushed a commit to serban300/polkadot-sdk that referenced this issue Apr 8, 2024
serban300 pushed a commit to serban300/polkadot-sdk that referenced this issue Apr 9, 2024
serban300 pushed a commit to serban300/polkadot-sdk that referenced this issue Apr 9, 2024
serban300 pushed a commit to serban300/polkadot-sdk that referenced this issue Apr 9, 2024
serban300 pushed a commit to serban300/polkadot-sdk that referenced this issue Apr 9, 2024
serban300 pushed a commit to serban300/polkadot-sdk that referenced this issue Apr 9, 2024
serban300 pushed a commit to serban300/polkadot-sdk that referenced this issue Apr 9, 2024
serban300 pushed a commit to serban300/polkadot-sdk that referenced this issue Apr 10, 2024
serban300 pushed a commit to serban300/polkadot-sdk that referenced this issue Apr 10, 2024
bkchr pushed a commit that referenced this issue Apr 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: In Progress
Development

No branches or pull requests

7 participants