Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Warp sync zombienet tests: add basic BEEFY checks #2854

Merged
merged 12 commits into from
Jan 11, 2024

Conversation

serban300
Copy link
Contributor

@serban300 serban300 commented Jan 4, 2024

Part of #2787

This is an initial PR that adds some basic BEEFY checks to the warp sync zombienet tests. To be more specific, it does the following:

  • Changes the snapshot used by the warp sync zombienet tests to one built from an updated version of the kitchensink runtime, that supports BEEFY
  • Adds some basic BEEFY checks to the warp sync zombienet tests
  • Deduplicates some params of the warp sync zombienet tests, making them easier to extend

Note:
For the setup used by 0002-validators-warp-sync, BEEFY doesn't catch up. It just detects the mandatory blocks, but can't get the proofs for them. Probably because no validator node is started from the snapshot (they are all started with warp sync). I would like to investigate this as part of a different issue and merge this PR as it is in order to have some basic BEEFY warp sync tests for the start.

@serban300 serban300 self-assigned this Jan 4, 2024
@serban300 serban300 requested a review from a team as a code owner January 4, 2024 13:59
@serban300 serban300 changed the title [WIP] Add BEEFY finality to warp sync zombienet tests [WIP][Debug] Add BEEFY finality to warp sync zombienet tests Jan 4, 2024
@serban300 serban300 added the T10-tests This PR/Issue is related to tests. label Jan 4, 2024
@serban300 serban300 added the R0-silent Changes should not be mentioned in any release notes label Jan 8, 2024
@serban300 serban300 force-pushed the beefy-test-chain branch 24 times, most recently from feaff7e to 5768b5d Compare January 10, 2024 08:57
@serban300 serban300 changed the title [WIP][Debug] Warp sync zombienet tests: add basic BEEFY checks Warp sync zombienet tests: add basic BEEFY checks Jan 10, 2024
Copy link
Contributor

@acatangiu acatangiu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't go into detailed review of chain-spec.json changes.

Everything else looks good.


chain_spec_path = "chain-spec.json"

[[relaychain.nodes]]
name = "alice"
validator = true

args = ["--state-pruning archive --blocks-pruning archive"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this needed? to my understanding we should need neither state nor block bodies for warp syncing..

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure why it's needed, but without it the nodes from which the warp proof is downloaded generate some errors that go something like State for block ... already discarded when they are requested for warp sync fragments. And the warp sync freezes. But good point. I can investigate this separately. Will take a note.

Copy link
Contributor

@michalkucharczyk michalkucharczyk Jan 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you see this log State for block ... already discarded every time?
I was investigating that some time ago, and it was happening sometimes (10%, maybe 20% depending on CPU load):
#2568

When preparing then database snapshot for warp-sync test we should keep all blocks, so adding this option is desirable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you see this log State for block ... already discarded every time?

Yes, for me this was happening very often. Definitely more than 50% of the runs. Maybe also because the new db is bigger.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, second thought: we don't actually need this params. It will indirectly fix #2568, but the problem will still be there. The database actually does not need to be full archive.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please also re-upload db-snapshot created w/o archive mode.

@michalkucharczyk I just checked this and indeed it solves the warp sync issue, but the chain isn't finalizing blocks because bob can't talk to alice. Is it ok if I leave the archide db for the moment ? I added a comment that this is a workaround and that it should be removed after fixing #2568

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@michalkucharczyk sorry for the ping. WDYT ? Would you have any concern about merging this PR as it is and then switching to a pruned snapshot in a future PR? After #2568 is fixed ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer to have DB without archive mode (because this is more close to scenario used widely in production).

But yeah, let's use archive DB here. I have some ideas for fixing #2568. Please also leave comment there (just not to forget about this). I would also appreciate if you leave the actual toml file that was used for generating DB in repo (with appropriate comment).

Thank you.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok thanks ! Done:

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ty!

@serban300
Copy link
Contributor Author

@michalkucharczyk just adding you as a reviewer in case you want to take a look on this PR that brings some changes to the zombienet tests for warp sync, since judging by git blame looks like you added these tests

@michalkucharczyk
Copy link
Contributor

Please also re-upload db-snapshot created w/o archive mode.

Copy link
Contributor

@Lederstrumpf Lederstrumpf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm; would be nice if @skunert's suggestion can be integrated (in a followup)

substrate/zombienet/0001-basic-warp-sync/README.md Outdated Show resolved Hide resolved
@serban300 serban300 merged commit 578960f into paritytech:master Jan 11, 2024
124 checks passed
@michalkucharczyk
Copy link
Contributor

@serban300 may it be that timeout for beefy is too small?
The test failed here: https://gitlab.parity.io/parity/mirrors/polkadot-sdk/-/jobs/5436619#L3319

@serban300
Copy link
Contributor Author

Yes, looks like we should increase the timeout. Will do

bgallois pushed a commit to duniter/duniter-polkadot-sdk that referenced this pull request Mar 25, 2024
Part of paritytech#2787

This is an initial PR that adds some basic BEEFY checks to the warp sync
zombienet tests. To be more specific, it does the following:
- Changes the snapshot used by the warp sync zombienet tests to one
built from an updated version of the kitchensink runtime, that supports
BEEFY
- Adds some basic BEEFY checks to the warp sync zombienet tests
- Deduplicates some params of the warp sync zombienet tests, making them
easier to extend
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
R0-silent Changes should not be mentioned in any release notes T10-tests This PR/Issue is related to tests.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants