Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid doing two validations on transactionsRoot and ReceiptsRoot during sync #5679

Closed
wants to merge 6 commits into from

Conversation

ahamlat
Copy link
Contributor

@ahamlat ahamlat commented Jul 6, 2023

PR description

This PR enhances the Sync process by eliminating duplicate validations for transactions root and receipts root. We've noticed in CPU profiles during sync that receipts root and transactions root are calculated twice. The first time when data is retrieved from other peers, and the second time during block import when validating block body. This PR implements a mechanism to maintain a record of previously validated transactions and receipts roots, ensuring that redundant validations are avoided.

Sync time improvement (on a 4 vCPU/ 32 GiB RAM VM)

image

Without this PR
Checkpoint sync time : 19 hours 54 minutes

With this PR
Checkpoint sync time : 16 hours 42 minutes

CPU profiling

We can notice in the profiling screenshots that Besu does the receipts root validation only once with this PR when getting the receipts from the peers.

Before this PR

image

After this PR

image

@github-actions
Copy link

github-actions bot commented Jul 6, 2023

  • I thought about documentation and added the doc-change-required label to this PR if updates are required.
  • I thought about the changelog and included a changelog update if required.
  • If my PR includes database changes (e.g. KeyValueSegmentIdentifier) I have thought about compatibility and performed forwards and backwards compatibility tests

Signed-off-by: Ameziane H <[email protected]>
…hether or not we already calculated the receipts root

Signed-off-by: Ameziane H <[email protected]>
@ahamlat ahamlat marked this pull request as ready for review July 7, 2023 13:49
@fab-10
Copy link
Contributor

fab-10 commented Jul 7, 2023

The performance improvements is very good, and we definitely need to avoid doing the same work twice, but I want to propose a different implementation, that avoid the modification to the data structures, and specifically avoid making them mutable like for Block, and is also more future proof, since we can't assume that validation when reading from network will not change in future.
My alternative proposal is to use a local cache with the validation results in the validator class, this approach does not imply where the validation have been done, and could be also easily applied also to other validation that we could do more than once.
I am available to discuss and help if you think this alternative implementation is a viable option.

Signed-off-by: Ameziane H <[email protected]>
@ahamlat
Copy link
Contributor Author

ahamlat commented Jul 7, 2023

We can discuss another implementation. My point here is to avoid any overhead in terms of performances, I thought about keeping track of what couple header/receipts was already verified, but this means we need to do another comparaison on header hash (byte array comparaison) which is expensive.

@matkt
Copy link
Contributor

matkt commented Jul 10, 2023

I think the second proposition (cache) seems to be interesting. Having all of the validation result in cache can simplify the code (we just have to modify the validator and not the other parts of code). and can also optimize if we ask several times the same block (not sure if we have this case but maybe in case of a retry)

@non-fungible-nelson non-fungible-nelson added the TeamChupa GH issues worked on by Chupacabara Team label Jul 11, 2023
@macfarla
Copy link
Contributor

@ahamlat is this PR one you're going to come back to or is the outcome to go with a different approach?

@ahamlat
Copy link
Contributor Author

ahamlat commented Sep 21, 2023

@macfarla We didn't decided yet. @fab-10 and @matkt are in favor to ship this PR as it is ?

@non-fungible-nelson
Copy link
Contributor

@ahamlat @fab-10 @matkt - can we agree on an implementation? We could ship this as-is? The improvements are already substantial. Or in favor of a small rework?

@garyschulte
Copy link
Contributor

Can we circle back to this PR and determine whether this is the approach we want to take?

@fab-10
Copy link
Contributor

fab-10 commented Jan 17, 2024

I think a cache based approach, like the one done for #6375 is better

@ahamlat
Copy link
Contributor Author

ahamlat commented Jan 17, 2024

I agree with @fab-10, let me think about how to include better caching mecanism

@siladu
Copy link
Contributor

siladu commented Mar 1, 2024

@ahamlat is this still relevant/in progress?

@ahamlat
Copy link
Contributor Author

ahamlat commented Sep 10, 2024

Closing this PR, I started another one with a better implementation #7595

@ahamlat ahamlat closed this Sep 10, 2024
@ahamlat ahamlat mentioned this pull request Sep 10, 2024
8 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
TeamChupa GH issues worked on by Chupacabara Team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants