Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add RootCache performance test #4374

Merged
merged 2 commits into from
Aug 7, 2022
Merged

Conversation

dapplion
Copy link
Contributor

@dapplion dapplion commented Aug 6, 2022

Motivation

RootCache may become unnecessary after merging SSZ v2.

Description

  • Add performance test for RootCache
  • Move RootCache out of registerAttestationInBlock

Locally I get

  RootCache.getBlockRootAtSlot
    ✔ RootCache.getBlockRootAtSlot - 250000 vs - 7PWei                 1.164144e+8 ops/s    8.590000 ns/op        -     573120 runs   7.15 s
    ✔ state getBlockRootAtSlot - 250000 vs - 7PWei                        974725.4 ops/s    1.025930 us/op        -       7795 runs   1.47 s

RootCache is faster (always will be since it's just a Map.get). But state is somewhat fast too. 1us * 128 attestations per block is 0.13 ms of processing time that would be added if we remove the RootCache. @tuyennhv what do you think?

@dapplion dapplion requested a review from a team as a code owner August 6, 2022 09:22
@dapplion dapplion mentioned this pull request Aug 6, 2022
2 tasks
@dapplion dapplion changed the title Dapplion/root cache perf Add RootCache performance test Aug 6, 2022
@dapplion dapplion mentioned this pull request Aug 6, 2022
@github-actions
Copy link
Contributor

github-actions bot commented Aug 6, 2022

Performance Report

✔️ no performance regression detected

Full benchmark results
Benchmark suite Current: f662292 Previous: 1da9558 Ratio
getPubkeys - index2pubkey - req 1000 vs - 250000 vc 2.1974 ms/op 2.2951 ms/op 0.96
getPubkeys - validatorsArr - req 1000 vs - 250000 vc 78.532 us/op 80.114 us/op 0.98
BLS verify - blst-native 1.8618 ms/op 1.8636 ms/op 1.00
BLS verifyMultipleSignatures 3 - blst-native 3.8032 ms/op 3.8016 ms/op 1.00
BLS verifyMultipleSignatures 8 - blst-native 8.1883 ms/op 8.1873 ms/op 1.00
BLS verifyMultipleSignatures 32 - blst-native 29.694 ms/op 29.688 ms/op 1.00
BLS aggregatePubkeys 32 - blst-native 39.068 us/op 39.703 us/op 0.98
BLS aggregatePubkeys 128 - blst-native 152.68 us/op 152.82 us/op 1.00
getAttestationsForBlock 173.83 ms/op 170.15 ms/op 1.02
isKnown best case - 1 super set check 429.00 ns/op 446.00 ns/op 0.96
isKnown normal case - 2 super set checks 419.00 ns/op 426.00 ns/op 0.98
isKnown worse case - 16 super set checks 419.00 ns/op 424.00 ns/op 0.99
CheckpointStateCache - add get delete 9.3850 us/op 9.3240 us/op 1.01
validate gossip signedAggregateAndProof - struct 4.2650 ms/op 4.2895 ms/op 0.99
validate gossip attestation - struct 2.0304 ms/op 2.0394 ms/op 1.00
altair verifyImport mainnet_s3766816:31 8.8206 s/op 8.4311 s/op 1.05
pickEth1Vote - no votes 2.1927 ms/op 2.3136 ms/op 0.95
pickEth1Vote - max votes 24.355 ms/op 27.514 ms/op 0.89
pickEth1Vote - Eth1Data hashTreeRoot value x2048 11.880 ms/op 12.684 ms/op 0.94
pickEth1Vote - Eth1Data hashTreeRoot tree x2048 21.847 ms/op 24.352 ms/op 0.90
pickEth1Vote - Eth1Data fastSerialize value x2048 1.6294 ms/op 1.5249 ms/op 1.07
pickEth1Vote - Eth1Data fastSerialize tree x2048 17.542 ms/op 13.500 ms/op 1.30
bytes32 toHexString 1.1400 us/op 1.0830 us/op 1.05
bytes32 Buffer.toString(hex) 772.00 ns/op 707.00 ns/op 1.09
bytes32 Buffer.toString(hex) from Uint8Array 1.0160 us/op 921.00 ns/op 1.10
bytes32 Buffer.toString(hex) + 0x 794.00 ns/op 684.00 ns/op 1.16
Object access 1 prop 0.42000 ns/op 0.34800 ns/op 1.21
Map access 1 prop 0.30600 ns/op 0.28500 ns/op 1.07
Object get x1000 17.981 ns/op 17.434 ns/op 1.03
Map get x1000 1.2460 ns/op 0.99600 ns/op 1.25
Object set x1000 139.55 ns/op 111.64 ns/op 1.25
Map set x1000 87.051 ns/op 69.399 ns/op 1.25
Return object 10000 times 0.37870 ns/op 0.36620 ns/op 1.03
Throw Error 10000 times 5.9277 us/op 5.8734 us/op 1.01
enrSubnets - fastDeserialize 64 bits 3.3500 us/op 2.6350 us/op 1.27
enrSubnets - ssz BitVector 64 bits 818.00 ns/op 817.00 ns/op 1.00
enrSubnets - fastDeserialize 4 bits 457.00 ns/op 365.00 ns/op 1.25
enrSubnets - ssz BitVector 4 bits 813.00 ns/op 761.00 ns/op 1.07
prioritizePeers score -10:0 att 32-0.1 sync 2-0 107.31 us/op 88.325 us/op 1.21
prioritizePeers score 0:0 att 32-0.25 sync 2-0.25 124.10 us/op 124.55 us/op 1.00
prioritizePeers score 0:0 att 32-0.5 sync 2-0.5 228.55 us/op 196.90 us/op 1.16
prioritizePeers score 0:0 att 64-0.75 sync 4-0.75 509.90 us/op 445.97 us/op 1.14
prioritizePeers score 0:0 att 64-1 sync 4-1 468.26 us/op 459.14 us/op 1.02
RateTracker 1000000 limit, 1 obj count per request 203.65 ns/op 183.27 ns/op 1.11
RateTracker 1000000 limit, 2 obj count per request 157.27 ns/op 137.28 ns/op 1.15
RateTracker 1000000 limit, 4 obj count per request 131.78 ns/op 121.38 ns/op 1.09
RateTracker 1000000 limit, 8 obj count per request 122.91 ns/op 110.36 ns/op 1.11
RateTracker with prune 5.5820 us/op 4.8720 us/op 1.15
array of 16000 items push then shift 3.2184 us/op 3.0885 us/op 1.04
LinkedList of 16000 items push then shift 29.856 ns/op 26.854 ns/op 1.11
array of 16000 items push then pop 272.39 ns/op 237.82 ns/op 1.15
LinkedList of 16000 items push then pop 23.598 ns/op 22.578 ns/op 1.05
array of 24000 items push then shift 4.5784 us/op 4.5870 us/op 1.00
LinkedList of 24000 items push then shift 31.105 ns/op 29.432 ns/op 1.06
array of 24000 items push then pop 210.99 ns/op 212.78 ns/op 0.99
LinkedList of 24000 items push then pop 24.247 ns/op 22.399 ns/op 1.08
intersect bitArray bitLen 8 11.585 ns/op 11.585 ns/op 1.00
intersect array and set length 8 180.21 ns/op 171.76 ns/op 1.05
intersect bitArray bitLen 128 72.104 ns/op 72.146 ns/op 1.00
intersect array and set length 128 2.5157 us/op 2.2754 us/op 1.11
Buffer.concat 32 items 1.9870 ns/op 1.9040 ns/op 1.04
pass gossip attestations to forkchoice per slot 3.6539 ms/op 5.0610 ms/op 0.72
computeDeltas 3.8987 ms/op 3.1833 ms/op 1.22
computeProposerBoostScoreFromBalances 937.51 us/op 908.21 us/op 1.03
altair processAttestation - 250000 vs - 7PWei normalcase 4.3781 ms/op 4.3258 ms/op 1.01
altair processAttestation - 250000 vs - 7PWei worstcase 6.5087 ms/op 5.8048 ms/op 1.12
altair processAttestation - setStatus - 1/6 committees join 209.40 us/op 211.21 us/op 0.99
altair processAttestation - setStatus - 1/3 committees join 400.00 us/op 398.12 us/op 1.00
altair processAttestation - setStatus - 1/2 committees join 554.33 us/op 563.49 us/op 0.98
altair processAttestation - setStatus - 2/3 committees join 713.02 us/op 717.32 us/op 0.99
altair processAttestation - setStatus - 4/5 committees join 998.65 us/op 1.0053 ms/op 0.99
altair processAttestation - setStatus - 100% committees join 1.1733 ms/op 1.2062 ms/op 0.97
altair processBlock - 250000 vs - 7PWei normalcase 29.897 ms/op 28.118 ms/op 1.06
altair processBlock - 250000 vs - 7PWei normalcase hashState 34.222 ms/op 39.995 ms/op 0.86
altair processBlock - 250000 vs - 7PWei worstcase 82.497 ms/op 89.033 ms/op 0.93
altair processBlock - 250000 vs - 7PWei worstcase hashState 101.79 ms/op 98.889 ms/op 1.03
phase0 processBlock - 250000 vs - 7PWei normalcase 4.7392 ms/op 4.8621 ms/op 0.97
phase0 processBlock - 250000 vs - 7PWei worstcase 53.628 ms/op 48.357 ms/op 1.11
altair processEth1Data - 250000 vs - 7PWei normalcase 839.50 us/op 919.12 us/op 0.91
Tree 40 250000 create 898.72 ms/op 845.11 ms/op 1.06
Tree 40 250000 get(125000) 294.29 ns/op 296.17 ns/op 0.99
Tree 40 250000 set(125000) 2.5097 us/op 2.8274 us/op 0.89
Tree 40 250000 toArray() 33.292 ms/op 33.305 ms/op 1.00
Tree 40 250000 iterate all - toArray() + loop 34.045 ms/op 33.561 ms/op 1.01
Tree 40 250000 iterate all - get(i) 112.35 ms/op 114.00 ms/op 0.99
MutableVector 250000 create 16.172 ms/op 14.100 ms/op 1.15
MutableVector 250000 get(125000) 14.779 ns/op 14.748 ns/op 1.00
MutableVector 250000 set(125000) 683.70 ns/op 707.41 ns/op 0.97
MutableVector 250000 toArray() 7.6551 ms/op 8.0500 ms/op 0.95
MutableVector 250000 iterate all - toArray() + loop 7.8623 ms/op 8.1962 ms/op 0.96
MutableVector 250000 iterate all - get(i) 3.2831 ms/op 3.5696 ms/op 0.92
Array 250000 create 7.1863 ms/op 7.4622 ms/op 0.96
Array 250000 clone - spread 3.8425 ms/op 3.9085 ms/op 0.98
Array 250000 get(125000) 1.4860 ns/op 1.5990 ns/op 0.93
Array 250000 set(125000) 1.4910 ns/op 1.6570 ns/op 0.90
Array 250000 iterate all - loop 167.93 us/op 167.91 us/op 1.00
effectiveBalanceIncrements clone Uint8Array 300000 95.341 us/op 100.77 us/op 0.95
effectiveBalanceIncrements clone MutableVector 300000 1.1240 us/op 1.1940 us/op 0.94
effectiveBalanceIncrements rw all Uint8Array 300000 255.14 us/op 252.60 us/op 1.01
effectiveBalanceIncrements rw all MutableVector 300000 225.58 ms/op 226.73 ms/op 0.99
phase0 afterProcessEpoch - 250000 vs - 7PWei 188.98 ms/op 202.40 ms/op 0.93
phase0 beforeProcessEpoch - 250000 vs - 7PWei 109.07 ms/op 70.840 ms/op 1.54
altair processEpoch - mainnet_e81889 598.37 ms/op 584.39 ms/op 1.02
mainnet_e81889 - altair beforeProcessEpoch 156.05 ms/op 149.58 ms/op 1.04
mainnet_e81889 - altair processJustificationAndFinalization 23.875 us/op 22.657 us/op 1.05
mainnet_e81889 - altair processInactivityUpdates 12.179 ms/op 11.106 ms/op 1.10
mainnet_e81889 - altair processRewardsAndPenalties 97.457 ms/op 93.442 ms/op 1.04
mainnet_e81889 - altair processRegistryUpdates 5.1640 us/op 5.4360 us/op 0.95
mainnet_e81889 - altair processSlashings 917.00 ns/op 1.3810 us/op 0.66
mainnet_e81889 - altair processEth1DataReset 1.3990 us/op 1.3070 us/op 1.07
mainnet_e81889 - altair processEffectiveBalanceUpdates 2.4542 ms/op 2.5663 ms/op 0.96
mainnet_e81889 - altair processSlashingsReset 7.6610 us/op 9.0140 us/op 0.85
mainnet_e81889 - altair processRandaoMixesReset 8.3070 us/op 6.6750 us/op 1.24
mainnet_e81889 - altair processHistoricalRootsUpdate 1.5310 us/op 953.00 ns/op 1.61
mainnet_e81889 - altair processParticipationFlagUpdates 4.1140 us/op 3.4230 us/op 1.20
mainnet_e81889 - altair processSyncCommitteeUpdates 1.3610 us/op 783.00 ns/op 1.74
mainnet_e81889 - altair afterProcessEpoch 197.17 ms/op 193.25 ms/op 1.02
phase0 processEpoch - mainnet_e58758 555.36 ms/op 512.42 ms/op 1.08
mainnet_e58758 - phase0 beforeProcessEpoch 246.21 ms/op 241.86 ms/op 1.02
mainnet_e58758 - phase0 processJustificationAndFinalization 23.641 us/op 28.706 us/op 0.82
mainnet_e58758 - phase0 processRewardsAndPenalties 123.02 ms/op 81.406 ms/op 1.51
mainnet_e58758 - phase0 processRegistryUpdates 10.948 us/op 10.180 us/op 1.08
mainnet_e58758 - phase0 processSlashings 1.0180 us/op 862.00 ns/op 1.18
mainnet_e58758 - phase0 processEth1DataReset 1.0450 us/op 857.00 ns/op 1.22
mainnet_e58758 - phase0 processEffectiveBalanceUpdates 2.1824 ms/op 2.0069 ms/op 1.09
mainnet_e58758 - phase0 processSlashingsReset 6.0550 us/op 5.6360 us/op 1.07
mainnet_e58758 - phase0 processRandaoMixesReset 8.5930 us/op 6.1060 us/op 1.41
mainnet_e58758 - phase0 processHistoricalRootsUpdate 804.00 ns/op 987.00 ns/op 0.81
mainnet_e58758 - phase0 processParticipationRecordUpdates 4.5970 us/op 4.9700 us/op 0.92
mainnet_e58758 - phase0 afterProcessEpoch 161.17 ms/op 160.25 ms/op 1.01
phase0 processEffectiveBalanceUpdates - 250000 normalcase 2.6299 ms/op 2.6362 ms/op 1.00
phase0 processEffectiveBalanceUpdates - 250000 worstcase 0.5 2.9759 ms/op 3.5447 ms/op 0.84
altair processInactivityUpdates - 250000 normalcase 41.288 ms/op 41.163 ms/op 1.00
altair processInactivityUpdates - 250000 worstcase 50.878 ms/op 41.196 ms/op 1.24
phase0 processRegistryUpdates - 250000 normalcase 10.157 us/op 9.2670 us/op 1.10
phase0 processRegistryUpdates - 250000 badcase_full_deposits 465.91 us/op 432.13 us/op 1.08
phase0 processRegistryUpdates - 250000 worstcase 0.5 227.53 ms/op 215.29 ms/op 1.06
altair processRewardsAndPenalties - 250000 normalcase 141.91 ms/op 130.96 ms/op 1.08
altair processRewardsAndPenalties - 250000 worstcase 132.86 ms/op 88.424 ms/op 1.50
phase0 getAttestationDeltas - 250000 normalcase 13.407 ms/op 15.113 ms/op 0.89
phase0 getAttestationDeltas - 250000 worstcase 13.532 ms/op 15.780 ms/op 0.86
phase0 processSlashings - 250000 worstcase 5.6048 ms/op 5.5991 ms/op 1.00
altair processSyncCommitteeUpdates - 250000 291.61 ms/op 293.34 ms/op 0.99
BeaconState.hashTreeRoot - No change 491.00 ns/op 481.00 ns/op 1.02
BeaconState.hashTreeRoot - 1 full validator 62.269 us/op 64.101 us/op 0.97
BeaconState.hashTreeRoot - 32 full validator 629.11 us/op 719.20 us/op 0.87
BeaconState.hashTreeRoot - 512 full validator 7.0099 ms/op 6.3749 ms/op 1.10
BeaconState.hashTreeRoot - 1 validator.effectiveBalance 81.507 us/op 77.565 us/op 1.05
BeaconState.hashTreeRoot - 32 validator.effectiveBalance 1.2049 ms/op 1.2026 ms/op 1.00
BeaconState.hashTreeRoot - 512 validator.effectiveBalance 15.797 ms/op 15.984 ms/op 0.99
BeaconState.hashTreeRoot - 1 balances 63.316 us/op 60.557 us/op 1.05
BeaconState.hashTreeRoot - 32 balances 664.47 us/op 575.06 us/op 1.16
BeaconState.hashTreeRoot - 512 balances 6.0324 ms/op 5.6359 ms/op 1.07
BeaconState.hashTreeRoot - 250000 balances 94.848 ms/op 94.626 ms/op 1.00
aggregationBits - 2048 els - zipIndexesInBitList 36.116 us/op 29.156 us/op 1.24
regular array get 100000 times 67.412 us/op 67.391 us/op 1.00
wrappedArray get 100000 times 67.448 us/op 67.416 us/op 1.00
arrayWithProxy get 100000 times 28.697 ms/op 34.242 ms/op 0.84
ssz.Root.equals 534.00 ns/op 479.00 ns/op 1.11
byteArrayEquals 523.00 ns/op 458.00 ns/op 1.14
shuffle list - 16384 els 11.095 ms/op 11.001 ms/op 1.01
shuffle list - 250000 els 167.05 ms/op 163.07 ms/op 1.02
processSlot - 1 slots 13.432 us/op 11.761 us/op 1.14
processSlot - 32 slots 1.8378 ms/op 1.7814 ms/op 1.03
getEffectiveBalanceIncrementsZeroInactive - 250000 vs - 7PWei 413.20 us/op 638.42 us/op 0.65
getCommitteeAssignments - req 1 vs - 250000 vc 5.2813 ms/op 5.2805 ms/op 1.00
getCommitteeAssignments - req 100 vs - 250000 vc 7.3278 ms/op 7.3645 ms/op 1.00
getCommitteeAssignments - req 1000 vs - 250000 vc 7.8127 ms/op 7.9839 ms/op 0.98
RootCache.getBlockRootAtSlot - 250000 vs - 7PWei 9.5300 ns/op
state getBlockRootAtSlot - 250000 vs - 7PWei 1.1993 us/op
computeProposers - vc 250000 17.289 ms/op 19.378 ms/op 0.89
computeEpochShuffling - vc 250000 170.44 ms/op 164.83 ms/op 1.03
getNextSyncCommittee - vc 250000 286.06 ms/op 284.97 ms/op 1.00

by benchmarkbot/action

@wemeetagain wemeetagain merged commit a601e81 into unstable Aug 7, 2022
@wemeetagain wemeetagain deleted the dapplion/root-cache-perf branch August 7, 2022 14:46
@twoeths
Copy link
Contributor

twoeths commented Aug 9, 2022

RootCache is faster (always will be since it's just a Map.get). But state is somewhat fast too. 1us * 128 attestations per block is 0.13 ms of processing time that would be added if we remove the RootCache. @tuyennhv what do you think?

@dapplion I think ssz-v2 should have same mechanism to cache loaded nodes, do you know why it takes more time than RootCache? I'd drop it only when the performances between the two are comparable

@dapplion
Copy link
Contributor Author

dapplion commented Aug 9, 2022

I think ssz-v2 should have same mechanism to cache loaded nodes, do you know why it takes more time than RootCache?

To get a root each impl has to:

  • ssz v1: traverse the state tree, traverse blockRoots tree, deserialize node to root
  • ssz v2: use cache to jump directly to root node, deserialize node to root
  • RootCache: single Map.get()

RootCache will always be faster, since it doesn't have to deserialize anything. I don't think it's ever possible for the ssz impl to be as fast as the RootCache. The question is if the optimization justifies having a RootCache there.

@twoeths
Copy link
Contributor

twoeths commented Aug 10, 2022

I think ssz-v2 should have same mechanism to cache loaded nodes, do you know why it takes more time than RootCache?

To get a root each impl has to:

  • ssz v1: traverse the state tree, traverse blockRoots tree, deserialize node to root
  • ssz v2: use cache to jump directly to root node, deserialize node to root
  • RootCache: single Map.get()

RootCache will always be faster, since it doesn't have to deserialize anything. I don't think it's ever possible for the ssz impl to be as fast as the RootCache. The question is if the optimization justifies having a RootCache there.

I see. Tbh, I still don't feel convenient for any changes that's slower 😃

  • RootCache is also used for validatorMonitor metric
  • This is all about calling hashObjectToByteArray() 128 times (and with validatorMonitor, it's 256) for the same data in same flow vs calling it once and cache it, I'd go with the 2nd

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants