Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PrepareNextSlot scheduler #4209

Merged
merged 5 commits into from
Jul 6, 2022
Merged

Conversation

twoeths
Copy link
Contributor

@twoeths twoeths commented Jun 28, 2022

Motivation

  • We want to prepare for next payload for skipped slot if one of our connected validators is the proposer for next slot
  • The current PrecomputeEpochTransition does not work in Bellatrix because we always do advanced state transition in importBlock()

Description

  • Rewrite PrepareNextSlot scheduler to be run 4s before next slot
    • For all forks, we'll precompute epoch transition 4s before next epoch (this is the same to what we've implemented so far)
    • For Bellatrix, always do advanced state transition in order to prepare next payload if one of connected validators is block proposer
      • Max 1 epoch transition, this usually happens at the end of an epoch
      • The 4s duration was double checked with lighthouse, I think this should be enough to prepare for next payload
      • Technically we can get proposer list if prepareEpoch === headEpoch but processSlot() is cheap enough, we'll need the next state to prepare payload anyway. If prepareEpoch === headEpoch+1, we need to do epoch transition first anyway to know the proposers
  • Remove the processSlot() call in importBlock()

Screen Shot 2022-06-28 at 09 54 40

Closes #4054

@github-actions
Copy link
Contributor

github-actions bot commented Jun 28, 2022

Performance Report

✔️ no performance regression detected

Full benchmark results
Benchmark suite Current: ce307a9 Previous: 0b4c4bf Ratio
altair processAttestation - 250000 vs - 7PWei normalcase 4.1622 ms/op 3.3217 ms/op 1.25
altair processAttestation - 250000 vs - 7PWei worstcase 7.1367 ms/op 5.3437 ms/op 1.34
altair processAttestation - setStatus - 1/6 committees join 259.95 us/op 197.99 us/op 1.31
altair processAttestation - setStatus - 1/3 committees join 454.94 us/op 380.83 us/op 1.19
altair processAttestation - setStatus - 1/2 committees join 628.73 us/op 534.70 us/op 1.18
altair processAttestation - setStatus - 2/3 committees join 829.47 us/op 688.04 us/op 1.21
altair processAttestation - setStatus - 4/5 committees join 1.1075 ms/op 946.89 us/op 1.17
altair processAttestation - setStatus - 100% committees join 1.3015 ms/op 1.1213 ms/op 1.16
altair processBlock - 250000 vs - 7PWei normalcase 29.299 ms/op 26.953 ms/op 1.09
altair processBlock - 250000 vs - 7PWei normalcase hashState 36.112 ms/op 33.085 ms/op 1.09
altair processBlock - 250000 vs - 7PWei worstcase 89.154 ms/op 79.224 ms/op 1.13
altair processBlock - 250000 vs - 7PWei worstcase hashState 113.63 ms/op 94.359 ms/op 1.20
phase0 processBlock - 250000 vs - 7PWei normalcase 4.0771 ms/op 3.6561 ms/op 1.12
phase0 processBlock - 250000 vs - 7PWei worstcase 53.442 ms/op 41.118 ms/op 1.30
altair processEth1Data - 250000 vs - 7PWei normalcase 759.84 us/op 719.71 us/op 1.06
Tree 40 250000 create 817.24 ms/op 740.85 ms/op 1.10
Tree 40 250000 get(125000) 336.46 ns/op 255.59 ns/op 1.32
Tree 40 250000 set(125000) 2.8372 us/op 2.1483 us/op 1.32
Tree 40 250000 toArray() 36.866 ms/op 29.222 ms/op 1.26
Tree 40 250000 iterate all - toArray() + loop 37.012 ms/op 28.508 ms/op 1.30
Tree 40 250000 iterate all - get(i) 130.35 ms/op 97.567 ms/op 1.34
MutableVector 250000 create 19.166 ms/op 13.906 ms/op 1.38
MutableVector 250000 get(125000) 15.916 ns/op 13.402 ns/op 1.19
MutableVector 250000 set(125000) 700.25 ns/op 534.30 ns/op 1.31
MutableVector 250000 toArray() 8.4071 ms/op 6.2177 ms/op 1.35
MutableVector 250000 iterate all - toArray() + loop 8.3957 ms/op 6.9772 ms/op 1.20
MutableVector 250000 iterate all - get(i) 3.9334 ms/op 3.1084 ms/op 1.27
Array 250000 create 6.9711 ms/op 5.4166 ms/op 1.29
Array 250000 clone - spread 4.5169 ms/op 2.4035 ms/op 1.88
Array 250000 get(125000) 1.8720 ns/op 1.0670 ns/op 1.75
Array 250000 set(125000) 1.9180 ns/op 1.0400 ns/op 1.84
Array 250000 iterate all - loop 203.46 us/op 148.09 us/op 1.37
effectiveBalanceIncrements clone Uint8Array 300000 134.20 us/op 92.091 us/op 1.46
effectiveBalanceIncrements clone MutableVector 300000 789.00 ns/op 617.00 ns/op 1.28
effectiveBalanceIncrements rw all Uint8Array 300000 298.61 us/op 222.75 us/op 1.34
effectiveBalanceIncrements rw all MutableVector 300000 178.63 ms/op 189.16 ms/op 0.94
phase0 afterProcessEpoch - 250000 vs - 7PWei 222.31 ms/op 173.83 ms/op 1.28
phase0 beforeProcessEpoch - 250000 vs - 7PWei 76.652 ms/op 94.236 ms/op 0.81
altair processEpoch - mainnet_e81889 641.99 ms/op 559.83 ms/op 1.15
mainnet_e81889 - altair beforeProcessEpoch 146.40 ms/op 137.59 ms/op 1.06
mainnet_e81889 - altair processJustificationAndFinalization 31.856 us/op 23.015 us/op 1.38
mainnet_e81889 - altair processInactivityUpdates 13.562 ms/op 11.272 ms/op 1.20
mainnet_e81889 - altair processRewardsAndPenalties 151.11 ms/op 129.02 ms/op 1.17
mainnet_e81889 - altair processRegistryUpdates 7.3280 us/op 4.6560 us/op 1.57
mainnet_e81889 - altair processSlashings 2.0450 us/op 888.00 ns/op 2.30
mainnet_e81889 - altair processEth1DataReset 2.0740 us/op 862.00 ns/op 2.41
mainnet_e81889 - altair processEffectiveBalanceUpdates 3.1586 ms/op 2.2376 ms/op 1.41
mainnet_e81889 - altair processSlashingsReset 13.066 us/op 8.2660 us/op 1.58
mainnet_e81889 - altair processRandaoMixesReset 15.625 us/op 8.2710 us/op 1.89
mainnet_e81889 - altair processHistoricalRootsUpdate 3.1400 us/op 1.1990 us/op 2.62
mainnet_e81889 - altair processParticipationFlagUpdates 6.2510 us/op 4.7880 us/op 1.31
mainnet_e81889 - altair processSyncCommitteeUpdates 1.4600 us/op 1.1340 us/op 1.29
mainnet_e81889 - altair afterProcessEpoch 242.23 ms/op 185.03 ms/op 1.31
phase0 processEpoch - mainnet_e58758 660.15 ms/op 486.99 ms/op 1.36
mainnet_e58758 - phase0 beforeProcessEpoch 281.79 ms/op 221.26 ms/op 1.27
mainnet_e58758 - phase0 processJustificationAndFinalization 32.120 us/op 22.017 us/op 1.46
mainnet_e58758 - phase0 processRewardsAndPenalties 86.219 ms/op 112.29 ms/op 0.77
mainnet_e58758 - phase0 processRegistryUpdates 17.497 us/op 12.550 us/op 1.39
mainnet_e58758 - phase0 processSlashings 1.6740 us/op 1.0640 us/op 1.57
mainnet_e58758 - phase0 processEth1DataReset 1.7570 us/op 1.2230 us/op 1.44
mainnet_e58758 - phase0 processEffectiveBalanceUpdates 2.4774 ms/op 1.9063 ms/op 1.30
mainnet_e58758 - phase0 processSlashingsReset 10.475 us/op 6.0520 us/op 1.73
mainnet_e58758 - phase0 processRandaoMixesReset 12.798 us/op 8.5460 us/op 1.50
mainnet_e58758 - phase0 processHistoricalRootsUpdate 1.8850 us/op 1.2040 us/op 1.57
mainnet_e58758 - phase0 processParticipationRecordUpdates 10.890 us/op 7.0180 us/op 1.55
mainnet_e58758 - phase0 afterProcessEpoch 190.85 ms/op 151.07 ms/op 1.26
phase0 processEffectiveBalanceUpdates - 250000 normalcase 3.4620 ms/op 2.8409 ms/op 1.22
phase0 processEffectiveBalanceUpdates - 250000 worstcase 0.5 3.8028 ms/op 2.8143 ms/op 1.35
altair processInactivityUpdates - 250000 normalcase 48.085 ms/op 25.342 ms/op 1.90
altair processInactivityUpdates - 250000 worstcase 37.552 ms/op 30.729 ms/op 1.22
phase0 processRegistryUpdates - 250000 normalcase 15.507 us/op 10.753 us/op 1.44
phase0 processRegistryUpdates - 250000 badcase_full_deposits 482.83 us/op 354.82 us/op 1.36
phase0 processRegistryUpdates - 250000 worstcase 0.5 240.99 ms/op 197.54 ms/op 1.22
altair processRewardsAndPenalties - 250000 normalcase 150.68 ms/op 81.414 ms/op 1.85
altair processRewardsAndPenalties - 250000 worstcase 114.36 ms/op 112.97 ms/op 1.01
phase0 getAttestationDeltas - 250000 normalcase 16.371 ms/op 12.224 ms/op 1.34
phase0 getAttestationDeltas - 250000 worstcase 17.132 ms/op 12.888 ms/op 1.33
phase0 processSlashings - 250000 worstcase 6.5258 ms/op 4.6787 ms/op 1.39
altair processSyncCommitteeUpdates - 250000 350.36 ms/op 269.97 ms/op 1.30
BeaconState.hashTreeRoot - No change 737.00 ns/op 554.00 ns/op 1.33
BeaconState.hashTreeRoot - 1 full validator 80.296 us/op 59.076 us/op 1.36
BeaconState.hashTreeRoot - 32 full validator 755.10 us/op 599.98 us/op 1.26
BeaconState.hashTreeRoot - 512 full validator 7.3390 ms/op 6.2156 ms/op 1.18
BeaconState.hashTreeRoot - 1 validator.effectiveBalance 99.446 us/op 70.213 us/op 1.42
BeaconState.hashTreeRoot - 32 validator.effectiveBalance 1.7178 ms/op 1.2450 ms/op 1.38
BeaconState.hashTreeRoot - 512 validator.effectiveBalance 19.532 ms/op 14.596 ms/op 1.34
BeaconState.hashTreeRoot - 1 balances 72.761 us/op 52.685 us/op 1.38
BeaconState.hashTreeRoot - 32 balances 638.42 us/op 505.86 us/op 1.26
BeaconState.hashTreeRoot - 512 balances 6.1919 ms/op 4.8173 ms/op 1.29
BeaconState.hashTreeRoot - 250000 balances 106.85 ms/op 79.977 ms/op 1.34
aggregationBits - 2048 els - zipIndexesInBitList 41.251 us/op 33.326 us/op 1.24
regular array get 100000 times 80.446 us/op 59.476 us/op 1.35
wrappedArray get 100000 times 80.948 us/op 59.571 us/op 1.36
arrayWithProxy get 100000 times 35.777 ms/op 26.933 ms/op 1.33
ssz.Root.equals 610.00 ns/op 493.00 ns/op 1.24
byteArrayEquals 606.00 ns/op 492.00 ns/op 1.23
shuffle list - 16384 els 13.323 ms/op 10.721 ms/op 1.24
shuffle list - 250000 els 195.65 ms/op 143.48 ms/op 1.36
processSlot - 1 slots 16.073 us/op 10.686 us/op 1.50
processSlot - 32 slots 2.3214 ms/op 1.5259 ms/op 1.52
getEffectiveBalanceIncrementsZeroInactive - 250000 vs - 7PWei 503.37 us/op 379.89 us/op 1.33
getCommitteeAssignments - req 1 vs - 250000 vc 6.3766 ms/op 5.2758 ms/op 1.21
getCommitteeAssignments - req 100 vs - 250000 vc 8.7756 ms/op 7.2743 ms/op 1.21
getCommitteeAssignments - req 1000 vs - 250000 vc 9.4359 ms/op 7.7335 ms/op 1.22
computeProposers - vc 250000 22.393 ms/op 16.469 ms/op 1.36
computeEpochShuffling - vc 250000 197.36 ms/op 146.24 ms/op 1.35
getNextSyncCommittee - vc 250000 332.80 ms/op 269.57 ms/op 1.23
pass gossip attestations to forkchoice per slot 4.1912 ms/op 3.1378 ms/op 1.34
computeDeltas 3.7963 ms/op 3.0972 ms/op 1.23
computeProposerBoostScoreFromBalances 1.0869 ms/op 907.65 us/op 1.20
getPubkeys - index2pubkey - req 1000 vs - 250000 vc 2.4585 ms/op 2.1741 ms/op 1.13
getPubkeys - validatorsArr - req 1000 vs - 250000 vc 92.873 us/op 76.260 us/op 1.22
BLS verify - blst-native 2.2288 ms/op 1.8590 ms/op 1.20
BLS verifyMultipleSignatures 3 - blst-native 4.5223 ms/op 3.8042 ms/op 1.19
BLS verifyMultipleSignatures 8 - blst-native 9.8071 ms/op 8.1923 ms/op 1.20
BLS verifyMultipleSignatures 32 - blst-native 35.703 ms/op 29.686 ms/op 1.20
BLS aggregatePubkeys 32 - blst-native 47.336 us/op 39.640 us/op 1.19
BLS aggregatePubkeys 128 - blst-native 184.09 us/op 153.66 us/op 1.20
getAttestationsForBlock 80.743 ms/op 63.259 ms/op 1.28
isKnown best case - 1 super set check 515.00 ns/op 436.00 ns/op 1.18
isKnown normal case - 2 super set checks 504.00 ns/op 423.00 ns/op 1.19
isKnown worse case - 16 super set checks 505.00 ns/op 373.00 ns/op 1.35
CheckpointStateCache - add get delete 12.549 us/op 10.046 us/op 1.25
validate gossip signedAggregateAndProof - struct 5.1111 ms/op 3.7666 ms/op 1.36
validate gossip attestation - struct 2.4373 ms/op 1.7896 ms/op 1.36
altair verifyImport mainnet_s3766816:31 7.5474 s/op 6.0608 s/op 1.25
pickEth1Vote - no votes 2.3876 ms/op 2.1143 ms/op 1.13
pickEth1Vote - max votes 27.884 ms/op 26.085 ms/op 1.07
pickEth1Vote - Eth1Data hashTreeRoot value x2048 13.735 ms/op 12.406 ms/op 1.11
pickEth1Vote - Eth1Data hashTreeRoot tree x2048 24.605 ms/op 22.310 ms/op 1.10
pickEth1Vote - Eth1Data fastSerialize value x2048 1.8981 ms/op 1.6283 ms/op 1.17
pickEth1Vote - Eth1Data fastSerialize tree x2048 21.026 ms/op 21.140 ms/op 0.99
bytes32 toHexString 1.2400 us/op 1.1170 us/op 1.11
bytes32 Buffer.toString(hex) 800.00 ns/op 720.00 ns/op 1.11
bytes32 Buffer.toString(hex) from Uint8Array 1.0390 us/op 894.00 ns/op 1.16
bytes32 Buffer.toString(hex) + 0x 803.00 ns/op 725.00 ns/op 1.11
Object access 1 prop 0.42300 ns/op 0.37200 ns/op 1.14
Map access 1 prop 0.33500 ns/op 0.30500 ns/op 1.10
Object get x1000 20.393 ns/op 17.762 ns/op 1.15
Map get x1000 1.1560 ns/op 0.97200 ns/op 1.19
Object set x1000 139.15 ns/op 120.12 ns/op 1.16
Map set x1000 83.845 ns/op 74.532 ns/op 1.12
Return object 10000 times 0.43080 ns/op 0.37380 ns/op 1.15
Throw Error 10000 times 7.0673 us/op 5.9862 us/op 1.18
enrSubnets - fastDeserialize 64 bits 3.3050 us/op 2.7140 us/op 1.22
enrSubnets - ssz BitVector 64 bits 899.00 ns/op 752.00 ns/op 1.20
enrSubnets - fastDeserialize 4 bits 455.00 ns/op 437.00 ns/op 1.04
enrSubnets - ssz BitVector 4 bits 887.00 ns/op 765.00 ns/op 1.16
prioritizePeers score -10:0 att 32-0.1 sync 2-0 108.79 us/op 86.211 us/op 1.26
prioritizePeers score 0:0 att 32-0.25 sync 2-0.25 144.27 us/op 108.90 us/op 1.32
prioritizePeers score 0:0 att 32-0.5 sync 2-0.5 245.30 us/op 200.24 us/op 1.23
prioritizePeers score 0:0 att 64-0.75 sync 4-0.75 537.00 us/op 425.30 us/op 1.26
prioritizePeers score 0:0 att 64-1 sync 4-1 536.48 us/op 407.44 us/op 1.32
RateTracker 1000000 limit, 1 obj count per request 219.49 ns/op 172.69 ns/op 1.27
RateTracker 1000000 limit, 2 obj count per request 172.30 ns/op 130.65 ns/op 1.32
RateTracker 1000000 limit, 4 obj count per request 143.62 ns/op 111.87 ns/op 1.28
RateTracker 1000000 limit, 8 obj count per request 127.09 ns/op 99.344 ns/op 1.28
RateTracker with prune 5.7220 us/op 4.1350 us/op 1.38
array of 16000 items push then shift 3.7176 us/op 2.8808 us/op 1.29
LinkedList of 16000 items push then shift 28.076 ns/op 24.524 ns/op 1.14
array of 16000 items push then pop 298.62 ns/op 229.20 ns/op 1.30
LinkedList of 16000 items push then pop 23.109 ns/op 20.944 ns/op 1.10
array of 24000 items push then shift 5.3903 us/op 4.5775 us/op 1.18
LinkedList of 24000 items push then shift 29.520 ns/op 26.709 ns/op 1.11
array of 24000 items push then pop 258.00 ns/op 202.73 ns/op 1.27
LinkedList of 24000 items push then pop 23.258 ns/op 21.199 ns/op 1.10
intersect bitArray bitLen 8 13.780 ns/op 11.694 ns/op 1.18
intersect array and set length 8 202.46 ns/op 169.63 ns/op 1.19
intersect bitArray bitLen 128 83.783 ns/op 72.098 ns/op 1.16
intersect array and set length 128 2.5075 us/op 2.2847 us/op 1.10

by benchmarkbot/action

@twoeths twoeths marked this pull request as ready for review June 28, 2022 13:56
@twoeths twoeths requested a review from a team as a code owner June 28, 2022 13:56
// At 1/3 slot time before the next slot, we either prepare payload or precompute epoch transition
await sleep(slotMs - slotMs / SCHEDULER_LOOKAHEAD_FACTOR, this.signal);

const {slot: headSlot, blockRoot: headRoot} = this.chain.forkChoice.getHead();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we try optimizing head for next slot, i.e. process attestations of this clockSlot as well and call updateHead. it would reduce any reorgs possibility immensely.

@g11tech
Copy link
Contributor

g11tech commented Jun 28, 2022

LGTM @tuyennhv looks amazing! just a small optimization point if we want to do it.

@g11tech g11tech mentioned this pull request Jun 28, 2022
22 tasks
wemeetagain
wemeetagain previously approved these changes Jul 4, 2022
Copy link
Member

@wemeetagain wemeetagain left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, file conflict tho

@twoeths twoeths force-pushed the tuyen/advanced-prepare-payload branch from 272fc60 to eadb139 Compare July 6, 2022 01:33
@wemeetagain wemeetagain merged commit 1b03e81 into unstable Jul 6, 2022
@wemeetagain wemeetagain deleted the tuyen/advanced-prepare-payload branch July 6, 2022 02:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Optimize issuing advance engine fcU for prepare payload
3 participants