Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add circuit breaker for builder #4488

Merged
merged 5 commits into from
Aug 30, 2022
Merged

Conversation

g11tech
Copy link
Contributor

@g11tech g11tech commented Aug 27, 2022

Motivation
In case there is a malevolent or buggy builder, it can take down the entire network by not propagating the blocks

This PR add a circuit breaker for builder in bad network conditions
Description
Closes #4483
Manual Test chronology on ropsten-1

  1. BN started, builder is initalized as disabled , validator produceBlock call comes in: builder flow errors as builder is disabled (and engine produced block is used)
  • image
  1. After sometime, prepareNextSlot scheduler runs for an impending proposal, the health is evaluated, builder status checked via api and builder enabled and the next slot is proposed using builder block
  • image
  1. Post this builder stays enabled and validator keeps using the builder
  • image

@g11tech g11tech requested a review from a team as a code owner August 27, 2022 08:45
@g11tech g11tech force-pushed the g11tech/builder-circuit-breaker branch from 7771b13 to a82e277 Compare August 27, 2022 08:52
@github-actions
Copy link
Contributor

github-actions bot commented Aug 27, 2022

Performance Report

✔️ no performance regression detected

Full benchmark results
Benchmark suite Current: e802c50 Previous: 506630e Ratio
getPubkeys - index2pubkey - req 1000 vs - 250000 vc 2.4324 ms/op 2.1565 ms/op 1.13
getPubkeys - validatorsArr - req 1000 vs - 250000 vc 80.487 us/op 69.709 us/op 1.15
BLS verify - blst-native 2.6039 ms/op 1.6419 ms/op 1.59
BLS verifyMultipleSignatures 3 - blst-native 5.3866 ms/op 3.3636 ms/op 1.60
BLS verifyMultipleSignatures 8 - blst-native 11.867 ms/op 7.2327 ms/op 1.64
BLS verifyMultipleSignatures 32 - blst-native 43.433 ms/op 26.240 ms/op 1.66
BLS aggregatePubkeys 32 - blst-native 58.500 us/op 35.022 us/op 1.67
BLS aggregatePubkeys 128 - blst-native 227.77 us/op 134.35 us/op 1.70
getAttestationsForBlock 186.68 ms/op 169.54 ms/op 1.10
isKnown best case - 1 super set check 511.00 ns/op 439.00 ns/op 1.16
isKnown normal case - 2 super set checks 485.00 ns/op 426.00 ns/op 1.14
isKnown worse case - 16 super set checks 508.00 ns/op 424.00 ns/op 1.20
CheckpointStateCache - add get delete 10.738 us/op 9.2560 us/op 1.16
validate gossip signedAggregateAndProof - struct 6.0433 ms/op 4.2604 ms/op 1.42
validate gossip attestation - struct 2.8828 ms/op 2.0315 ms/op 1.42
altair verifyImport mainnet_s3766816:31 5.8434 s/op 4.6083 s/op 1.27
pickEth1Vote - no votes 2.4943 ms/op 1.9568 ms/op 1.27
pickEth1Vote - max votes 24.951 ms/op 22.652 ms/op 1.10
pickEth1Vote - Eth1Data hashTreeRoot value x2048 13.858 ms/op 10.653 ms/op 1.30
pickEth1Vote - Eth1Data hashTreeRoot tree x2048 23.306 ms/op 22.073 ms/op 1.06
pickEth1Vote - Eth1Data fastSerialize value x2048 1.9088 ms/op 1.2860 ms/op 1.48
pickEth1Vote - Eth1Data fastSerialize tree x2048 17.527 ms/op 12.503 ms/op 1.40
bytes32 toHexString 1.2850 us/op 897.00 ns/op 1.43
bytes32 Buffer.toString(hex) 854.00 ns/op 586.00 ns/op 1.46
bytes32 Buffer.toString(hex) from Uint8Array 1.2480 us/op 804.00 ns/op 1.55
bytes32 Buffer.toString(hex) + 0x 838.00 ns/op 612.00 ns/op 1.37
Object access 1 prop 0.42100 ns/op 0.31900 ns/op 1.32
Map access 1 prop 0.35300 ns/op 0.26400 ns/op 1.34
Object get x1000 16.983 ns/op 15.750 ns/op 1.08
Map get x1000 1.0820 ns/op 0.98900 ns/op 1.09
Object set x1000 118.64 ns/op 108.83 ns/op 1.09
Map set x1000 78.135 ns/op 65.583 ns/op 1.19
Return object 10000 times 0.41870 ns/op 0.33060 ns/op 1.27
Throw Error 10000 times 8.0912 us/op 5.2111 us/op 1.55
enrSubnets - fastDeserialize 64 bits 3.0550 us/op 2.5590 us/op 1.19
enrSubnets - ssz BitVector 64 bits 855.00 ns/op 668.00 ns/op 1.28
enrSubnets - fastDeserialize 4 bits 445.00 ns/op 367.00 ns/op 1.21
enrSubnets - ssz BitVector 4 bits 825.00 ns/op 660.00 ns/op 1.25
prioritizePeers score -10:0 att 32-0.1 sync 2-0 106.33 us/op 93.682 us/op 1.13
prioritizePeers score 0:0 att 32-0.25 sync 2-0.25 151.81 us/op 124.33 us/op 1.22
prioritizePeers score 0:0 att 32-0.5 sync 2-0.5 268.01 us/op 214.60 us/op 1.25
prioritizePeers score 0:0 att 64-0.75 sync 4-0.75 589.61 us/op 452.96 us/op 1.30
prioritizePeers score 0:0 att 64-1 sync 4-1 608.20 us/op 470.17 us/op 1.29
RateTracker 1000000 limit, 1 obj count per request 209.90 ns/op 183.83 ns/op 1.14
RateTracker 1000000 limit, 2 obj count per request 157.86 ns/op 139.03 ns/op 1.14
RateTracker 1000000 limit, 4 obj count per request 130.71 ns/op 114.83 ns/op 1.14
RateTracker 1000000 limit, 8 obj count per request 113.96 ns/op 101.87 ns/op 1.12
RateTracker with prune 5.3890 us/op 4.4300 us/op 1.22
array of 16000 items push then shift 5.1634 us/op 3.1857 us/op 1.62
LinkedList of 16000 items push then shift 32.737 ns/op 29.239 ns/op 1.12
array of 16000 items push then pop 255.09 ns/op 232.10 ns/op 1.10
LinkedList of 16000 items push then pop 25.133 ns/op 24.228 ns/op 1.04
array of 24000 items push then shift 7.7907 us/op 4.0384 us/op 1.93
LinkedList of 24000 items push then shift 36.207 ns/op 31.000 ns/op 1.17
array of 24000 items push then pop 222.93 ns/op 209.29 ns/op 1.07
LinkedList of 24000 items push then pop 27.638 ns/op 22.978 ns/op 1.20
intersect bitArray bitLen 8 12.510 ns/op 10.300 ns/op 1.21
intersect array and set length 8 195.52 ns/op 149.67 ns/op 1.31
intersect bitArray bitLen 128 69.239 ns/op 71.301 ns/op 0.97
intersect array and set length 128 2.3861 us/op 2.2563 us/op 1.06
Buffer.concat 32 items 2.2470 ns/op 1.9430 ns/op 1.16
pass gossip attestations to forkchoice per slot 5.9963 ms/op 5.0002 ms/op 1.20
computeDeltas 5.3263 ms/op 5.2547 ms/op 1.01
computeProposerBoostScoreFromBalances 865.69 us/op 812.70 us/op 1.07
altair processAttestation - 250000 vs - 7PWei normalcase 4.8977 ms/op 3.9656 ms/op 1.24
altair processAttestation - 250000 vs - 7PWei worstcase 7.1567 ms/op 6.3713 ms/op 1.12
altair processAttestation - setStatus - 1/6 committees join 244.59 us/op 194.89 us/op 1.26
altair processAttestation - setStatus - 1/3 committees join 438.51 us/op 372.32 us/op 1.18
altair processAttestation - setStatus - 1/2 committees join 626.27 us/op 516.32 us/op 1.21
altair processAttestation - setStatus - 2/3 committees join 842.37 us/op 739.45 us/op 1.14
altair processAttestation - setStatus - 4/5 committees join 1.1754 ms/op 1.0292 ms/op 1.14
altair processAttestation - setStatus - 100% committees join 1.4059 ms/op 1.1053 ms/op 1.27
altair processBlock - 250000 vs - 7PWei normalcase 37.177 ms/op 28.926 ms/op 1.29
altair processBlock - 250000 vs - 7PWei normalcase hashState 49.668 ms/op 40.486 ms/op 1.23
altair processBlock - 250000 vs - 7PWei worstcase 114.03 ms/op 91.273 ms/op 1.25
altair processBlock - 250000 vs - 7PWei worstcase hashState 138.69 ms/op 91.184 ms/op 1.52
phase0 processBlock - 250000 vs - 7PWei normalcase 4.7627 ms/op 4.0165 ms/op 1.19
phase0 processBlock - 250000 vs - 7PWei worstcase 67.076 ms/op 47.711 ms/op 1.41
altair processEth1Data - 250000 vs - 7PWei normalcase 1.0436 ms/op 1.1943 ms/op 0.87
Tree 40 250000 create 961.07 ms/op 857.90 ms/op 1.12
Tree 40 250000 get(125000) 314.17 ns/op 324.32 ns/op 0.97
Tree 40 250000 set(125000) 2.7495 us/op 2.4159 us/op 1.14
Tree 40 250000 toArray() 35.402 ms/op 36.723 ms/op 0.96
Tree 40 250000 iterate all - toArray() + loop 40.435 ms/op 36.450 ms/op 1.11
Tree 40 250000 iterate all - get(i) 128.10 ms/op 114.13 ms/op 1.12
MutableVector 250000 create 13.384 ms/op 23.956 ms/op 0.56
MutableVector 250000 get(125000) 16.555 ns/op 13.219 ns/op 1.25
MutableVector 250000 set(125000) 669.62 ns/op 526.48 ns/op 1.27
MutableVector 250000 toArray() 7.0052 ms/op 6.9111 ms/op 1.01
MutableVector 250000 iterate all - toArray() + loop 6.6970 ms/op 6.7873 ms/op 0.99
MutableVector 250000 iterate all - get(i) 3.6382 ms/op 2.9053 ms/op 1.25
Array 250000 create 6.2293 ms/op 6.5648 ms/op 0.95
Array 250000 clone - spread 2.6858 ms/op 2.6393 ms/op 1.02
Array 250000 get(125000) 1.1930 ns/op 1.1550 ns/op 1.03
Array 250000 set(125000) 1.2160 ns/op 1.1580 ns/op 1.05
Array 250000 iterate all - loop 139.52 us/op 170.61 us/op 0.82
effectiveBalanceIncrements clone Uint8Array 300000 145.08 us/op 88.579 us/op 1.64
effectiveBalanceIncrements clone MutableVector 300000 688.00 ns/op 813.00 ns/op 0.85
effectiveBalanceIncrements rw all Uint8Array 300000 301.23 us/op 252.69 us/op 1.19
effectiveBalanceIncrements rw all MutableVector 300000 209.10 ms/op 198.85 ms/op 1.05
phase0 afterProcessEpoch - 250000 vs - 7PWei 205.36 ms/op 203.85 ms/op 1.01
phase0 beforeProcessEpoch - 250000 vs - 7PWei 179.85 ms/op 151.66 ms/op 1.19
altair processEpoch - mainnet_e81889 790.50 ms/op 745.36 ms/op 1.06
mainnet_e81889 - altair beforeProcessEpoch 227.39 ms/op 222.65 ms/op 1.02
mainnet_e81889 - altair processJustificationAndFinalization 75.313 us/op 19.891 us/op 3.79
mainnet_e81889 - altair processInactivityUpdates 11.125 ms/op 11.079 ms/op 1.00
mainnet_e81889 - altair processRewardsAndPenalties 203.30 ms/op 194.74 ms/op 1.04
mainnet_e81889 - altair processRegistryUpdates 16.783 us/op 4.0470 us/op 4.15
mainnet_e81889 - altair processSlashings 4.2680 us/op 1.0990 us/op 3.88
mainnet_e81889 - altair processEth1DataReset 4.6730 us/op 872.00 ns/op 5.36
mainnet_e81889 - altair processEffectiveBalanceUpdates 3.1901 ms/op 2.6306 ms/op 1.21
mainnet_e81889 - altair processSlashingsReset 24.579 us/op 5.0270 us/op 4.89
mainnet_e81889 - altair processRandaoMixesReset 27.830 us/op 7.0300 us/op 3.96
mainnet_e81889 - altair processHistoricalRootsUpdate 4.6420 us/op 861.00 ns/op 5.39
mainnet_e81889 - altair processParticipationFlagUpdates 14.321 us/op 3.9840 us/op 3.59
mainnet_e81889 - altair processSyncCommitteeUpdates 3.8950 us/op 1.2430 us/op 3.13
mainnet_e81889 - altair afterProcessEpoch 210.42 ms/op 193.86 ms/op 1.09
phase0 processEpoch - mainnet_e58758 797.55 ms/op 698.92 ms/op 1.14
mainnet_e58758 - phase0 beforeProcessEpoch 344.56 ms/op 298.11 ms/op 1.16
mainnet_e58758 - phase0 processJustificationAndFinalization 73.370 us/op 29.984 us/op 2.45
mainnet_e58758 - phase0 processRewardsAndPenalties 166.92 ms/op 153.23 ms/op 1.09
mainnet_e58758 - phase0 processRegistryUpdates 37.496 us/op 9.0020 us/op 4.17
mainnet_e58758 - phase0 processSlashings 4.2590 us/op 717.00 ns/op 5.94
mainnet_e58758 - phase0 processEth1DataReset 3.8540 us/op 756.00 ns/op 5.10
mainnet_e58758 - phase0 processEffectiveBalanceUpdates 2.5784 ms/op 2.0276 ms/op 1.27
mainnet_e58758 - phase0 processSlashingsReset 23.649 us/op 4.8610 us/op 4.87
mainnet_e58758 - phase0 processRandaoMixesReset 27.257 us/op 5.2170 us/op 5.22
mainnet_e58758 - phase0 processHistoricalRootsUpdate 5.0330 us/op 845.00 ns/op 5.96
mainnet_e58758 - phase0 processParticipationRecordUpdates 28.513 us/op 5.3590 us/op 5.32
mainnet_e58758 - phase0 afterProcessEpoch 175.16 ms/op 160.42 ms/op 1.09
phase0 processEffectiveBalanceUpdates - 250000 normalcase 2.5205 ms/op 2.6387 ms/op 0.96
phase0 processEffectiveBalanceUpdates - 250000 worstcase 0.5 2.8169 ms/op 3.0807 ms/op 0.91
altair processInactivityUpdates - 250000 normalcase 78.293 ms/op 46.468 ms/op 1.68
altair processInactivityUpdates - 250000 worstcase 53.298 ms/op 59.030 ms/op 0.90
phase0 processRegistryUpdates - 250000 normalcase 30.900 us/op 12.692 us/op 2.43
phase0 processRegistryUpdates - 250000 badcase_full_deposits 514.67 us/op 515.63 us/op 1.00
phase0 processRegistryUpdates - 250000 worstcase 0.5 309.35 ms/op 272.41 ms/op 1.14
altair processRewardsAndPenalties - 250000 normalcase 178.58 ms/op 164.22 ms/op 1.09
altair processRewardsAndPenalties - 250000 worstcase 181.11 ms/op 163.48 ms/op 1.11
phase0 getAttestationDeltas - 250000 normalcase 13.873 ms/op 12.302 ms/op 1.13
phase0 getAttestationDeltas - 250000 worstcase 13.952 ms/op 13.233 ms/op 1.05
phase0 processSlashings - 250000 worstcase 7.1880 ms/op 5.4134 ms/op 1.33
altair processSyncCommitteeUpdates - 250000 324.38 ms/op 265.55 ms/op 1.22
BeaconState.hashTreeRoot - No change 527.00 ns/op 529.00 ns/op 1.00
BeaconState.hashTreeRoot - 1 full validator 80.624 us/op 54.874 us/op 1.47
BeaconState.hashTreeRoot - 32 full validator 1.0071 ms/op 630.33 us/op 1.60
BeaconState.hashTreeRoot - 512 full validator 8.2939 ms/op 7.8598 ms/op 1.06
BeaconState.hashTreeRoot - 1 validator.effectiveBalance 207.62 us/op 80.004 us/op 2.60
BeaconState.hashTreeRoot - 32 validator.effectiveBalance 1.4534 ms/op 1.2741 ms/op 1.14
BeaconState.hashTreeRoot - 512 validator.effectiveBalance 18.688 ms/op 16.940 ms/op 1.10
BeaconState.hashTreeRoot - 1 balances 75.283 us/op 64.363 us/op 1.17
BeaconState.hashTreeRoot - 32 balances 842.52 us/op 651.72 us/op 1.29
BeaconState.hashTreeRoot - 512 balances 8.4222 ms/op 6.0489 ms/op 1.39
BeaconState.hashTreeRoot - 250000 balances 131.78 ms/op 101.70 ms/op 1.30
aggregationBits - 2048 els - zipIndexesInBitList 32.125 us/op 28.816 us/op 1.11
regular array get 100000 times 55.462 us/op 67.416 us/op 0.82
wrappedArray get 100000 times 56.409 us/op 67.409 us/op 0.84
arrayWithProxy get 100000 times 34.719 ms/op 34.829 ms/op 1.00
ssz.Root.equals 513.00 ns/op 446.00 ns/op 1.15
byteArrayEquals 505.00 ns/op 464.00 ns/op 1.09
shuffle list - 16384 els 12.307 ms/op 11.175 ms/op 1.10
shuffle list - 250000 els 176.62 ms/op 164.71 ms/op 1.07
processSlot - 1 slots 20.328 us/op 13.186 us/op 1.54
processSlot - 32 slots 2.2918 ms/op 1.6535 ms/op 1.39
getEffectiveBalanceIncrementsZeroInactive - 250000 vs - 7PWei 545.23 us/op 335.41 us/op 1.63
getCommitteeAssignments - req 1 vs - 250000 vc 5.5120 ms/op 5.2798 ms/op 1.04
getCommitteeAssignments - req 100 vs - 250000 vc 7.7124 ms/op 7.3470 ms/op 1.05
getCommitteeAssignments - req 1000 vs - 250000 vc 8.1746 ms/op 7.8097 ms/op 1.05
RootCache.getBlockRootAtSlot - 250000 vs - 7PWei 9.3200 ns/op 10.500 ns/op 0.89
state getBlockRootAtSlot - 250000 vs - 7PWei 1.2069 us/op 989.48 ns/op 1.22
computeProposers - vc 250000 19.547 ms/op 15.870 ms/op 1.23
computeEpochShuffling - vc 250000 181.41 ms/op 169.29 ms/op 1.07
getNextSyncCommittee - vc 250000 327.95 ms/op 267.97 ms/op 1.22

by benchmarkbot/action

// an api call to the builder to check its status, so we will not await for
// its completion here as we want to trigger local execution here without
// delay
this.chain.updateBuilderStatus(clockSlot).catch((e) => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without a more in-depth review this looks problematic. Since this is async there's a racing condition on producing the block and updating the status. Could be that the builder is disabled too late and the block is already produced?

Why not check at the end of every epoch (or some point, say at 2nd slot) the status of the previous epoch and update the status of the builder with it?

  • That allows to have constant metrics of status
  • You can subscribe to block event instead of getSlotsPresent() and have real-time cumulative status of blocks per epoch, which can also go into a metric

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 for counting the slots via event and the corresponding metric

However with regard to calling updateBuilderStatus here: for disabling, this is synchronous because the builder's check status api is only being called for enabling, which if it doesn't respond within a second should anyway be ok to have builder disabled for the next slot's block.

I think its a good idea to check for builder status here before proposing next slot.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However with regard to calling updateBuilderStatus here: for disabling, this is synchronous because the builder's check status api is only being called for enabling

Got it, then can you find a way to make it more obvious on the code that disabling is sync?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@g11tech g11tech force-pushed the g11tech/builder-circuit-breaker branch from 1f8d9de to 4afca90 Compare August 29, 2022 17:12
@g11tech g11tech enabled auto-merge (squash) August 29, 2022 18:56
@dapplion dapplion disabled auto-merge August 30, 2022 15:00
dapplion
dapplion previously approved these changes Aug 30, 2022
Copy link
Contributor

@dapplion dapplion left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent!

Discussed offline to change the ALLOWED_FAULTS value to not follow the spec, as allowing only 1 fault is too sensitive.

@dapplion dapplion enabled auto-merge (squash) August 30, 2022 15:19
@dapplion dapplion merged commit f48bef0 into unstable Aug 30, 2022
@dapplion dapplion deleted the g11tech/builder-circuit-breaker branch August 30, 2022 15:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Failsafe circuit breaker for builder API
2 participants