Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Decrypt keystores in a thread pool #5357

Merged
merged 13 commits into from
Apr 20, 2023

Conversation

wemeetagain
Copy link
Member

@wemeetagain wemeetagain commented Apr 11, 2023

Motivation

Resolves #4179

Description

  • Refactor the decryptKeystoreDefinitions function
  • Use threads library worker pool implementation
  • poolSize code copied from bls worker thread impl

@github-actions
Copy link
Contributor

github-actions bot commented Apr 11, 2023

Performance Report

✔️ no performance regression detected

Full benchmark results
Benchmark suite Current: 8ce64f4 Previous: 7af22d7 Ratio
getPubkeys - index2pubkey - req 1000 vs - 250000 vc 908.54 us/op 647.57 us/op 1.40
getPubkeys - validatorsArr - req 1000 vs - 250000 vc 45.595 us/op 61.862 us/op 0.74
BLS verify - blst-native 1.2295 ms/op 1.3093 ms/op 0.94
BLS verifyMultipleSignatures 3 - blst-native 2.5025 ms/op 3.0422 ms/op 0.82
BLS verifyMultipleSignatures 8 - blst-native 5.4081 ms/op 5.6455 ms/op 0.96
BLS verifyMultipleSignatures 32 - blst-native 19.610 ms/op 20.870 ms/op 0.94
BLS aggregatePubkeys 32 - blst-native 25.815 us/op 27.192 us/op 0.95
BLS aggregatePubkeys 128 - blst-native 98.664 us/op 106.47 us/op 0.93
getAttestationsForBlock 52.477 ms/op 67.217 ms/op 0.78
isKnown best case - 1 super set check 257.00 ns/op 267.00 ns/op 0.96
isKnown normal case - 2 super set checks 252.00 ns/op 275.00 ns/op 0.92
isKnown worse case - 16 super set checks 250.00 ns/op 263.00 ns/op 0.95
CheckpointStateCache - add get delete 4.9690 us/op 6.1550 us/op 0.81
validate gossip signedAggregateAndProof - struct 2.7221 ms/op 2.9605 ms/op 0.92
validate gossip attestation - struct 1.3076 ms/op 1.3863 ms/op 0.94
pickEth1Vote - no votes 1.2698 ms/op 1.7276 ms/op 0.73
pickEth1Vote - max votes 9.7631 ms/op 14.358 ms/op 0.68
pickEth1Vote - Eth1Data hashTreeRoot value x2048 8.8061 ms/op 10.117 ms/op 0.87
pickEth1Vote - Eth1Data hashTreeRoot tree x2048 14.938 ms/op 17.135 ms/op 0.87
pickEth1Vote - Eth1Data fastSerialize value x2048 640.03 us/op 790.40 us/op 0.81
pickEth1Vote - Eth1Data fastSerialize tree x2048 7.0342 ms/op 6.3514 ms/op 1.11
bytes32 toHexString 498.00 ns/op 755.00 ns/op 0.66
bytes32 Buffer.toString(hex) 336.00 ns/op 451.00 ns/op 0.75
bytes32 Buffer.toString(hex) from Uint8Array 542.00 ns/op 653.00 ns/op 0.83
bytes32 Buffer.toString(hex) + 0x 339.00 ns/op 469.00 ns/op 0.72
Object access 1 prop 0.15300 ns/op 0.32000 ns/op 0.48
Map access 1 prop 0.15500 ns/op 0.19800 ns/op 0.78
Object get x1000 7.4070 ns/op 10.067 ns/op 0.74
Map get x1000 0.59600 ns/op 0.73700 ns/op 0.81
Object set x1000 52.939 ns/op 88.930 ns/op 0.60
Map set x1000 42.615 ns/op 58.098 ns/op 0.73
Return object 10000 times 0.23640 ns/op 0.26960 ns/op 0.88
Throw Error 10000 times 4.0809 us/op 4.6279 us/op 0.88
fastMsgIdFn sha256 / 200 bytes 3.4020 us/op 3.7450 us/op 0.91
fastMsgIdFn h32 xxhash / 200 bytes 270.00 ns/op 356.00 ns/op 0.76
fastMsgIdFn h64 xxhash / 200 bytes 369.00 ns/op 503.00 ns/op 0.73
fastMsgIdFn sha256 / 1000 bytes 11.605 us/op 12.629 us/op 0.92
fastMsgIdFn h32 xxhash / 1000 bytes 429.00 ns/op 481.00 ns/op 0.89
fastMsgIdFn h64 xxhash / 1000 bytes 471.00 ns/op 589.00 ns/op 0.80
fastMsgIdFn sha256 / 10000 bytes 105.70 us/op 112.03 us/op 0.94
fastMsgIdFn h32 xxhash / 10000 bytes 1.9770 us/op 2.0840 us/op 0.95
fastMsgIdFn h64 xxhash / 10000 bytes 1.3700 us/op 1.5220 us/op 0.90
enrSubnets - fastDeserialize 64 bits 1.2840 us/op 1.6630 us/op 0.77
enrSubnets - ssz BitVector 64 bits 476.00 ns/op 920.00 ns/op 0.52
enrSubnets - fastDeserialize 4 bits 180.00 ns/op 258.00 ns/op 0.70
enrSubnets - ssz BitVector 4 bits 526.00 ns/op 725.00 ns/op 0.73
prioritizePeers score -10:0 att 32-0.1 sync 2-0 104.11 us/op 213.68 us/op 0.49
prioritizePeers score 0:0 att 32-0.25 sync 2-0.25 128.43 us/op 180.34 us/op 0.71
prioritizePeers score 0:0 att 32-0.5 sync 2-0.5 160.81 us/op 220.50 us/op 0.73
prioritizePeers score 0:0 att 64-0.75 sync 4-0.75 292.79 us/op 376.11 us/op 0.78
prioritizePeers score 0:0 att 64-1 sync 4-1 368.28 us/op 497.26 us/op 0.74
array of 16000 items push then shift 1.6568 us/op 2.2567 us/op 0.73
LinkedList of 16000 items push then shift 8.8380 ns/op 13.908 ns/op 0.64
array of 16000 items push then pop 85.814 ns/op 122.55 ns/op 0.70
LinkedList of 16000 items push then pop 8.9100 ns/op 9.7110 ns/op 0.92
array of 24000 items push then shift 2.3803 us/op 2.7216 us/op 0.87
LinkedList of 24000 items push then shift 8.6300 ns/op 10.797 ns/op 0.80
array of 24000 items push then pop 75.513 ns/op 90.693 ns/op 0.83
LinkedList of 24000 items push then pop 8.3350 ns/op 9.6320 ns/op 0.87
intersect bitArray bitLen 8 12.977 ns/op 15.008 ns/op 0.86
intersect array and set length 8 73.492 ns/op 129.30 ns/op 0.57
intersect bitArray bitLen 128 42.989 ns/op 49.285 ns/op 0.87
intersect array and set length 128 1.0686 us/op 1.3947 us/op 0.77
Buffer.concat 32 items 2.7030 us/op 3.2790 us/op 0.82
Uint8Array.set 32 items 2.3830 us/op 3.2550 us/op 0.73
pass gossip attestations to forkchoice per slot 3.0124 ms/op 4.8384 ms/op 0.62
computeDeltas 2.9720 ms/op 4.0282 ms/op 0.74
computeProposerBoostScoreFromBalances 1.8025 ms/op 1.8683 ms/op 0.96
altair processAttestation - 250000 vs - 7PWei normalcase 2.2932 ms/op 3.1467 ms/op 0.73
altair processAttestation - 250000 vs - 7PWei worstcase 3.3941 ms/op 4.7276 ms/op 0.72
altair processAttestation - setStatus - 1/6 committees join 146.57 us/op 146.96 us/op 1.00
altair processAttestation - setStatus - 1/3 committees join 285.02 us/op 287.25 us/op 0.99
altair processAttestation - setStatus - 1/2 committees join 375.95 us/op 428.40 us/op 0.88
altair processAttestation - setStatus - 2/3 committees join 471.04 us/op 491.41 us/op 0.96
altair processAttestation - setStatus - 4/5 committees join 664.55 us/op 688.63 us/op 0.97
altair processAttestation - setStatus - 100% committees join 767.47 us/op 797.84 us/op 0.96
altair processBlock - 250000 vs - 7PWei normalcase 15.373 ms/op 19.936 ms/op 0.77
altair processBlock - 250000 vs - 7PWei normalcase hashState 25.446 ms/op 32.018 ms/op 0.79
altair processBlock - 250000 vs - 7PWei worstcase 49.272 ms/op 52.943 ms/op 0.93
altair processBlock - 250000 vs - 7PWei worstcase hashState 69.671 ms/op 70.634 ms/op 0.99
phase0 processBlock - 250000 vs - 7PWei normalcase 1.9690 ms/op 2.3635 ms/op 0.83
phase0 processBlock - 250000 vs - 7PWei worstcase 28.528 ms/op 28.658 ms/op 1.00
altair processEth1Data - 250000 vs - 7PWei normalcase 464.23 us/op 545.61 us/op 0.85
vc - 250000 eb 1 eth1 1 we 0 wn 0 - smpl 15 7.9950 us/op 11.733 us/op 0.68
vc - 250000 eb 0.95 eth1 0.1 we 0.05 wn 0 - smpl 219 24.357 us/op 38.369 us/op 0.63
vc - 250000 eb 0.95 eth1 0.3 we 0.05 wn 0 - smpl 42 11.453 us/op 13.655 us/op 0.84
vc - 250000 eb 0.95 eth1 0.7 we 0.05 wn 0 - smpl 18 8.7030 us/op 10.182 us/op 0.85
vc - 250000 eb 0.1 eth1 0.1 we 0 wn 0 - smpl 1020 99.601 us/op 112.83 us/op 0.88
vc - 250000 eb 0.03 eth1 0.03 we 0 wn 0 - smpl 11777 675.41 us/op 698.26 us/op 0.97
vc - 250000 eb 0.01 eth1 0.01 we 0 wn 0 - smpl 16384 925.58 us/op 941.05 us/op 0.98
vc - 250000 eb 0 eth1 0 we 0 wn 0 - smpl 16384 900.31 us/op 923.90 us/op 0.97
vc - 250000 eb 0 eth1 0 we 0 wn 0 nocache - smpl 16384 2.3383 ms/op 2.5736 ms/op 0.91
vc - 250000 eb 0 eth1 1 we 0 wn 0 - smpl 16384 1.4835 ms/op 1.5783 ms/op 0.94
vc - 250000 eb 0 eth1 1 we 0 wn 0 nocache - smpl 16384 4.0040 ms/op 4.4945 ms/op 0.89
Tree 40 250000 create 340.96 ms/op 310.02 ms/op 1.10
Tree 40 250000 get(125000) 194.45 ns/op 199.09 ns/op 0.98
Tree 40 250000 set(125000) 1.0249 us/op 952.81 ns/op 1.08
Tree 40 250000 toArray() 20.915 ms/op 36.471 ms/op 0.57
Tree 40 250000 iterate all - toArray() + loop 21.550 ms/op 33.504 ms/op 0.64
Tree 40 250000 iterate all - get(i) 71.445 ms/op 95.297 ms/op 0.75
MutableVector 250000 create 11.917 ms/op 17.334 ms/op 0.69
MutableVector 250000 get(125000) 7.1870 ns/op 11.327 ns/op 0.63
MutableVector 250000 set(125000) 372.97 ns/op 761.50 ns/op 0.49
MutableVector 250000 toArray() 3.0813 ms/op 6.0349 ms/op 0.51
MutableVector 250000 iterate all - toArray() + loop 2.9790 ms/op 6.3177 ms/op 0.47
MutableVector 250000 iterate all - get(i) 1.5274 ms/op 3.0386 ms/op 0.50
Array 250000 create 2.8811 ms/op 4.9800 ms/op 0.58
Array 250000 clone - spread 1.2433 ms/op 5.0129 ms/op 0.25
Array 250000 get(125000) 0.67300 ns/op 2.1940 ns/op 0.31
Array 250000 set(125000) 0.67500 ns/op 2.2960 ns/op 0.29
Array 250000 iterate all - loop 109.50 us/op 97.301 us/op 1.13
effectiveBalanceIncrements clone Uint8Array 300000 30.367 us/op 71.980 us/op 0.42
effectiveBalanceIncrements clone MutableVector 300000 373.00 ns/op 1.0040 us/op 0.37
effectiveBalanceIncrements rw all Uint8Array 300000 170.68 us/op 185.98 us/op 0.92
effectiveBalanceIncrements rw all MutableVector 300000 85.144 ms/op 175.70 ms/op 0.48
phase0 afterProcessEpoch - 250000 vs - 7PWei 115.12 ms/op 121.71 ms/op 0.95
phase0 beforeProcessEpoch - 250000 vs - 7PWei 34.427 ms/op 38.440 ms/op 0.90
altair processEpoch - mainnet_e81889 313.31 ms/op 357.56 ms/op 0.88
mainnet_e81889 - altair beforeProcessEpoch 54.857 ms/op 77.734 ms/op 0.71
mainnet_e81889 - altair processJustificationAndFinalization 18.741 us/op 19.624 us/op 0.96
mainnet_e81889 - altair processInactivityUpdates 5.6770 ms/op 5.8527 ms/op 0.97
mainnet_e81889 - altair processRewardsAndPenalties 69.640 ms/op 67.584 ms/op 1.03
mainnet_e81889 - altair processRegistryUpdates 4.7460 us/op 7.2830 us/op 0.65
mainnet_e81889 - altair processSlashings 725.00 ns/op 1.3460 us/op 0.54
mainnet_e81889 - altair processEth1DataReset 812.00 ns/op 1.4600 us/op 0.56
mainnet_e81889 - altair processEffectiveBalanceUpdates 1.3842 ms/op 2.3745 ms/op 0.58
mainnet_e81889 - altair processSlashingsReset 4.7100 us/op 12.164 us/op 0.39
mainnet_e81889 - altair processRandaoMixesReset 4.5210 us/op 12.384 us/op 0.37
mainnet_e81889 - altair processHistoricalRootsUpdate 1.0200 us/op 2.2950 us/op 0.44
mainnet_e81889 - altair processParticipationFlagUpdates 3.0290 us/op 6.0180 us/op 0.50
mainnet_e81889 - altair processSyncCommitteeUpdates 606.00 ns/op 1.4480 us/op 0.42
mainnet_e81889 - altair afterProcessEpoch 127.84 ms/op 154.57 ms/op 0.83
phase0 processEpoch - mainnet_e58758 381.06 ms/op 485.29 ms/op 0.79
mainnet_e58758 - phase0 beforeProcessEpoch 151.73 ms/op 200.51 ms/op 0.76
mainnet_e58758 - phase0 processJustificationAndFinalization 19.839 us/op 36.152 us/op 0.55
mainnet_e58758 - phase0 processRewardsAndPenalties 65.166 ms/op 82.292 ms/op 0.79
mainnet_e58758 - phase0 processRegistryUpdates 8.8480 us/op 16.520 us/op 0.54
mainnet_e58758 - phase0 processSlashings 597.00 ns/op 1.7650 us/op 0.34
mainnet_e58758 - phase0 processEth1DataReset 566.00 ns/op 1.5700 us/op 0.36
mainnet_e58758 - phase0 processEffectiveBalanceUpdates 1.0115 ms/op 1.9639 ms/op 0.52
mainnet_e58758 - phase0 processSlashingsReset 3.8330 us/op 9.4800 us/op 0.40
mainnet_e58758 - phase0 processRandaoMixesReset 4.5180 us/op 14.804 us/op 0.31
mainnet_e58758 - phase0 processHistoricalRootsUpdate 1.4530 us/op 2.1700 us/op 0.67
mainnet_e58758 - phase0 processParticipationRecordUpdates 4.2830 us/op 13.341 us/op 0.32
mainnet_e58758 - phase0 afterProcessEpoch 101.63 ms/op 130.22 ms/op 0.78
phase0 processEffectiveBalanceUpdates - 250000 normalcase 1.2169 ms/op 3.3240 ms/op 0.37
phase0 processEffectiveBalanceUpdates - 250000 worstcase 0.5 1.5815 ms/op 3.5231 ms/op 0.45
altair processInactivityUpdates - 250000 normalcase 27.486 ms/op 50.070 ms/op 0.55
altair processInactivityUpdates - 250000 worstcase 27.935 ms/op 43.586 ms/op 0.64
phase0 processRegistryUpdates - 250000 normalcase 7.6900 us/op 22.738 us/op 0.34
phase0 processRegistryUpdates - 250000 badcase_full_deposits 303.44 us/op 528.22 us/op 0.57
phase0 processRegistryUpdates - 250000 worstcase 0.5 136.16 ms/op 236.26 ms/op 0.58
altair processRewardsAndPenalties - 250000 normalcase 65.016 ms/op 83.466 ms/op 0.78
altair processRewardsAndPenalties - 250000 worstcase 68.623 ms/op 91.067 ms/op 0.75
phase0 getAttestationDeltas - 250000 normalcase 7.1133 ms/op 13.530 ms/op 0.53
phase0 getAttestationDeltas - 250000 worstcase 8.3640 ms/op 14.238 ms/op 0.59
phase0 processSlashings - 250000 worstcase 3.6500 ms/op 6.3006 ms/op 0.58
altair processSyncCommitteeUpdates - 250000 184.00 ms/op 232.59 ms/op 0.79
BeaconState.hashTreeRoot - No change 288.00 ns/op 472.00 ns/op 0.61
BeaconState.hashTreeRoot - 1 full validator 56.426 us/op 76.246 us/op 0.74
BeaconState.hashTreeRoot - 32 full validator 570.31 us/op 730.50 us/op 0.78
BeaconState.hashTreeRoot - 512 full validator 5.9867 ms/op 9.8773 ms/op 0.61
BeaconState.hashTreeRoot - 1 validator.effectiveBalance 67.928 us/op 98.458 us/op 0.69
BeaconState.hashTreeRoot - 32 validator.effectiveBalance 982.62 us/op 1.4364 ms/op 0.68
BeaconState.hashTreeRoot - 512 validator.effectiveBalance 12.262 ms/op 18.744 ms/op 0.65
BeaconState.hashTreeRoot - 1 balances 52.955 us/op 65.632 us/op 0.81
BeaconState.hashTreeRoot - 32 balances 483.43 us/op 573.96 us/op 0.84
BeaconState.hashTreeRoot - 512 balances 4.6829 ms/op 6.9995 ms/op 0.67
BeaconState.hashTreeRoot - 250000 balances 77.913 ms/op 112.88 ms/op 0.69
aggregationBits - 2048 els - zipIndexesInBitList 17.809 us/op 48.217 us/op 0.37
regular array get 100000 times 34.333 us/op 76.364 us/op 0.45
wrappedArray get 100000 times 34.221 us/op 71.541 us/op 0.48
arrayWithProxy get 100000 times 15.816 ms/op 19.747 ms/op 0.80
ssz.Root.equals 581.00 ns/op 808.00 ns/op 0.72
byteArrayEquals 583.00 ns/op 827.00 ns/op 0.70
shuffle list - 16384 els 7.2053 ms/op 9.2346 ms/op 0.78
shuffle list - 250000 els 104.21 ms/op 136.28 ms/op 0.76
processSlot - 1 slots 8.7780 us/op 15.744 us/op 0.56
processSlot - 32 slots 1.4046 ms/op 2.2115 ms/op 0.64
getEffectiveBalanceIncrementsZeroInactive - 250000 vs - 7PWei 36.903 ms/op 50.565 ms/op 0.73
getCommitteeAssignments - req 1 vs - 250000 vc 3.0728 ms/op 4.1626 ms/op 0.74
getCommitteeAssignments - req 100 vs - 250000 vc 4.5335 ms/op 6.1764 ms/op 0.73
getCommitteeAssignments - req 1000 vs - 250000 vc 4.7744 ms/op 7.4565 ms/op 0.64
RootCache.getBlockRootAtSlot - 250000 vs - 7PWei 5.6300 ns/op 6.5400 ns/op 0.86
state getBlockRootAtSlot - 250000 vs - 7PWei 711.07 ns/op 1.2068 us/op 0.59
computeProposers - vc 250000 11.450 ms/op 16.892 ms/op 0.68
computeEpochShuffling - vc 250000 109.29 ms/op 151.55 ms/op 0.72
getNextSyncCommittee - vc 250000 188.91 ms/op 297.02 ms/op 0.64
computeSigningRoot for AttestationData 15.271 us/op 19.720 us/op 0.77
hash AttestationData serialized data then Buffer.toString(base64) 2.6940 us/op 3.9235 us/op 0.69
toHexString serialized data 1.3243 us/op 2.9025 us/op 0.46
Buffer.toString(base64) 405.55 ns/op 635.13 ns/op 0.64

by benchmarkbot/action

dapplion
dapplion previously approved these changes Apr 12, 2023
Copy link
Contributor

@dapplion dapplion left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good, some minor comments

@nflaig
Copy link
Member

nflaig commented Apr 15, 2023

the results importing 2000 keys on

CPU(s):                          12
Thread(s) per core:              2
Core(s) per socket:              6
Socket(s):                       1

unstable: 19.7 keys/m

Apr-15 13:43:59.725[]                 info: Lodestar network=goerli, version=v1.6.0/9eb2ca3, commit=9eb2ca36ad4f12fa7cf39cffd96a33e41a55bf4b
Apr-15 13:43:59.727[]                 info: Connecting to LevelDB database path=/home/devops/goerli/data/validator/validator-db
Apr-15 13:44:09.763[]                 info: 0% of keystores imported. current=1 total=2000 rate=6.00keys/m
Apr-15 13:44:19.763[]                 info: 0% of keystores imported. current=4 total=2000 rate=18.00keys/m
Apr-15 13:44:29.764[]                 info: 0% of keystores imported. current=8 total=2000 rate=24.00keys/m
Apr-15 13:44:39.763[]                 info: 1% of keystores imported. current=11 total=2000 rate=18.00keys/m
Apr-15 13:44:49.763[]                 info: 1% of keystores imported. current=14 total=2000 rate=18.00keys/m
Apr-15 13:44:59.764[]                 info: 1% of keystores imported. current=18 total=2000 rate=24.00keys/m
Apr-15 13:45:09.764[]                 info: 1% of keystores imported. current=21 total=2000 rate=18.00keys/m
Apr-15 13:45:19.763[]                 info: 1% of keystores imported. current=24 total=2000 rate=18.00keys/m
Apr-15 13:45:29.764[]                 info: 1% of keystores imported. current=27 total=2000 rate=18.00keys/m
Apr-15 13:45:39.764[]                 info: 2% of keystores imported. current=31 total=2000 rate=24.00keys/m
Apr-15 13:45:49.763[]                 info: 2% of keystores imported. current=34 total=2000 rate=18.00keys/m
Apr-15 13:45:59.763[]                 info: 2% of keystores imported. current=37 total=2000 rate=18.00keys/m
Apr-15 13:46:09.764[]                 info: 2% of keystores imported. current=40 total=2000 rate=18.00keys/m
Apr-15 13:46:19.764[]                 info: 2% of keystores imported. current=44 total=2000 rate=24.00keys/m
Apr-15 13:46:29.763[]                 info: 2% of keystores imported. current=47 total=2000 rate=18.00keys/m

with transferable objects: 100.5 keys/m

Apr-15 13:48:07.727[]                 info: Lodestar network=goerli, version=v1.6.0/cayman/parallel-keystore-decrypt/0096e30, commit=0096e304ac3333a291d28ff2f40b19deb948ff88
Apr-15 13:48:07.729[]                 info: Connecting to LevelDB database path=/home/devops/goerli/data/validator/validator-db
Apr-15 13:48:17.753[]                 info: 100% of keystores imported. current=0 total=2000 rate=0.00keys/m
Apr-15 13:48:27.755[]                 info: 1% of keystores imported. current=24 total=2000 rate=143.97keys/m
Apr-15 13:48:37.757[]                 info: 2% of keystores imported. current=36 total=2000 rate=71.99keys/m
Apr-15 13:48:47.756[]                 info: 3% of keystores imported. current=56 total=2000 rate=120.01keys/m
Apr-15 13:48:57.757[]                 info: 4% of keystores imported. current=71 total=2000 rate=89.99keys/m
Apr-15 13:49:07.756[]                 info: 4% of keystores imported. current=88 total=2000 rate=102.01keys/m
Apr-15 13:49:17.758[]                 info: 5% of keystores imported. current=108 total=2000 rate=119.98keys/m
Apr-15 13:49:27.761[]                 info: 6% of keystores imported. current=117 total=2000 rate=53.98keys/m
Apr-15 13:49:37.761[]                 info: 7% of keystores imported. current=143 total=2000 rate=156.00keys/m
Apr-15 13:49:47.761[]                 info: 8% of keystores imported. current=155 total=2000 rate=72.00keys/m
Apr-15 13:49:57.761[]                 info: 9% of keystores imported. current=179 total=2000 rate=144.00keys/m

without transferable objects: 102.2 keys/m

Apr-15 13:51:55.473[]                 info: Lodestar network=goerli, version=v1.6.0/65a8668, commit=65a86684c54005c935e99e5eeab7497801dab1d7
Apr-15 13:51:55.476[]                 info: Connecting to LevelDB database path=/home/devops/goerli/data/validator/validator-db
Apr-15 13:52:05.507[]                 info: 100% of keystores imported. current=0 total=2000 rate=0.00keys/m
Apr-15 13:52:15.507[]                 info: 1% of keystores imported. current=24 total=2000 rate=144.00keys/m
Apr-15 13:52:25.509[]                 info: 2% of keystores imported. current=36 total=2000 rate=71.99keys/m
Apr-15 13:52:35.508[]                 info: 3% of keystores imported. current=60 total=2000 rate=144.01keys/m
Apr-15 13:52:45.509[]                 info: 4% of keystores imported. current=72 total=2000 rate=71.99keys/m
Apr-15 13:52:55.509[]                 info: 5% of keystores imported. current=91 total=2000 rate=114.01keys/m
Apr-15 13:53:05.521[]                 info: 5% of keystores imported. current=108 total=2000 rate=101.87keys/m
Apr-15 13:53:15.520[]                 info: 6% of keystores imported. current=122 total=2000 rate=84.01keys/m
Apr-15 13:53:25.521[]                 info: 7% of keystores imported. current=141 total=2000 rate=113.99keys/m
Apr-15 13:53:35.521[]                 info: 8% of keystores imported. current=167 total=2000 rate=156.00keys/m
Apr-15 13:53:45.523[]                 info: 9% of keystores imported. current=178 total=2000 rate=65.99keys/m

Copy link
Member

@nflaig nflaig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Issues found during testing

  • incorrect initial log info: 100% of keystores imported. current=0 total=2000 rate=0.00keys/m, should show 0 instead of 100%
  • keystore cache does not seem to work, the file is created after import but on next startup is removed and keys are loaded from scratch

@wemeetagain
Copy link
Member Author

I think we can do better than 1 decryption per thread at a time.

ChatGPT was helpful here:

--

The memory usage of Scrypt depends on the specific parameters chosen. These parameters include:

N: The CPU/memory cost factor, which must be a power of 2 (e.g., 2^14)
r: The block size (typically 8)
p: The parallelization factor (typically 1)

The memory usage can be roughly calculated with the following formula:

Memory Usage = 128 * N * r * p bytes

For example, using the parameters N=2^14 (16,384), r=8, and p=1, the memory usage would be:

Memory Usage = 128 * 16,384 * 8 * 1 = 16,777,216 bytes, or approximately 16 MB.

--

The default used by the deposit cli is here

Comes out to ~256MB.

So since each thread has its own heap, we can really do quite a bit more decryptions at a time. Ofc taking into account (conservatively) the total amount of free memory available (os.freeMem()).

I'd like to see what this looks like and add some benchmarks here so we can keep track of this.

@wemeetagain
Copy link
Member Author

Issues found during testing

* incorrect initial log `info: 100% of keystores imported. current=0 total=2000 rate=0.00keys/m`, should show 0 instead of 100%

* keystore cache does not seem to work, the file is created after import but on next startup is removed and keys are loaded from scratch

👀 will look into that

@nflaig
Copy link
Member

nflaig commented Apr 16, 2023

incorrect initial log info: 100% of keystores imported. current=0 total=2000 rate=0.00keys/m, should show 0 instead of 100%

this happens due to the fact that processProgress is called before onDecrypt is called to update the current value

const processProgress = (): void => {

It seems like the problem is just how we calculate the percentage here

percentage: current && total ? (current / total) * 100 : 100,

if current=0 (which is the default) then this will always result in 100 due to 0 being falsy

> 0 && 2000 ? (0 / 2000) * 100 : 100
100

cc @nazarhussain

@nazarhussain
Copy link
Contributor

Apr-15 13:48:17.753[] info: 100% of keystores imported. current=0 total=2000 rate=0.00keys/m

I believe there was a test for this case, if missed we should add it.

@wemeetagain
Copy link
Member Author

@nflaig curious to see what your benchmark shows now after the latest commit

@nflaig
Copy link
Member

nflaig commented Apr 18, 2023

curious to see what your benchmark shows now after the latest commit

It does not look great on my machine (or at least not better than before), it is overloaded which I think just causes bad overall performance (note: I am also running geth and several lodestar instance on that host)

image

This is the result of calculateThreadpoolConcurrency:

numWorkers: 12
jobConcurrency: 8

Those took around 30GB of RAM and put all CPUs under full load (also happened before) as you can see above, probably need to adjust the params somehow to be a bit more conservative.

Just as reference, htop from previously tested state, commit 0096e304ac3333a291d28ff2f40b19deb948ff88:

image

It does look like memory utilization is better but does not seem to improve performance, I think CPU is just the bottleneck?

This is the log output from latest commit, it looks like decrypt batches take too long now

Apr-18 08:22:21.205[]                 info: 0% of keystores imported. current=0 total=2000 rate=0.00keys/m
Apr-18 08:22:31.213[]                 info: 0% of keystores imported. current=0 total=2000 rate=0.00keys/m
Apr-18 08:22:41.215[]                 info: 0% of keystores imported. current=0 total=2000 rate=0.00keys/m
Apr-18 08:22:51.217[]                 info: 0% of keystores imported. current=0 total=2000 rate=0.00keys/m
Apr-18 08:23:01.222[]                 info: 0% of keystores imported. current=0 total=2000 rate=0.00keys/m
Apr-18 08:23:11.222[]                 info: 4% of keystores imported. current=88 total=2000 rate=528.00keys/m
Apr-18 08:23:21.223[]                 info: 4% of keystores imported. current=88 total=2000 rate=0.00keys/m
Apr-18 08:23:31.227[]                 info: 4% of keystores imported. current=88 total=2000 rate=0.00keys/m
Apr-18 08:23:41.229[]                 info: 4% of keystores imported. current=88 total=2000 rate=0.00keys/m
Apr-18 08:23:51.233[]                 info: 4% of keystores imported. current=88 total=2000 rate=0.00keys/m
Apr-18 08:24:01.245[]                 info: 4% of keystores imported. current=88 total=2000 rate=0.00keys/m
Apr-18 08:24:11.245[]                 info: 8% of keystores imported. current=152 total=2000 rate=384.00keys/m
Apr-18 08:24:21.253[]                 info: 7% of keystores imported. current=136 total=2000 rate=-95.92keys/m
Apr-18 08:24:31.265[]                 info: 7% of keystores imported. current=136 total=2000 rate=0.00keys/m
Apr-18 08:24:41.273[]                 info: 7% of keystores imported. current=136 total=2000 rate=0.00keys/m
Apr-18 08:24:51.281[]                 info: 7% of keystores imported. current=136 total=2000 rate=0.00keys/m
Apr-18 08:25:01.289[]                 info: 7% of keystores imported. current=136 total=2000 rate=0.00keys/m
Apr-18 08:25:11.299[]                 info: 12% of keystores imported. current=233 total=2000 rate=581.42keys/m
Apr-18 08:25:21.301[]                 info: 14% of keystores imported. current=288 total=2000 rate=329.93keys/m
Apr-18 08:25:31.309[]                 info: 14% of keystores imported. current=288 total=2000 rate=0.00keys/m
Apr-18 08:25:41.317[]                 info: 14% of keystores imported. current=288 total=2000 rate=0.00keys/m
Apr-18 08:25:51.325[]                 info: 14% of keystores imported. current=288 total=2000 rate=0.00keys/m
Apr-18 08:26:01.333[]                 info: 14% of keystores imported. current=288 total=2000 rate=0.00keys/m

Some logs show negative rate for keys/m, likely related to how onDecrypt is called, the rate will always be a bit inaccurate.

Apr-18 08:24:21.253[]                 info: 7% of keystores imported. current=136 total=2000 rate=-95.92keys/m

Problems mentioned earlier are fixed now

  • loading from keystore cache ✅
  • initial log shows 0% ✅

@wemeetagain
Copy link
Member Author

It does not look great

Ok will revert the last commit so we just have a job concurrency of 1.

issues around force vs ignoreLockFile

Lets keep the top-level CLI flag for now (so as to not introduce a breaking change to users) and just use ignoreLockFile downstream.

And explicitly assign args when calling decryptKeystoreDefinitions to get type safety.

Copy link
Member

@nflaig nflaig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, just few final remarks after more in-depth review. Everything works as expected now and the performance boost is pretty amazing, on average it is now 5x faster and in combination with the keystore cache I think the UX for big operators is pretty good now

const pool = Pool(
() =>
spawn<DecryptKeystoreWorkerAPI>(new Worker("./worker.js"), {
// A Lodestar Node may do very expensive task at start blocking the event loop and causing
Copy link
Member

@nflaig nflaig Apr 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment seems a bit out of place, should probably be revised. In case of loading the keys there is really nothing else going on as this is a blocking operation

if (typeof navigator !== "undefined") {
defaultPoolSize = navigator.hardwareConcurrency ?? 4;
} else {
defaultPoolSize = (await import("node:os")).cpus().length;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

noting here that we should change this line to use os.availableParallelism() once we upgrade to node v20

/**
* Decrypt a single keystore definition, returning the secret key as a Uint8Array
*
* NOTE: This is memory-intensive process, since decrypting the keystore involves running a key derivation function (either pbkdf2 or scrypt)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this note is correct but based on the performance tests it looks like CPU is rather the bottleneck that prevents us from increasing the job concurrency to further speed up the key import

With job concurrency of 1

image

packages/cli/test/utils/keymanagerTestRunners.ts Outdated Show resolved Hide resolved
@wemeetagain wemeetagain merged commit ffc0d16 into unstable Apr 20, 2023
@wemeetagain wemeetagain deleted the cayman/parallel-keystore-decrypt branch April 20, 2023 15:13
@nflaig
Copy link
Member

nflaig commented Apr 28, 2023

@wemeetagain I noticed that we have a similar logic in the keymanager API. Is supporting thread pool decryption something we should consider there as well or do we expect that this API will generally be used to import less overall keys?

const secretKey = bls.SecretKey.fromBytes(await keystore.decrypt(password));

@wemeetagain
Copy link
Member Author

Probably wouldn't hurt to support decryption in parallel here. 🤷

@dapplion
Copy link
Contributor

@wemeetagain I noticed that we have a similar logic in the keymanager API. Is supporting thread pool decryption something we should consider there as well or do we expect that this API will generally be used to import less overall keys?

const secretKey = bls.SecretKey.fromBytes(await keystore.decrypt(password));

Yes definitely, that should be done in a thread and limit with a queue to prevent killing the vc process

@wemeetagain
Copy link
Member Author

🎉 This PR is included in v1.8.0 🎉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Optimize UX loading +100 keystores on vc
4 participants