Implement new logging policy #5359

philknows · 2023-04-13T06:13:25Z

This issue is to track the implementation of #5299 into Lodestar's logging. After the Shapella hard fork, we experienced a consistent, ignorable error messages relating to the QUEUE_ERROR_QUEUE_MAX_LENGTH which created a poor logging UX to users of Lodestar during the fork transition due to the flow of large batches of BLStoExecution messages. This is a wake up call for us to urgently implement the new logging policy for the next attainable release to fix our poor logging UX.

There are many errors we would like to fix listed in #4400 which should now also abide by the new policy.

The text was updated successfully, but these errors were encountered:

nflaig · 2023-04-23T18:41:46Z

Potential candidate to move from warn to debug as this log happens quite frequently even if there is nothing wrong with the node itself. This was mentioned by a user in discord but I also see this log a lot and don't know what to do with it.

lodestar/packages/beacon-node/src/network/peers/peerManager.ts

Line 609 in ad0c58f

    
           this.logger.warn("Unclean disconnect", {peer: prettyPrintPeerId(peer)}, e as Error);

warn: Unclean disconnect peer=16...A3GmAv Socket close timeout
CodeError: Socket close timeout
    at Timeout._onTimeout (file:///usr/app/node_modules/@libp2p/tcp/src/socket-to-conn.ts:143:28)
    at listOnTimeout (node:internal/timers:569:17)
    at processTimers (node:internal/timers:512:7)

Probably someone with more experience in the networking stack needs to assess this but I don't see how this log would be actionable to an end user and if a user is not able to get sufficient peers then the debug logs should be looked at.

nflaig · 2023-04-30T13:37:45Z

These should be less verbose if it is due to Service Unavailable: Node is syncing, we do not need the stack trace and should consider logging as warning as this is expected but user should still be aware that something is not yet working

Error: Error on getProposerDuties - Service Unavailable: Node is syncing - headSlot 5529152 currentSlot 5529417
    at Function.assert (file:///usr/app/packages/api/src/utils/client/httpClient.ts:44:13)
    at BlockDutiesService.pollBeaconProposers (file:///usr/app/packages/validator/src/services/blockDuties.ts:155:14)
    at processTicksAndRejections (node:internal/process/task_queues:95:5)
    at BlockDutiesService.pollBeaconProposersAndNotify (file:///usr/app/packages/validator/src/services/blockDuties.ts:129:5)
    at BlockDutiesService.runBlockDutiesTask (file:///usr/app/packages/validator/src/services/blockDuties.ts:87:9)
    at Clock.runAtMostEvery (file:///usr/app/packages/validator/src/util/clock.ts:96:7)
Apr-30 13:23:24.418[]                error: Failed to download attester duties epoch=172794 Failed to obtain attester duty - Service Unavailable: Node is syncing - headSlot 5529152 currentSlot 5529417
Error: Failed to obtain attester duty - Service Unavailable: Node is syncing - headSlot 5529152 currentSlot 5529417
    at Function.assert (file:///usr/app/packages/api/src/utils/client/httpClient.ts:44:13)
    at AttestationDutiesService.pollBeaconAttestersForEpoch (file:///usr/app/packages/validator/src/services/attestationDuties.ts:214:14)
    at processTicksAndRejections (node:internal/process/task_queues:95:5)
    at AttestationDutiesService.pollBeaconAttesters (file:///usr/app/packages/validator/src/services/attestationDuties.ts:163:7)
    at async Promise.all (index 1)
    at AttestationDutiesService.runDutiesTasks (file:///usr/app/packages/validator/src/services/attestationDuties.ts:124:5)
    at Clock.runAtMostEvery (file:///usr/app/packages/validator/src/util/clock.ts:96:7)
Apr-30 13:23:24.423[]                error: Failed to download SyncDuties epoch=172794 Failed to obtain SyncDuties - Service Unavailable: Node is syncing - headSlot 5529152 currentSlot 5529417
Error: Failed to obtain SyncDuties - Service Unavailable: Node is syncing - headSlot 5529152 currentSlot 5529417
    at Function.assert (file:///usr/app/packages/api/src/utils/client/httpClient.ts:44:13)
    at SyncCommitteeDutiesService.pollSyncCommitteesForEpoch (file:///usr/app/packages/validator/src/services/syncCommitteeDuties.ts:233:14)
    at processTicksAndRejections (node:internal/process/task_queues:95:5)
    at SyncCommitteeDutiesService.pollSyncCommittees (file:///usr/app/packages/validator/src/services/syncCommitteeDuties.ts:178:7)
    at async Promise.all (index 1)
    at SyncCommitteeDutiesService.runDutiesTasks (file:///usr/app/packages/validator/src/services/syncCommitteeDuties.ts:140:5)
    at Clock.runAtMostEvery (file:///usr/app/packages/validator/src/util/clock.ts:96:7)

dapplion · 2023-05-02T01:00:39Z

There's a not so great attempt at controlling those type of error logs, see

lodestar/packages/validator/src/util/logger.ts

Line 43 in 1e4ca97

isSyncing(e: Error) {

but should be improved and applied to all paths that query the API

nflaig · 2023-05-23T12:25:12Z

Another candidate which we might wanna rework.

error: Error on pollBeaconProposers  request to http://localhost:9596/eth/v1/validator/duties/proposer/177958 failed, reason: connect ECONNREFUSED 127.0.0.1:9596
FetchError: request to http://localhost:9596/eth/v1/validator/duties/proposer/177958 failed, reason: connect ECONNREFUSED 127.0.0.1:9596
    at ClientRequest.<anonymous> (/home/devops/goerli/lodestar/node_modules/node-fetch/lib/index.js:1461:11)
    at ClientRequest.emit (node:events:513:28)
    at Socket.socketErrorListener (node:_http_client:502:9)
    at Socket.emit (node:events:513:28)
    at emitErrorNT (node:internal/streams/destroy:151:8)
    at emitErrorCloseNT (node:internal/streams/destroy:116:3)
    at processTicksAndRejections (node:internal/process/task_queues:82:21)

Right now, this is quite noisy as similar errors will be spammed from different internal services of the VC.

I think we could just handle this similar to if node is syncing and just log a single line error (no stack trace) which states that node is unreachable/offline and just log the error message.

philknows · 2023-05-23T12:59:17Z

Agreed, the ECONNREFUSED with the IP address is a pretty good explanation already and clearly explains what the problem is for the user to action.

nflaig · 2023-06-26T08:36:33Z

We create those two logs when block error happens

Jun-25 23:09:25.255[chain]           debug: Block error slot=5935547 code=BLOCK_ERROR_ALREADY_KNOWN, root=0xdc3da9c9eaf50c1fbb2a140e006dc6f8007feaee195eb4ed8ea7f18c9ff2dbb8
Error: BLOCK_ERROR_ALREADY_KNOWN
    at verifyBlocksSanityChecks (file:///home/devops/goerli/lodestar/packages/beacon-node/src/chain/blocks/verifyBlocksSanityChecks.ts:102:15)
    at BeaconChain.processBlocks (file:///home/devops/goerli/lodestar/packages/beacon-node/src/chain/blocks/index.ts:61:82)
    at JobItemQueue.BlockProcessor.jobQueue.JobItemQueue.maxLength [as itemProcessor] (file:///home/devops/goerli/lodestar/packages/beacon-node/src/chain/blocks/index.ts:27:30)
    at JobItemQueue.runJob (file:///home/devops/goerli/lodestar/packages/beacon-node/src/util/queue/itemQueue.ts:101:33)
    at JobItemQueue.runJob (file:///home/devops/goerli/lodestar/packages/beacon-node/src/util/queue/itemQueue.ts:118:15)
    at runNextTicks (node:internal/process/task_queues:60:5)
    at listOnTimeout (node:internal/timers:540:9)
    at processTimers (node:internal/timers:514:7)
Jun-25 23:09:25.255[network]         error: Error receiving block slot=5935547, peer=16Uiu2HAmR7yebF8ip58VVxCcbnSv27WFys9osJBncAi15wCSdGdk code=BLOCK_ERROR_ALREADY_KNOWN, root=0xdc3da9c9eaf50c1fbb2a140e006dc6f8007feaee195eb4ed8ea7f18c9ff2dbb8
Error: BLOCK_ERROR_ALREADY_KNOWN
    at verifyBlocksSanityChecks (file:///home/devops/goerli/lodestar/packages/beacon-node/src/chain/blocks/verifyBlocksSanityChecks.ts:102:15)
    at BeaconChain.processBlocks (file:///home/devops/goerli/lodestar/packages/beacon-node/src/chain/blocks/index.ts:61:82)
    at JobItemQueue.BlockProcessor.jobQueue.JobItemQueue.maxLength [as itemProcessor] (file:///home/devops/goerli/lodestar/packages/beacon-node/src/chain/blocks/index.ts:27:30)
    at JobItemQueue.runJob (file:///home/devops/goerli/lodestar/packages/beacon-node/src/util/queue/itemQueue.ts:101:33)
    at JobItemQueue.runJob (file:///home/devops/goerli/lodestar/packages/beacon-node/src/util/queue/itemQueue.ts:118:15)
    at runNextTicks (node:internal/process/task_queues:60:5)
    at listOnTimeout (node:internal/timers:540:9)
    at processTimers (node:internal/timers:514:7)

The error log might look scary to users and raise concerns (see discord). AFAIK those errors are kinda expected to happen, we could consider logging all those to debug or just for a subset based on error code.

lodestar/packages/beacon-node/src/network/processor/gossipHandlers.ts

Line 165 in f5d12a8

    
           logger.error("Error receiving block", {slot: signedBlock.message.slot, peer: peerIdStr}, e as Error);

philknows · 2023-11-07T19:24:07Z

duplicate of Align existing logger usage to new policy #5320

philknows added prio-high Resolve issues as soon as possible. scope-logging Issue about logs: hygiene, format issues, improvements. scope-ux Issues for CLI UX or general consumer UX. labels Apr 13, 2023

philknows added this to the v1.9.0 milestone Apr 13, 2023

philknows mentioned this issue Apr 13, 2023

Establish logging pattern for non-critical failures #5288

Closed

philknows mentioned this issue May 3, 2023

DNM: logging ux changes based on new logging policy #5457

Closed

nflaig mentioned this issue May 5, 2023

fix(validator): log api errors as warn if node is syncing #5467

Merged

nflaig mentioned this issue May 25, 2023

refactor(network): move unclean disconnect log to debug #5541

Merged

philknows modified the milestones: v1.9.0, v1.10.0 May 29, 2023

philknows removed this from the v1.10.0 milestone Jul 13, 2023

philknows mentioned this issue Jul 31, 2023

Better error handling for no builder bids on /eth/v1/builder/header #5826

Closed

philknows closed this as completed Nov 7, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement new logging policy #5359

Implement new logging policy #5359

philknows commented Apr 13, 2023 •

edited

Loading

nflaig commented Apr 23, 2023 •

edited

Loading

nflaig commented Apr 30, 2023

dapplion commented May 2, 2023

nflaig commented May 23, 2023

philknows commented May 23, 2023

nflaig commented Jun 26, 2023

philknows commented Nov 7, 2023

Implement new logging policy #5359

Implement new logging policy #5359

Comments

philknows commented Apr 13, 2023 • edited Loading

nflaig commented Apr 23, 2023 • edited Loading

nflaig commented Apr 30, 2023

dapplion commented May 2, 2023

nflaig commented May 23, 2023

philknows commented May 23, 2023

nflaig commented Jun 26, 2023

philknows commented Nov 7, 2023

philknows commented Apr 13, 2023 •

edited

Loading

nflaig commented Apr 23, 2023 •

edited

Loading