-
-
Notifications
You must be signed in to change notification settings - Fork 291
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: offline error message when node is shutting down #5797
Conversation
Performance Report✔️ no performance regression detected Full benchmark results
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As the EL status changes rely on error.code
returned from fetch, we might wanna merge #5811 first to make sure EL status still works and adapt error codes here in this PR if required.
I added a bunch of tests in the PR to make sure error codes still work as expected, see fetch.test.ts.
Based on codes defined here
lodestar/packages/beacon-node/src/execution/engine/utils.ts
Lines 39 to 40 in 59ec5fe
export const HTTP_FATAL_ERROR_CODES = ["ECONNREFUSED", "ENOTFOUND", "EAI_AGAIN"]; | |
export const HTTP_CONNECTION_ERROR_CODES = ["ECONNRESET", "ECONNABORTED"]; |
It seems like EAI_AGAIN
, ECONNRESET
, and ECONNABORTED
are not covered, @nazarhussain if by any chance you know how to produce them I would add them to the test cases.
import {ErrorJsonRpcResponse, HttpRpcError} from "../../../../src/eth1/provider/jsonRpcHttpClient.js"; | ||
|
||
describe("execution/engine/utils", () => { | ||
describe("getExecutionEngineState", () => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These tests are great!
As this relies on error.code
returned from fetch, we might wanna merge #5811 first to make sure EL status is still works.
Tests would not catch if error.code
returned from fetch changes, will have to cover all cases in fetch.test.ts to make sure we can rely on those error codes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are very standard error codes must comply with any fetch
implementation. Testing regressively native implementations is not suggested. But for node/browser compatibility of fetch we should have some basic tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I doubt the error codes will ever change, I just tested different cases in regard to custom FetchError
added in #5811 but these are rather meant to test if FetchError
works as expected.
Since we use native fetch now it is even more unlikely we get unexpected errors. As per my understanding the codes you are checking right now for EL offline/auth failed should be alright and it is anyhow hard produce specific error codes (#5811 (comment)).
For those e need to create a custom server implementation which reset the client connections. I don't think it's important enough to mock a server for now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just some minor remarks, would suggest another review from someone else, maybe @g11tech who is more familiar with this code.
} | ||
} | ||
|
||
function getExecutionEngineStateForPayloadError( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
function getExecutionEngineStateForPayloadError( | |
function getELStateFromPayloadError( |
) | ||
// If there are errors by EL like connection refused, internal error, they need to be | ||
// treated separate from being INVALID. For now, just pass the error upstream. | ||
.catch((e: Error): EngineApiRpcReturnTypes[typeof method] => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this error handling is required upstream in processBlocks (as well as to not downgrade the peer in case of UNAVAILABLE)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
restore the previous error hadling in new payload
65d84c2
to
9ade9c4
Compare
@@ -30,7 +30,7 @@ export enum ExecutePayloadStatus { | |||
UNSAFE_OPTIMISTIC_STATUS = "UNSAFE_OPTIMISTIC_STATUS", | |||
} | |||
|
|||
export enum ExecutionEngineState { | |||
export enum ExecutionState { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i am thinking if ExecutionState
might cause confusion over reference to actual execution state, in that case original ExecutionEngineState
is preferable
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I use ExecutionEngineState
, then would prefer to use the same vocabulary/wording to other places for variables and functions. That will lead to almost similar to what we had earlier. Will it be ok?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
Motivation
There should not be a false error log.
Description
Currently when node is shutting down, it shows an error message that EL engine is turned offline, which is not the valid case.
Steps to test or reproduce