Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tweak timeouts for fetching validators from BN #3237

Open
michaelsproul opened this issue Aug 22, 2024 · 0 comments
Open

Tweak timeouts for fetching validators from BN #3237

michaelsproul opened this issue Aug 22, 2024 · 0 comments
Labels
protocol Protocol Team tickets

Comments

@michaelsproul
Copy link

🐞 Bug Report

Description

Based on reports from Stakely, Charon sometimes times out when requesting validator info from Lighthouse:

14:51:30.489 ERRO vapi Validator api 5xx response: fetching non-cached validators from BN: beacon api validators: http request timeout: context deadline exceeded {"status_code": 500, "message": "Internal server error", "duration": "2.001067392s", "label": "validators", "url": "http://beaconnode:5051/eth/v1/beacon/states/head/validators?id=XXX", "method": "Get", "vapi_endpoint": "get_validator"}
app/eth2wrap/eth2wrap.go:206 .wrapError
app/eth2wrap/eth2wrap_gen.go:648 .Validators
core/validatorapi/validatorapi.go:1021 .Validators
core/validatorapi/router.go:1372 .getValidatorsByID
core/validatorapi/router.go:388 .func7
core/validatorapi/router.go:311 .func1
app/app.go:954 .func2

Part of the reason for this is that Lighthouse considers these request low-priority. I have opened a PR on Lighthouse to change this:

However, in the meantime I think there are probably some changes charon could make to make this more reliable.

The error log seems to show a request for /eth/v1/beacon/states/head/validators?id=XXX with a single ID. It's possible that timeouts could be avoided by batching multiple IDs in one request. Based on my reading of go-eth2-client, it already has the ability to use the more efficient POST method which can handle an unbounded number of pubkey requests in one go:

https://github.com/attestantio/go-eth2-client/blob/490d07a8e0c258f4528d3039109696679d79787d/http/validators.go#L81

Further, it could be good to give charon users the ability to adjust the timeouts used for communicating with the beacon node. I couldn't find in the charon code where the timeout is set, but it seems to be 2s based on the error. On beacon nodes that are struggling under load (or heavily deprioritising charon's requests as in the case of Lighthouse) a fixed timeout that is too short is just going to lead to indefinitely repeating requests. Giving users the ability to lengthen this timeout could mitigate this. Dynamic timeouts a la exponential backoff could also be an option, but are more complicated to implement.

Has this worked before in a previous version?

Not sure.

🔬 Minimal Reproduction

Run charon with Lighthouse and 1000+ inactive validator keys.

🔥 Error

See above.

🌍 Your Environment

Not sure. I can check with Stakely.

@github-actions github-actions bot added the protocol Protocol Team tickets label Aug 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
protocol Protocol Team tickets
Projects
None yet
Development

No branches or pull requests

1 participant