-
Notifications
You must be signed in to change notification settings - Fork 144
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support longer checkin intervals when the agent status has not changed #2257
Comments
How will the action queue for scheduled actions be checked with a longer poll time? EDIT: I've added a separate timer to dispatch scheduled actions in a managed agent #2344 |
Changed the description to "Support longer checkin intervals when the agent status has not changed" since we aren't going to increase the default timeout when this issue closes. |
after a quick clarification with @cmacknz :
|
Pinging @elastic/elastic-agent-control-plane (Team:Elastic-Agent-Control-Plane) |
We've been doing scale testing over the past few months using a ~30 minute long poll duration (rather than current default of 5m) and we are seeing much better results for very large clusters.
We're now ready to make this the default setting for Fleet Server and Agent. These changes can happen independently and do not necessarily need to land in the same release, though it would be preferred. The corresponding Fleet Server changes are in tracked in:
There is some additional complexity to changing this on the Agent side, as we currently have an issue where Agent will not re-checkin with Fleet Server when it's health status changes. If we update the long polling interval to 30 minutes, this could result in the agent status in the UI being up to 30 minutes stale, rather than only 5 minutes stale.
To avoid this kind of regression, we need to update Agent to also cancel the current checkin and start a new one when status changes, however we will cap the frequency of this to 5 minutes to avoid any extra load on large Fleets. We will investigate increasing the frequency that Agent updates this further separately from this change, see #1946.
Tasks
The client side timeout in Agent should be longer than Fleet Server (28m) or the proxy's timeout (30m 20s). We'll keep a similar buffer here at 5 minutes over what the proxy will timeout at and timeout at 35 minutes from the client.
fleet.timeout
to 35 minutes:elastic-agent/internal/pkg/remote/config.go
Line 49 in c097697
The text was updated successfully, but these errors were encountered: