-
Notifications
You must be signed in to change notification settings - Fork 8.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Stack Monitoring] More detailed event loop diagnostics #134452
Comments
Pinging @elastic/kibana-core (Team:Core) |
Putting the "Platform Observability" label on this. When we decide to add this, it may be a metric from platform observability or added to the existing Stack Monitoring app. |
Specifically for testing performance optimisations looking at the maximum recorded event loop delay is often more useful than the mean since a single request causing a > 1s event loop delay is a problem even if the mean over 5s would be a lot lower. |
There does not seem to be an easy way to pinpoint the line in the code that blocked the event loop. However, we can use our existing monitoring infrastructure to try to correlate event loop delays with ongoing / recent requests. We can leverage the information stored in our overview cluster:
Thus, whenever we detect a substantial event loop delay on a given project, we can search the proxy logs and list the requests that were taking place around that time. If Kibana has been blocked for e.g. 10 seconds, a request must exist, which took at least 10 seconds to respond. Whilst it does not directly pinpoint the line in the code that caused the delay, it can constitute a good starting point to investigate and dispatch to the right team. This is the goal of the newly introduced [Serverless] Event Loop Delays dashboard (see PR). Also, in line with recent discussions, and with Rudolf's last comment, I am updating the |
## Summary Part of elastic#134452 By using `mean` we're missing out on relevant spikes in event loop delays.
want to flag that we can already correlate event loop delay with requests via |
There's two ways to get more details about Kibana's event loop:
metrics.ops
loggerThe metrics.ops logger will log the following fields:
It would be useful if both diagnostics contained all the values in the
IntervalHistogram
type:The text was updated successfully, but these errors were encountered: