Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RefreshStore can cause performance degradation on ES data nodes #5777

Open
tanasegabriel opened this issue Mar 15, 2019 · 0 comments
Open

RefreshStore can cause performance degradation on ES data nodes #5777

tanasegabriel opened this issue Mar 15, 2019 · 0 comments

Comments

@tanasegabriel
Copy link

Current Behavior

Graylog has this neat feature where it can "stream" logs in real time, with a specified refresh interval:
image

If this is being used against a query that returns a lot of results, a lot of search contexts are opened on each of the nodes that are being queried. If the refresh interval is really small, these might not get cleared before the query is repeated, putting a lot of stress on the nodes.

We ran a query that returned 6M results every two seconds and this is the effect on our 35 data nodes ES cluster:

image
image

This correlates with massive CPU spikes across all of the data nodes:
image

Elastic acknowledged this issue and added a soft limit for the maximum number of open search contexts. However, this was only released in version 6.6.0.

Expected Behavior

The "streaming" of logs should not cause performance degradation on ES's side.

Possible Solution

Graylog should limit the number of search contexts that it opens if the previous ones were not closed.

It would be great if Graylog would allow disabling / overriding the option to "stream" logs from the config file.

Alternative solutions would be allowing to disable that option from Search Configuration settings in the UI, or even change the time interval options from there (this is currently possible for Surrounding Timeranges and Relative Timeranges)

Steps to Reproduce

  1. Create a "fat" index set containing at least 6M log lines and attach a stream to it.
  2. Run a query that returns all of the 6M log lines every 2 seconds.
  3. Observe the number of open search contexts and the resource usage on ElasticSearch

Context

Graylog seems to be causing damage to ES if there's a pattern of high usage and abusive queries.
There's no way to mitigate this unless we update ElasticSearch, but even then, we'll only be able to limit the number of open search contexts, which means ES will error out when this is reached.

Your Environment

  • Graylog Version: 2.5.0
  • Elasticsearch Version: 6.3.0
  • MongoDB Version: 4.0.6
  • Operating System: server - Amazon Linux, client - macOS Mojave
  • Browser version: Google Chrome | 72.0.3626.121 (Official Build) (64-bit)
@tanasegabriel tanasegabriel changed the title RefreshState can cause performance degradation on ES data nodes RefreshStore can cause performance degradation on ES data nodes Mar 15, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants