[Fleet] Improve data streams API efficiency #116428

hop-dev · 2021-10-27T11:25:02Z

Kibana version:

7.15.0, 7.16.0, master

Description of the problem including expected versus actual behavior:

Originally pointed out by @joshdover here:

The data stream view can be quite slow to load when there are a lot of streams. We currently get all data streams in one request without pagination and perform an aggregation per data stream.

This issue is to look into ways of improving the performance, current options discussed:

1. Using the data stream name to extract the type, dataset and namespace instead of aggregating

Currently, there is no guarantee that the constant_keyword values in the data match the data stream name. @ruflin suggested we could look at putting a feature request for elastic to validate the constant keywords against the data stream name allowing us to rely on this link.

However, we are now looking at adding another aggregation as part of elastic/integrations#768 so there may no longer be a big efficiency gain to be found here.

2. Introducing pagination

We could introduce pagination to limit the work we do, however there would be some challenges:

the data stream stats API doesn't support pagination , but it can take comma separated names
we currently sort in-memory based on last_activity_ms from the stats API, however this is about to change in "windows.powershell_operational" logs are generated for past timestamp under Data Streams on 7.12.0 Snapshot Kibana. integrations#768 when we will sort by event.ingested. We would still have to get the values for all data streams and then sort in memory I believe, so there may not be a massive performance gain

3. Combine individual aggregations into one aggregation
I am not sure this is possible. We could find a way to use filters and sub aggregations to get the namespace, dataset and type for each data stream in one query. We would need to be able to distinguish each data stream using a filter query I believe and the only way to distinguish them would be to use the values we are querying for!

Steps to reproduce:

Setup Fleet & Fleet Server
Create an agent policy with many integrations to create many data streams
Go to /app/fleet/data-streams
Note that the page can be quite slow to load

The text was updated successfully, but these errors were encountered:

elasticmachine · 2021-10-27T11:25:22Z

Pinging @elastic/fleet (Team:Fleet)

joshdover · 2021-10-27T13:24:01Z

@elastic/kibana-stack-management have you all solved optimizing your usage of the Data Streams stats API? I noticed that by default, stats are excluded from your Data Streams UI (you have to switch on a toggle in the top right). Curious if there's any history behind this decision and if we should also consider excluding stats by default or removing them from the list view entirely.

cjcenizal · 2021-10-27T14:23:05Z

@joshdover We haven't had an opportunity to revisit that functionality since it was first implemented. Because loading the data stream stats requires hitting a separate API (https://github.com/elastic/kibana/pull/75107/files#diff-0db7f035e2e41be22bac202848c325fabf209f626b8a934d09cce5e9e074941bR34), and I think the stats themselves might take awhile to fetch, it might take awhile to retrieve the data streams along with their stats. I recommend pinging the ES Data Management team for more detailed and up-to-date info.

joshdover · 2022-01-26T11:29:07Z

This continues to be a problem for what I expect to be most Fleet customers. In my test cluster, I have ~60 data streams, with ~300 backing indices and the request to GET /api/fleet/data_streams is timing out on Kibana after 2 minutes, resulting in a 502 error in Cloud, likely from the proxy layer: backend closed connection.

I don't think this is anywhere close to large amount of data (I'm only ingesting data from ~6 integrations from 2 laptops that aren't even always in use).

@jen-huang I'm going to add this to our iteration board to look at in the next testing cycle. I think we should try to get a fix in for the 7.x series as well.

joshdover · 2022-01-26T15:42:06Z

I did some further digging in our production data here and I'm seeing about 2.5% of customers who attempted to use this page were affected by this bug in the last 7 days. I haven't dug further, but my guess is this affects our largest, most mature adopters of Fleet, an important segment. While the incidence rate isn't incredibly high, 97.5% isn't exactly a great SLA. I think prioritizing this is the right call.

thunderwood19 · 2022-02-22T15:04:54Z

@joshdover

Any update on this? I am one of the effected customers who relies heavily on fleet. If I can help with any logs/testing, I would be more than happy to!

joshdover · 2022-02-24T11:59:07Z

Hi @thunderwood19 we have this prioritized to be worked on soon but have not yet dug in further. In the meantime, I do suggest using the UI in Stack Management > Index Management > Data streams.

Related to this, in #126067 it was discovered that the user needs to have access to the manage cluster privilege in order to access the Data stream stats API. This limits the usability of this page now that we're allowing non-superusers to use Fleet.

I think this requirement gives us further reason to explore decoupling the request to the Data stream stats API from fetching the list of data streams. If we loaded the stats separately, we may be able to show the main list quicker while also providing a more progressive UI for users with lower privileges.

joshdover · 2022-03-18T13:07:58Z

@thunderwood19 Have you had a chance to test this on 8.1? We've made some improvements and I'm no longer seeing this issue as widespread in our production data or in my personal cluster on Elastic Cloud.

thunderwood19 · 2022-03-18T13:29:58Z

@thunderwood19 Have you had a chance to test this on 8.1? We've made some improvements and I'm no longer seeing this issue as widespread in our production data or in my personal cluster on Elastic Cloud.

Yep! I let my support know yesterday, I can see the data streams via Fleet gui just fine now on 8.1.0.

joshdover · 2022-03-18T14:21:45Z

Fantastic to hear, @jen-huang I'm going to de-prioritize this for now.

joshdover · 2022-05-05T12:50:01Z

Some improvements are being made in #130973 to switch to use the terms enum API instead of aggregations for some of the calculations which increases the request count, but should have a big improvement on overall perf.

Pagination would still be welcome to avoid the n+1 query problem we have right now

nimarezainia · 2022-07-05T21:44:36Z

@joshdover what remains for us to do in this regard? should we track this for 8.5 (for fleet scaling)

joshdover · 2022-07-08T13:19:48Z

I think we mostly need to do the pagination work at this point. I don't think it's super high priority right now though. It doesn't affect control plane scaling, mostly data plane.

hop-dev added the Team:Fleet Team label for Observability Data Collection Fleet team label Oct 27, 2021

hop-dev mentioned this issue Oct 27, 2021

[Fleet] Data streams UI breaks with large numbers of backing indices #114972

Closed

joshdover changed the title ~~[Fleet] Improve data streams API efficiencey~~ [Fleet] Improve data streams API efficiency Jan 26, 2022

joshdover added bug Fixes for quality problems that affect the customer experience impact:medium Addressing this issue will have a medium level of impact on the quality/strength of our product. labels Jan 26, 2022

jen-huang mentioned this issue Jul 21, 2022

[Fleet] Prepare Fleet API for promotion to GA #123150

Closed

21 tasks

kpollich mentioned this issue May 12, 2023

[Fleet] Add granular APM spans to Fleet's GET /data_streams API #157518

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Fleet] Improve data streams API efficiency #116428

[Fleet] Improve data streams API efficiency #116428

hop-dev commented Oct 27, 2021 •

edited

Loading

elasticmachine commented Oct 27, 2021

joshdover commented Oct 27, 2021

cjcenizal commented Oct 27, 2021

joshdover commented Jan 26, 2022

joshdover commented Jan 26, 2022 •

edited

Loading

thunderwood19 commented Feb 22, 2022

joshdover commented Feb 24, 2022

joshdover commented Mar 18, 2022

thunderwood19 commented Mar 18, 2022

joshdover commented Mar 18, 2022

joshdover commented May 5, 2022

nimarezainia commented Jul 5, 2022

joshdover commented Jul 8, 2022

[Fleet] Improve data streams API efficiency #116428

[Fleet] Improve data streams API efficiency #116428

Comments

hop-dev commented Oct 27, 2021 • edited Loading

Description of the problem including expected versus actual behavior:

Steps to reproduce:

elasticmachine commented Oct 27, 2021

joshdover commented Oct 27, 2021

cjcenizal commented Oct 27, 2021

joshdover commented Jan 26, 2022

joshdover commented Jan 26, 2022 • edited Loading

thunderwood19 commented Feb 22, 2022

joshdover commented Feb 24, 2022

joshdover commented Mar 18, 2022

thunderwood19 commented Mar 18, 2022

joshdover commented Mar 18, 2022

joshdover commented May 5, 2022

nimarezainia commented Jul 5, 2022

joshdover commented Jul 8, 2022

hop-dev commented Oct 27, 2021 •

edited

Loading

joshdover commented Jan 26, 2022 •

edited

Loading