Asynchronous processor and exporter for query insights data #11296

ansjcy · 2023-11-21T21:41:40Z

Is your feature request related to a problem? Please describe.
we haven't provide a efficient way to export the “top queries with latency” data collected in #11295 , a generic, asynchronous processor and exporter should be created to handle those data for query insight.
(Parent RFC: #11186 )

Describe the solution you'd like
As part of #11186 , we need to implement an asynchronous processor and exporter to handle the data for query insight features. At the first iteration, the processor should be able to handle query latency data asynchronously and enqueue to the aggregator implemented in #11295 , and also export the aggregated data to an OpenSearch index. This framework can potentially be used by other query insight features in the future to avoid adding blocking logic in core search path.

Describe alternatives you've considered
In the future, we can potentially leverage the OPTL collector when it becomes available. With that we can send traces/spans to OPTL collectors, where the collector takes responsibility for necessary calculations, aggregations and export. This strategy could further reduce the impact on the OpenSearch process.

Additional context
Please refer to RFC: #11186

ansjcy · 2024-02-09T19:53:46Z

The processor is done as part of the plugin implementation: #11903

We need to follow up on the exporter:

what are the exporters we want to to expose to the users
configuration endpoints etc.

ansjcy · 2024-03-21T21:57:08Z

I want to further elaborate on the exporter component in the query insights framework. As mentioned in the architecture diagram (see #11429), we want to have generic and asynchronous exporters to export the queries insights data. In the first phase, we should focus on the use case to export top n queries data (since that’s the only insights data we have in memory now - #11904), while keeping in mind the exporter should be generic enough to handle different use cases to export to other sinks.

Exporter types and configurations

Configuration endpoints shuold be provided for exporters for any types of top n queries (by latency, by resource usages etc), For development and debugging purposes, a debug exporter that exports to stdout will be provided.

search.insights.top_queries.latency.exporter.type: debug

Additionally, the initial exporter release will include a local index exporter with minimal configurations required, such as the rolling index pattern to export to. By default, we can store the top n queries data in a daily rolling index named topQueries-YYYY.MM.dd. The user can define their own prefix and date pattern to configure daily, weekly, or monthly rolling indexes:

search.insights.top_queries.latency.exporter.type: internal_opensearch
search.insights.top_queries.latency.exporter.config.index: "'my_top_queries-'YYYY.MM.dd"

Future implementations may include exporters for other sinks such as log4j, webhook, external_opensearch, etc., each with their specific configuration details. But we need to make sure the configuration endpoints under exporter.config for different sinks are as similar as possible.

Proposed Implementation

In the top n queries implementation, search requests data will be accumulated and finally drained to and stored in an in-memory priority queue in a fixed interval. The priority queue will be rotated to the "last window snapshot" once the data reaches the end of the window. The asynchronous exporter logic can be triggered during this window rotation process.

Further consideration:

Access Control: We need to implement access control for the top queries indices to restrict access to authorized personnel only. This ensures data security and compliance with privacy regulations.
Retention Policy: We should set up index lifecycle policies to manage the retention and deletion of old top queries indicies. This helps optimize storage usage and ensures data is retained for the required duration based on business and regulatory requirements.

deshsidd · 2024-03-30T00:18:12Z

@ansjcy Thanks for the proposed solution for exporter and the PR that followed. The proposal looks good to me overall. For the following : search.insights.top_queries.latency.exporter.type: internal_opensearch do we want to be more specific and mention internal_opensearch_index or local_opensearch_index or something on these lines?

Furthermore, are we planning on having a different index for top queries by cpu and memory in the future or use the same index? If its the former, we may want to make this clear in the naming here: search.insights.top_queries.latency.exporter.config.index: "'my_top_queries-'YYYY.MM.dd"

ansjcy added enhancement Enhancement or improvement to existing feature or request untriaged labels Nov 21, 2023

ansjcy added this to Performance Roadmap Nov 21, 2023

ansjcy moved this to Now (This Quarter) in Performance Roadmap Nov 21, 2023

ansjcy self-assigned this Nov 21, 2023

ansjcy added the Search:Query Insights label Nov 21, 2023

This was referenced Dec 7, 2023

[Draft] Query Insight Plugin with Top Queries feature #11506

Closed

[META] Generic Query Insights Framework #11522

Open

msfroh removed the untriaged label Jan 31, 2024

github-project-automation bot added this to Search Project Board Jan 31, 2024

github-project-automation bot moved this to 🆕 New in Search Project Board Jan 31, 2024

getsaurabh02 moved this from Now (This Quarter) to In Progress in Performance Roadmap Mar 22, 2024

ansjcy mentioned this issue Mar 29, 2024

Query insights exporters implementation #12982

Merged

8 tasks

getsaurabh02 moved this from In Progress to In-Review in Performance Roadmap Apr 8, 2024

ansjcy mentioned this issue Apr 9, 2024

Add index permissions for query insights exporters opensearch-project/security#4229

Merged

3 tasks

ansjcy changed the title ~~Asynchronous processor and exporter for query insight data~~ Asynchronous processor and exporter for query insights data Apr 10, 2024

getsaurabh02 added the v2.15.0 Issues and PRs related to version 2.15.0 label May 28, 2024

getsaurabh02 added this to OpenSearch Roadmap May 31, 2024

github-project-automation bot moved this to Planned work items in OpenSearch Roadmap May 31, 2024

msfroh closed this as completed in #12982 Jun 6, 2024

github-project-automation bot moved this from 🆕 New to ✅ Done in Search Project Board Jun 6, 2024

github-project-automation bot moved this from In-Review to Done in Performance Roadmap Jun 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Asynchronous processor and exporter for query insights data #11296

Asynchronous processor and exporter for query insights data #11296

ansjcy commented Nov 21, 2023

ansjcy commented Feb 9, 2024

ansjcy commented Mar 21, 2024 •

edited

Loading

deshsidd commented Mar 30, 2024

Asynchronous processor and exporter for query insights data #11296

Asynchronous processor and exporter for query insights data #11296

Comments

ansjcy commented Nov 21, 2023

ansjcy commented Feb 9, 2024

ansjcy commented Mar 21, 2024 • edited Loading

Exporter types and configurations

Proposed Implementation

Further consideration:

deshsidd commented Mar 30, 2024

ansjcy commented Mar 21, 2024 •

edited

Loading