Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Asynchronous processor and exporter for query insights data #11296

Closed
ansjcy opened this issue Nov 21, 2023 · 3 comments · Fixed by #12982
Closed

Asynchronous processor and exporter for query insights data #11296

ansjcy opened this issue Nov 21, 2023 · 3 comments · Fixed by #12982
Assignees
Labels
enhancement Enhancement or improvement to existing feature or request Search:Query Insights v2.15.0 Issues and PRs related to version 2.15.0

Comments

@ansjcy
Copy link
Member

ansjcy commented Nov 21, 2023

Is your feature request related to a problem? Please describe.
we haven't provide a efficient way to export the “top queries with latency” data collected in #11295 , a generic, asynchronous processor and exporter should be created to handle those data for query insight.
(Parent RFC: #11186 )

Describe the solution you'd like
As part of #11186 , we need to implement an asynchronous processor and exporter to handle the data for query insight features. At the first iteration, the processor should be able to handle query latency data asynchronously and enqueue to the aggregator implemented in #11295 , and also export the aggregated data to an OpenSearch index. This framework can potentially be used by other query insight features in the future to avoid adding blocking logic in core search path.

Describe alternatives you've considered
In the future, we can potentially leverage the OPTL collector when it becomes available. With that we can send traces/spans to OPTL collectors, where the collector takes responsibility for necessary calculations, aggregations and export. This strategy could further reduce the impact on the OpenSearch process.

Additional context
Please refer to RFC: #11186

@ansjcy
Copy link
Member Author

ansjcy commented Feb 9, 2024

The processor is done as part of the plugin implementation: #11903

We need to follow up on the exporter:

  • what are the exporters we want to to expose to the users
  • configuration endpoints etc.

@ansjcy
Copy link
Member Author

ansjcy commented Mar 21, 2024

I want to further elaborate on the exporter component in the query insights framework. As mentioned in the architecture diagram (see #11429), we want to have generic and asynchronous exporters to export the queries insights data. In the first phase, we should focus on the use case to export top n queries data (since that’s the only insights data we have in memory now - #11904), while keeping in mind the exporter should be generic enough to handle different use cases to export to other sinks.

Exporter types and configurations

Configuration endpoints shuold be provided for exporters for any types of top n queries (by latency, by resource usages etc), For development and debugging purposes, a debug exporter that exports to stdout will be provided.

search.insights.top_queries.latency.exporter.type: debug

Additionally, the initial exporter release will include a local index exporter with minimal configurations required, such as the rolling index pattern to export to. By default, we can store the top n queries data in a daily rolling index named topQueries-YYYY.MM.dd. The user can define their own prefix and date pattern to configure daily, weekly, or monthly rolling indexes:

search.insights.top_queries.latency.exporter.type: internal_opensearch
search.insights.top_queries.latency.exporter.config.index: "'my_top_queries-'YYYY.MM.dd"

Future implementations may include exporters for other sinks such as log4j, webhook, external_opensearch, etc., each with their specific configuration details. But we need to make sure the configuration endpoints under exporter.config for different sinks are as similar as possible.

Proposed Implementation

In the top n queries implementation, search requests data will be accumulated and finally drained to and stored in an in-memory priority queue in a fixed interval. The priority queue will be rotated to the "last window snapshot" once the data reaches the end of the window. The asynchronous exporter logic can be triggered during this window rotation process.

Further consideration:

  • Access Control: We need to implement access control for the top queries indices to restrict access to authorized personnel only. This ensures data security and compliance with privacy regulations.
  • Retention Policy: We should set up index lifecycle policies to manage the retention and deletion of old top queries indicies. This helps optimize storage usage and ensures data is retained for the required duration based on business and regulatory requirements.

@getsaurabh02 getsaurabh02 moved this from Now (This Quarter) to In Progress in Performance Roadmap Mar 22, 2024
@deshsidd
Copy link
Contributor

@ansjcy Thanks for the proposed solution for exporter and the PR that followed. The proposal looks good to me overall. For the following : search.insights.top_queries.latency.exporter.type: internal_opensearch do we want to be more specific and mention internal_opensearch_index or local_opensearch_index or something on these lines?

Furthermore, are we planning on having a different index for top queries by cpu and memory in the future or use the same index? If its the former, we may want to make this clear in the naming here: search.insights.top_queries.latency.exporter.config.index: "'my_top_queries-'YYYY.MM.dd"

@getsaurabh02 getsaurabh02 moved this from In Progress to In-Review in Performance Roadmap Apr 8, 2024
@ansjcy ansjcy changed the title Asynchronous processor and exporter for query insight data Asynchronous processor and exporter for query insights data Apr 10, 2024
@getsaurabh02 getsaurabh02 added the v2.15.0 Issues and PRs related to version 2.15.0 label May 28, 2024
@github-project-automation github-project-automation bot moved this to Planned work items in OpenSearch Roadmap May 31, 2024
@github-project-automation github-project-automation bot moved this from 🆕 New to ✅ Done in Search Project Board Jun 6, 2024
@github-project-automation github-project-automation bot moved this from In-Review to Done in Performance Roadmap Jun 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Enhancement or improvement to existing feature or request Search:Query Insights v2.15.0 Issues and PRs related to version 2.15.0
Projects
Status: New
Status: Done
Status: Done
Development

Successfully merging a pull request may close this issue.

4 participants