Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide query history extension to QueryTracker #12185

Closed
qinghui-xu opened this issue Jan 7, 2019 · 5 comments
Closed

Provide query history extension to QueryTracker #12185

qinghui-xu opened this issue Jan 7, 2019 · 5 comments
Labels

Comments

@qinghui-xu
Copy link

Presto UI fetches query history provided by QueryTracker. Current implementation of QueryTracker keeps the recent queries in some in-memory data structure, which has some capacity limit. This leads to history expiration for queries aged more than the configured query.min-expire-age (default is 15 min).
As query.min-expire-age should not be too big to take up all the heap, we might need an extension to the current QueryTracker, to be able to offload the queries to some data store and then get them back when needed.
Also, current in memory history will be lost if coordinator fails over / restarts. So keeping history in a data store will also solve the persistence issue.

It seems people are already asking the same question and trying to address it in an adhoc way when looking into the community discussions: https://groups.google.com/forum/#!searchin/presto-users/query$20history%7Csort:date

We thus suggest to add an pluggable extension API for this purpose, and we could also provide some plugin implementations using RSDB/elasticsearch/KV store.

@findepi
Copy link
Contributor

findepi commented Jan 7, 2019

What you suggest requires getting some external store (eg object store like s3) and configuring Presto so that it can read/write there.

I would recommend:

  1. create event listener to dump your query info in some object store (eg s3)
    • of course, this calls for some reuse so that we don't have to reinvent this piece over and over
  2. configure a hive connector so that you can read the information back
  3. query your historical query info using SQL

then - would you still need to change how in-memory query tracking works?

@piyushnarang
Copy link
Contributor

Thanks for getting back @findepi. The scenario we have in mind is that we want to be able to view queries issued in the past (maybe a day / two back) in the Presto UI so that we can take advantage of the UI to dig into which stages took more time etc. This is also useful for our users as they get a query link while submitting a query which will potentially still work a couple of days later.

We do have a hook in place to write out completed queries to Kafka and then HDFS but I'm not sure if we can view these historic queries using the Web UI (the /ui/query.html?query_id endpoint).

We were looking at something similar to the Yarn / Spark job history (with a much more limited retention as it's something our users ask for a lot as their query links expire in a couple of hours on our cluster).

@qinghui-xu
Copy link
Author

Thanks for the quick feedbacks.
We find the PrestoUI very nice and helpful to track behavior of user queries. For a common user, it would be much more convenient to have only one place to see the query history, especially when he already got the link to presto UI for a specific query that he wants to share with us (for troubleshooting/tuning). The pain point is that often the query expires few hours after the completion, and users are confused.
As Piyush mentioned, we can set up a hook to export queries somewhere, but it's now not possible to display those queries in the UI.
And also it seems the question comes up from time to time https://groups.google.com/forum/#!searchin/presto-users/query$20history%7Csort:date
So it would be nice to have this possibility and to avoid people doing this in an adhoc way.

Regarding to the query tracker itself, I think the current behavior (keeps history in memory) should be kept as default that comes out of box. My suggestion is to have other implementations that can be plugged in according to users' needs.

@dain
Copy link
Contributor

dain commented Jan 8, 2019

Assuming we want to do this, we would likely need to plug this in at the REST interface instead of in the query tracker (which is in the query manager). The query management is already a very complex system for the currently running queries, and I would not want to further burden it with query history.

@stale
Copy link

stale bot commented Jan 10, 2021

This issue has been automatically marked as stale because it has not had any activity in the last 2 years. If you feel that this issue is important, just comment and the stale tag will be removed; otherwise it will be closed in 7 days. This is an attempt to ensure that our open issues remain valuable and relevant so that we can keep track of what needs to be done and prioritize the right things.

@stale stale bot added the stale label Jan 10, 2021
@stale stale bot closed this as completed Jan 17, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants