-
Notifications
You must be signed in to change notification settings - Fork 5.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Provide query history extension to QueryTracker #12185
Comments
What you suggest requires getting some external store (eg object store like s3) and configuring Presto so that it can read/write there. I would recommend:
then - would you still need to change how in-memory query tracking works? |
Thanks for getting back @findepi. The scenario we have in mind is that we want to be able to view queries issued in the past (maybe a day / two back) in the Presto UI so that we can take advantage of the UI to dig into which stages took more time etc. This is also useful for our users as they get a query link while submitting a query which will potentially still work a couple of days later. We do have a hook in place to write out completed queries to Kafka and then HDFS but I'm not sure if we can view these historic queries using the Web UI (the We were looking at something similar to the Yarn / Spark job history (with a much more limited retention as it's something our users ask for a lot as their query links expire in a couple of hours on our cluster). |
Thanks for the quick feedbacks. Regarding to the query tracker itself, I think the current behavior (keeps history in memory) should be kept as default that comes out of box. My suggestion is to have other implementations that can be plugged in according to users' needs. |
Assuming we want to do this, we would likely need to plug this in at the REST interface instead of in the query tracker (which is in the query manager). The query management is already a very complex system for the currently running queries, and I would not want to further burden it with query history. |
This issue has been automatically marked as stale because it has not had any activity in the last 2 years. If you feel that this issue is important, just comment and the stale tag will be removed; otherwise it will be closed in 7 days. This is an attempt to ensure that our open issues remain valuable and relevant so that we can keep track of what needs to be done and prioritize the right things. |
Presto UI fetches query history provided by QueryTracker. Current implementation of QueryTracker keeps the recent queries in some in-memory data structure, which has some capacity limit. This leads to history expiration for queries aged more than the configured
query.min-expire-age
(default is 15 min).As
query.min-expire-age
should not be too big to take up all the heap, we might need an extension to the current QueryTracker, to be able to offload the queries to some data store and then get them back when needed.Also, current in memory history will be lost if coordinator fails over / restarts. So keeping history in a data store will also solve the persistence issue.
It seems people are already asking the same question and trying to address it in an adhoc way when looking into the community discussions: https://groups.google.com/forum/#!searchin/presto-users/query$20history%7Csort:date
We thus suggest to add an pluggable extension API for this purpose, and we could also provide some plugin implementations using RSDB/elasticsearch/KV store.
The text was updated successfully, but these errors were encountered: