Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hue export to csv or excel results in re-execution of the query #3420

Open
1 task done
satvik1992 opened this issue Jul 31, 2023 · 10 comments
Open
1 task done

Hue export to csv or excel results in re-execution of the query #3420

satvik1992 opened this issue Jul 31, 2023 · 10 comments
Assignees
Labels
BUG Issue type for reporting failure due to bug in functionality

Comments

@satvik1992
Copy link

Is there an existing issue for this?

  • I have searched the existing issues

Description

Hue export to csv or excel results in re-execution of the query . This is a performance issue because a complex query will need time to execute and re-execution of a complex query during download is a overhead .

The version of hue used is 4.7.1

We are currently using Hue presto with hive meta-store and Mariadb on a Kubernetes Environment .

Download QueryExecution

Steps To Reproduce

  1. Run a complex query that takes a considerable amount of time to fetch results
  2. Select the download to csv or excel option

Logs

============= logs when we execute the query =====================
[31/Jul/2023 03:34:02 -0700] access INFO 10.223.204.174 -anon- - "GET /desktop/debug/is_alive HTTP/1.1" returned in 1ms 200 0
[31/Jul/2023 03:34:05 -0700] access INFO 10.223.204.174 -anon- - "GET /desktop/debug/is_alive HTTP/1.1" returned in 2ms 200 0
[31/Jul/2023 03:34:05 -0700] access INFO 10.223.204.174 -anon- - "GET /desktop/debug/is_alive HTTP/1.1" returned in 2ms 200 0
[31/Jul/2023 03:34:08 -0700] access INFO 10.223.204.174 -anon- - "GET /desktop/debug/is_alive HTTP/1.1" returned in 2ms 200 0
[31/Jul/2023 03:34:08 -0700] access INFO 10.223.204.174 -anon- - "GET /desktop/debug/is_alive HTTP/1.1" returned in 2ms 200 0
[31/Jul/2023 03:34:11 -0700] access INFO 10.223.204.174 -anon- - "GET /desktop/debug/is_alive HTTP/1.1" returned in 2ms 200 0
[31/Jul/2023 03:34:11 -0700] access INFO 10.223.204.174 -anon- - "GET /desktop/debug/is_alive HTTP/1.1" returned in 2ms 200 0
[31/Jul/2023 03:34:14 -0700] access INFO 10.223.204.174 -anon- - "GET /desktop/debug/is_alive HTTP/1.1" returned in 1ms 200 0
[31/Jul/2023 03:34:14 -0700] access INFO 10.223.204.174 -anon- - "GET /desktop/debug/is_alive HTTP/1.1" returned in 1ms 200 0
[31/Jul/2023 03:34:17 -0700] access INFO 10.223.204.174 -anon- - "GET /desktop/debug/is_alive HTTP/1.1" returned in 2ms 200 0
[31/Jul/2023 03:34:17 -0700] access INFO 10.223.204.174 -anon- - "GET /desktop/debug/is_alive HTTP/1.1" returned in 2ms 200 0
[31/Jul/2023 03:34:18 -0700] presto INFO select * from hue__tmp limit 1000

============== logs when download to csv/excel option is clicked ==============

[31/Jul/2023 03:34:26 -0700] access INFO 10.223.204.174 -anon- - "GET /desktop/debug/is_alive HTTP/1.1" returned in 2ms 200 0
[31/Jul/2023 03:34:26 -0700] access INFO 10.223.204.174 -anon- - "GET /desktop/debug/is_alive HTTP/1.1" returned in 2ms 200 0
[31/Jul/2023 03:34:29 -0700] access INFO 10.223.204.174 -anon- - "GET /desktop/debug/is_alive HTTP/1.1" returned in 2ms 200 0
[31/Jul/2023 03:34:29 -0700] access INFO 10.223.204.174 -anon- - "GET /desktop/debug/is_alive HTTP/1.1" returned in 2ms 200 0
[31/Jul/2023 03:34:32 -0700] access INFO 10.223.204.174 -anon- - "GET /desktop/debug/is_alive HTTP/1.1" returned in 1ms 200 0
[31/Jul/2023 03:34:32 -0700] access INFO 10.223.204.174 -anon- - "GET /desktop/debug/is_alive HTTP/1.1" returned in 1ms 200 0
[31/Jul/2023 03:34:35 -0700] access INFO 10.223.204.174 -anon- - "GET /desktop/debug/is_alive HTTP/1.1" returned in 2ms 200 0
[31/Jul/2023 03:34:35 -0700] access INFO 10.223.204.174 -anon- - "GET /desktop/debug/is_alive HTTP/1.1" returned in 2ms 200 0
[31/Jul/2023 03:34:38 -0700] access INFO 10.223.204.174 -anon- - "GET /desktop/debug/is_alive HTTP/1.1" returned in 2ms 200 0
[31/Jul/2023 03:34:38 -0700] access INFO 10.223.204.174 -anon- - "GET /desktop/debug/is_alive HTTP/1.1" returned in 2ms 200 0
[31/Jul/2023 03:34:41 -0700] access INFO 10.223.204.174 -anon- - "GET /desktop/debug/is_alive HTTP/1.1" returned in 1ms 200 0
[31/Jul/2023 03:34:41 -0700] access INFO 10.223.204.174 -anon- - "GET /desktop/debug/is_alive HTTP/1.1" returned in 1ms 200 0
[31/Jul/2023 03:34:41 -0700] presto INFO select * from hue__tmp limit 1000

Hue version

4.7.1

@satvik1992 satvik1992 added the BUG Issue type for reporting failure due to bug in functionality label Jul 31, 2023
@wachoo
Copy link

wachoo commented Aug 2, 2023

it may not be a bug.
it is better to review the code in this flle: desktop\libs\notebook\src\notebook\connectors\base.py,
and there is a function 'download' and a class 'ExecutionWrapper' about how to download csv/excel.

def download(self, notebook, snippet, file_format='csv'):
    from beeswax import data_export #TODO: Move to notebook?
    from beeswax import conf

    result_wrapper = ExecutionWrapper(self, notebook, snippet)

    max_rows = conf.DOWNLOAD_ROW_LIMIT.get()
    max_bytes = conf.DOWNLOAD_BYTES_LIMIT.get()

    content_generator = data_export.DataAdapter(result_wrapper, max_rows=max_rows, max_bytes=max_bytes)
    return export_csvxls.create_generator(content_generator, file_format)

@bjornalm
Copy link
Collaborator

@satvik1992 Thanks for reporting. Are you using sqlalchemy? As I understand Hue doesn't cache the result for sqlalchemy interface, and that is why the question is executed a second time when when exporting.

@satvik1992
Copy link
Author

@bjornalm yes we are using sqlalchemy and configuration is as below :-

interpreters: |
  [[[hive]]]
  name = Hive
  interface=hiveserver2
  
  [[[mysql]]]
  name = Mariadb
  interface=sqlalchemy
  options='{"url": "mysql://hue:hue@xxx-xxxxx-mariadb:3306/hue"}'

  [[[presto]]]
  name = Presto
  interface=sqlalchemy
  options='{"url": "presto://xxx-xxxxx-coordinator:80/hive/default"}'

@bjornalm
Copy link
Collaborator

@Harshg999 we discussed this briefly. Are there any planned improvements for sqlalchemy, if not should we add it to the agenda?

@twoyang0917
Copy link

+1

@github-actions
Copy link

This issue is stale because it has been open 30 days with no activity and is not labeled "Prevent stale". Remove "stale" label or comment or this will be closed in 10 days.

Copy link

This issue is stale because it has been open 30 days with no activity and is not labeled "Prevent stale". Remove "stale" label or comment or this will be closed in 10 days.

@github-actions github-actions bot added the Stale label Dec 17, 2023
@bjornalm bjornalm removed the Stale label Dec 18, 2023
Copy link

This issue is stale because it has been open 30 days with no activity and is not labeled "Prevent stale". Remove "stale" label or comment or this will be closed in 10 days.

@satvik1992
Copy link
Author

Can we re-open this issue ?

Could you please inform which interfaces support caching the query results from presto in hue?

Also, can hue's task server be used for caching the query results ? If yes then could please provide a resource for configuring this?

Can cached results be stored to S3? If yes could you please provide the resources for this

@bjornalm
Copy link
Collaborator

bjornalm commented Oct 9, 2024

Ping @Harshg999 :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
BUG Issue type for reporting failure due to bug in functionality
Projects
None yet
Development

No branches or pull requests

5 participants