Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(sqllab): Add timeout on fetching query results #29959

Merged
merged 4 commits into from
Sep 12, 2024

Conversation

justinpark
Copy link
Member

SUMMARY

This commit adds the timeout on fetching query results.
Additionally, a 'Retry without timeout' option is provided so that when a timeout issue occurs, users can attempt to proceed by ignoring the error.

BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF

after--frontend-timeout.mov

TESTING INSTRUCTIONS

SET SQLLAB_QUERY_RESULT_TIMEOUT with a small timeout
Run a query async
Check the error message and then click "Retry" to get the result without timeout errors

ADDITIONAL INFORMATION

  • Has associated issue:
  • Required feature flags:
  • Changes UI
  • Includes DB Migration (follow approval process in SIP-59)
    • Migration is atomic, supports rollback & is backwards-compatible
    • Confirm DB migration upgrade and downgrade tested
    • Runtime estimates and downtime expectations provided
  • Introduces new feature or API
  • Removes existing feature or API

@dosubot dosubot bot added the sqllab Namespace | Anything related to the SQL Lab label Aug 16, 2024
Copy link

codecov bot commented Aug 16, 2024

Codecov Report

Attention: Patch coverage is 90.90909% with 1 line in your changes missing coverage. Please review.

Project coverage is 72.52%. Comparing base (76d897e) to head (65cd417).
Report is 726 commits behind head on master.

Files with missing lines Patch % Lines
...frontend/src/SqlLab/components/ResultSet/index.tsx 75.00% 1 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##           master   #29959       +/-   ##
===========================================
+ Coverage   60.48%   72.52%   +12.03%     
===========================================
  Files        1931     1976       +45     
  Lines       76236    79638     +3402     
  Branches     8568     9072      +504     
===========================================
+ Hits        46114    57759    +11645     
+ Misses      28017    21155     -6862     
+ Partials     2105      724     -1381     
Flag Coverage Δ
hive 48.87% <100.00%> (-0.30%) ⬇️
javascript 62.12% <90.00%> (+4.40%) ⬆️
mysql 76.66% <100.00%> (?)
postgres 76.73% <100.00%> (?)
presto 53.38% <100.00%> (-0.42%) ⬇️
python 83.69% <100.00%> (+20.20%) ⬆️
sqlite 76.18% <100.00%> (?)
unit 60.44% <100.00%> (+2.82%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@michael-s-molina
Copy link
Member

@justinpark we currently have

# Timeout duration for SQL Lab synchronous queries
SQLLAB_TIMEOUT = int(timedelta(seconds=30).total_seconds())

# The MAX duration a query can run for before being killed by celery.
SQLLAB_ASYNC_TIME_LIMIT_SEC = int(timedelta(hours=6).total_seconds())

Can't we reuse one of these for this? I'm thinking that we could expand the scope of SQLLAB_TIMEOUT (changing the comment) to be valid for both sync and async queries.

@michael-s-molina
Copy link
Member

michael-s-molina commented Aug 16, 2024

Another question is about retry without time limit. Isn't this dangerous and probably inaccurate? I'm assuming all database engines will impose timeouts and this setting will not bypass this? I'm also assuming that a timeout was set for a reason, so giving users the ability to override this seems dangerous. If you're thinking about zombie frontend requests, wouldn't #29956 be sufficient to handle these cases?

@michael-s-molina michael-s-molina added the review:checkpoint Last PR reviewed during the daily review standup label Aug 16, 2024
@justinpark
Copy link
Member Author

justinpark commented Aug 19, 2024

Can't we reuse one of these for this? I'm thinking that we could expand the scope of SQLLAB_TIMEOUT (changing the comment) to be valid for both sync and async queries.

@michael-s-molina QUERY_RESULT_TIMEOUT is different from the existing SQLLAB_TIMEOUT and SQLLAB_ASYNC_TIME_LIMIT_SEC. This refers to a request that occurs in an ASYNC QUERY, which is different from the actual query execution time. It is a request to retrieve the completed query result from the cache. In formula form, it can be expressed as follows:

SQLLAB_TIMEOUT = SQLLAB_ASYNC_TIME_LIMIT_SEC + SQLLAB_QUERY_RESULT_TIMEOUT

The reason for delays related to the timeout is due to locking caused by updates to the query table, which leads to delays in querying the information linked to the resultsKey of the query ID. To address this issue, a separate timeout is configured.

Another question is about retry without time limit. Isn't this dangerous and probably inaccurate? I'm assuming all database engines will impose timeouts and this setting will not bypass this? I'm also assuming that a timeout was set for a reason, so giving users the ability to override this seems dangerous. If you're thinking about zombie frontend requests, wouldn't #29956 be sufficient to handle these cases?

The locking delay in synchronizing query status was resolved with PR 29956. This patch also addresses the intermittent issue of the fetching state freezing (when the timer stops and results are retrieved) that occasionally occurred after the query status was confirmed as successful.

Making a request without a timeout applies the max request time related on the HTTP server, which is the same as the existing result retrieval process. The new addition is simply an update to the UX, where pending states are cut off slightly earlier than the HTTP server’s max request timeout. If a request requires more time than the configured timeout, users can re-initiate the request using the original method, depending on their preference.

@michael-s-molina michael-s-molina added hold:testing! On hold for testing and removed review:checkpoint Last PR reviewed during the daily review standup labels Aug 19, 2024
@mistercrunch
Copy link
Member

/testenv up

Copy link
Contributor

@mistercrunch Ephemeral environment spinning up at http://34.215.131.40:8080. Credentials are admin/admin. Please allow several minutes for bootstrapping and startup.

@justinpark
Copy link
Member Author

Screenshot 2024-09-11 at 12 13 32 PM
Updated the button label to "Retry fetching results" which will run as-is (which means no overrides of timeout)

@justinpark justinpark removed the hold:testing! On hold for testing label Sep 11, 2024
@justinpark justinpark merged commit ff3b86b into apache:master Sep 12, 2024
38 of 39 checks passed
Copy link
Contributor

Ephemeral environment shutdown and build artifacts deleted.

@justinpark
Copy link
Member Author

@sadpandajoe Could you add this fix on 4.1?

sadpandajoe pushed a commit that referenced this pull request Sep 12, 2024
@sadpandajoe sadpandajoe added the v4.1 Label added by the release manager to track PRs to be included in the 4.1 branch label Sep 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
size/M sqllab Namespace | Anything related to the SQL Lab v4.1 Label added by the release manager to track PRs to be included in the 4.1 branch
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants