-
Notifications
You must be signed in to change notification settings - Fork 13.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[sqllab] fix: return pandas records in execute_sql_statements #9102
[sqllab] fix: return pandas records in execute_sql_statements #9102
Conversation
d52832b
to
15a4dce
Compare
Ideally, we'll always want to serialize to msgpack (vs json) when persisting to the results backend due to the performance gains. Instead of skipping msgpack serialization, can the logic used to fetch stored results and rehydrate the tab state perform the proper deserialization? I'm still not clear where the issue lies with synchronous queries returning bad data. Is msgpack modifying the Previously, |
15a4dce
to
460b46d
Compare
Codecov Report
@@ Coverage Diff @@
## master #9102 +/- ##
======================================
Coverage 59.1% 59.1%
======================================
Files 372 372
Lines 11922 11922
Branches 2919 2919
======================================
Hits 7046 7046
Misses 4694 4694
Partials 182 182 Continue to review full report at Codecov.
|
460b46d
to
f3dff5a
Compare
f3dff5a
to
435ac95
Compare
@robdiciuccio Looks like it's the pyarrow data that's getting sent to the frontend, thus it ends up as |
CATEGORY
Choose one
SUMMARY
SQLLAB_BACKEND_PERSISTENCE feature stores the results in the results backend which is used to rehydrate tab states. Data is incorrectly being returned in serialized binary form due to the assumption that if we're storing results the query is being executed asynchronously. This pr fixes that assumption by also checking if we're looking to return results (in which case we also use the df_to_records and JSON serialization approach).
This is obviously not the most performant way of handling this as were doing pyarrow + msgpack to store the results in the results backend, as well as pandas + json to return the results, however it's the easiest way to get SQLLAB_BACKEND_PERSISTENCE + RESULTS_BACKEND_USE_MSGPACK working together without a significant rewrite/refactor.
TEST PLAN
ADDITIONAL INFORMATION
REVIEWERS
@robdiciuccio @betodealmeida