-
Notifications
You must be signed in to change notification settings - Fork 13.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: see Presto row and array data types #7391
Conversation
Codecov Report
@@ Coverage Diff @@
## lyft-release-sp8 #7391 +/- ##
==================================================
+ Coverage 64.89% 65% +0.1%
==================================================
Files 428 429 +1
Lines 20979 21081 +102
Branches 2337 2337
==================================================
+ Hits 13615 13704 +89
- Misses 7240 7253 +13
Partials 124 124
Continue to review full report at Codecov.
|
superset/db_engine_specs.py
Outdated
@@ -804,6 +809,21 @@ class PrestoEngineSpec(BaseEngineSpec): | |||
date_add('day', 1, CAST({col} AS TIMESTAMP))))", | |||
} | |||
|
|||
type_map = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could this name be made a bit more self-documenting as to what it is mapping to what? (presto_to_?? ) If presto is the keys, what are the values? (upon first glance it looks like sqlalchemy abstract types, but then I see that mysql value there...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yes. This is a mapping copied from the pyhive package (package we use for most of our presto needs). It's basically a mapping of strings to SQL object types defined by sqlalchemy. There isn't a tinyint data type defined by sqlalchemy, which is why we use mysql. But I agree it seems leaky. I can create a skeleton for tinyint and should create one for ROW object. From the looks of it, it looks like we just use its string representation in our code base so far.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Created skeleton types and moved types to its own file.
superset/db_engine_specs.py
Outdated
@@ -804,6 +809,21 @@ class PrestoEngineSpec(BaseEngineSpec): | |||
date_add('day', 1, CAST({col} AS TIMESTAMP))))", | |||
} | |||
|
|||
type_map = { | |||
'boolean': types.Boolean, | |||
'tinyint': mysql.MSTinyInteger, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mysql
... just calling it out as a divergence from the impl agnostic types that all of the others are using. It feels "leaky".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Created skeleton types and moved types to its own file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like skeleton types are now causing CI failures:
AttributeError: 'GenericTypeCompiler' object has no attribute 'visit_ARRAY'
During handling of the above exception, another exception occurred:
... lots of stuff ...
sqlalchemy.exc.UnsupportedCompilationError: Compiler <sqlalchemy.sql.compiler.GenericTypeCompiler object at 0x7f9f6189ed68> can't render element of type <class 'superset.models.sql_types.presto_sql_types.ARRAY'>
As annoying as it would be to revert the types you added (and going back to the original question I sort of raised) I wonder if it is a viable path to just leave it with the mysql type there? Obviously it would be less "correct" than having a well-defined presto dialect but... not sure if the superset project should be the entity that owns that. :-)
@john-bodley @graceguo-supercat @michellethomas @mistercrunch Does that value get passed around to other bits of code that might actually care if it is actually a mysql column? How bad is it to use the only existing built-in representation of tinyint here, under the mysql dialect?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Talked to Dave offline. I fixed the CI issues. So we can continue to have our skeleton types and not have to rely on other db classes like mysql.
superset/db_engine_specs.py
Outdated
@@ -814,6 +834,164 @@ def get_view_names(cls, inspector, schema): | |||
""" | |||
return [] | |||
|
|||
@classmethod | |||
def _create_column_info(cls, column, name, data_type): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would you mind providing type hints for all arguments and the return type? It's a suggestion we have for all new Python methods and helps with type-checking and readability.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Definitely!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was able to add type hints and returns for all new methods except for the my_db parameter in the method select_star because it exposed a circular dependency. I created issue #7412 to address refactoring for this file.
Also @john-bodley @mistercrunch @graceguo-supercat @michellethomas ... as discussed previously, in order to prevent divergence we would like to merge this over to master as directly as possible after merge, so if you don't mind please review as though it is going to |
Since my original question/concern about the usage of the mysql dialect is addressed by adding your presto dialect standin... I think I am 👍 |
} | ||
|
||
@classmethod | ||
def _get_full_name(cls, names: list) -> str: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for lists and dictionaries can you use the typing
List
and Dict
objects respectively? This should require for you to explicitly define the types, i.e., List[str]
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No problem!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@john-bodley We merged this earlier than we should have so I opened a new PR at #7415 to address feedback in a new commit... then we can merge all of it to master in one go, with feedback. :-)
* Merge lastest from master into lyft-release-sp8 (#7405) * filter out all nan series (#7313) * improve not rich tooltip (#7345) * Create issue_label_bot.yaml (#7341) * fix: do not save colors without a color scheme (#7347) * [wtforms] Strip leading/trailing whitespace (#7084) * [schema] Updating the datasources schema (#5451) * limit tables/views returned if schema is not provided (#7358) * limit tables/views returned if schema is not provided * fix typo * improve code performance * handle the case when table name or view name does not present a schema * Add type anno (#7342) * Updated local dev instructions to include missing step * First pass at type annotations * [schema] Updating the base column schema (#5452) * Update 937d04c16b64_update_datasources.py (#7361) * Feature flag for client cache (#7348) * Feature flag for client cache * Fix integration test * Revert "Fix integration test" This reverts commit 58434ab. * Feature flag for client cache * Fix integration tests * Add feature flag to config.py * Add another feature check * Fix more integration tests * Fix raw HTML in SliceAdder (#7338) * remove backendSync.json (#7331) * [bubbles] issue when using duplicated metrics (#7087) * SUPERSET-7: Docker compose config version breaks on Ubuntu 16.04 (#7359) * SUPERSET-8: Update text in docs copyright footer (#7360) * SUPERSET-7: Docker compose config version breaks on Ubuntu 16.04 * SUPERSET-8: Extra text in docs copyright footer * [schema] Adding commits and removing unnecessary foreign-key definitions (#7371) * Store last selected dashboard in sessionStorage (#7181) * Store last selected dashboard in sessionStorage * Fix tests * [schema] Updating the base metric schema (#5453) * Fix NoneType bug & fill the test recipients with original recipients if empty (#7365) * feat: see Presto row and array data types (#7391) * feat: see Presto row and array data types * fix: address PR comments * fix: lint and build issues * fix: add types * add stronger type hints where possible * fix: lint issues and add select_star func in Hive * add missing pkg init * fix: build issues * fix: pylint issues * fix: use logging instead of print
* Merge lastest from master into lyft-release-sp8 (#7405) * filter out all nan series (#7313) * improve not rich tooltip (#7345) * Create issue_label_bot.yaml (#7341) * fix: do not save colors without a color scheme (#7347) * [wtforms] Strip leading/trailing whitespace (#7084) * [schema] Updating the datasources schema (#5451) * limit tables/views returned if schema is not provided (#7358) * limit tables/views returned if schema is not provided * fix typo * improve code performance * handle the case when table name or view name does not present a schema * Add type anno (#7342) * Updated local dev instructions to include missing step * First pass at type annotations * [schema] Updating the base column schema (#5452) * Update 937d04c16b64_update_datasources.py (#7361) * Feature flag for client cache (#7348) * Feature flag for client cache * Fix integration test * Revert "Fix integration test" This reverts commit 58434ab. * Feature flag for client cache * Fix integration tests * Add feature flag to config.py * Add another feature check * Fix more integration tests * Fix raw HTML in SliceAdder (#7338) * remove backendSync.json (#7331) * [bubbles] issue when using duplicated metrics (#7087) * SUPERSET-7: Docker compose config version breaks on Ubuntu 16.04 (#7359) * SUPERSET-8: Update text in docs copyright footer (#7360) * SUPERSET-7: Docker compose config version breaks on Ubuntu 16.04 * SUPERSET-8: Extra text in docs copyright footer * [schema] Adding commits and removing unnecessary foreign-key definitions (#7371) * Store last selected dashboard in sessionStorage (#7181) * Store last selected dashboard in sessionStorage * Fix tests * [schema] Updating the base metric schema (#5453) * Fix NoneType bug & fill the test recipients with original recipients if empty (#7365) * feat: see Presto row and array data types (#7391) * feat: see Presto row and array data types * fix: address PR comments * fix: lint and build issues * fix: add types * Incorporate feedback from initial PR (prematurely merged to lyft-release-sp8) (#7415) * add stronger type hints where possible * fix: lint issues and add select_star func in Hive * add missing pkg init * fix: build issues * fix: pylint issues * fix: use logging instead of print * feat: view presto row objects in data grid * fix: address feedback * fix: spacing
* Merge lastest from master into lyft-release-sp8 (#7405) * filter out all nan series (#7313) * improve not rich tooltip (#7345) * Create issue_label_bot.yaml (#7341) * fix: do not save colors without a color scheme (#7347) * [wtforms] Strip leading/trailing whitespace (#7084) * [schema] Updating the datasources schema (#5451) * limit tables/views returned if schema is not provided (#7358) * limit tables/views returned if schema is not provided * fix typo * improve code performance * handle the case when table name or view name does not present a schema * Add type anno (#7342) * Updated local dev instructions to include missing step * First pass at type annotations * [schema] Updating the base column schema (#5452) * Update 937d04c16b64_update_datasources.py (#7361) * Feature flag for client cache (#7348) * Feature flag for client cache * Fix integration test * Revert "Fix integration test" This reverts commit 58434ab. * Feature flag for client cache * Fix integration tests * Add feature flag to config.py * Add another feature check * Fix more integration tests * Fix raw HTML in SliceAdder (#7338) * remove backendSync.json (#7331) * [bubbles] issue when using duplicated metrics (#7087) * SUPERSET-7: Docker compose config version breaks on Ubuntu 16.04 (#7359) * SUPERSET-8: Update text in docs copyright footer (#7360) * SUPERSET-7: Docker compose config version breaks on Ubuntu 16.04 * SUPERSET-8: Extra text in docs copyright footer * [schema] Adding commits and removing unnecessary foreign-key definitions (#7371) * Store last selected dashboard in sessionStorage (#7181) * Store last selected dashboard in sessionStorage * Fix tests * [schema] Updating the base metric schema (#5453) * Fix NoneType bug & fill the test recipients with original recipients if empty (#7365) * feat: see Presto row and array data types (#7391) * feat: see Presto row and array data types * fix: address PR comments * fix: lint and build issues * fix: add types * Incorporate feedback from initial PR (prematurely merged to lyft-release-sp8) (#7415) * add stronger type hints where possible * fix: lint issues and add select_star func in Hive * add missing pkg init * fix: build issues * fix: pylint issues * fix: use logging instead of print * feat: view presto row objects in data grid * fix: address feedback * fix: spacing * Workaround for no results returned (#7442) * feat: view presto row objects in data grid (#7436) * feat: view presto row objects in data grid * fix: address feedback * fix: spacing * feat: Scheduling queries from SQL Lab (#7416) * Lightweight pipelines POC * Add docs * Minor fixes * Remove Lyft URL * Use enum * Minor fix * Fix unit tests * Mark props as required
* filter out all nan series (#7313) * improve not rich tooltip (#7345) * Create issue_label_bot.yaml (#7341) * fix: do not save colors without a color scheme (#7347) * [wtforms] Strip leading/trailing whitespace (#7084) * [schema] Updating the datasources schema (#5451) * limit tables/views returned if schema is not provided (#7358) * limit tables/views returned if schema is not provided * fix typo * improve code performance * handle the case when table name or view name does not present a schema * Add type anno (#7342) * Updated local dev instructions to include missing step * First pass at type annotations * [schema] Updating the base column schema (#5452) * Update 937d04c16b64_update_datasources.py (#7361) * Feature flag for client cache (#7348) * Feature flag for client cache * Fix integration test * Revert "Fix integration test" This reverts commit 58434ab. * Feature flag for client cache * Fix integration tests * Add feature flag to config.py * Add another feature check * Fix more integration tests * Fix raw HTML in SliceAdder (#7338) * remove backendSync.json (#7331) * [bubbles] issue when using duplicated metrics (#7087) * SUPERSET-7: Docker compose config version breaks on Ubuntu 16.04 (#7359) * SUPERSET-8: Update text in docs copyright footer (#7360) * SUPERSET-7: Docker compose config version breaks on Ubuntu 16.04 * SUPERSET-8: Extra text in docs copyright footer * [schema] Adding commits and removing unnecessary foreign-key definitions (#7371) * Store last selected dashboard in sessionStorage (#7181) * Store last selected dashboard in sessionStorage * Fix tests * [schema] Updating the base metric schema (#5453) * Fix NoneType bug & fill the test recipients with original recipients if empty (#7365) * Added living goods as among the users of Superset (#7407) * Added living goods as among the users of Superset Living Goods is a non profit organisation with operation in africa and the middle east. We work in community health use data heavily on day to day. Superset is our platform of choice for dashboards. * Update README.md * [dashboard] allow user re-order top-level tabs (#7390) * [SQL Lab] Increase timeout threshold for offline check (#7411) * Bump FAB to 2.0.0 (#7323) * Bump FAB to 2.0.0 * [tests] whitelist SecurityApi login and refresh endpoints * [style] Fix, C812 missing trailing commas * [security] Remove SUPERSET_UPDATE_PERMS flag Registering sources needs to be performed after the views are initialized on UPDATE_PERMS=False configuration * [docs] New, FAB_UPDATE_PERMS and flask fab cli * [docs] Fix, db upgrade needs to come first, create-admin needs a db * [cli] New, superset init bootstraps all permissions for FAB and Superset * [style] Fix, flakes * [annotations] Improves UX on annotation validation, start_dttm, end_dttm (#7326) * Setting renderTrigger on label_colors (#7410) * Refactor out controlUtils.js module + unit tests (#7350) * [WiP]refactor out a controlUtils.js file * unit tests * add missing license * Addressing comments * feature: see Presto row and array data types (#7413) * Merge lastest from master into lyft-release-sp8 (#7405) * filter out all nan series (#7313) * improve not rich tooltip (#7345) * Create issue_label_bot.yaml (#7341) * fix: do not save colors without a color scheme (#7347) * [wtforms] Strip leading/trailing whitespace (#7084) * [schema] Updating the datasources schema (#5451) * limit tables/views returned if schema is not provided (#7358) * limit tables/views returned if schema is not provided * fix typo * improve code performance * handle the case when table name or view name does not present a schema * Add type anno (#7342) * Updated local dev instructions to include missing step * First pass at type annotations * [schema] Updating the base column schema (#5452) * Update 937d04c16b64_update_datasources.py (#7361) * Feature flag for client cache (#7348) * Feature flag for client cache * Fix integration test * Revert "Fix integration test" This reverts commit 58434ab. * Feature flag for client cache * Fix integration tests * Add feature flag to config.py * Add another feature check * Fix more integration tests * Fix raw HTML in SliceAdder (#7338) * remove backendSync.json (#7331) * [bubbles] issue when using duplicated metrics (#7087) * SUPERSET-7: Docker compose config version breaks on Ubuntu 16.04 (#7359) * SUPERSET-8: Update text in docs copyright footer (#7360) * SUPERSET-7: Docker compose config version breaks on Ubuntu 16.04 * SUPERSET-8: Extra text in docs copyright footer * [schema] Adding commits and removing unnecessary foreign-key definitions (#7371) * Store last selected dashboard in sessionStorage (#7181) * Store last selected dashboard in sessionStorage * Fix tests * [schema] Updating the base metric schema (#5453) * Fix NoneType bug & fill the test recipients with original recipients if empty (#7365) * feat: see Presto row and array data types (#7391) * feat: see Presto row and array data types * fix: address PR comments * fix: lint and build issues * fix: add types * add stronger type hints where possible * fix: lint issues and add select_star func in Hive * add missing pkg init * fix: build issues * fix: pylint issues * fix: use logging instead of print * Removed --console-log and superset runserver (#7421) * Fixes dashboard export button missing download and #7353 (#7427) * Added additional German translations to string file (#6604) * Added additional German translations to string file Updates to German translation files as per directions * Removed messages.json * [fix] Fixing SQL parsing issue (#7374) * add chinese translate (#7402) * Quick fix to address deadlock issue (#7434) * feat: view presto row objects in data grid (#7445) * Merge lastest from master into lyft-release-sp8 (#7405) * filter out all nan series (#7313) * improve not rich tooltip (#7345) * Create issue_label_bot.yaml (#7341) * fix: do not save colors without a color scheme (#7347) * [wtforms] Strip leading/trailing whitespace (#7084) * [schema] Updating the datasources schema (#5451) * limit tables/views returned if schema is not provided (#7358) * limit tables/views returned if schema is not provided * fix typo * improve code performance * handle the case when table name or view name does not present a schema * Add type anno (#7342) * Updated local dev instructions to include missing step * First pass at type annotations * [schema] Updating the base column schema (#5452) * Update 937d04c16b64_update_datasources.py (#7361) * Feature flag for client cache (#7348) * Feature flag for client cache * Fix integration test * Revert "Fix integration test" This reverts commit 58434ab. * Feature flag for client cache * Fix integration tests * Add feature flag to config.py * Add another feature check * Fix more integration tests * Fix raw HTML in SliceAdder (#7338) * remove backendSync.json (#7331) * [bubbles] issue when using duplicated metrics (#7087) * SUPERSET-7: Docker compose config version breaks on Ubuntu 16.04 (#7359) * SUPERSET-8: Update text in docs copyright footer (#7360) * SUPERSET-7: Docker compose config version breaks on Ubuntu 16.04 * SUPERSET-8: Extra text in docs copyright footer * [schema] Adding commits and removing unnecessary foreign-key definitions (#7371) * Store last selected dashboard in sessionStorage (#7181) * Store last selected dashboard in sessionStorage * Fix tests * [schema] Updating the base metric schema (#5453) * Fix NoneType bug & fill the test recipients with original recipients if empty (#7365) * feat: see Presto row and array data types (#7391) * feat: see Presto row and array data types * fix: address PR comments * fix: lint and build issues * fix: add types * Incorporate feedback from initial PR (prematurely merged to lyft-release-sp8) (#7415) * add stronger type hints where possible * fix: lint issues and add select_star func in Hive * add missing pkg init * fix: build issues * fix: pylint issues * fix: use logging instead of print * feat: view presto row objects in data grid * fix: address feedback * fix: spacing * feat: Scheduling queries from SQL Lab (#7416) (#7446) * Merge lastest from master into lyft-release-sp8 (#7405) * filter out all nan series (#7313) * improve not rich tooltip (#7345) * Create issue_label_bot.yaml (#7341) * fix: do not save colors without a color scheme (#7347) * [wtforms] Strip leading/trailing whitespace (#7084) * [schema] Updating the datasources schema (#5451) * limit tables/views returned if schema is not provided (#7358) * limit tables/views returned if schema is not provided * fix typo * improve code performance * handle the case when table name or view name does not present a schema * Add type anno (#7342) * Updated local dev instructions to include missing step * First pass at type annotations * [schema] Updating the base column schema (#5452) * Update 937d04c16b64_update_datasources.py (#7361) * Feature flag for client cache (#7348) * Feature flag for client cache * Fix integration test * Revert "Fix integration test" This reverts commit 58434ab. * Feature flag for client cache * Fix integration tests * Add feature flag to config.py * Add another feature check * Fix more integration tests * Fix raw HTML in SliceAdder (#7338) * remove backendSync.json (#7331) * [bubbles] issue when using duplicated metrics (#7087) * SUPERSET-7: Docker compose config version breaks on Ubuntu 16.04 (#7359) * SUPERSET-8: Update text in docs copyright footer (#7360) * SUPERSET-7: Docker compose config version breaks on Ubuntu 16.04 * SUPERSET-8: Extra text in docs copyright footer * [schema] Adding commits and removing unnecessary foreign-key definitions (#7371) * Store last selected dashboard in sessionStorage (#7181) * Store last selected dashboard in sessionStorage * Fix tests * [schema] Updating the base metric schema (#5453) * Fix NoneType bug & fill the test recipients with original recipients if empty (#7365) * feat: see Presto row and array data types (#7391) * feat: see Presto row and array data types * fix: address PR comments * fix: lint and build issues * fix: add types * Incorporate feedback from initial PR (prematurely merged to lyft-release-sp8) (#7415) * add stronger type hints where possible * fix: lint issues and add select_star func in Hive * add missing pkg init * fix: build issues * fix: pylint issues * fix: use logging instead of print * feat: view presto row objects in data grid * fix: address feedback * fix: spacing * Workaround for no results returned (#7442) * feat: view presto row objects in data grid (#7436) * feat: view presto row objects in data grid * fix: address feedback * fix: spacing * feat: Scheduling queries from SQL Lab (#7416) * Lightweight pipelines POC * Add docs * Minor fixes * Remove Lyft URL * Use enum * Minor fix * Fix unit tests * Mark props as required * feat: Add `validate_sql_json` endpoint for checking that a given sql query is valid for the chosen database (#7422) (#7462) merge from lyft-release-sp8 to master * Adds missing metric sum__SP_RUR_TOTL (#7452) * Late import for optional lib pyhive (#7471) * Late import for optional lib pyhive * fix * fix: calendar heatmap examples (#7375) Fixing a set of examples that trip on ValueError vs TypeError * bugfix: Improve support for special characters in schema and table names (#7297) * Bugfix to SQL Lab to support tables and schemas with characters that require quoting * Remove debugging prints * Add uri encoding to secondary tables call * Quote schema names for presto * Quote selected_schema on Snowflake, MySQL and Hive * Remove redundant parens * Add python unit tests * Add js unit test * Fix flake8 linting error * [dashboard] After update filter, trigger new queries when charts are visible (#7233) * trigger query when chart is visible * add integration test * fix: alter sql columns to long text #7463 (#7476) Merge lyft-release-sp8@7bfe7bc to master * Refactor ConsoleLog (#7428) * Revised Chinese translation (#7464) * add chinese translate * edit chinese translation * druid connector: avoid using 'dimensions' for scan queries (#7377) After the following PyDruid change (contained in version 0.5.2) the Superset Histogram charts rendered with Druid data are broken: druid-io/pydruid@0a59a70 Bump the pydruid requirements accordingly in setup.py Issue: #7368
CATEGORY
Choose one
SUMMARY
If a column is an array or row, we did not show their data type in the schema view or its contents. For example, say we have ColumnA and its type is row(nest_obj double).
Previous Display:
ColumnA NULL
Current Display:
ColumnA ROW
ColumnA.nested_obj double
Why did we implement our own logic to handle row and array data types? It is because we use the pyhive library to get columns but they do not handle complex data types. They see row and array objects as unknown types and then assign it as a null type. Fortunately, the information we receive from Presto is predictable and that's why we have a string parser to make sense of the reply.
TEST PLAN
Tested manually and wrote unit tests
REVIEWERS
@xtinec @bearcage @DiggidyDave @datability-io @betodealmeida @mistercrunch @john-bodley @graceguo-supercat @michellethomas