Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for period character in table names #7453

Merged
merged 9 commits into from
May 26, 2019
Merged

Add support for period character in table names #7453

merged 9 commits into from
May 26, 2019

Conversation

villebro
Copy link
Member

@villebro villebro commented May 4, 2019

CATEGORY

  • Bug Fix
  • Enhancement (new features, refinement)
  • Refactor
  • Add tests
  • Build / Development Environment
  • Documentation

SUMMARY

In SQL Lab, table names are currently assumed to follow the following convention:

  • schema.table or
  • table

This is handled in the frontend by assuming that a period in the table name always implies a separator between schema and table name. Since there is no standardized way for SQLAlchemy inspectors to return table names (some return schema.table, others only table), they can be in either format. Since some databases (at least Apache Drill and Postgres) support periods in schema and table names, and their respective inspectors don't return schema prefixed table names, this causes problems when querying tables, as TableSelector strips away everything before the period character. This PR moves this logic from the frontend to the backend, and makes it possible to configure this behavior per engine.

This proposal changes table name handling in the following way:

  1. An attribute try_remove_schema_from_table_name is added to db_engine_specs (defaults to True). By default, when True, get_table_names() and get_view_names() checks if a table name starts with the schema name followed by a period, and if so, removes the schema name from the table name. Example: schema.table becomes table, while table remains unchanged.
  2. Table names are passed as dicts {'schema': 'schema_name', 'table': 'table_name'} when handed to the frontend. Previously they were either of the format table or schema.table. This removes any ambiguity in the frontend.
  3. SQL Lab UI now works in the following way:
    • If no schema is selected, table/view names are displayed in the dropdown as schema.table.
    • If a schema is selected, tables are shown as table only in the dropdown.
      Filtering also supports this, i.e. when no schema is chosen, the filter substring makes the comparison assuming that the table name is schema.table, making it possible to include the schema name in the filter string.

SCREENSHOTS

When no schema is chosen, table names are displayed as schema.table:
Screenshot 2019-05-09 at 21 09 07

If a schema is selected, only the table name is shown (in this case the table name is test.table, not table in schema test):
Screenshot 2019-05-09 at 21 09 43

TEST PLAN

Tested locally on Postgres and sqlite. Js unit tests updated to correspond to new data structures and python unit tests added to test table name fetching.

ADDITIONAL INFORMATION

  • Has associated issue:
  • Changes UI
  • Requires DB Migration.
  • Confirm DB Migration upgrade and downgrade tested.
  • Introduces new feature or API
  • Removes existing feature or API

REVIEWERS

@cgivre

@codecov-io
Copy link

codecov-io commented May 10, 2019

Codecov Report

Merging #7453 into master will increase coverage by 0.05%.
The diff coverage is 37.03%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #7453      +/-   ##
==========================================
+ Coverage   65.17%   65.23%   +0.05%     
==========================================
  Files         433      433              
  Lines       21428    21433       +5     
  Branches     2360     2358       -2     
==========================================
+ Hits        13966    13981      +15     
+ Misses       7342     7332      -10     
  Partials      120      120
Impacted Files Coverage Δ
superset/cli.py 36.01% <0%> (ø) ⬆️
.../assets/src/SqlLab/components/SqlEditorLeftBar.jsx 40.42% <0%> (+3.88%) ⬆️
superset/utils/core.py 88.21% <100%> (+0.06%) ⬆️
superset/assets/src/components/TableSelector.jsx 84.16% <100%> (-0.52%) ⬇️
superset/security.py 75% <100%> (+0.22%) ⬆️
superset/db_engine_specs.py 62.35% <35.71%> (+0.87%) ⬆️
superset/models/core.py 83.74% <69.23%> (+0.28%) ⬆️
superset/views/core.py 72.82% <8%> (-0.1%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 1ae000a...75387cd. Read the comment docs.

@villebro
Copy link
Member Author

@cgivre please take a look at this PR. If this works out this should make merging your Drill PR slightly easier.

@cgivre
Copy link
Contributor

cgivre commented May 10, 2019

@villebro I'll take a look this weekend. This pretty much will solve the issues with the Drill integration.

@@ -279,33 +280,32 @@ def convert_dttm(cls, target_type, dttm):
return "'{}'".format(dttm.strftime('%Y-%m-%d %H:%M:%S'))

@classmethod
def fetch_result_sets(cls, db, datasource_type):
"""Returns a list of tables [schema1.table1, schema2.table2, ...]
def get_all_datasource_names(cls, db, datasource_type: str) \
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason db wasn't type annotated is because it caused a circular import (models.core already has a reference to db_engine_specs). This should probably be refactored, but is outside the scope of this PR.

@villebro villebro changed the title [WIP] Refine table name handling in SQL Lab Add support for period in table name May 11, 2019
@villebro villebro changed the title Add support for period in table name Add support for period character in table name May 11, 2019
@villebro villebro changed the title Add support for period character in table name Add support for period character in table names May 11, 2019
@cgivre
Copy link
Contributor

cgivre commented May 12, 2019

@villebro I tried this out and it worked really well even without the Drill PR! I think pretty much the only thing that Drill needs now for the integration is the time grains! Thanks for your help with this.
LGTM +1

@villebro
Copy link
Member Author

Thanks for verifying that this works @cgivre . @mistercrunch @john-bodley @betodealmeida would really appreciate help reviewing this, as I think this is one of the last hurdles to being able to support engines that rely on the presence of non-standard characters in schema/table names.

@villebro
Copy link
Member Author

Kind reminder to committers that this is pending review, would be great to get this reviewed/merged so Drill integration can be finalized (is blocked by this PR).

@john-bodley
Copy link
Member

john-bodley commented May 20, 2019

@villebro my main comment which is somewhat related to #7490 is given that the cluster/schema/table name construct can be quite complicated and historically we've often flatten these names into a single string (and then split or used regular expressions to extract the components) whether we should move towards using a class (possibly a dataclass) to represent these objects everywhere in a canonical way.

@villebro
Copy link
Member Author

Good point @john-bodley , having a dedicated class with proper parsing/formatting functionality would probably be a good idea. Do you feel this should be addressed as part of this PR, or start a new PR for that?

@john-bodley
Copy link
Member

I think a separate PR is fine as it’s probably a large change.

@villebro villebro merged commit f7d3413 into apache:master May 26, 2019
@mistercrunch mistercrunch added the 🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels label Feb 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels size/L 🚢 0.34.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants