Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Autocomplete in the table browser in SQL lab is broken - Fix part 2 #7770

Merged
merged 6 commits into from
Jul 1, 2019

Conversation

khtruong
Copy link
Contributor

@khtruong khtruong commented Jun 24, 2019

CATEGORY

Choose one

  • Bug Fix
  • Enhancement (new features, refinement)
  • Refactor
  • Add tests
  • Build / Development Environment
  • Documentation

SUMMARY

This is the complete fix for autocomplete in the table browser in SQL lab. We use the createFilterOptions API from the react-select-fast-filter-options library for autocomplete. The API expects a list of option objects where its properties value and label are strings. The bug is that we were passing objects to the property value, which caused the algorithm to not work properly. Now we explicitly pass strings to the API.

Two additional changes I added:

  1. So this entire component is buggy and doesn't work as expected. We have one function that will add the option if it is missing to the list with just the table name. I believe the original intent was to display something while we were fetching the tables. This led to a few bugs:
  • We may be adding a table that no longer exists to the list, which doesn't make sense. So when the user selects this table, it just breaks silently because the table doesn't exist.
  • Even if the table exists, we did not populate this option with the schema and so when we make a request to get the table, again it breaks silently because we get the table with just its name and no schema information.

I feel that if we really wanted to display something while we fetch the tables, we should do it properly with tests. So I just removed the buggy code for now and thought we can revisit it later if that is a feature we really want.

  1. I had to update the unit tests to return promises because if you do not return a promise, Jest just calls the promise and continues onto the next test. This caused false positives. There were two tests that are broken and need to be fixed. I filed TableSelector handle error tests do not run correctly #7768 to track this so that this PR did not contain unrelated changes.

TEST PLAN

Manually and extended unit tests

REVIEWERS

@mistercrunch @betodealmeida @DiggidyDave @xtinec

@@ -191,12 +199,6 @@ export default class TableSelector extends React.PureComponent {
this.onChange();
});
}
addOptionIfMissing(options, value) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@khtruong we don't need this anymore?

Copy link
Contributor Author

@khtruong khtruong Jun 25, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I'm sorry. I totally forgot to mention in my description. So this entire component is buggy and doesn't work as expected. What this function does is that if the option is missing, then we add it to the list with just the table name. I believe the original intent was to display something while we were fetching the tables. This led to a few bugs:

  1. We may be adding a table that no longer exists to the list, which doesn't make sense. So when the user selects this table, it just breaks silently because the table doesn't exist.
  2. Even if the table exists, we did not populate this option with the schema and so when we make a request to get the table, again it breaks silently because we get the table with just its name and no schema information.

I felt that if we really wanted to display something while we fetch the tables, we should do it properly with tests. So I just removed the buggy code for now and thought we can revisit it later if that is a feature we really want.

The component is still buggy but it's less buggy. I'll add this explanation to the description. Let me know your thoughts.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, make sense!

@codecov-io
Copy link

codecov-io commented Jun 25, 2019

Codecov Report

❗ No coverage uploaded for pull request base (master@15426fe). Click here to learn what that means.
The diff coverage is 50%.

Impacted file tree graph

@@            Coverage Diff            @@
##             master    #7770   +/-   ##
=========================================
  Coverage          ?   65.89%           
=========================================
  Files             ?      459           
  Lines             ?    22007           
  Branches          ?     2417           
=========================================
  Hits              ?    14502           
  Misses            ?     7384           
  Partials          ?      121
Impacted Files Coverage Δ
superset/views/core.py 73.07% <0%> (ø)
superset/assets/src/components/TableSelector.jsx 84.16% <100%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 15426fe...000ccf6. Read the comment docs.

Copy link
Member

@mistercrunch mistercrunch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that <TableSelector /> is also used in the <TableEditor />, where it's used in somewhat different ways. If a chart points to a table and that physical table gets dropped, we still need to see it there.

Really in that case we should have some sort of red exclamation point with a tooltip highlighting that fact.

@mistercrunch
Copy link
Member

Another element of complexity that I was dealing with when I generalized that component (for <TableEditor />) is the [optional, see this.props.database.allow_multi_schema_metadata_fetch] feature where if the schema isn't selected you can in theory search (as in autocomplete) across all tables across all schemas. This feature makes the code opaque and depends on eager caching techniques in the backend (I think at Airbnb there's a cron refreshing that cache from the catalog every 5).

Clearly searching/ across schemas is super useful, but adds another layer of complexity on top or an already heavy component.

@mistercrunch
Copy link
Member

mistercrunch commented Jun 26, 2019

A bit more context. I think recently in #7297, there was an effort to move from using "fully-qualified" table names as in {schema}.{tableName} to using objects instead as in {schema, tableName}. I think that notation was the way that we dealt with schemas when allow_multi_schema_metadata_fetch == true. The driver for the change was that Apache Drill had tables name with dots in them, and that was throwing off some if the dot-spliting logic...

@villebro
Copy link
Member

The breaking change @mistercrunch is referring to was introduced in #7453 , where fully qualified table names were replaced by objects. It seems the consequences were more far reaching than were understood at the time..

@khtruong
Copy link
Contributor Author

Note that <TableSelector /> is also used in the <TableEditor />, where it's used in somewhat different ways. If a chart points to a table and that physical table gets dropped, we still need to see it there.

Really in that case we should have some sort of red exclamation point with a tooltip highlighting that fact.

I didn't see TableEditor but I did see TableSelector in DataSourceEditor and verified this fix there. There was also another issue when a user explores a chart based off a query. The query name would then be added to the list of tables, which doesn't make sense.

I could file a Github to track the ideal experience where there is some sort of red bang with a tooltip letting the user know what the physical table was. Otherwise putting addOptionIfMissing brings back all the issues highlighted earlier. Let me know what you think.

@khtruong
Copy link
Contributor Author

@villebro @mistercrunch Thanks for the backstory! Wow, I can see that there was a lot of work done to fix the previous issues. One question: Will there be a regression if the option value is not specifically an object {table: ..., schema: ...}? I did not use fully-qualified table names. I just pulled out table and schema into separate properties in the parent object and it seems fine. Am I missing something?

@villebro
Copy link
Member

@khtruong I think any solution that makes it possible to unambiguously separate schema from table name should be fine. IIRC one option that was proposed was to keep using a fully qualified string, but apply escaping where necessary. I also remember that @john-bodley is/was working on introducing the concept of a cluster, i.e. one level of abstraction above schema (cluster.schema.table), and who knows what some future engine brings along (I can see some cloud DW introducing region, too).. So perhaps good to leave room for future expansion, while at the same time trying to keep it as simple as possible. Which might be a tall order 😄

@khtruong
Copy link
Contributor Author

@mistercrunch Let me know if you prefer a specific change. I think my changes will not bring back issues you saw when we used the fully qualified names in the past because the table name and schema name are still separate properties. And I can file a Github issue to track the ideal experience where there is some sort of red bang with a tooltip letting the user know if the physical table is missing.

@mistercrunch
Copy link
Member

I just wanted to make sure you have all the context for the complexity in this area. I'm ok with the change if there are no regressions across the board.

The caching feature (from cross schema table name autocomplete) I mentioned seems brittle, and someone at Airbnb (where they use the feature) might want to confirm that this works there.

@khtruong
Copy link
Contributor Author

I just wanted to make sure you have all the context for the complexity in this area. I'm ok with the change if there are no regressions across the board.

The caching feature (from cross schema table name autocomplete) I mentioned seems brittle, and someone at Airbnb (where they use the feature) might want to confirm that this works there.

Gotcha! I don't think this should affect the caching feature since that is a backend operation and this change is mostly frontend changes and does not modify the schema/table name itself.

I have filed issue #7798 as well.

@villebro
Copy link
Member

If it's of any help I'll be happy to run this PR through some testing, as the Presto + Pular and Drill issues that were the basis for my PRs that were mentioned here are still fresh in my memory. Will be able to do a thorough testing during the weekend.

@khtruong
Copy link
Contributor Author

If it's of any help I'll be happy to run this PR through some testing, as the Presto + Pular and Drill issues that were the basis for my PRs that were mentioned here are still fresh in my memory. Will be able to do a thorough testing during the weekend.

That'd be awesome @villebro ! Thanks!

Copy link
Member

@villebro villebro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Functionally this works 100 % with all difficult edge cases I used to validate the Presto+Pulsar and Drill PRs. Apart from aligning the object structures across the frontend and backend this LGTM. Incidentally, what is the role of title, as it seems to always have the same value as label?

value: o.value.table,
schema: o.value.schema,
label: o.label,
title: o.label,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To avoid confusion down the line I think it might be a good idea to align what is returned by the backend with this new structure. Something along these lines (didn't test so might contain typos):

diff --git a/superset/assets/spec/javascripts/sqllab/fixtures.js b/superset/assets/spec/javascripts/sqllab/fixtures.js
index 99e740c3..2b737fef 100644
--- a/superset/assets/spec/javascripts/sqllab/fixtures.js
+++ b/superset/assets/spec/javascripts/sqllab/fixtures.js
@@ -343,16 +343,22 @@ export const databases = {
 export const tables = {
   options: [
     {
-      value: { schema: 'main', table: 'birth_names' },
+      value: 'birth_names',
+      schema: 'main',
       label: 'birth_names',
+      title: 'birth_names',
     },
     {
-      value: { schema: 'main', table: 'energy_usage' },
+      value: 'energy_usage',
+      schema: 'main',
       label: 'energy_usage',
+      title: 'energy_usage',
     },
     {
-      value: { schema: 'main', table: 'wb_health_population' },
+      value: 'wb_health_population',
+      schema: 'main',
       label: 'wb_health_population',
+      title: 'wb_health_population',
     },
   ],
 };
diff --git a/superset/assets/src/components/TableSelector.jsx b/superset/assets/src/components/TableSelector.jsx
index 4c9ef7a9..dc5d075c 100644
--- a/superset/assets/src/components/TableSelector.jsx
+++ b/superset/assets/src/components/TableSelector.jsx
@@ -108,10 +108,10 @@ export default class TableSelector extends React.PureComponent {
         `${encodeURIComponent(this.props.schema)}/${encodeURIComponent(input)}`),
     }).then(({ json }) => {
       const options = json.options.map(o => ({
-        value: o.value.table,
-        schema: o.value.schema,
+        value: o.value,
+        schema: o.schema,
         label: o.label,
-        title: o.label,
+        title: o.title,
       }));
       return ({ options });
     });
@@ -138,10 +138,10 @@ export default class TableSelector extends React.PureComponent {
        return SupersetClient.get({ endpoint })
         .then(({ json }) => {
           const options = json.options.map(o => ({
-            value: o.value.table,
-            schema: o.value.schema,
+            value: o.value,
+            schema: o.schema,
             label: o.label,
-            title: o.label,
+            title: o.title,
           }));
           this.setState(() => ({
             filterOptions: createFilterOptions({ options }),
diff --git a/superset/views/core.py b/superset/views/core.py
index 06727f7a..8355ec39 100755
--- a/superset/views/core.py
+++ b/superset/views/core.py
@@ -1627,14 +1627,16 @@ class Superset(BaseSupersetView):
             max_tables = max_items * len(tables) // total_items
             max_views = max_items * len(views) // total_items
 
-        def get_datasource_value(ds_name: utils.DatasourceName) -> Dict[str, str]:
-            return {'schema': ds_name.schema, 'table': ds_name.table}
-
-        table_options = [{'value': get_datasource_value(tn),
-                          'label': get_datasource_label(tn)}
+        table_options = [{'value': tn.table,
+                          'schema': tn.schema,
+                          'label': get_datasource_label(tn),
+                          'title': get_datasource_label(tn)}
                          for tn in tables[:max_tables]]
-        table_options.extend([{'value': get_datasource_value(vn),
-                               'label': f'[view] {get_datasource_label(vn)}'}
+        table_options.extend([{'value': vn.table,
+                               'schema': vn.schema,
+                               'label': f'[view] {get_datasource_label(vn)}',
+                               'title': f'[view] {get_datasource_label(vn)}',
+                               }
                               for vn in views[:max_views]])
         payload = {
             'tableLength': len(tables) + len(views),

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good! I'll update the code. As for the title, I am not sure what the history is on that attribute and why we need it.

value: o.value.table,
schema: o.value.schema,
label: o.label,
title: o.label,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above

@villebro
Copy link
Member

One problem that @mistercrunch pointed out is that if value always contains only the table name, it will not be possible to search over the fully qualified name if no schema is selected multi schema fetch is enabled. However, if we add a property table with only the table name, and provide fully qualified names in value when no schema is selected (otherwise just table name) similar to how label is now populated, searching over both schema and table will be possible.

I think these changes go slightly beyond the original intent of this PR, so I propose we merge this as is, and I can put together a new PR that tackles this stuff I've just proposed (should be fairly quick).

@khtruong
Copy link
Contributor Author

khtruong commented Jul 1, 2019

One problem that @mistercrunch pointed out is that if value always contains only the table name, it will not be possible to search over the fully qualified name if no schema is selected multi schema fetch is enabled. However, if we add a property table with only the table name, and provide fully qualified names in value when no schema is selected (otherwise just table name) similar to how label is now populated, searching over both schema and table will be possible.

I think these changes go slightly beyond the original intent of this PR, so I propose we merge this as is, and I can put together a new PR that tackles this stuff I've just proposed (should be fairly quick).

Got it. That would be awesome if you have time. Otherwise, we can file an issue to track it for the future.

@betodealmeida betodealmeida merged commit 963dce6 into apache:master Jul 1, 2019
@mistercrunch mistercrunch added 🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels 🚢 0.34.0 labels Feb 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels size/L 🚢 0.34.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants