[SIP-95] Proposal for Catalog Support in SQL Lab #22862

JELGT2011 · 2023-01-26T00:15:18Z

[SIP] Proposal for Catalog Support in SQL Lab

Motivation

The current SQL Lab explorer is not very intuitive as there is no way to explore catalogs, and can only explore the default catalog.

Proposed Change

Add a new Catalog dropdown in the sqllab editor (similar to schema).

New or Changed Public Interfaces

This should only affect the sqllab query tab. The Catalog dropdown will be similar enough to the Schema explorer. The word schema is kind of overloaded throughout Superset, so it's not entirely clear to me yet whether a new endpoint is necessary.

New dependencies

None

Migration Plan and Compatibility

None

Rejected Alternatives

None

The text was updated successfully, but these errors were encountered:

betodealmeida · 2023-01-26T00:51:06Z

I would love to see this, but there are few complications. The biggest one is that we need support from the SQL Alchemy dialects, and I'm not sure how many support dynamic catalogs. For example, both Hive and Trino require the catalog to be specified when the connection is created:

trino://<username>:<password>@<host>:<port>/<catalog>/<schema>
hive://<host>:<port>/<catalog>

For engines like this the user would have to choose a default catalog when the database is added, and each DB engine spec in Superset would have to know how to replace the catalog in the URL, in case a user chooses one different from the default after the DB has been created.

It's definitely doable, and something we should do. We've been slowly adding support for catalogs:

superset/superset/sql_parse.py

Line 172 in 383313b

catalog: Optional[str] = None

Are you planning to work on this, or are you just proposing it? If you're not going to work on it I can start taking a look.

JELGT2011 · 2023-01-26T00:55:05Z

I was planning to work on this, but of course I’d love help as I’ve never made a contribution to Superset. I was planning on doing a smaller task first (adding a configurable header/footer) since I figured this one might be complex.

betodealmeida · 2023-01-26T00:56:08Z

I was planning to work on this, but of course I’d love help as I’ve never made a contribution to Superset. I was planning on doing a smaller task first (adding a configurable header/footer) since I figured this one might be complex.

Ah, great! From the wording of the SIP I wasn't sure, and people are welcome to propose changes even if they're not going to work on it. I'm happy to help in any way you need, let me know!

afilipchik · 2023-01-26T18:00:54Z

hey @betodealmeida, good to see you are still pushing Superset forward!

We worked around originally by concatenating catalog with schema, but patching got nasty, as we would pass it thought to Trino driver.

Here is how it looks:

betodealmeida · 2023-01-26T18:44:24Z

I think a good approach for this would be modifying BaseEngineSpec.get_engine to allow passing a catalog. The method would then pass it to get_sqla_engine_with_context, which would call a DB engine specific method to mutate the SQLAlchemy URI so it uses the requested catalog. Something like (pseudocode):

BaseEngineSpec:

    @classmethod
    def get_engine(
        cls,
        database: Database,
        catalog: Optional[str] = None,
        schema: Optional[str] = None,
        source: Optional[QuerySource] = None,
    ) -> ContextManager[Engine]:
        # get_sqla_engine_with_context needs to call database.engine_spec.apply_catalog_to_sqlalchemy_uri
        return database.get_sqla_engine_with_context(
            catalog=catalog,
            schema=schema,
            source=source
        )
    
    @classmethod
    def apply_catalog_to_sqlalchemy_uri(cls, database: Database, catalog: str) -> URL:
        return database.sqlalchemy_uri

    @classmethod
    def get_catalog_names(
        cls,
        database: Database,
        inspect: Inspector,
    ) -> Set[str]:
        return set()

Individual DB engine specs like Trino/Hive/Snowflake/BigQuery would implement custom apply_catalog_to_sqlalchemy_uri methods to build the SQLAlchemy URI for a given catalog. They would also implement custom get_catalog_names to list all available catalogs, since there's no standard SQLAlchemy way of doing that, as far as I know.

betodealmeida · 2023-03-14T16:30:27Z

Actually, we would need the apply_catalog_to_sqlalchemy_uri to also pass schema:

    @classmethod
    def apply_catalog_to_sqlalchemy_uri(
        cls,
        database: Database,
        catalog: Optional[str],
        schema: Optional[str],
    ) -> URL:
        try:
            connect_args = database.get_extra()["engine_params"]["connect_args"]
        except KeyError:
            connect_args = {}

        return database.sqlalchemy_uri, connect_args

For Postgres, this would look like this (untested):

    @classmethod
    def apply_catalog_to_sqlalchemy_uri(
        cls,
        database: Database,
        catalog: Optional[str],
        schema: Optional[str],
    ) -> Tuple[URL, Dict[str, Any]]:
        url, connect_args = super().apply_catalog_to_sqlalchemy_uri(database, catalog, schema)

        if catalog:
            # for Postgres a catalog is a database
            url = url.set(database=catalog)

        if schema:
            connect_args["options"] = f"-csearch_path={schema}"

        return url, connect_args

See the discussion at #23356.

betodealmeida · 2023-03-14T19:02:51Z

Actually we already have a method to change the schema (adjust_database_uri), we should piggyback on that.

betodealmeida · 2023-03-17T17:08:18Z

While working on an issue with schemas, I've done some foundational work for supporting catalogs:

Once we have that, I think we would still need the following steps:

Implement catalog-level permissions (@dpgaspar can probably help here). We also need to modify the parser to extract the catalog from queries, and have the security manager handle catalogs when checking for access.
Implement a get_catalog_names in the DB engine specs that support catalogs (Presto, Trino, Snowflake, BigQuery, Postgres, ...?). Also update their adjust_engine_params methods to modify the SQL Alchemy URI to use the chosen catalog.
Add a catalog dropdown to SQL Lab, and make sure the selected catalog is being passed to _get_sqla_engine in the database model, and from there passed to adjust_engine_params.
Modify the dataset model and add a catalog attribute.
Add a catalog dropdown to the dataset creator.

betodealmeida · 2023-03-20T16:45:11Z

I'm going to work on (1) and (2) this week, then we can put the SIP to vote and you (or someone else) can work on the UI part, @JELGT2011.

JELGT2011 · 2023-03-20T18:09:25Z

that's awesome news @betodealmeida! I've been a little delayed trying to upgrade our existing instance of superset, but I could definitely do some frontend changes (3), (4?), and (5). I think I have a draft work in progress on it already so that works out.

rusackas · 2023-06-07T17:36:18Z

Numbering this as SIP-95. If you'd like to continue pursuing this, pleas submit it for a DISCUSS thread on the Superset @dev mailing list, so we can start progressing it toward a vote. Thanks!

rusackas · 2023-10-03T20:28:48Z

It seems like this work is incrementally moving forward, but the SIP itself has still not been brought up for a DISCUSS thread or a VOTE thread on the Apache mailing list. @JELGT2011 / @betodealmeida is there any need or intent to continue with the SIP process here?

kdazzle · 2023-10-05T16:16:05Z

It would be huge to get this going for Databricks. I could probably contribute - especially since part of the approach was laid out for different engines

guenp · 2023-12-05T18:50:59Z

I also support putting this SIP-95 up for a DISCUSS/VOTE thread! @motherduckdb has a similar scenario where a single DuckDB instance can connect to multiple databases (see https://duckdb.org/docs/sql/statements/attach.html). It would be great to be able to use the Catalog drop-down to scope the Schema options.

rusackas · 2024-02-01T20:22:46Z

@guenp please feel free to kick off the Discussion thread on the [email protected] mailing list! Let me know if you need help with that, you can ping me here or on Superset Slack

rusackas · 2024-04-03T15:56:52Z

Hi @JELGT2011 / @guenp if anyone wants to move this SIP forward, please put it up for a [DISCUSS] thread on the mailing list. Otherwise, it's at risk of being closed/discarded.

betodealmeida · 2024-04-03T15:59:01Z

I'll bring it up for a discussion, I don't think we're far away from finishing this.

john-bodley · 2024-04-04T22:48:05Z

@betodealmeida regarding your comment in the "[DISCUSS] [SIP-95] Proposal for Catalog Support in SQL Lab" email,

I'd like to increase the scope of SIP-95 by introducing catalogs not only in SQL Lab, but throughout the whole application.

I totally agree as it likely is a necessary requirement if one wanted to explore a SQL Lab result set.

betodealmeida · 2024-04-05T16:32:04Z

@john-bodley right, I just called that out because the original SIP mentioned only SQL Lab in the title, but I don't see how that would work. 🙃

rusackas · 2024-05-07T15:52:05Z

@betodealmeida I think you just need to close the vote for this with a [RESULT] thread

betodealmeida · 2024-05-07T22:53:05Z

@betodealmeida I think you just need to close the vote for this with a [RESULT] thread

Done!

betodealmeida · 2024-05-09T22:56:47Z

It's official, we now have catalog support in Apache Superset. Currently Postgres and Databricks are supported, and I have a PR out for:

Trino
Presto
BigQuery
Snowflake

Adding support to new DB engine specs should be straightforward, requiring just 3 methods. I suggest using #28416 as a reference implementation.

rusackas assigned betodealmeida, villebro, john-bodley and eschutho Jan 26, 2023

betodealmeida mentioned this issue Mar 16, 2023

feat(postgresql): dynamic schema #23401

Merged

9 tasks

This was referenced Mar 21, 2023

feat(DB engine spec): get_catalog_names #23447

Merged

feat(bigquery): get_catalog_names #23461

Merged

This was referenced Apr 6, 2023

feat(presto): get_catalog_names #23599

Merged

feat(snowflake): get_catalog_names #23602

Merged

rusackas added the sip Superset Improvement Proposal label Jun 7, 2023

rusackas changed the title ~~[SIP] Proposal for Catalog Support in SQL Lab~~ [SIP-95] Proposal for Catalog Support in SQL Lab Jun 7, 2023

dungdm93 mentioned this issue Aug 9, 2023

implement get_catalog_names trinodb/trino-python-client#401

Merged

betodealmeida mentioned this issue Jan 2, 2024

[SIP-111] Proposal for improved database, schema, and table selection UI in SQL Lab sidebar #26395

Open

This was referenced Apr 16, 2024

feat(SIP-95): new endpoint for extra table metadata #28063

Merged

feat(SIP-95): new endpoint for table metadata #28122

Merged

This was referenced Apr 30, 2024

chore: clean up DB create command #28246

Merged

feat(SIP-95): permissions for catalogs #28317

Merged

betodealmeida mentioned this issue May 7, 2024

feat(SIP-95): catalogs in SQL Lab and datasets #28376

Merged

9 tasks

betodealmeida mentioned this issue May 9, 2024

feat: add support for catalogs #28416

Merged

9 tasks

betodealmeida closed this as completed May 9, 2024

betodealmeida mentioned this issue Jul 19, 2024

fix: remove old constraint #29649

Open

9 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SIP-95] Proposal for Catalog Support in SQL Lab #22862

[SIP-95] Proposal for Catalog Support in SQL Lab #22862

JELGT2011 commented Jan 26, 2023 •

edited

Loading

betodealmeida commented Jan 26, 2023

JELGT2011 commented Jan 26, 2023

betodealmeida commented Jan 26, 2023

afilipchik commented Jan 26, 2023

betodealmeida commented Jan 26, 2023 •

edited

Loading

betodealmeida commented Mar 14, 2023 •

edited

Loading

betodealmeida commented Mar 14, 2023

betodealmeida commented Mar 17, 2023

betodealmeida commented Mar 20, 2023

JELGT2011 commented Mar 20, 2023

rusackas commented Jun 7, 2023

rusackas commented Oct 3, 2023

kdazzle commented Oct 5, 2023 •

edited

Loading

guenp commented Dec 5, 2023 •

edited

Loading

rusackas commented Feb 1, 2024

rusackas commented Apr 3, 2024

betodealmeida commented Apr 3, 2024

john-bodley commented Apr 4, 2024

betodealmeida commented Apr 5, 2024

rusackas commented May 7, 2024

betodealmeida commented May 7, 2024

betodealmeida commented May 9, 2024

[SIP-95] Proposal for Catalog Support in SQL Lab #22862

[SIP-95] Proposal for Catalog Support in SQL Lab #22862

Comments

JELGT2011 commented Jan 26, 2023 • edited Loading

[SIP] Proposal for Catalog Support in SQL Lab

Motivation

Proposed Change

New or Changed Public Interfaces

New dependencies

Migration Plan and Compatibility

Rejected Alternatives

betodealmeida commented Jan 26, 2023

JELGT2011 commented Jan 26, 2023

betodealmeida commented Jan 26, 2023

afilipchik commented Jan 26, 2023

betodealmeida commented Jan 26, 2023 • edited Loading

betodealmeida commented Mar 14, 2023 • edited Loading

betodealmeida commented Mar 14, 2023

betodealmeida commented Mar 17, 2023

betodealmeida commented Mar 20, 2023

JELGT2011 commented Mar 20, 2023

rusackas commented Jun 7, 2023

rusackas commented Oct 3, 2023

kdazzle commented Oct 5, 2023 • edited Loading

guenp commented Dec 5, 2023 • edited Loading

rusackas commented Feb 1, 2024

rusackas commented Apr 3, 2024

betodealmeida commented Apr 3, 2024

john-bodley commented Apr 4, 2024

betodealmeida commented Apr 5, 2024

rusackas commented May 7, 2024

betodealmeida commented May 7, 2024

betodealmeida commented May 9, 2024

JELGT2011 commented Jan 26, 2023 •

edited

Loading

betodealmeida commented Jan 26, 2023 •

edited

Loading

betodealmeida commented Mar 14, 2023 •

edited

Loading

kdazzle commented Oct 5, 2023 •

edited

Loading

guenp commented Dec 5, 2023 •

edited

Loading