Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore: improve schema security #23385

Merged
merged 1 commit into from
Mar 17, 2023
Merged

chore: improve schema security #23385

merged 1 commit into from
Mar 17, 2023

Conversation

betodealmeida
Copy link
Member

@betodealmeida betodealmeida commented Mar 15, 2023

SUMMARY

Superset supports permissions for schema-level access control; for example, a given user role might have access to only the schema foo in a given database. The problem is that it's really hard to figure out the schema of an unqualified table name. Eg, for the following query:

SELECT * FROM bar;

When a given user runs the query above, Superset needs to know what is the schema where table bar lives, so that it can check if the user has permission to access that schema. But figuring out the schema is not trivial, and depends on the database engine spec and how the database configured.

For example, for Postgres the schema will usually be public, the default one, regardless of the schema selected in SQL Lab. But it's possible to specify a different search path when creating the database, eg:

engine = create_engine("postgres://...", connect_args={"options": "-csearch_path=secret"})

For a database configured as above, the schema for the table bar in the query would be secret, and not public. To make things worse, the search path in Postgres can point to multiple schemas, in which case Superset has no idea where the data is coming from!

Other databases can specify the default schema in the SQLAlchemy URI; MySQL is one of them (though it calls it a "database"):

mysql://user:password@host/schema

In this case we can fetch the default schema for unqualified table names from the URI.

Finally, some database engine specs (like Trino, Presto, Snowflake, MySQL) can modify the schema dynamically on a per-query basis, in which case we can simply use the schema that is selected in the dropdown in SQL Lab:

@classmethod
def adjust_database_uri(
cls, uri: URL, selected_schema: Optional[str] = None
) -> URL:
database = uri.database
if selected_schema and database:
selected_schema = parse.quote(selected_schema, safe="")
if "/" in database:
database = database.split("/")[0] + "/" + selected_schema
else:
database += "/" + selected_schema
uri = uri.set(database=database)
return uri

To improve the situation this PR introduces a new method to DB engine specs, get_default_schema_for_query, along a couple helper methods. This method allows the security manager to know in which default schema a given query is running, so it can validate access to unqualified table names.

BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF

TESTING INSTRUCTIONS

ADDITIONAL INFORMATION

  • Has associated issue:
  • Required feature flags:
  • Changes UI
  • Includes DB Migration (follow approval process in SIP-59)
    • Migration is atomic, supports rollback & is backwards-compatible
    • Confirm DB migration upgrade and downgrade tested
    • Runtime estimates and downtime expectations provided
  • Introduces new feature or API
  • Removes existing feature or API

@codecov
Copy link

codecov bot commented Mar 15, 2023

Codecov Report

Merging #23385 (708bc87) into master (ec6318b) will increase coverage by 0.10%.
The diff coverage is 95.31%.

❗ Current head 708bc87 differs from pull request most recent head 8e04ad8. Consider uploading reports for the commit 8e04ad8 to get more accurate results

@@            Coverage Diff             @@
##           master   #23385      +/-   ##
==========================================
+ Coverage   67.44%   67.55%   +0.10%     
==========================================
  Files        1907     1907              
  Lines       73493    73546      +53     
  Branches     7976     7976              
==========================================
+ Hits        49571    49687     +116     
+ Misses      21873    21810      -63     
  Partials     2049     2049              
Flag Coverage Δ
hive 52.75% <46.87%> (?)
mysql 78.38% <46.87%> (-0.06%) ⬇️
postgres 78.48% <67.18%> (-0.03%) ⬇️
presto 52.68% <46.87%> (-0.02%) ⬇️
python 82.30% <95.31%> (+0.20%) ⬆️
sqlite 76.95% <60.93%> (-0.04%) ⬇️
unit 52.52% <75.00%> (+0.02%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
...ins/plugin-chart-table/src/DataTable/DataTable.tsx 39.82% <ø> (ø)
...d/src/SqlLab/components/AceEditorWrapper/index.tsx 59.25% <ø> (ø)
...frontend/src/SqlLab/components/ResultSet/index.tsx 62.65% <ø> (ø)
...rontend/src/components/DropdownContainer/index.tsx 65.00% <ø> (ø)
...uperset-frontend/src/components/FaveStar/index.tsx 100.00% <ø> (ø)
...et-frontend/src/components/FlashProvider/index.tsx 90.00% <ø> (ø)
...et-frontend/src/components/ListView/CrossLinks.tsx 88.23% <ø> (ø)
...ponents/ReportModal/HeaderReportDropdown/index.tsx 67.50% <ø> (ø)
...erset-frontend/src/components/Select/CustomTag.tsx 75.00% <ø> (ø)
...et-frontend/src/components/TruncatedList/index.tsx 76.92% <ø> (ø)
... and 29 more

... and 9 files with indirect coverage changes

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

Copy link
Member

@villebro villebro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice improvement! Thanks for this hard work, this will set a much better foundation for schema security! 👍

"""
# default schema varies on a per-query basis
if cls.supports_dynamic_schema:
return query.schema
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@betodealmeida judging from the description, the dropdown from sqllab has already been assigned as query.schema at this point?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right!

@mistercrunch mistercrunch added 🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels 🚢 3.0.0 labels Mar 13, 2024
@mistercrunch mistercrunch deleted the default_schema branch March 26, 2024 16:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels size/L 🚢 3.0.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants