Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore: isolate examples database by default #25003

Merged
merged 3 commits into from
Aug 17, 2023

Conversation

dpgaspar
Copy link
Member

@dpgaspar dpgaspar commented Aug 16, 2023

SUMMARY

This PR places the examples database on it's own database, previously the examples database tables were being created on Superset's metadata database by default.
Setting the examples database on it's own database will stop giving users access to Superset's metadata database through SQLLab avoiding several possible attack vectors.

By default without using docker-compose, the examples database will be set on it's own SQLite db. Using docker-compose the postgres container will create a specific user for the examples db and the db itself.
Removed the default that set's the SQLALCHEMY_EXAMPLES_URI to be equal to SQLALCHEMY_DATABASE_URI when not set.

Before:

Screenshot 2023-08-16 at 17 58 13

After:

Screenshot 2023-08-16 at 15 27 39

BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF

TESTING INSTRUCTIONS

ADDITIONAL INFORMATION

  • Has associated issue:
  • Required feature flags:
  • Changes UI
  • Includes DB Migration (follow approval process in SIP-59)
    • Migration is atomic, supports rollback & is backwards-compatible
    • Confirm DB migration upgrade and downgrade tested
    • Runtime estimates and downtime expectations provided
  • Introduces new feature or API
  • Removes existing feature or API

@john-bodley
Copy link
Member

/testenv up

@john-bodley john-bodley added the review:checkpoint Last PR reviewed during the daily review standup label Aug 16, 2023
@github-actions
Copy link
Contributor

@john-bodley Ephemeral environment creation is currently limited to committers.

Copy link
Member

@betodealmeida betodealmeida left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, i just left a question about #23399. I think it could be good to have the examples DB with DML and file uploads enabled, for a better initial experience with Superset.

or current_app.config["SQLALCHEMY_DATABASE_URI"]
)
return get_or_create_db("examples", db_uri)
return get_or_create_db("examples", current_app.config["SQLALCHEMY_EXAMPLES_URI"])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding #23399, should we have the examples DB have DML on and with file uploads enabled?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

left a comment on #23399, no strong opinions about allowing DML/upload on the examples db, yet I do see this as a readonly db

@dpgaspar dpgaspar merged commit 269c992 into apache:master Aug 17, 2023
35 checks passed
@dpgaspar dpgaspar deleted the chore/isolate-examples-db branch August 17, 2023 10:50
@michael-s-molina michael-s-molina added the v3.0 Label added by the release manager to track PRs to be included in the 3.0 branch label Aug 17, 2023
@john-bodley john-bodley removed the review:checkpoint Last PR reviewed during the daily review standup label Aug 17, 2023
michael-s-molina pushed a commit that referenced this pull request Aug 17, 2023
@sfirke
Copy link
Member

sfirke commented Aug 18, 2023

Someone in Slack reported that they are trying to follow the docker-compose setup instructions and are getting the error message:

The environment variable EXAMPLES_USER was missing, abort...

I was surprised to see zero hits for EXAMPLES_USER in Superset GitHub issues, so went looking to see if there was a recent PR that related to this variable, and found this one. Do folks think this could have broken the default deployment in that way?

@michael-s-molina
Copy link
Member

@dpgaspar @sfirke @betodealmeida The problem is because when running with docker-compose-non-dev, it's using the docker/.env file which does not have the examples variables. My question is, for non-dev environments should we have the examples database? If yes, then the fix is just adding these variables to docker/.env. If not, then we need to skip the lookup for them in pythonpath_dev/superset_config.py. What do you think?

@sfirke
Copy link
Member

sfirke commented Aug 18, 2023

I don't feel strongly but think it would be nice to offer admins the option of examples or not. So add the necessary values to the .env file. Not sure if the examples should default to enabled or not for that deployment.

@misiekofski
Copy link

Just tried to go with docs and got the same issue when running : docker-compose -f docker-compose-non-dev.yml up

superset_worker_beat    |     EXAMPLES_USER = get_env_variable("EXAMPLES_USER")
superset_worker_beat    |   File "/app/docker/pythonpath_dev/superset_config.py", line 44, in get_env_variable
superset_worker_beat    |     raise OSError(error_msg)
superset_worker_beat    | OSError: The environment variable EXAMPLES_USER was missing, abort...

@sfirke
Copy link
Member

sfirke commented Aug 21, 2023

Did this PR from @sebastianliebscher and merged 2 hours ago by @michael-s-molina address this problem?

@michael-s-molina
Copy link
Member

@sfirke @misiekofski #25055 fixes the reported error.

@@ -1452,7 +1452,7 @@ def EMAIL_HEADER_MUTATOR( # pylint: disable=invalid-name,unused-argument

# URI to database storing the example data, points to
# SQLALCHEMY_DATABASE_URI by default if set to `None`
SQLALCHEMY_EXAMPLES_URI = None
SQLALCHEMY_EXAMPLES_URI = "sqlite:///" + os.path.join(DATA_DIR, "examples.db")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dpgaspar are we supporting sqlite as a valid examples db in production?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as before since our default for the metadata db is also sqlite https://github.com/apache/superset/blob/master/superset/config.py#L187
The difference is before this PR both schemas were on the same sqlite db

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have an approved SIP to drop support for SQL Lite. I've added a ticket to the 4.0 board to ensure we execute this in the next major release.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do discourage people to use sqlite, though, so I'm wondering if we should put something in UPDATING.md that encourage people to change this value to a different db type?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels size/M v3.0 Label added by the release manager to track PRs to be included in the 3.0 branch 🍒 3.0.0 🍒 3.0.1 🍒 3.0.2 🍒 3.0.3 🍒 3.0.4 🚢 3.1.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants