Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dbt is creating schemas when they already exists because of case-sensitive comparisons #1651

Closed
edmundyan opened this issue Aug 1, 2019 · 1 comment · Fixed by #1663
Closed
Labels
enhancement New feature or request good_first_issue Straightforward + self-contained changes, good for new contributors! snowflake
Milestone

Comments

@edmundyan
Copy link
Contributor

Issue

In Snowflake, DBT is trying to create a schema that already exists. Because my snowflake role does not have create schema privs, I get an Insufficient privileges to operate on database 'foo' error when dbt tries to create the schema.

Issue description

Stacktrace:

2019-08-01 18:05:01,735 (MainThread): On master: select distinct schema_name
    from CENTRAL_DATA.information_schema.schemata
    where catalog_name ilike 'CENTRAL_DATA'
2019-08-01 18:05:04,211 (MainThread): SQL status: SUCCESS 9 in 2.48 seconds
2019-08-01 18:05:04,223 (MainThread): Creating schema "CENTRAL_DATA"."cdm".
2019-08-01 18:05:04,227 (MainThread): Using snowflake connection "master".
2019-08-01 18:05:04,227 (MainThread): On master: BEGIN
2019-08-01 18:05:04,417 (MainThread): SQL status: SUCCESS 1 in 0.19 seconds
2019-08-01 18:05:04,417 (MainThread): Using snowflake connection "master".
2019-08-01 18:05:04,417 (MainThread): On master: create schema if not exists CENTRAL_DATA.cdm
2019-08-01 18:05:04,727 (MainThread): Snowflake error: 003001 (42501): 018dec4d-0307-9857-0000-0379146884aa: SQL access control error:
Insufficient privileges to operate on database 'CENTRAL_DATA'
2019-08-01 18:05:04,728 (MainThread): On master: ROLLBACK
2019-08-01 18:05:05,336 (MainThread): Connection 'master' was left open.
2019-08-01 18:05:05,337 (MainThread): On master: Close
2019-08-01 18:05:05,953 (MainThread): Sending event: {'category': 'dbt', 'action': 'invocation', 'label': 'end', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x1080a42e8>, <snowplow_tracker.self_describi
ng_json.SelfDescribingJson object at 0x106d629b0>, <snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x106d62748>]}
2019-08-01 18:05:06,031 (MainThread): Flushing usage events
2019-08-01 18:05:06,031 (MainThread): /Users/edmund.yan/.pyenv/versions/3.6.2/envs/central-data/lib/python3.6/site-packages/dbt/main.py:97: DeprecationWarning: The 'warn' method is deprecated, use 'warning' instead

2019-08-01 18:05:06,031 (MainThread): Encountered an error:
2019-08-01 18:05:06,031 (MainThread): /Users/edmund.yan/.pyenv/versions/3.6.2/envs/central-data/lib/python3.6/site-packages/dbt/main.py:98: DeprecationWarning: The 'warn' method is deprecated, use 'warning' instead

2019-08-01 18:05:06,031 (MainThread): Database Error
  003001 (42501): 018dec4d-0307-9857-0000-0379146884aa: SQL access control error:
  Insufficient privileges to operate on database 'CENTRAL_DATA'
2019-08-01 18:05:06,050 (MainThread): Traceback (most recent call last):
  File "/Users/edmund.yan/.pyenv/versions/3.6.2/envs/central-data/lib/python3.6/site-packages/dbt/adapters/snowflake/connections.py", line 77, in exception_handler
    yield
  File "/Users/edmund.yan/.pyenv/versions/3.6.2/envs/central-data/lib/python3.6/site-packages/dbt/adapters/sql/connections.py", line 60, in add_query
    cursor.execute(sql, bindings)
  File "/Users/edmund.yan/.pyenv/versions/3.6.2/envs/central-data/lib/python3.6/site-packages/snowflake/connector/cursor.py", line 558, in execute
    errvalue)
  File "/Users/edmund.yan/.pyenv/versions/3.6.2/envs/central-data/lib/python3.6/site-packages/snowflake/connector/errors.py", line 97, in errorhandler_wrapper
    cursor.errorhandler(connection, cursor, errorclass, errorvalue)
  File "/Users/edmund.yan/.pyenv/versions/3.6.2/envs/central-data/lib/python3.6/site-packages/snowflake/connector/errors.py", line 73, in default_errorhandler
    done_format_msg=errorvalue.get(u'done_format_msg'))
snowflake.connector.errors.ProgrammingError: 003001 (42501): 018dec4d-0307-9857-0000-0379146884aa: SQL access control error:
Insufficient privileges to operate on database 'CENTRAL_DATA'

Results

https://github.com/fishtown-analytics/dbt/blob/399b33822a85493a8790df787eed59e951aecf6c/core/dbt/task/runnable.py#L306-L324

I believe there is a bug in create_schemas() and case sensitivity. In Snowflake, list_schemas() will return schema names in upper-case. However, if you define your schemas in your dbt project or profiles in lowercase, the set subtraction of required_schemas - existing_schemas will not identify that the schema already exists and will re-create it.

When I change my dbt_project.yml and profiles.yml to use only upper-case schema names, it works as expected.

System information

The output of dbt --version:

0.14.0

The python version you're using (probably the output of python --version)

Python 3.6.2

Proposed Solutions

  1. dbt does not support case-sensitive schemas
    This looks like the easiest route. If you look at lookup schemas in a case-insensitive way #1411, we already consider databases to be case insensitive.

  2. dbt supports case-sensitive schemas
    If we go this route, I think the default should be case insensitive. However, if you double quote your schema name, it will evaluate as case-sensitive. This seems like a better solution, but I'm unsure of all the downstream call points that would need to be updated to support this.

Happy to help submit a patch if you give me some direction on what the correct fix should be :)

@drewbanin
Copy link
Contributor

Thanks for the report @edmundyan! Casing + quoting on Snowflake can be a bit of a nightmare!

I think it makes sense to make this schema lookup case-insensitive. One good approach could be to just upper-case (or lower-case) the schema names in each set (required_schemas and existing_schemas), then compute the set-difference.

dbt in general will do very undesirable things if users try to have two distinct schema names that are distinguished only in their casing. I think it's ok to continue to assume that users won't (or can't) do that here.

We'd love a PR for this one! Let me know if there's anything I can do to help out. Check out the contributing guide for some info on how to get set up for development with dbt.

@drewbanin drewbanin added enhancement New feature or request good_first_issue Straightforward + self-contained changes, good for new contributors! snowflake labels Aug 2, 2019
@drewbanin drewbanin added this to the 0.14.1 milestone Aug 6, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request good_first_issue Straightforward + self-contained changes, good for new contributors! snowflake
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants