Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support cluster_by on snowflake #728

Closed
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
28 changes: 27 additions & 1 deletion dbt/include/global_project/macros/adapters/snowflake.sql
Original file line number Diff line number Diff line change
@@ -1,7 +1,33 @@
{% macro clusterby(_clusterby) %}
{%- if _clusterby is not none -%}
cluster by (
{%- if _clusterby is string -%}
{%- set _clusterby = [_clusterby] -%}
{%- endif -%}
{%- for item in _clusterby -%}
"{{ item }}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're currently experiencing a ton of pain around quoting on Snowflake. I think our default approach going forward will be to not quote identifiers in dbt, but instead ask the user to explicitly quote fields like this if desired. Do you buy that?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another benefit of not quoting:

You made a great example of using cluster_by in the sample.dbt_project.yml file below:

                clusterby: ["date_day", "ad_group_id % 10"]

If dbt adds quotes, then I think the expression ad_group_id % 10 will look like an identifier, and Snowflake will do the wrong thing

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I buy that! We are definitely in the "don't quote identifiers and let the database do it's thing" camp. Allowing users to override that behavior on a case-by-case basis makes sense.

We have an edge case in our Vertica to Snowflake migration, where we manually created a handful of views that quote identifiers lower-cased versions of the table names. This is so that our pre-existing Tableau reports, which quote a lower-cased version of the table name, continue to function without change.

{%- if not loop.last -%},{%- endif -%}
{%- endfor -%}
)
{%- endif -%}
{%- endmacro -%}

{% macro snowflake__create_table_as(temporary, relation, sql) -%}
{%- set _clusterby = config.get('clusterby') -%}

{% if temporary %}
use schema {{ schema }};
{% endif %}

{{ default__create_table_as(temporary, relation, sql) }}
{# FIXME: We cannot call default__create_table_as here
as it terminates the return string with a semicolon.
Conversely, we have introduced code-copy as
a potential source of drift. #}
create {% if temporary: -%}temporary{%- endif %} table
{{ relation.include(schema=(not temporary)) }}
as (
{{ sql }}
)
{{ clusterby(_clusterby) }}
;
{% endmacro %}
9 changes: 5 additions & 4 deletions dbt/model.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,12 +18,13 @@ class SourceConfig(object):
'schema',
'enabled',
'materialized',
'dist',
'sort',
'dist', # redshift
'sort', # redshift
'clusterby', # snowflake
'sql_where',
'unique_key',
'sort_type',
'bind'
'sort_type', # redshift
'bind' # redshift
]

def __init__(self, active_project, own_project, fqn, node_type):
Expand Down
7 changes: 4 additions & 3 deletions dbt/utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,11 +20,12 @@
'schema',
'enabled',
'materialized',
'dist',
'sort',
'dist', # redshift
'sort', # redshift
'clusterby', # snowflake
'sql_where',
'unique_key',
'sort_type',
'sort_type', # redshift
'pre-hook',
'post-hook',
'vars',
Expand Down
1 change: 1 addition & 0 deletions sample.dbt_project.yml
Original file line number Diff line number Diff line change
Expand Up @@ -120,6 +120,7 @@ models:
adwords_ads:
enabled: true
materialized: table
clusterby: ["date_day", "ad_group_id % 10"]

# Applies to all SQL files found under ./models/snowplow/
snowplow:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,8 @@
config(
materialized = "table",
sort = 'first_name',
dist = 'first_name'
dist = 'first_name',
clusterby = 'first_name',
)
}}

Expand Down
4 changes: 2 additions & 2 deletions test/integration/018_adapter_ddl_tests/test_adapter_ddl.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,14 +10,14 @@ def setUp(self):

@property
def schema(self):
return "adaper_ddl_018"
return "adapter_ddl_018"

@property
def models(self):
return "test/integration/018_adapter_ddl_tests/models"

@attr(type='postgres')
def test_sort_and_dist_keys_are_nops_on_postgres(self):
def test_clusterby_and_sort_and_dist_keys_are_nops_on_postgres(self):
self.run_dbt(['run'])

self.assertTablesEqual("seed","materialized")