Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support cluster_by on snowflake #728

Closed

Conversation

jon-rtr
Copy link
Contributor

@jon-rtr jon-rtr commented Apr 7, 2018

Issue #634 add cluster_by support for snowflake data warehouses.

Issue dbt-labs#634 add cluster_by support for snowflake data warehouses.
@drewbanin drewbanin self-requested a review April 8, 2018 22:26
{%- set _clusterby = [_clusterby] -%}
{%- endif -%}
{%- for item in _clusterby -%}
"{{ item }}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're currently experiencing a ton of pain around quoting on Snowflake. I think our default approach going forward will be to not quote identifiers in dbt, but instead ask the user to explicitly quote fields like this if desired. Do you buy that?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another benefit of not quoting:

You made a great example of using cluster_by in the sample.dbt_project.yml file below:

                clusterby: ["date_day", "ad_group_id % 10"]

If dbt adds quotes, then I think the expression ad_group_id % 10 will look like an identifier, and Snowflake will do the wrong thing

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I buy that! We are definitely in the "don't quote identifiers and let the database do it's thing" camp. Allowing users to override that behavior on a case-by-case basis makes sense.

We have an edge case in our Vertica to Snowflake migration, where we manually created a handful of views that quote identifiers lower-cased versions of the table names. This is so that our pre-existing Tableau reports, which quote a lower-cased version of the table name, continue to function without change.

@drewbanin
Copy link
Contributor

@jon-rtr I played around with this a little bit, and I think we still need to address the fact that Snowflake requires the table DDL to be specified along with create table ... as () when setting a cluster by.

For example, this works:

create table dbt_dbanin.debug (
    id int
) cluster by (id)
as (
    select 1 as id
);

but this does not:

create table dbt_dbanin.debug (
    id int
) cluster by (id)
as (
    select 1 as id
);

Do you know if it's possible to use cluster by without passing in table DDL?

@drewbanin drewbanin added help_wanted Trickier changes, with a clear starting point, good for previous/experienced contributors snowflake labels Jul 3, 2018
@drewbanin
Copy link
Contributor

@jon-rtr we're closing out some stale PRs -- did you have a chance to give this any more thought? I'm definitely keen to add support for cluster by on Snowflake, but I think there are some major aspects of dbt that need to change in order to make this happen. Let's discuss in #634 -- we can re-open this when we come to a good conclusion for implementation

@drewbanin drewbanin closed this Aug 23, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help_wanted Trickier changes, with a clear starting point, good for previous/experienced contributors snowflake
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants