-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding incremental logic to Snowflake plugins #1307
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Blakewell thanks for the PR! This looks really great!
Some comments here about changes we should make to support hooks properly on Snowflake incremental models. The original code you submitted looks like it was based on the BigQuery incremental implementation which does use the merge statement, but does not support transactions.
In addition to the inline comments, let's add a Snowflake-specific implementation of get_merge_sql
as we do in bigquery. It would be good to add this for consistency, but it will also be helpful when we (eventually) split these plugins into their own repos.
I'd add a file at plugins/snowflake/dbt/include/snowflake/macros/materializations/merge.sql
with the following contents:
{% macro snowflake__get_merge_sql(target, source, unique_key, dest_columns) %}
{{ common_get_merge_sql(target, source, unique_key, dest_columns) }}
{% endmacro %}
Overall this looks really great and will be a welcomed addition to dbt! Once you've made these changes, I'll kick off dbt's automated test suite. We're already testing incremental models pretty thoroughly, so the existing tests should cover this code change well.
plugins/snowflake/dbt/include/snowflake/macros/materializations/incremental.sql
Show resolved
Hide resolved
plugins/snowflake/dbt/include/snowflake/macros/materializations/incremental.sql
Show resolved
Hide resolved
Hey @Blakewell - this change looks good to me! Separately, I see that the integration tests are failing here. I think I see what's happening, and it looks pretty unpleasant. dbt's -- Create a table with 1 row
create table dbt_dbanin.snowflake_merge_example as (
select 1 as id
);
-- Try to insert a second row using a merge statement with a FALSE predicate
merge into dbt_dbanin.snowflake_merge_example
using (
select 2 as id
) as new_data
on false
when not matched then insert (id) values (id);
-- The destination table still contains only 1 row
select * from dbt_dbanin.snowflake_merge_example; I can't find anything in the Snowflake docs or forums which remarks upon using Super interestingly, I'm able to get this query to work as expected by adding a merge into dbt_dbanin.snowflake_merge_example
using (
select 2 as id
) as new_data
on false
-- This should _never_ be matched. Use a no-op update to play it safe
when matched then update set snowflake_merge_example.id = snowflake_merge_example.id
-- with the "when matched" clause added above, this insert is now executed by Snowflake
when not matched then insert (id) values (id); If we can get some more information that this is an appropriate and intended way to use merge statements on Snowflake, then I think my recommendation would be to override the implementation of the That would probably look like this (via this macro) {% macro snowflake__get_merge_sql(target, source, unique_key, dest_columns) -%}
{%- set dest_cols_csv = dest_columns | map(attribute="name") | join(', ') -%}
merge into {{ target }} as DBT_INTERNAL_DEST
using {{ source }} as DBT_INTERNAL_SOURCE
{% if unique_key %}
on DBT_INTERNAL_SOURCE.{{ unique_key }} = DBT_INTERNAL_DEST.{{ unique_key }}
{% else %}
on FALSE
{% endif %}
{% if unique_key %}
when matched then update set
{% for column in dest_columns -%}
{{ column.name }} = DBT_INTERNAL_SOURCE.{{ column.name }}
{%- if not loop.last %}, {%- endif %}
{%- endfor %}
{#-- this is new for Snowflake. Always include a `when matched` clause,
-- but make it a no-op if no unique_key is specified #}
{% else %}
when matched then update set
DBT_INTERNAL_DEST.{{ dest_columns[0].name }} = DBT_INTERNAL_DEST.{{ dest_columns[0].name }}
{% endif %}
when not matched then insert
({{ dest_cols_csv }})
values
({{ dest_cols_csv }})
{% endmacro %} I just threw a lot at you here. Let me know what you think about all this! If you have some ability to get in touch with Snowflake support for clarification, that would be super helpful and welcomed :) Edit: updated new Snowflake merge SQL code |
@drewbanin - I can definitely reached out to my Snowflake contacts, but I believe I would lean towards the fact that BigQuery may have the non-standard implementation. For example, Snowflake appears to be heavily influenced by Oracle on a lot of its implementation. Snowflakes interpretation of merge appears to be no different than Oracle's merge statement. SQL Server also appears to follow a similar merge syntax. I'll go ahead and override Snowflake's get merge as suggested above and if needed, I can reach out to Snowflake support to get more insight. |
@drewbanin - although, in thinking about this more, perhaps what you are getting at is the situation where you do not have a unique key (i.e. an append incremental instead of a merge incremental). |
just a heads up, I'm going to open a PR soon to pick this current one up and implement My current work lies here, I forked from this current fork and branch: https://github.com/bastienboutonnet/dbt/tree/snowflake_create_or_replace |
The code in this PR was merged in #1409 -- closing this one! Thanks for your initial help and hard work with this one @Blakewell :) |
No description provided.