Adding incremental logic to Snowflake plugins #1307

Blakewell · 2019-02-20T14:46:59Z

No description provided.

drewbanin

@Blakewell thanks for the PR! This looks really great!

Some comments here about changes we should make to support hooks properly on Snowflake incremental models. The original code you submitted looks like it was based on the BigQuery incremental implementation which does use the merge statement, but does not support transactions.

In addition to the inline comments, let's add a Snowflake-specific implementation of get_merge_sql as we do in bigquery. It would be good to add this for consistency, but it will also be helpful when we (eventually) split these plugins into their own repos.

I'd add a file at plugins/snowflake/dbt/include/snowflake/macros/materializations/merge.sql with the following contents:

{% macro snowflake__get_merge_sql(target, source, unique_key, dest_columns) %}
    {{ common_get_merge_sql(target, source, unique_key, dest_columns) }}
{% endmacro %}

Overall this looks really great and will be a welcomed addition to dbt! Once you've made these changes, I'll kick off dbt's automated test suite. We're already testing incremental models pretty thoroughly, so the existing tests should cover this code change well.

plugins/snowflake/dbt/include/snowflake/macros/materializations/incremental.sql

drewbanin · 2019-03-02T16:35:20Z

Hey @Blakewell - this change looks good to me!

Separately, I see that the integration tests are failing here. I think I see what's happening, and it looks pretty unpleasant.

dbt's MERGE implementation is based on the BigQuery spec. If you cmd+f for on false, you'll see that BigQuery supports the use of the merge statement with an always-false merge predicate. It looks to me like this is not supported on Snowflake. Here's a Snowflake example which reproduces this difference:

-- Create a table with 1 row
create table dbt_dbanin.snowflake_merge_example as (
    select 1 as id
);


-- Try to insert a second row using a merge statement with a FALSE predicate
merge into dbt_dbanin.snowflake_merge_example
using (
    select 2 as id
) as new_data

on false
when not matched then insert (id) values (id);

-- The destination table still contains only 1 row
select * from dbt_dbanin.snowflake_merge_example;

I can't find anything in the Snowflake docs or forums which remarks upon using ON FALSE as a merge predicate! As such, it's not clear to me if BigQuery is the one with the non-standard implementation, or if this is some sort of Snowflake bug. I think it would be a good idea to reach out to the Snowflake folks to get some insight into how this is intended to work.

Super interestingly, I'm able to get this query to work as expected by adding a when matched clause, even though the ON FALSE predicate should never be matched! This query works as intended, inserting a record into the destination table:

merge into dbt_dbanin.snowflake_merge_example
using (
    select 2 as id
) as new_data

on false
-- This should _never_ be matched. Use a no-op update to play it safe
when matched then update set snowflake_merge_example.id = snowflake_merge_example.id

-- with the "when matched" clause added above, this insert is now executed by Snowflake
when not matched then insert (id) values (id);

If we can get some more information that this is an appropriate and intended way to use merge statements on Snowflake, then I think my recommendation would be to override the implementation of the merge statement for Snowflake.

That would probably look like this (via this macro)

{% macro snowflake__get_merge_sql(target, source, unique_key, dest_columns) -%}
    {%- set dest_cols_csv = dest_columns | map(attribute="name") | join(', ') -%}

    merge into {{ target }} as DBT_INTERNAL_DEST
    using {{ source }} as DBT_INTERNAL_SOURCE

    {% if unique_key %}
        on DBT_INTERNAL_SOURCE.{{ unique_key }} = DBT_INTERNAL_DEST.{{ unique_key }}
    {% else %}
        on FALSE
    {% endif %}

    {% if unique_key %}
    when matched then update set
        {% for column in dest_columns -%}
            {{ column.name }} = DBT_INTERNAL_SOURCE.{{ column.name }}
            {%- if not loop.last %}, {%- endif %}
        {%- endfor %}

    {#-- this is new for Snowflake. Always include a `when matched` clause,
      --    but make it a no-op if no unique_key is specified #}
    {% else %}
    when matched then update set
      DBT_INTERNAL_DEST.{{ dest_columns[0].name }} = DBT_INTERNAL_DEST.{{ dest_columns[0].name }}

    {% endif %}

    when not matched then insert
        ({{ dest_cols_csv }})
    values
        ({{ dest_cols_csv }})

{% endmacro %}

I just threw a lot at you here. Let me know what you think about all this! If you have some ability to get in touch with Snowflake support for clarification, that would be super helpful and welcomed :)

Edit: updated new Snowflake merge SQL code

Blakewell · 2019-03-03T20:59:54Z

@drewbanin - I can definitely reached out to my Snowflake contacts, but I believe I would lean towards the fact that BigQuery may have the non-standard implementation. For example, Snowflake appears to be heavily influenced by Oracle on a lot of its implementation. Snowflakes interpretation of merge appears to be no different than Oracle's merge statement. SQL Server also appears to follow a similar merge syntax.

I'll go ahead and override Snowflake's get merge as suggested above and if needed, I can reach out to Snowflake support to get more insight.

Blakewell · 2019-03-03T21:04:29Z

@drewbanin - although, in thinking about this more, perhaps what you are getting at is the situation where you do not have a unique key (i.e. an append incremental instead of a merge incremental).

bastienboutonnet · 2019-04-22T09:39:57Z

just a heads up, I'm going to open a PR soon to pick this current one up and implement create or replace for snowflake to solve issues with destructive/non-destructive runs.

My current work lies here, I forked from this current fork and branch: https://github.com/bastienboutonnet/dbt/tree/snowflake_create_or_replace

drewbanin · 2019-05-15T21:49:23Z

The code in this PR was merged in #1409 -- closing this one!

Thanks for your initial help and hard work with this one @Blakewell :)

Adding incremental logic to Snowflake plugins

9772c1c

drewbanin reviewed Feb 20, 2019

View reviewed changes

plugins/snowflake/dbt/include/snowflake/macros/materializations/incremental.sql Show resolved Hide resolved

plugins/snowflake/dbt/include/snowflake/macros/materializations/incremental.sql Show resolved Hide resolved

Adding changes based on Drew's recommendations

56801f9

bastienboutonnet mentioned this pull request Apr 22, 2019

Snowflake create or replace #1409

Merged

drewbanin mentioned this pull request Apr 24, 2019

Implement incremental models with "merge" on Snowflake #1414

Closed

drewbanin closed this May 15, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding incremental logic to Snowflake plugins #1307

Adding incremental logic to Snowflake plugins #1307

Blakewell commented Feb 20, 2019

drewbanin left a comment •

edited

Loading

drewbanin commented Mar 2, 2019 •

edited

Loading

Blakewell commented Mar 3, 2019

Blakewell commented Mar 3, 2019

bastienboutonnet commented Apr 22, 2019 •

edited

Loading

drewbanin commented May 15, 2019

Adding incremental logic to Snowflake plugins #1307

Adding incremental logic to Snowflake plugins #1307

Conversation

Blakewell commented Feb 20, 2019

drewbanin left a comment • edited Loading

Choose a reason for hiding this comment

drewbanin commented Mar 2, 2019 • edited Loading

Blakewell commented Mar 3, 2019

Blakewell commented Mar 3, 2019

bastienboutonnet commented Apr 22, 2019 • edited Loading

drewbanin commented May 15, 2019

drewbanin left a comment •

edited

Loading

drewbanin commented Mar 2, 2019 •

edited

Loading

bastienboutonnet commented Apr 22, 2019 •

edited

Loading