Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding incremental logic to Snowflake plugins #1307

Closed

Conversation

Blakewell
Copy link

No description provided.

Copy link
Contributor

@drewbanin drewbanin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Blakewell thanks for the PR! This looks really great!

Some comments here about changes we should make to support hooks properly on Snowflake incremental models. The original code you submitted looks like it was based on the BigQuery incremental implementation which does use the merge statement, but does not support transactions.

In addition to the inline comments, let's add a Snowflake-specific implementation of get_merge_sql as we do in bigquery. It would be good to add this for consistency, but it will also be helpful when we (eventually) split these plugins into their own repos.

I'd add a file at plugins/snowflake/dbt/include/snowflake/macros/materializations/merge.sql with the following contents:

{% macro snowflake__get_merge_sql(target, source, unique_key, dest_columns) %}
    {{ common_get_merge_sql(target, source, unique_key, dest_columns) }}
{% endmacro %}

Overall this looks really great and will be a welcomed addition to dbt! Once you've made these changes, I'll kick off dbt's automated test suite. We're already testing incremental models pretty thoroughly, so the existing tests should cover this code change well.

@drewbanin
Copy link
Contributor

drewbanin commented Mar 2, 2019

Hey @Blakewell - this change looks good to me!

Separately, I see that the integration tests are failing here. I think I see what's happening, and it looks pretty unpleasant.

dbt's MERGE implementation is based on the BigQuery spec. If you cmd+f for on false, you'll see that BigQuery supports the use of the merge statement with an always-false merge predicate. It looks to me like this is not supported on Snowflake. Here's a Snowflake example which reproduces this difference:

-- Create a table with 1 row
create table dbt_dbanin.snowflake_merge_example as (
    select 1 as id
);


-- Try to insert a second row using a merge statement with a FALSE predicate
merge into dbt_dbanin.snowflake_merge_example
using (
    select 2 as id
) as new_data

on false
when not matched then insert (id) values (id);

-- The destination table still contains only 1 row
select * from dbt_dbanin.snowflake_merge_example;

I can't find anything in the Snowflake docs or forums which remarks upon using ON FALSE as a merge predicate! As such, it's not clear to me if BigQuery is the one with the non-standard implementation, or if this is some sort of Snowflake bug. I think it would be a good idea to reach out to the Snowflake folks to get some insight into how this is intended to work.

Super interestingly, I'm able to get this query to work as expected by adding a when matched clause, even though the ON FALSE predicate should never be matched! This query works as intended, inserting a record into the destination table:

merge into dbt_dbanin.snowflake_merge_example
using (
    select 2 as id
) as new_data

on false
-- This should _never_ be matched. Use a no-op update to play it safe
when matched then update set snowflake_merge_example.id = snowflake_merge_example.id

-- with the "when matched" clause added above, this insert is now executed by Snowflake
when not matched then insert (id) values (id);

If we can get some more information that this is an appropriate and intended way to use merge statements on Snowflake, then I think my recommendation would be to override the implementation of the merge statement for Snowflake.

That would probably look like this (via this macro)

{% macro snowflake__get_merge_sql(target, source, unique_key, dest_columns) -%}
    {%- set dest_cols_csv = dest_columns | map(attribute="name") | join(', ') -%}

    merge into {{ target }} as DBT_INTERNAL_DEST
    using {{ source }} as DBT_INTERNAL_SOURCE

    {% if unique_key %}
        on DBT_INTERNAL_SOURCE.{{ unique_key }} = DBT_INTERNAL_DEST.{{ unique_key }}
    {% else %}
        on FALSE
    {% endif %}

    {% if unique_key %}
    when matched then update set
        {% for column in dest_columns -%}
            {{ column.name }} = DBT_INTERNAL_SOURCE.{{ column.name }}
            {%- if not loop.last %}, {%- endif %}
        {%- endfor %}

    {#-- this is new for Snowflake. Always include a `when matched` clause,
      --    but make it a no-op if no unique_key is specified #}
    {% else %}
    when matched then update set
      DBT_INTERNAL_DEST.{{ dest_columns[0].name }} = DBT_INTERNAL_DEST.{{ dest_columns[0].name }}

    {% endif %}

    when not matched then insert
        ({{ dest_cols_csv }})
    values
        ({{ dest_cols_csv }})

{% endmacro %}

I just threw a lot at you here. Let me know what you think about all this! If you have some ability to get in touch with Snowflake support for clarification, that would be super helpful and welcomed :)

Edit: updated new Snowflake merge SQL code

@Blakewell
Copy link
Author

@drewbanin - I can definitely reached out to my Snowflake contacts, but I believe I would lean towards the fact that BigQuery may have the non-standard implementation. For example, Snowflake appears to be heavily influenced by Oracle on a lot of its implementation. Snowflakes interpretation of merge appears to be no different than Oracle's merge statement. SQL Server also appears to follow a similar merge syntax.

I'll go ahead and override Snowflake's get merge as suggested above and if needed, I can reach out to Snowflake support to get more insight.

@Blakewell
Copy link
Author

@drewbanin - although, in thinking about this more, perhaps what you are getting at is the situation where you do not have a unique key (i.e. an append incremental instead of a merge incremental).

@bastienboutonnet
Copy link
Contributor

bastienboutonnet commented Apr 22, 2019

just a heads up, I'm going to open a PR soon to pick this current one up and implement create or replace for snowflake to solve issues with destructive/non-destructive runs.

My current work lies here, I forked from this current fork and branch: https://github.com/bastienboutonnet/dbt/tree/snowflake_create_or_replace

@drewbanin
Copy link
Contributor

The code in this PR was merged in #1409 -- closing this one!

Thanks for your initial help and hard work with this one @Blakewell :)

@drewbanin drewbanin closed this May 15, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants