Snowflake create or replace #1409

bastienboutonnet · 2019-04-22T14:02:59Z

Had a few chats with @drewbanin regarding wanting to solve an issue with Snowflake lack of proper transactions which would cause downtime to tables which ended up either truncated or dropped when doing full-refreshes of incremental tables or re-generating regular tables.

I originally suggested doing table swaps but @drewbanin suggested we do create or replace which actually makes a lot more sense and is neater in implementation (no need to create temporary tables that need to be cleaned etc.)

Regarding incremental logic for Snowflake @drewbanin pointed out some work had started being done to use merge instead of inserts in this PR: #1307, so it made sense to build on top of that PR to solve the on false issue (well solve...) and rework the materialisation logics of incremental runs and tables.

Aims:

Incremental Materialisation/Merge:

When no unique_key is provided, we revert to a regular insert ... as this seemed to cause issues with on false.
Used merge for incremental models when unique key is provided (this part of the code remains pretty much unchanged from the referred to PR.

Full-refresh and tables materialisations

Leverage create or replace in Snowflake for full-refreshes and table materialisations

it's atomic
no downtime, empty tables, missing tables
no need to worry about destructive vs non-destructive (makes it possible to remove --non-destructive in future versions)

Relates to following issues:

#525
#1101

…nnet/dbt into snowflake_create_or_replace

…it up

plugins/snowflake/dbt/include/snowflake/macros/materializations/table.sql

drewbanin · 2019-04-23T13:36:14Z

Thanks for opening this PR @bastienboutonnet - will give this a look today :)

drewbanin

Some cosmetic comments here, and a couple of areas to simplify these materializations even further. I really like your approach for reconciling that issue with the on false clause in the incremental materialization.

This is really stellar! Happy to discuss any of the comments I dropped in here, otherwise, let me know when this is ready for another look. At that point, I'll kick off the integration tests and we can hopefully get this merged :D

core/dbt/include/global_project/macros/adapters/common.sql

plugins/snowflake/dbt/include/snowflake/macros/adapters.sql

plugins/snowflake/dbt/include/snowflake/macros/materializations/incremental.sql

plugins/snowflake/dbt/include/snowflake/macros/materializations/table.sql

drewbanin · 2019-04-24T22:44:58Z

This PR closes #1379, #1101, #1414

bastienboutonnet · 2019-04-27T13:12:04Z

@drewbanin thanks a lot for reviewing this. I implemented most of your feedback. I still have a question regarding the --non-destructive block. But other than that I think we could merge pretty soon

…replace" This reverts commit 3ab8238, reversing changes made to 43a9db5.

…eate_or_replace"" This reverts commit 4f62978.

drewbanin · 2019-04-27T20:24:27Z

plugins/snowflake/dbt/include/snowflake/macros/materializations/incremental.sql

+
+      {%- if unique_key is none -%}
+        {# -- if no unique_key is provided run regular insert as Snowflake may complain #}
+        insert into {{ target_relation }} ({{ dest_cols_csv }})


this is a really good fix for the on false issue with Snowflake's merge statements. Do you think it makes sense to put this logic here? Or should we move it into the Snowflake implementation of get_merge_sql?

I like the idea of making materializations represent business logic instead of database logic, as they become a lot more generalizable. Curious what you think!

I think that makes total sense! I actually was feeling a bit "awkward" about having this logic sit there but didn't think too much about where else it could live and this is very good, so I'm going to go ahead and change this as you suggest.

Great! I think this would be the place to implement it. If unique_key is provided, then we can proceed with common_get_merge_sql, otherwise we should return the insert statement you've built here

yep, its exactly what I just started doing!

One thing, I realised there is no incremental deletes anymore, and the merge statement doesn't call a delete. Would you think we need it here?

The previous implementation of incremental models on Snowflake used delete statements to approximate an upsert. Before we did:

create temp table as (select * from model code) delete from destination table where destination unique key = temp table unique key insert into destination table (select * from temp table)

So, records were only deleted if they were going to be immediately re-inserted. We'd actually prefer not to call a delete, and instead use the merge to update these rows in-place. This should be handled by the when matched clause in the merge statement.

I do think there's a conversation to be had about performance. I wonder if there's any difference between:

Deleting existing records and reinserting them (with new values)

Updating existing records in place

An example

Destination table

unique_key value

1 abc

2 def

Temp table (generated from model select)

unique_key value

2 ghi

3 xyz

Desired destination table state

unique_key value

1 abc

2 ghi

3 xyz

So, there are two ways to accomplish this desired end-state. We can either (pseudocode):

1. delete + insert

delete from destination table where id = 2 insert into destination table (id, value) values (2, ghi), (3, xyz)

2. update + insert (via merge)

merge into destination table from temp table when matched update -- updates row with id = 2 when not matched insert -- adds rows with id = 3

This does raise an interesting question about edge-case behavior with merge. What happens if there are duplicate unique_ids in either 1) the destination table or 2) the staging table?

Previously, it was straightforward to understand how the delete + insert pattern behaved. While having a duplicated unique_key would probably lead to undesirable results, the insert and delete queries would execute successfully.

With the merge implementation, I think users will see an error about non-deterministic results if their unique_key is not unique! All told, I think this will actually be a good thing, as it should help alert users to bugs in their model code.

Good catch. From what you say here's what I think. Merge is definitely the preferable option and I think unless there's really a good reason for it, you should be getting an error if you're trying to insert dupes. There is probably something fucked up with the source.

Alternatively we could add support for the ERROR_ON_NONDETERMINISTIC_MERGE session parameter (when FALSE it would pick one of the duplicated rows and insert it) but there doesn't seem to be a clear way on how to select the row and I think this is just bad anyway. I don't really see the point of inserting a dupe row. So I agree with your last point in that comment. So I think the current implementation is cool.

drewbanin · 2019-04-27T20:27:41Z

This PR is in really good shape! Just one comment about non-destructive mode, and maybe an interesting discussion to have about the job of the get_merge_sql statement, but otherwise I really like all of this!

Can you take a pass through and remove/update any "todo" comments in here? Definitely let me know if you still have outstanding questions about these things :)

plugins/snowflake/dbt/include/snowflake/macros/materializations/incremental.sql

plugins/snowflake/dbt/include/snowflake/macros/materializations/table.sql

drewbanin · 2019-04-30T13:47:05Z

@bastienboutonnet just fixed a merge conflict (we updated dev/wilt-chamberlain) and the tests should be running now!

bastienboutonnet · 2019-04-30T14:56:16Z

Awesome! Should I be worried that it looks like many tests are failing?

drewbanin

Approved! Thanks for your hard work here @bastienboutonnet - this is going to be a really wonderful addition to dbt on Snowflake ❄️ 🎉 💯

Blake Blackwell and others added 16 commits February 17, 2019 19:28

Adding incremental logic to Snowflake plugins

9772c1c

Adding changes based on Drew's recommendations

56801f9

make create or replace snowflake macro

38254a8

implement create or replace in table mater

9222c79

implement insert when no unique key and full refresh solution

a35ad18

add some logging

2d5525e

test

6a104c1

Merge branch 'snowflake_create_or_replace' of github.com:bastienbouto…

d168bdd

…nnet/dbt into snowflake_create_or_replace

revert test

91d869e

Merge branch 'snowflake_create_or_replace' of github.com:bastienbouto…

fb26ce5

…nnet/dbt into snowflake_create_or_replace

fix insert cols call and temp workaround call of snowflake

dacce7c

some logging and temp call workaround

54c02ef

make default create or replace macro to allow snowflake adapter pick …

2830b6a

…it up

remove snowflake__ direct call in incremental

95c9f76

remove testing logging messages

0433369

fixme/todo regarding non destructive flag

e83edd3

bastienboutonnet commented Apr 22, 2019

View reviewed changes

plugins/snowflake/dbt/include/snowflake/macros/materializations/table.sql Outdated Show resolved Hide resolved

bastienboutonnet commented Apr 22, 2019

View reviewed changes

plugins/snowflake/dbt/include/snowflake/macros/materializations/table.sql Outdated Show resolved Hide resolved

drewbanin self-requested a review April 24, 2019 00:14

drewbanin reviewed Apr 24, 2019

View reviewed changes

This was referenced Apr 24, 2019

DBT Temp Tables Left If Model Succeeds but Post-Hook Fails #1379

Closed

Implement incremental models with "merge" on Snowflake #1414

Closed

bastienboutonnet added 5 commits April 26, 2019 12:43

(PR fdbk) rm extra macro

9591b86

remove references to temp and backup relations

f99efbf

more explicit comments and quick formatting

5c1c588

quick todo marker

43a9db5

Merge branch 'dev/wilt-chamberlain' into snowflake_create_or_replace

3ab8238

bastienboutonnet added 5 commits April 27, 2019 06:46

Revert "Merge branch 'dev/wilt-chamberlain' into snowflake_create_or_…

4f62978

…replace" This reverts commit 3ab8238, reversing changes made to 43a9db5.

fixing my jetlagged introduced bugs

08820a2

conflic resolve

0432c1d

Revert "Revert "Merge branch 'dev/wilt-chamberlain' into snowflake_cr…

90f8e0b

…eate_or_replace"" This reverts commit 4f62978.

cleaning up some commented out stuff

afe236d

drewbanin reviewed Apr 27, 2019

View reviewed changes

plugins/snowflake/dbt/include/snowflake/macros/materializations/incremental.sql Outdated Show resolved Hide resolved

drewbanin reviewed Apr 27, 2019

View reviewed changes

plugins/snowflake/dbt/include/snowflake/macros/materializations/table.sql Outdated Show resolved Hide resolved

bastienboutonnet and others added 5 commits April 28, 2019 07:31

remove non-destructive logic

8af7984

cleaner select

85eac05

todo and comments clean up

3ef519d

move unique key workaround to snowflake macro

7a2279e

Merge branch 'dev/wilt-chamberlain' into snowflake_create_or_replace

3a7dcd9

Merge branch 'dev/wilt-chamberlain' into snowflake_create_or_replace

1f97fe4

drewbanin mentioned this pull request May 10, 2019

Support for incremental models dbt-labs/dbt-presto#5

Open

drewbanin added 2 commits May 10, 2019 19:12

fix tests

8d74550

(closes #1455) Qualify Snowflake temp tables with a database and schema

90abc2d

drewbanin mentioned this pull request May 13, 2019

Snowflake create or replace (drew) #1458

Merged

drewbanin approved these changes May 15, 2019

View reviewed changes

drewbanin merged commit 0a2e4f7 into dbt-labs:dev/wilt-chamberlain May 15, 2019

drewbanin mentioned this pull request May 15, 2019

Adding incremental logic to Snowflake plugins #1307

Closed

drewbanin mentioned this pull request May 23, 2019

use create or replace table in Snowflake table materialization #1101

Closed

EamonKeane mentioned this pull request May 4, 2020

Snowflake fast-sync Create or Replace instead of Swap transferwise/pipelinewise#396

Closed

KeeonTabrizi mentioned this pull request Nov 10, 2022

[CT-1487] [Feature] Allow DBT Models to Reference Created Temporary Tables prior to final result set & materialization. #6234

Closed

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Snowflake create or replace #1409

Snowflake create or replace #1409

bastienboutonnet commented Apr 22, 2019 •

edited

Loading

drewbanin commented Apr 23, 2019

drewbanin left a comment

drewbanin commented Apr 24, 2019

bastienboutonnet commented Apr 27, 2019

drewbanin Apr 27, 2019

bastienboutonnet Apr 28, 2019

drewbanin Apr 28, 2019

bastienboutonnet Apr 28, 2019

bastienboutonnet Apr 28, 2019

drewbanin Apr 28, 2019

bastienboutonnet Apr 28, 2019

drewbanin commented Apr 27, 2019

drewbanin commented Apr 30, 2019

bastienboutonnet commented Apr 30, 2019 •

edited

Loading

drewbanin left a comment

Snowflake create or replace #1409

Snowflake create or replace #1409

Conversation

bastienboutonnet commented Apr 22, 2019 • edited Loading

Aims:

Incremental Materialisation/Merge:

Full-refresh and tables materialisations

Relates to following issues:

drewbanin commented Apr 23, 2019

drewbanin left a comment

Choose a reason for hiding this comment

drewbanin commented Apr 24, 2019

bastienboutonnet commented Apr 27, 2019

drewbanin Apr 27, 2019

Choose a reason for hiding this comment

bastienboutonnet Apr 28, 2019

Choose a reason for hiding this comment

drewbanin Apr 28, 2019

Choose a reason for hiding this comment

bastienboutonnet Apr 28, 2019

Choose a reason for hiding this comment

bastienboutonnet Apr 28, 2019

Choose a reason for hiding this comment

drewbanin Apr 28, 2019

Choose a reason for hiding this comment

An example

bastienboutonnet Apr 28, 2019

Choose a reason for hiding this comment

drewbanin commented Apr 27, 2019

drewbanin commented Apr 30, 2019

bastienboutonnet commented Apr 30, 2019 • edited Loading

drewbanin left a comment

Choose a reason for hiding this comment

bastienboutonnet commented Apr 22, 2019 •

edited

Loading

bastienboutonnet commented Apr 30, 2019 •

edited

Loading