Add non-destructive functionality to Snowflake table materializations #1972

liveandletbri · 2019-12-04T00:02:33Z

Problem

Snowflake's time travel functionality is most easily usable when a table is not dropped and re-created every time it is refreshed. For incrementally-loaded models, this is fine. The table is only dropped when we perform a full refresh. But for materialized: table models, this is an issue, since these tables are dropped regularly (for us, many of them are dropped hourly).

Time travel is very useful for troubleshooting, as it allows you to see exactly what data a table contained at an exact time. However, each time the table is dropped, the "time travel history" starts over at that point. I cannot query a table's contents from 12:15 PM if it was dropped and re-built any time between then and now. Well, not easily.

Snowflake can show you a table's history using the show tables history command. For a table refreshed hourly, you might see something like this:

table_name    date_dropped
my_table      NULL
my_table      12/3/2019 3:00 PM
my_table      12/3/2019 2:00 PM
my_table      12/3/2019 1:00 PM

It's currently 3:45PM. To find what the data looked like for my_table at 12:15 PM, I would have to do this:

alter table my_table rename to my_table_current; --set aside production version of table so nothing is using the name "my_table" anymore
undrop table my_table; --allows me to uncover the version of the table dropped at 3:00, which now takes up the unique name "my_table"
alter table my_table rename to my_table_3PM; --move the 3PM version out of the way, freeing up the name again
undrop table my_table;
alter table my_table rename to my_table_2PM;
undrop table my_table;
alter table my_table rename to my_table_1PM; --now I finally have the version of the table that was dropped at 1:00, so I can restore the current version back to its original name
alter table my_table_current rename to my_table;
drop table my_table_3PM;
drop table my_table_2PM;

select *
from my_table_1PM 
...

Solution

By adding the non-destructive functionality back, tables can be preserved so investigation is as easy as this:

SELECT * 
FROM my_table at (timestamp => '12/3/2019 12:15 PM')

I know that the --non-destructive flag was removed in 0.14.0 (#1419), but I'm hoping this version of its functionality will be easier to maintain for the following reasons:

It only affects table materializations, so you won't have any issues with view or incremental models
It's a config argument, not a flag, so you won't have to account for its functionality in any other files except the table materialization file(s)
On that note, I've set it up to only apply to Snowflake's table materializations, since according to your doc it isn't useful for any other platforms.
It uses delete instead of truncate to avoid auto-committing the transaction
It should fail the same way an incremental load would fail if you changed the columns on your table, so there's nothing "tricky" or "pernicious" when columns change (but shoutout to Adding Full Refresh on schema change option for model config #1850 which would be a cool fix for all of that)

Questions

As a first-time contributor to DBT, I do have a few questions

Is it alright that I use delete explicitly rather than adding a functionality to the adapter, so I could write {{ adapter.delete_relation }}? I saw the existing file already used insert explicitly, and here's another file that uses delete explicitly.
The Snowflake table materialization file, when I began with it, did not have any of the intermediate or backup relation logic that populates, renames, and drops tables. All it had is a create_table_as call. I can see it in the core table file, though. How does DBT know, when I run my refreshes against Snowflake, to use this logic?
And, as a followup to question 2, is there any way I can reduce the code written in the Snowflake table file because a lot of the same code already exists in the core table file?

disallow 3.1.x, except for 3.1.1 which works fine

…schema Set a much more strict version upper bound for jsonschema

dbt 0.14.3

…ndency-issues Lock many dependencies/sub-dependencies (dbt-labs#1892)

dbt 0.14.4

cla-bot · 2019-12-04T00:02:35Z

Thanks for your pull request, and welcome to our community! We require contributors to sign our Contributor License Agreement and we don't seem to have your signature on file. Check out this article for more information on why we have a CLA.

In order for us to review and merge your code, please submit the Individual Contributor License Agreement form attached above above. If you have questions about the CLA, or if you believe you've received this message in error, don't hesitate to ping @drewbanin.

CLA has not been signed by users: @liveandletbri

liveandletbri · 2019-12-04T00:12:45Z

I just signed the CLA

liveandletbri · 2019-12-06T18:12:39Z

I spoke with @beckjake and he recommended that I make this custom materialization. I'm all for that! Closing this PR then.

drewbanin · 2019-12-09T18:37:52Z

hey @liveandletbri - thanks for opening this PR! Glad you got in touch with Jake - this is a neat PR, but it's not something I anticipate adding support for in dbt-core. I think a custom materialization is a really good idea here :)

I did want to tell you that this PR compelled me to add some more info to our contributing guide for dbt. We want to be supportive of folks contributing code back to dbt, and I felt that our policies around how new features are contributed to the project were poorly specified. No action is required on your part here - just wanted to give you a heads up that this exists now :D

cmcarthur and others added 18 commits October 1, 2019 07:18

Bump version: 0.14.2 → 0.14.3rc1

2d84844

Set a much more strict version upper bound for jsonschema

b7b999b

disallow 3.1.x, except for 3.1.1 which works fine

add date to 0.14.3

70d9b01

Merge pull request dbt-labs#1819 from fishtown-analytics/fix/pin-json…

e679c70

…schema Set a much more strict version upper bound for jsonschema

update CHANGELOG

169f894

Merge pull request dbt-labs#1820 from fishtown-analytics/dev/0.14.3

f6f551a

dbt 0.14.3

Bump version: 0.14.3rc1 → 0.14.3

9c69631

Bump version: 0.14.3 → 0.14.4b1

2b47cb4

Lock many dependencies/sub-dependencies

b4dd265

Merge pull request dbt-labs#1895 from fishtown-analytics/fix/014-depe…

2004b4e

…ndency-issues Lock many dependencies/sub-dependencies (dbt-labs#1892)

Update CHANGELOG.md

3b26988

Bump version: 0.14.4b1 → 0.14.4

3182bb6

add release date

087a541

Merge pull request dbt-labs#1908 from fishtown-analytics/dev/0.14.4

b5aff36

dbt 0.14.4

Adding non_destructive, table swap, and handling for views

624349f

Merge with 0.15.0

5a35126

Merge with 0.15.0

7b12fab

Renames

b0529f6

liveandletbri closed this Dec 6, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add non-destructive functionality to Snowflake table materializations #1972

Add non-destructive functionality to Snowflake table materializations #1972

liveandletbri commented Dec 4, 2019

cla-bot bot commented Dec 4, 2019

liveandletbri commented Dec 4, 2019

liveandletbri commented Dec 6, 2019

drewbanin commented Dec 9, 2019

Add non-destructive functionality to Snowflake table materializations #1972

Add non-destructive functionality to Snowflake table materializations #1972

Conversation

liveandletbri commented Dec 4, 2019

Problem

Solution

Questions

cla-bot bot commented Dec 4, 2019

liveandletbri commented Dec 4, 2019

liveandletbri commented Dec 6, 2019

drewbanin commented Dec 9, 2019