Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix snapshot to always drop the staging table #96

Merged
merged 2 commits into from
May 20, 2022

Conversation

ueshin
Copy link
Collaborator

@ueshin ueshin commented May 17, 2022

Description

We introduced Delta constraints at #71 and now that snapshot query could fail when it's violating the constraints.
In that case, the temporary view keeps existing because snapshot uses a permanent view and drop it later.

We should always drop them.

Copy link
Collaborator

@allisonwang-db allisonwang-db left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am curious why aren't the '__dbt_tmp' tables created by dbt snapshot automatically deleted? Is this by design?

If these tmp tables are actually being used, we can always just delete them after each unit test (and it won't affect the production workload).

@@ -0,0 +1,16 @@
{% macro statement_with_staging_table(name=None, staging_table=None, fetch_result=False, auto_begin=True) -%}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can add a comment here (clean up staging table generated by dbt snapshot)

@ueshin
Copy link
Collaborator Author

ueshin commented May 19, 2022

why aren't the '__dbt_tmp' tables created by dbt snapshot automatically deleted?

It is supposed to be deleted at https://github.com/databricks/dbt-databricks/pull/96/files#diff-c139bdc07e2116a9e3beb0cf3110fb25dc2b8f0f36b907595c83f2944ee631eaL105-L107, but if the query fails, it's unreachable.

@@ -102,10 +106,6 @@

{{ adapter.commit() }}

{% if staging_table is defined %}
{% do post_snapshot(staging_table) %}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The default implementation of post_snapshot seems to be a noop
https://github.com/dbt-labs/dbt-core/blob/e7218d3e99837f0139fb7ecd367d3bdf1135a961/core/dbt/include/global_project/macros/materializations/snapshots/helpers.sql#L17-L23
And dbt-spark doesn't seem to implement it either. Where is the logic of post_snapshot defined?

Copy link
Collaborator Author

@ueshin ueshin May 20, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 This doesn't show up in Github's search result..

@ueshin
Copy link
Collaborator Author

ueshin commented May 20, 2022

Thanks! merging.

@ueshin ueshin merged commit e649410 into databricks:main May 20, 2022
@ueshin ueshin deleted the snapshot branch May 20, 2022 04:05
ueshin added a commit to ueshin/dbt-databricks that referenced this pull request May 20, 2022
### Description

We introduced Delta constraints at databricks#71 and now that snapshot query could fail when it's violating the constraints.
In that case, the temporary view keeps existing because snapshot uses a permanent view and drop it later.

We should always drop them.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants