-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
(#525) drop existing relation at end of full-refresh incremental build #1682
Changes from 3 commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,20 @@ | ||
|
||
{% macro incremental_upsert(tmp_relation, target_relation, unique_key=none, statement_name="main") %} | ||
{%- set dest_columns = adapter.get_columns_in_relation(target_relation) -%} | ||
{%- set dest_cols_csv = dest_columns | map(attribute='quoted') | join(', ') -%} | ||
|
||
{%- if unique_key is not none -%} | ||
delete | ||
from {{ target_relation }} | ||
where ({{ unique_key }}) in ( | ||
select ({{ unique_key }}) | ||
from {{ tmp_relation.include(schema=False, database=False) }} | ||
); | ||
{%- endif %} | ||
|
||
insert into {{ target_relation }} ({{ dest_cols_csv }}) | ||
( | ||
select {{ dest_cols_csv }} | ||
from {{ tmp_relation }} | ||
); | ||
{%- endmacro %} |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -346,6 +346,36 @@ def execute_model(self, model, materialization, sql_override=None, | |
|
||
return res | ||
|
||
@available.parse_none | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm going to be pedantic about types here: This should probably be |
||
def is_replaceable(self, relation, conf_partition, conf_cluster): | ||
""" | ||
Check if a given partition and clustering column spec for a table | ||
can replace an existing relation in the database. BigQuery does not | ||
allow tables to be replaced with another table that has a different | ||
partitioning spec. This method returns True if the given config spec is | ||
identical to that of the existing table. | ||
""" | ||
try: | ||
table = self.connections.get_bq_table( | ||
database=relation.database, | ||
schema=relation.schema, | ||
identifier=relation.identifier | ||
) | ||
except google.cloud.exceptions.NotFound: | ||
return True | ||
|
||
table_partition = table.time_partitioning | ||
if table_partition is not None: | ||
table_partition = table_partition.field | ||
|
||
table_cluster = table.clustering_fields | ||
|
||
if isinstance(conf_cluster, str): | ||
conf_cluster = [conf_cluster] | ||
|
||
return table_partition == conf_partition \ | ||
and table_cluster == conf_cluster | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Does order matter? If not, we should compare There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah -- the order is significant -- clustering works like ordering whereby the table is clustered by the first clustering key, then the second, and so on. This query fails if you run it twice, swapping the order of the clustering keys on the second run:
|
||
|
||
@available.parse_none | ||
def alter_table_add_columns(self, relation, columns): | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this
include()
necessary/even correct for all databases? I think whatever made yourtmp_relation
should be giving you the correct include policy.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is such a great catch! Yes - this macro should definitely expect the
tmp_relation
to already have a valid include policy for the given database.