-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CT-3373] [Feature] Incremental Strategy: 'insert_unmatched' #9056
Comments
Thanks for publishing the logic you've developed -- I bet it will come in handy for some folks 🧠 Rather than add new incremental strategies in dbt-core, we'd rather encourage the ecosystem through:
There's probably still some room to make these easier to author for custom incremental strategies that make use of In particular, #9223 (comment) does a demo of your proposed |
Is this your first time submitting a feature request?
Describe the feature
Overview
A new incremental strategy should be defined that has the following behavior:
We might call this strategy
insert_unmatched
because it inserts only the rows that are not matched.Implementation
This should be as simple as creating a new strategy that is identical to
default__get_merge_sql
but has the following chunk of code removed:This results in a merge statement that includes a
when not matched
clause only.(Note that I've tested that this works for Snowflake. Not sure about other platforms.)
Describe alternatives you've considered
Rather than making this a separate incremental strategy, we could also have a config argument passed to the existing merge strategy. Maybe something like
update_matches
then the logic indefault__get_merge_sql
could be updaed to:Who will this benefit?
Suppose that:
(This is actually quite common!)
If you are building an incremental model, you need to:
where session_start > dateadd(day, -3, current_date)
)In this case, any matches that are detected when the merge statement is run can safely be assumed to be full duplicates of the row already existing in the destination table and can be discarded. (These are just your artificially created duplicates due to the lookback period.)
However, the current "merge" strategy overwrites the duplicated rows, which is a waste of compute. This can be a significant savings if the lookback period is fairly long relative to the time between job runs, resulting in re-processing a large number of duplicate rows.
Are you interested in contributing this feature?
Sure
Anything else?
No response
The text was updated successfully, but these errors were encountered: