[CT-3373] [Feature] Incremental Strategy: 'insert_unmatched' #9056

dbernett-amplify · 2023-11-11T14:20:03Z

Is this your first time submitting a feature request?

I have read the expectations for open source contributors
I have searched the existing issues, and I could not find an existing issue for this feature
I am requesting a straightforward extension of existing dbt functionality, rather than a Big Idea better suited to a discussion

Describe the feature

Overview

A new incremental strategy should be defined that has the following behavior:

When the row in the source table has no match in the destination table, it is inserted
When the row in the source table has a match in the destination table, nothing happens (i.e. the row in the destination table is left unchanged).

We might call this strategy insert_unmatched because it inserts only the rows that are not matched.

Implementation

This should be as simple as creating a new strategy that is identical to default__get_merge_sql but has the following chunk of code removed:

    {% if unique_key %}
    when matched then update set
        {% for column_name in update_columns -%}
            {{ column_name }} = DBT_INTERNAL_SOURCE.{{ column_name }}
            {%- if not loop.last %}, {%- endif %}
        {%- endfor %}
    {% endif %}

This results in a merge statement that includes a when not matched clause only.

(Note that I've tested that this works for Snowflake. Not sure about other platforms.)

Describe alternatives you've considered

Rather than making this a separate incremental strategy, we could also have a config argument passed to the existing merge strategy. Maybe something like update_matches then the logic in default__get_merge_sql could be updaed to:

    {% if unique_key and update_matches %}
    when matched then update set
        {% for column_name in update_columns -%}
            {{ column_name }} = DBT_INTERNAL_SOURCE.{{ column_name }}
            {%- if not loop.last %}, {%- endif %}
        {%- endfor %}
    {% endif %}

Who will this benefit?

Suppose that:

You are certain that your source table contains no duplicates.
You have late-arriving data in your source table (i.e. rows can arrive in an order different from the one specified by their timestamp)
(This is actually quite common!)

If you are building an incremental model, you need to:

Set a look back period so that you don't miss the late-arriving data (e.g., where session_start > dateadd(day, -3, current_date))
Define a unique key. Because now you have an artificial duplicates problem that you created by re-processing the same data more than once (due to 1 above).

In this case, any matches that are detected when the merge statement is run can safely be assumed to be full duplicates of the row already existing in the destination table and can be discarded. (These are just your artificially created duplicates due to the lookback period.)

However, the current "merge" strategy overwrites the duplicated rows, which is a waste of compute. This can be a significant savings if the lookback period is fairly long relative to the time between job runs, resulting in re-processing a large number of duplicate rows.

Are you interested in contributing this feature?

Sure

Anything else?

No response

The text was updated successfully, but these errors were encountered:

dbeatty10 · 2023-11-13T18:56:28Z

Related comments from other issues:

[CT-2190] [Feature] Incremental delete and merge update #7057 (comment)
[CT-2023] [Feature] Create an append_unique incremental strategy #6848 (comment)
[CT-1555] Allow for Only INSERT Operation (without UPDATE) with unique_key in MERGE DML for Incremental Models #6320 (comment)

See also: #9060

dbeatty10 · 2023-11-14T06:40:34Z

Two other related comments:

dbeatty10 · 2024-01-10T18:59:32Z

Thanks for publishing the logic you've developed -- I bet it will come in handy for some folks 🧠

Rather than add new incremental strategies in dbt-core, we'd rather encourage the ecosystem through:

Enabling custom incremental strategies, and
Foster sharing that custom logic through the dbt Package Hub

There's probably still some room to make these easier to author for custom incremental strategies that make use of MERGE statements, and we've opened #9223 as a result. Would definitely welcome your insights and opinions in that issue 🙏

In particular, #9223 (comment) does a demo of your proposed insert_unmatched logic using a custom incremental materialization.

dbernett-amplify added enhancement New feature or request triage labels Nov 11, 2023

github-actions bot changed the title ~~[Feature] Incremental Strategy: 'insert_unmatched'~~ [CT-3373] [Feature] Incremental Strategy: 'insert_unmatched' Nov 11, 2023

dbernett-amplify mentioned this issue Nov 12, 2023

[CT-3375] [Feature] Allow Users to Specify Merge Clauses for Incremental Models #9060

Closed

3 tasks

dbeatty10 added the incremental Incremental modeling with dbt label Jan 2, 2024

dbeatty10 mentioned this issue Jan 6, 2024

[CT-3462] [Feature] Additional configurability of incremental merge strategy #9223

Open

3 tasks

dbeatty10 closed this as not planned Won't fix, can't repro, duplicate, stale Jan 10, 2024

dbeatty10 added wontfix Not a bug or out of scope for dbt-core and removed triage labels Jan 10, 2024

dbeatty10 mentioned this issue Jul 3, 2024

[Feature] New table strategy (Truncate+Insert) #10373

Closed

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CT-3373] [Feature] Incremental Strategy: 'insert_unmatched' #9056

[CT-3373] [Feature] Incremental Strategy: 'insert_unmatched' #9056

dbernett-amplify commented Nov 11, 2023 •

edited

Loading

dbeatty10 commented Nov 13, 2023

dbeatty10 commented Nov 14, 2023

dbeatty10 commented Jan 10, 2024 •

edited

Loading

[CT-3373] [Feature] Incremental Strategy: 'insert_unmatched' #9056

[CT-3373] [Feature] Incremental Strategy: 'insert_unmatched' #9056

Comments

dbernett-amplify commented Nov 11, 2023 • edited Loading

Is this your first time submitting a feature request?

Describe the feature

Overview

Implementation

Describe alternatives you've considered

Who will this benefit?

Are you interested in contributing this feature?

Anything else?

dbeatty10 commented Nov 13, 2023

dbeatty10 commented Nov 14, 2023

dbeatty10 commented Jan 10, 2024 • edited Loading

dbernett-amplify commented Nov 11, 2023 •

edited

Loading

dbeatty10 commented Jan 10, 2024 •

edited

Loading