Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CT-2755] [Bug] Allow on_schema_change = fail for contracted incremental models #7975

Closed
2 tasks done
Tracked by #7372
graciegoheen opened this issue Jun 28, 2023 · 2 comments · Fixed by #8006
Closed
2 tasks done
Tracked by #7372
Assignees
Labels
bug Something isn't working
Milestone

Comments

@graciegoheen
Copy link
Contributor

graciegoheen commented Jun 28, 2023

Is this a new bug in dbt-core?

  • I believe this is a new bug in dbt-core
  • I have searched the existing issues, and I could not find an existing issue for this bug

Current Behavior

When I add a contract to an incremental model with on_schema_change = fail, I get the following error message:

Invalid value for on_schema_change: fail. Models materialized as incremental with contracts enabled must set on_schema_change to 'append_new_columns'

Expected Behavior

There's not a clear reason to prevent contracted incremental models from having on_schema_change = fail - which triggers an error message when the source and target schemas diverge. This config forces you to execute a full-refresh if you add a new column or remove and old column from your incremental model. Removing an old column would also break your "contract" - so there shouldn't be an issue with doubling up on errors here.

I also don't see a strong reason to prevent on_schema_change = ignore, but at the minimum the "safest" incremental strategy (on_schema_change = fail) should be allowed for contracted models.

Steps To Reproduce

  1. Add a contact to an incremental model with on_schema_change = fail
  2. Execute a dbt build

Relevant log output

No response

Environment

- OS:
- Python:
- dbt:

Which database adapter are you using with dbt?

No response

Additional Context

No response

@graciegoheen graciegoheen added bug Something isn't working triage labels Jun 28, 2023
@github-actions github-actions bot changed the title [Bug] Allow on_schema_change = fail for contracted incremental models [CT-2755] [Bug] Allow on_schema_change = fail for contracted incremental models Jun 28, 2023
@jtcohen6 jtcohen6 added this to the v1.5.x milestone Jun 29, 2023
@jtcohen6
Copy link
Contributor

Thanks @graciegoheen! You're totally right that we should support on_schema_change: fail. Would you like to do the honors? ;)

if (
self.contract.enforced
and self.materialized == "incremental"
and self.on_schema_change != "append_new_columns"
):
raise ValidationError(
f"Invalid value for on_schema_change: {self.on_schema_change}. Models "
"materialized as incremental with contracts enabled must set "
"on_schema_change to 'append_new_columns'"
)

Why only support these two, and not all four? This is an opinionated choice, but really:

  • sync_all_columns: is exactly the same as append_new_columns, but with the ability to remove columns that get deleted, a thing you're specifically not supposed to do with contracted models (unless you're bumping the version)
  • ignore means that your contract check will succeed, assuming you've coordinated yaml + SQL updates (tmp table == yaml) — but the resulting columns in the target table will not actually reflect the columns described in your contract! You need to full-refresh to make the stated contract and resulting table line up. At the point where you must full-refresh, that's really what on_schema_change: fail is for. (Or on_schema_change: full_refresh, if/when we can implement it: feat: add full_refresh option to on_schema_change of incremental models #6412)

The biggest reason in favor of supporting on_schema_change = ignore is that it's the default, such that a user could contract an incremental model without needing to touch other configs. Why it's the default is really for historical reasons: incremental models came first (ca. 2016), on_schema_change handling much later (v0.21 / 2021). If we were to do it over again from the start, I'd think on_schema_change = fail should be the default; ignore frequently leads to surprising outcomes, because you've changed a model's SQL, it's successfully built, but the new columns are not actually in the resulting table.

@TianyiLi10
Copy link

Does python-model support on_schema_change feature ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants