Adding a new column is not a breaking contract change #7333

jtcohen6 · 2023-04-12T11:35:10Z

resolves #7332

Description

If there's a mismatch in the checksum, then we need to dig into why to understand if the change is actually breaking. I'm not sure if this is the most performant way to do it, but it yields the intended functionality.

I also renamed the exception from ModelContractError to ContractBreakingChangeError, and reworked the message that we display to users:

$ dbt ls -s state:modified --state state/
11:31:16  Running with dbt=1.5.0-b5
11:31:17  Found 2 models, 0 tests, 0 snapshots, 1 analysis, 536 macros, 0 operations, 1 seed file, 0 sources, 0 exposures, 0 metrics, 0 groups
11:31:17  Encountered an error:
Breaking Change to Contract Error in model sometable (models/sometable.sql)
  While comparing to previous project state, dbt detected a breaking change to an enforced contract.

  The contract's enforcement has been disabled.

  Columns were removed:
   - order_name

  Columns with data_type changes:
   - order_id (number -> int)

  Consider making an additive (non-breaking) change instead, if possible.
  Otherwise, create a new model version: https://docs.getdbt.com/docs/collaborate/publish/model-versions

I first did this with fancier string formatting within CompiledNode.same_contract, but I added another commit that took the approach of storing more structured information on ContractBreakingChangeError, and pretty-formatting the message there.

Checklist

I have read the contributing guide and understand what's expected of me
I have signed the CLA
I have run this code in development and it appears to resolve the stated issue
This PR includes tests, or tests are not required/relevant for this PR
I have opened an issue to add/update docs, or docs changes are not required/relevant for this PR
I have run changie new to create a changelog entry

MichelleArk · 2023-04-12T13:44:59Z

core/dbt/contracts/graph/nodes.py

-
-        if breaking_change_reasons:
-            raise (ModelContractError(reasons=" and ".join(breaking_change_reasons), node=self))
+            for key, value in sorted(old.columns.items()):


In terms of performance - we're not iterating over columns for every model, just those with a contract change. This should extend also nicely to detecting breaking changes based on constraint modifications - we could add the constraints to the existing contract checksum and dig into the specific breaking change reasons from there.

That was my instinct as well. The checksum is the fastest way to detect that there have been no changes. If there are changes, we dig in from there, and it's worth iterating over every column to construct the most thorough & helpful error message.

Yes. Something like this was always going to have to happen for constraints.

jtcohen6 · 2023-04-12T13:49:44Z

core/dbt/exceptions.py

    CODE = 10016
-    MESSAGE = "Contract Error"
+    MESSAGE = "Breaking Change to Contract"


I changed this, but I don't actually know where this MESSAGE appears. Should, this and the type below, be Contract Breaking Change Error for consistency with the exception class name?

I think it appears in a generic message that packages up compilation errors, but I'm not sure it does anywhere else. Let me look for it...

A few exceptions use it to construct the actual "message", but mostly it's not. Another thing we might want to clean up at some point. So I don't think it matters here.

Other message use a "type" property to construct a "{type} Error", but mostly DbtInternalError

MichelleArk · 2023-04-12T14:15:11Z

core/dbt/contracts/graph/nodes.py

+                    column_type_changes=column_type_changes,
+                    node=self,
+                )
+            )
        else:
            # no breaking changes
            return True


It might just be a confusing comment - but this method should return True when there are no changes (breaking or otherwise) to the contract, and False if there is a change to the contract but it is non-breaking. Are we actually handling the latter case?

It does look like the logic isn't right here. I think we need to keep track of adds and return False, so we need "elif columns_added, return False, else True.

I think that the else wouldn't actually be hit, but it does feel better to have it there.

I'm taking a look at this logic now. It's genuinely tricky. I'll add more inline comments.

This should be returning False.

In practice, it's likely that a contract change would be accompanied by a change to the model definition, such that same_body would return False — but we should still have the logic be right here.

I'll add a test that should catch this as well, by selecting just state:modified.contract.

I inverted the logic a bit, so now it's:

Was the contract being enforced previously?

no → return False or True, but never ContractBreakingChangeError

If so, do the checksums match up?

yes → return True

If the checksums don't match up, are there any changes we consider breaking?

breaking → throw a ContractBreakingChangeError with details

non-breaking, still a change → return False

Let me know if you think that makes more or less sense than before

MichelleArk · 2023-04-12T20:37:22Z

core/dbt/contracts/graph/nodes.py

-            # Breaking change: throw an error
-            # Note: we don't have contract.checksum for current node, so build
+            # Breaking change: the contract was previously enforced, and it no longer is
+            # Note: we don't have contract.checksum for current node, so build it now
            self.build_contract_checksum()


have we not already done this up here?

ah, good catch - I'd added it up there, but it actually isn't needed (?) - we only need it in the case that self.contract.enforced is False (?) - but it still seems like calling it here wouldn't actually do anything...

dbt-core/core/dbt/contracts/graph/nodes.py

Lines 571 to 575 in 6b42a71

def build_contract_checksum(self):

# We don't need to construct the checksum if the model does not

# have contract enforced, because it won't be used.

# This needs to be executed after contract config is set

if self.contract.enforced is True:

I'm pretty sure we can remove both calls to build_contract_checksum(). It's being called within NodePatchParser.parse_patch

Oh, that's a change. Initially I only had it building when contract.enforced is True, and it's been changed so that it always builds.

Actually no, it's "build_contract_checksum" that checks for contract enforced. So it is a no-op, but it probably shouldn't be, because Michelle wanted to also capture that there were additional changes to the columns. So we should remove the check for contract.enforced in build_contract_checksum and only call it in parse_patch if contract.enforced is True, and THEN it won't be a no-op.

Boy this logic makes your head hurt.

@gshank Right 😅

We want the full column spec to be captured in the checksum, so that additions would cause a checksum mismatch — but if the checksums don't match, we then need to dig in and understand whether a column was actually removed/changed, or just added.

I'm happy with the current implementation for now, and it seems to satisfy the test cases we've added (reflecting in-the-wild use cases we'd expect) — but we can open up another ticket to revisit this logic.

MichelleArk

Just one question, not necessarily blocking. Thanks so much for fixing this ✨

gshank · 2023-04-12T20:56:43Z

core/dbt/contracts/graph/nodes.py

+
+        # If the checksums match up, the contract has not changed, so same_contract: True
+        if self.contract.checksum == old.contract.checksum:
+            return True


I think this is wrong. If old.contract.enforced is True and self.contract.enforced is False it's a breaking change even when checksums are still equal, and this will always return True for that case. Also the "build_contract_checksum" only needs to run for self.contract.enforced is False, and the line above runs it for both True and False.

If old.contract.enforced is True and self.contract.enforced is False it's a breaking change even when checksums are still equal, and this will always return True for that case.

Yes! Good point! Let me add a test for this

Also the "build_contract_checksum" only needs to run for self.contract.enforced is False, and the line above runs it for both True and False.

Per thread above, I don't think we need to call build_contract_checksum at any point within same_contract

Per other thread, we want to capture that the contract has been disabled AND the columns are different.

If old.contract.enforced is True and self.contract.enforced is False it's a breaking change even when checksums are still equal, and this will always return True for that case.

After looking more closely, the current formulation of the logic does work, because self.contract.checksum is None!

ipdb> self.contract.checksum == old.contract.checksum False ipdb> self.contract Contract(enforced=False, checksum=None) ipdb> old.contract Contract(enforced=True, checksum='1698cf5a415f00ae0dee2c6e97bb51c020d46955847b2e9cec53a8e40d1afb13')

I do agree that a more explicit condition for this would be better, rather than implicitly depending on that behavior to remain the case, so I've added it in

jtcohen6 · 2023-04-12T22:19:42Z

I'm going to merge this for inclusion in v1.5.0-rc1. Even if the logic is a bit verbose, and perhaps still not 100% perfect, we have better test coverage now than before, and it feels like a step closer to the behavior that we want.

jtcohen6 added 3 commits April 12, 2023 13:21

Adding a new column is not a breaking contract change

afa4757

Add changelog entry

897ab08

More structured exception

80c2857

jtcohen6 requested review from a team as code owners April 12, 2023 11:35

jtcohen6 requested review from emmyoop and gshank April 12, 2023 11:35

cla-bot bot added the cla:yes label Apr 12, 2023

jtcohen6 requested a review from MichelleArk April 12, 2023 11:35

gshank approved these changes Apr 12, 2023

View reviewed changes

MichelleArk reviewed Apr 12, 2023

View reviewed changes

jtcohen6 commented Apr 12, 2023

View reviewed changes

jtcohen6 mentioned this pull request Apr 12, 2023

[CT-2197] [Feature] Detect breaking changes to constraints in state:modified check #7065

Closed

MichelleArk reviewed Apr 12, 2023

View reviewed changes

same_contract: False if non-breaking changes

f0a30ce

MichelleArk reviewed Apr 12, 2023

View reviewed changes

MichelleArk approved these changes Apr 12, 2023

View reviewed changes

gshank reviewed Apr 12, 2023

View reviewed changes

PR feedback: rm build_contract_checksum, more comments

e9e58ee

jtcohen6 merged commit 2ddf296 into main Apr 12, 2023

jtcohen6 deleted the jerco/fix-7332 branch April 12, 2023 22:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding a new column is not a breaking contract change #7333

Adding a new column is not a breaking contract change #7333

jtcohen6 commented Apr 12, 2023 •

edited

Loading

MichelleArk Apr 12, 2023

jtcohen6 Apr 12, 2023 •

edited

Loading

gshank Apr 12, 2023

jtcohen6 Apr 12, 2023

gshank Apr 12, 2023

gshank Apr 12, 2023

gshank Apr 12, 2023

MichelleArk Apr 12, 2023

gshank Apr 12, 2023

gshank Apr 12, 2023

jtcohen6 Apr 12, 2023

jtcohen6 Apr 12, 2023 •

edited

Loading

jtcohen6 Apr 12, 2023

MichelleArk Apr 12, 2023

jtcohen6 Apr 12, 2023

jtcohen6 Apr 12, 2023

gshank Apr 12, 2023

gshank Apr 12, 2023

gshank Apr 12, 2023

jtcohen6 Apr 12, 2023 •

edited

Loading

MichelleArk left a comment

gshank Apr 12, 2023

jtcohen6 Apr 12, 2023

gshank Apr 12, 2023

jtcohen6 Apr 12, 2023 •

edited

Loading

jtcohen6 commented Apr 12, 2023

	def build_contract_checksum(self):
	# We don't need to construct the checksum if the model does not
	# have contract enforced, because it won't be used.
	# This needs to be executed after contract config is set
	if self.contract.enforced is True:

Adding a new column is not a breaking contract change #7333

Adding a new column is not a breaking contract change #7333

Conversation

jtcohen6 commented Apr 12, 2023 • edited Loading

Description

Checklist

Choose a reason for hiding this comment

jtcohen6 Apr 12, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jtcohen6 Apr 12, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jtcohen6 Apr 12, 2023 • edited Loading

Choose a reason for hiding this comment

MichelleArk left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jtcohen6 Apr 12, 2023 • edited Loading

Choose a reason for hiding this comment

jtcohen6 commented Apr 12, 2023

jtcohen6 commented Apr 12, 2023 •

edited

Loading

jtcohen6 Apr 12, 2023 •

edited

Loading

jtcohen6 Apr 12, 2023 •

edited

Loading

jtcohen6 Apr 12, 2023 •

edited

Loading

jtcohen6 Apr 12, 2023 •

edited

Loading