-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding a new column is not a breaking contract change #7333
Conversation
core/dbt/contracts/graph/nodes.py
Outdated
|
||
if breaking_change_reasons: | ||
raise (ModelContractError(reasons=" and ".join(breaking_change_reasons), node=self)) | ||
for key, value in sorted(old.columns.items()): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In terms of performance - we're not iterating over columns for every model, just those with a contract change. This should extend also nicely to detecting breaking changes based on constraint modifications - we could add the constraints to the existing contract checksum and dig into the specific breaking change reasons from there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That was my instinct as well. The checksum
is the fastest way to detect that there have been no changes. If there are changes, we dig in from there, and it's worth iterating over every column to construct the most thorough & helpful error message.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. Something like this was always going to have to happen for constraints.
CODE = 10016 | ||
MESSAGE = "Contract Error" | ||
MESSAGE = "Breaking Change to Contract" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed this, but I don't actually know where this MESSAGE
appears. Should, this and the type
below, be Contract Breaking Change Error
for consistency with the exception class name?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it appears in a generic message that packages up compilation errors, but I'm not sure it does anywhere else. Let me look for it...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few exceptions use it to construct the actual "message", but mostly it's not. Another thing we might want to clean up at some point. So I don't think it matters here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Other message use a "type" property to construct a "{type} Error", but mostly DbtInternalError
core/dbt/contracts/graph/nodes.py
Outdated
column_type_changes=column_type_changes, | ||
node=self, | ||
) | ||
) | ||
else: | ||
# no breaking changes | ||
return True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might just be a confusing comment - but this method should return True when there are no changes (breaking or otherwise) to the contract, and False if there is a change to the contract but it is non-breaking. Are we actually handling the latter case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does look like the logic isn't right here. I think we need to keep track of adds and return False, so we need "elif columns_added, return False, else True.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that the else wouldn't actually be hit, but it does feel better to have it there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm taking a look at this logic now. It's genuinely tricky. I'll add more inline comments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be returning False
.
In practice, it's likely that a contract change would be accompanied by a change to the model definition, such that same_body
would return False
— but we should still have the logic be right here.
I'll add a test that should catch this as well, by selecting just state:modified.contract
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I inverted the logic a bit, so now it's:
- Was the contract being enforced previously?
- no → return
False
orTrue
, but neverContractBreakingChangeError
- no → return
- If so, do the checksums match up?
- yes →
return True
- yes →
- If the checksums don't match up, are there any changes we consider breaking?
- breaking → throw a
ContractBreakingChangeError
with details - non-breaking, still a change →
return False
- breaking → throw a
Let me know if you think that makes more or less sense than before
core/dbt/contracts/graph/nodes.py
Outdated
# Breaking change: throw an error | ||
# Note: we don't have contract.checksum for current node, so build | ||
# Breaking change: the contract was previously enforced, and it no longer is | ||
# Note: we don't have contract.checksum for current node, so build it now | ||
self.build_contract_checksum() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
have we not already done this up here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah, good catch - I'd added it up there, but it actually isn't needed (?) - we only need it in the case that self.contract.enforced is False
(?) - but it still seems like calling it here wouldn't actually do anything...
dbt-core/core/dbt/contracts/graph/nodes.py
Lines 571 to 575 in 6b42a71
def build_contract_checksum(self): | |
# We don't need to construct the checksum if the model does not | |
# have contract enforced, because it won't be used. | |
# This needs to be executed after contract config is set | |
if self.contract.enforced is True: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm pretty sure we can remove both calls to build_contract_checksum()
. It's being called within NodePatchParser.parse_patch
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, that's a change. Initially I only had it building when contract.enforced is True, and it's been changed so that it always builds.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually no, it's "build_contract_checksum" that checks for contract enforced. So it is a no-op, but it probably shouldn't be, because Michelle wanted to also capture that there were additional changes to the columns. So we should remove the check for contract.enforced in build_contract_checksum and only call it in parse_patch if contract.enforced is True, and THEN it won't be a no-op.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Boy this logic makes your head hurt.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gshank Right 😅
We want the full column spec to be captured in the checksum, so that additions would cause a checksum mismatch — but if the checksums don't match, we then need to dig in and understand whether a column was actually removed/changed, or just added.
I'm happy with the current implementation for now, and it seems to satisfy the test cases we've added (reflecting in-the-wild use cases we'd expect) — but we can open up another ticket to revisit this logic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just one question, not necessarily blocking. Thanks so much for fixing this ✨
|
||
# If the checksums match up, the contract has not changed, so same_contract: True | ||
if self.contract.checksum == old.contract.checksum: | ||
return True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is wrong. If old.contract.enforced is True and self.contract.enforced is False it's a breaking change even when checksums are still equal, and this will always return True for that case. Also the "build_contract_checksum" only needs to run for self.contract.enforced is False, and the line above runs it for both True and False.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If old.contract.enforced is True and self.contract.enforced is False it's a breaking change even when checksums are still equal, and this will always return True for that case.
Yes! Good point! Let me add a test for this
Also the "build_contract_checksum" only needs to run for self.contract.enforced is False, and the line above runs it for both True and False.
Per thread above, I don't think we need to call build_contract_checksum
at any point within same_contract
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Per other thread, we want to capture that the contract has been disabled AND the columns are different.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If old.contract.enforced is True and self.contract.enforced is False it's a breaking change even when checksums are still equal, and this will always return True for that case.
After looking more closely, the current formulation of the logic does work, because self.contract.checksum
is None
!
ipdb> self.contract.checksum == old.contract.checksum
False
ipdb> self.contract
Contract(enforced=False, checksum=None)
ipdb> old.contract
Contract(enforced=True, checksum='1698cf5a415f00ae0dee2c6e97bb51c020d46955847b2e9cec53a8e40d1afb13')
I do agree that a more explicit condition for this would be better, rather than implicitly depending on that behavior to remain the case, so I've added it in
I'm going to merge this for inclusion in v1.5.0-rc1. Even if the logic is a bit verbose, and perhaps still not 100% perfect, we have better test coverage now than before, and it feels like a step closer to the behavior that we want. |
resolves #7332
Description
If there's a mismatch in the
checksum
, then we need to dig into why to understand if the change is actually breaking. I'm not sure if this is the most performant way to do it, but it yields the intended functionality.I also renamed the exception from
ModelContractError
toContractBreakingChangeError
, and reworked the message that we display to users:I first did this with fancier string formatting within
CompiledNode.same_contract
, but I added another commit that took the approach of storing more structured information onContractBreakingChangeError
, and pretty-formatting the message there.Checklist
changie new
to create a changelog entry