Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] Renaming or removing a contracted model should raise a BreakingChange warning/error #10116

Closed
2 tasks done
Tracked by #10125
jtcohen6 opened this issue May 9, 2024 · 4 comments · Fixed by #10221
Closed
2 tasks done
Tracked by #10125
Assignees
Labels
bug Something isn't working High Severity bug with significant impact that should be resolved in a reasonable timeframe model_contracts multi_project user docs [docs.getdbt.com] Needs better documentation

Comments

@jtcohen6
Copy link
Contributor

jtcohen6 commented May 9, 2024

Is this a new bug in dbt-core?

  • I believe this is a new bug in dbt-core
  • I have searched the existing issues, and I could not find an existing issue for this bug

Current Behavior

When a contracted model changes or disables its contract, dbt raises a warning (UnversionedBreakingChange) or error (ContractBreakingChangeError) within the same_contract check triggered by state:modified selection/comparison.

However, the same warning/error does not occur if the user simply deletes or renames a contracted model. I believe it should — until/unless unless that contracted model has passed a defined deprecation_date, at which point deletion is allowed.

Expected Behavior

If a contracted model is present in the comparison manifest, and missing in the current manifest, raise a warning (if unversioned) or error (if versioned) accordingly. Add a new type of "breaking change" for "renamed or removed."

Naïve implementation for illustrative purposes only:

class StateSelectorMethod(SelectorMethod):
    ...
    def check_for_deleted_contracted_models(self) -> None:
        old_contracted_nodes = set(
            k for k, v in self.previous_state.manifest.nodes.items() if v.config.contract.enforced
        )
        new_contracted_nodes = set(
            k for k, v in self.manifest.nodes.items() if v.config.contract.enforced
        )
        for unique_id in old_contracted_nodes - new_contracted_nodes:
            node = self.previous_state.manifest.nodes[unique_id]
            if (
                node.deprecation_date
                and node.deprecation_date < datetime.datetime.now().astimezone()
            ):
                # Passed its deprecation_date, so deletion is allowed
                continue
            if node.version is None:
                print("WARNING! Unversioned contracted model renamed or removed")
                # This should be warn_or_error(UnversionedBreakingChange), and we probably want a new "breaking_change" reason
            else:
                print("ERROR! Versioned contracted model renamed or removed")
                # This should raise ContractBreakingChangeError
    ...
    def search(self, included_nodes: Set[UniqueId], selector: str) -> Iterator[UniqueId]:
        if self.previous_state is None or self.previous_state.manifest is None:
            raise DbtRuntimeError("Got a state selector method, but no comparison manifest")
        self.check_for_deleted_contracted_models()
        ...

Steps To Reproduce

  1. Create a model configured with contract: {enforced: true}
  2. Parse the project / run the model
  3. Move the manifest to a folder named state/
  4. Delete or rename the model
  5. dbt list -s state:modified --state
  6. No warning or error

Relevant log output

No response

Environment

- OS: macOS
- Python: 3.10.11
- dbt: v1.8 / main

Which database adapter are you using with dbt?

No response

Additional Context

No response

@jtcohen6 jtcohen6 added the High Severity bug with significant impact that should be resolved in a reasonable timeframe label May 10, 2024
@ChenyuLInx
Copy link
Contributor

@MichelleArk link an example of the current implementation for similar errors in the contracted model.

@jtcohen6
Copy link
Contributor Author

The existing call sites for UnversionedBreakingChange and ContractBreakingChangeError are in same_contents:

def same_contract(self, old, adapter_type=None) -> bool:
# If the contract wasn't previously enforced:
if old.contract.enforced is False and self.contract.enforced is False:
# No change -- same_contract: True
return True
if old.contract.enforced is False and self.contract.enforced is True:
# Now it's enforced. This is a change, but not a breaking change -- same_contract: False
return False
# Otherwise: The contract was previously enforced, and we need to check for changes.
# Happy path: The contract is still being enforced, and the checksums are identical.
if self.contract.enforced is True and self.contract.checksum == old.contract.checksum:
# No change -- same_contract: True
return True
# Otherwise: There has been a change.
# We need to determine if it is a **breaking** change.
# These are the categories of breaking changes:
contract_enforced_disabled: bool = False
columns_removed: List[str] = []
column_type_changes: List[Dict[str, str]] = []
enforced_column_constraint_removed: List[
Dict[str, str]
] = [] # column_name, constraint_type
enforced_model_constraint_removed: List[Dict[str, Any]] = [] # constraint_type, columns
materialization_changed: List[str] = []
if old.contract.enforced is True and self.contract.enforced is False:
# Breaking change: the contract was previously enforced, and it no longer is
contract_enforced_disabled = True
# TODO: this avoid the circular imports but isn't ideal
from dbt.adapters.base import ConstraintSupport
from dbt.adapters.factory import get_adapter_constraint_support
constraint_support = get_adapter_constraint_support(adapter_type)
column_constraints_exist = False
# Next, compare each column from the previous contract (old.columns)
for old_key, old_value in sorted(old.columns.items()):
# Has this column been removed?
if old_key not in self.columns.keys():
columns_removed.append(old_value.name)
# Has this column's data type changed?
elif old_value.data_type != self.columns[old_key].data_type:
column_type_changes.append(
{
"column_name": str(old_value.name),
"previous_column_type": str(old_value.data_type),
"current_column_type": str(self.columns[old_key].data_type),
}
)
# track if there are any column level constraints for the materialization check late
if old_value.constraints:
column_constraints_exist = True
# Have enforced columns level constraints changed?
# Constraints are only enforced for table and incremental materializations.
# We only really care if the old node was one of those materializations for breaking changes
if (
old_key in self.columns.keys()
and old_value.constraints != self.columns[old_key].constraints
and old.materialization_enforces_constraints
):
for old_constraint in old_value.constraints:
if (
old_constraint not in self.columns[old_key].constraints
and constraint_support[old_constraint.type] == ConstraintSupport.ENFORCED
):
enforced_column_constraint_removed.append(
{
"column_name": old_key,
"constraint_name": old_constraint.name,
"constraint_type": ConstraintType(old_constraint.type),
}
)
# Now compare the model level constraints
if old.constraints != self.constraints and old.materialization_enforces_constraints:
for old_constraint in old.constraints:
if (
old_constraint not in self.constraints
and constraint_support[old_constraint.type] == ConstraintSupport.ENFORCED
):
enforced_model_constraint_removed.append(
{
"constraint_name": old_constraint.name,
"constraint_type": ConstraintType(old_constraint.type),
"columns": old_constraint.columns,
}
)
# Check for relevant materialization changes.
if (
old.materialization_enforces_constraints
and not self.materialization_enforces_constraints
and (old.constraints or column_constraints_exist)
):
materialization_changed = [old.config.materialized, self.config.materialized]
# If a column has been added, it will be missing in the old.columns, and present in self.columns
# That's a change (caught by the different checksums), but not a breaking change
# Did we find any changes that we consider breaking? If there's an enforced contract, that's
# a warning unless the model is versioned, then it's an error.
if (
contract_enforced_disabled
or columns_removed
or column_type_changes
or enforced_model_constraint_removed
or enforced_column_constraint_removed
or materialization_changed
):
breaking_changes = []
if contract_enforced_disabled:
breaking_changes.append(
"Contract enforcement was removed: Previously, this model had an enforced contract. It is no longer configured to enforce its contract, and this is a breaking change."
)
if columns_removed:
columns_removed_str = "\n - ".join(columns_removed)
breaking_changes.append(f"Columns were removed: \n - {columns_removed_str}")
if column_type_changes:
column_type_changes_str = "\n - ".join(
[
f"{c['column_name']} ({c['previous_column_type']} -> {c['current_column_type']})"
for c in column_type_changes
]
)
breaking_changes.append(
f"Columns with data_type changes: \n - {column_type_changes_str}"
)
if enforced_column_constraint_removed:
column_constraint_changes_str = "\n - ".join(
[
f"'{c['constraint_name'] if c['constraint_name'] is not None else c['constraint_type']}' constraint on column {c['column_name']}"
for c in enforced_column_constraint_removed
]
)
breaking_changes.append(
f"Enforced column level constraints were removed: \n - {column_constraint_changes_str}"
)
if enforced_model_constraint_removed:
model_constraint_changes_str = "\n - ".join(
[
f"'{c['constraint_name'] if c['constraint_name'] is not None else c['constraint_type']}' constraint on columns {c['columns']}"
for c in enforced_model_constraint_removed
]
)
breaking_changes.append(
f"Enforced model level constraints were removed: \n - {model_constraint_changes_str}"
)
if materialization_changed:
materialization_changes_str = (
f"{materialization_changed[0]} -> {materialization_changed[1]}"
)
breaking_changes.append(
f"Materialization changed with enforced constraints: \n - {materialization_changes_str}"
)
if self.version is None:
warn_or_error(
UnversionedBreakingChange(
contract_enforced_disabled=contract_enforced_disabled,
columns_removed=columns_removed,
column_type_changes=column_type_changes,
enforced_column_constraint_removed=enforced_column_constraint_removed,
enforced_model_constraint_removed=enforced_model_constraint_removed,
breaking_changes=breaking_changes,
model_name=self.name,
model_file_path=self.original_file_path,
),
node=self,
)
else:
raise (
ContractBreakingChangeError(
breaking_changes=breaking_changes,
node=self,
)
)

However, in the case of a model that's been renamed/removed, we can't rely on same_contents, because that is called once for each node within the current project.

Acceptance criteria

  • Add new attribute (renamed_or_removed: bool) to UnversionedBreakingChange
  • Add check for old nodes missing in current manifest to state:modified (naïve example above). Depending on whether the node is versioned, and whether it's passed its deprecation_date, fire UnversionedBreakingChange event or raise ContractBreakingChangeError error. These should include "The model has been renamed or removed" within their list of breaking_changes.
  • Add tests

@MichelleArk
Copy link
Contributor

@FishtownBuildBot
Copy link
Collaborator

Opened a new issue in dbt-labs/docs.getdbt.com: dbt-labs/docs.getdbt.com#5574

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working High Severity bug with significant impact that should be resolved in a reasonable timeframe model_contracts multi_project user docs [docs.getdbt.com] Needs better documentation
Projects
None yet
4 participants