-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
dbt Constraints / model contracts #6271
Conversation
Thank you for your pull request! We could not find a changelog entry for this change. For details on how to document a change, see the contributing guide. |
1 similar comment
Thank you for your pull request! We could not find a changelog entry for this change. For details on how to document a change, see the contributing guide. |
core/dbt/contracts/graph/parsed.py
Outdated
@@ -17,6 +17,7 @@ | |||
|
|||
from dbt.dataclass_schema import dbtClassMixin, ExtensibleDbtClassMixin | |||
|
|||
from dbt.exceptions import CompilationException |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I noticed no exceptions exist in this file except for the one I introduce. Is that intentional?
core/dbt/contracts/graph/unparsed.py
Outdated
@@ -93,6 +93,8 @@ class HasDocs(AdditionalPropertiesMixin, ExtensibleDbtClassMixin, Replaceable): | |||
description: str = "" | |||
meta: Dict[str, Any] = field(default_factory=dict) | |||
data_type: Optional[str] = None | |||
constraint: Optional[str] = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It only needs to be inserted here as external tables and sources are not in scope for this feature
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could there ever be multiple constraints? Should this be a list (or either a string or a list)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ooooo great question. For developer UX, I'd bias towards a list even though it may be more "efficient" to allow arbitrary strings, developer mental load better reasons about multiple constraints as a list.
And yes, postgres allows multiple constraints: https://www.postgresql.org/docs/current/ddl-constraints.html#id-1.5.4.6.6
I'll make the change for the constraints
config to be an optional list
String Format:
Harder to reason about separate constraints
# models/my_custom_model_config.yml
models:
- name: my_model
columns:
- name: email
data_type: string
constraints: not null unique
List Format:
Easier to reason about separate constraints
# models/my_custom_model_config.yml
models:
- name: my_model
columns:
- name: email
data_type: string
constraints: ['not null','unique']
core/dbt/exceptions.py
Outdated
@@ -1000,6 +1000,12 @@ def warn(msg, node=None): | |||
return "" | |||
|
|||
|
|||
# TODO: remove this later as Sung created this to avoid errors with running dbt-snowflake==1.3.0 | |||
def warn_or_error(msg, node=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason this function was removed from the main
branch while 1.3.latest
still has it? https://github.com/dbt-labs/dbt-core/blob/1.3.latest/core/dbt/exceptions.py#L1090
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, this was related to work that Emily has been doing on more structured logging/exceptions. Just opened a ticket for it: #6297
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like the selector error message was fixed, but is there a reason warn_or_error
doesn't exist in the main
branch but in 1.3.latest
? @jtcohen6
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sungchun12 It still exists, it just moved:
- from dbt.exceptions import warn_or_error
+ from dbt.events.functions import warn_or_error
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahhhh I know what happened, I needed to update my snowflake adapter to the main branch vs. relying on 1.3.latest
Thanks Jerco! I'm unblocked!
@@ -180,6 +180,7 @@ def from_test_block( | |||
) | |||
|
|||
|
|||
# TODO: create a ConstraintsBuilder equivalent | |||
class TestBuilder(Generic[Testable]): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I should NOT need to build a ConstraintsBuilder
because the configs extend column_info
within the schemas.py
file.
core/dbt/parser/models.py
Outdated
@@ -177,6 +177,21 @@ def verify_python_model_code(node): | |||
raise ParsingException("No jinja in python model code is allowed", node=node) | |||
|
|||
|
|||
# TODO: Do we need to do this? | |||
# def verify_model_constraints(node): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't need to verify within this model because that happens parsed.py
schemas/dbt/manifest/v8.json
Outdated
@@ -585,6 +585,10 @@ | |||
"type": "boolean", | |||
"default": true | |||
}, | |||
"constraints_enabled": { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do you test the manifest?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
quis verificat ipsos verificantia? quis manifestat ipsos manifesta?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
google translate:
Who verifies the verifiers? Who makes them manifest?
I've reviewed + refactored tests on the dbt-core PR, with the specific goal of having a clearer separation between:
There are now only two "adapter zone" test classes that need inheriting/modifying. I'll take a pass over the adapter PRs tomorrow or Wednesday. Ideally, I'd like to get all of this merged by Wednesday (Feb 1), but I'm going to wait until |
cast('2019-01-01' as date) as date_day | ||
""" | ||
|
||
my_model_error_sql = """ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this looks the same as my_model_sql
-- what's the expected error? @jtcohen6
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dave-connors-3 Good catch! Ended up implementing this by other means below:
dbt-core/tests/adapter/dbt/tests/adapter/constraints/test_constraints.py
Lines 235 to 237 in dcf7062
# Make a contract-breaking change to the model | |
my_model_with_nulls = my_model_sql.replace("1 as id", "null as id") | |
write_file(my_model_with_nulls, "models", "my_model.sql") |
Let's cut my_model_error_sql
entirely
|
||
# Verify the previous table still exists | ||
old_model_exists_sql = """ | ||
select id from dbt.{0}.my_model where id = 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jtcohen6 I think this will also need the templated database!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks to @dbeatty10 @McKnight-42 @MichelleArk for doing some live PR walkthrough & review earlier!
I'll work on fixing up the merge conflicts. Let's make sure all the TODOs are captured in our tracking issues, then let's get this baby merged 😎
{%- set col_name = col['name'] -%} | ||
{%- set column_names_config_only = column_names_config_only.append(col_name) -%} | ||
{%- endfor -%} | ||
{%- set sql_file_provided_columns = get_columns_in_query(sql) -%} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Making an assumption here that column ordering is consistent (and deterministic) in all our data platforms.
Follow-on TODO: We prefer option 2 in https://github.com/dbt-labs/dbt-core/pull/6271/files#r1069332715, let's make sure dbt always ensures order itself, and doesn't make yaml-ordering matter so much
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @MichelleArk for opening #6975!
{% if config.get('constraints_enabled', False) %} | ||
{{ get_assert_columns_equivalent(sql) }} | ||
{{ get_columns_spec_ddl() }} | ||
{% endif %} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Follow-up (related to ongoing discussion in #6750): We might want to distinguish between the "contract" and the "constraints." That distinction would make it possible for users to define (e.g.) check
constraints on Postgres/Databricks, without providing the full column specification in a yaml file. (This is already the case in dbt-databricks
.)
What ought to be part of the contract (for detecting breaking changes, #6869)? "Data shape" for sure (column names, types, nullability). Additional constraints? Tests? We need to decide
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Assigning myself to #6750 for further refinement, before we pick this up for implementation
schemas/dbt/manifest/v8.json
Outdated
@@ -173,7 +173,7 @@ | |||
} | |||
}, | |||
"additionalProperties": false, | |||
"description": "WritableManifest(metadata: dbt.contracts.graph.manifest.ManifestMetadata, nodes: Mapping[str, Union[dbt.contracts.graph.nodes.AnalysisNode, dbt.contracts.graph.nodes.SingularTestNode, dbt.contracts.graph.nodes.HookNode, dbt.contracts.graph.nodes.ModelNode, dbt.contracts.graph.nodes.RPCNode, dbt.contracts.graph.nodes.SqlNode, dbt.contracts.graph.nodes.GenericTestNode, dbt.contracts.graph.nodes.SnapshotNode, dbt.contracts.graph.nodes.SeedNode]], sources: Mapping[str, dbt.contracts.graph.nodes.SourceDefinition], macros: Mapping[str, dbt.contracts.graph.nodes.Macro], docs: Mapping[str, dbt.contracts.graph.nodes.Documentation], exposures: Mapping[str, dbt.contracts.graph.nodes.Exposure], metrics: Mapping[str, dbt.contracts.graph.nodes.Metric], selectors: Mapping[str, Any], disabled: Optional[Mapping[str, List[Union[dbt.contracts.graph.nodes.AnalysisNode, dbt.contracts.graph.nodes.SingularTestNode, dbt.contracts.graph.nodes.HookNode, dbt.contracts.graph.nodes.ModelNode, dbt.contracts.graph.nodes.RPCNode, dbt.contracts.graph.nodes.SqlNode, dbt.contracts.graph.nodes.GenericTestNode, dbt.contracts.graph.nodes.SnapshotNode, dbt.contracts.graph.nodes.SeedNode, dbt.contracts.graph.nodes.SourceDefinition]]]], parent_map: Optional[Dict[str, List[str]]], child_map: Optional[Dict[str, List[str]]])", | |||
"description": "WritableManifest(metadata: dbt.contracts.graph.manifest.ManifestMetadata, nodes: Mapping[str, Union[dbt.contracts.graph.nodes.AnalysisNode, dbt.contracts.graph.nodes.SingularTestNode, dbt.contracts.graph.nodes.HookNode, dbt.contracts.graph.nodes.ModelNode, dbt.contracts.graph.nodes.RPCNode, dbt.contracts.graph.nodes.SqlNode, dbt.contracts.graph.nodes.GenericTestNode, dbt.contracts.graph.nodes.SnapshotNode, dbt.contracts.graph.nodes.SeedNode]], sources: Mapping[str, dbt.contracts.graph.nodes.SourceDefinition], macros: Mapping[str, dbt.contracts.graph.nodes.Macro], docs: Mapping[str, dbt.contracts.graph.nodes.Documentation], exposures: Mapping[str, dbt.contracts.graph.nodes.Exposure], metrics: Mapping[str, dbt.contracts.graph.nodes.Metric], selectors: Mapping[str, Any], disabled: Union[Mapping[str, List[Union[dbt.contracts.graph.nodes.AnalysisNode, dbt.contracts.graph.nodes.SingularTestNode, dbt.contracts.graph.nodes.HookNode, dbt.contracts.graph.nodes.ModelNode, dbt.contracts.graph.nodes.RPCNode, dbt.contracts.graph.nodes.SqlNode, dbt.contracts.graph.nodes.GenericTestNode, dbt.contracts.graph.nodes.SnapshotNode, dbt.contracts.graph.nodes.SeedNode, dbt.contracts.graph.nodes.SourceDefinition]]], NoneType], parent_map: Union[Dict[str, List[str]], NoneType], child_map: Union[Dict[str, List[str]], NoneType])", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO before merging: Regenerate this given latest changes.
FOR LATER: Bump the manifest schema version to v9
(for v1.5) — until/unless we decide to change our approach to manifest/artifact versioning :) cc @MichelleArk let's talk more about this with @dbt-labs/core-language
Lots of code written & reviewed since!
Thanks team for getting this done. I just shouted out loud, "FINALLY!" With the utmost glee. |
resolves #6079
Description
This is a building block to empower multi-project experiences in the future.
Brings defining data types for columns within a SQL select statement as native to dbt. This PR focuses on injecting new config defaults within the manifest, and introduces compilation defense code to ensure configs follow a clear standard. Other adapter PRs will pair with this one to ensure configs cascade appropriately.
Note: It is the responsibility of the adapter to provide extra guardrails for configs like
check
that do not work universally across databases.Related Adapter Pull Requests
dbt Core Test Cases
True or False
values)data_type
configs are enforceddbt-postgres Adapter Test Cases
Checklist
changie new
to create a changelog entry