Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CT-2196] Make constraints ddl agnostic to column order #7064

Closed
Tracked by #6747
MichelleArk opened this issue Feb 27, 2023 · 5 comments
Closed
Tracked by #6747

[CT-2196] Make constraints ddl agnostic to column order #7064

MichelleArk opened this issue Feb 27, 2023 · 5 comments
Assignees
Labels
model_contracts multi_project user docs [docs.getdbt.com] Needs better documentation

Comments

@MichelleArk
Copy link
Contributor

MichelleArk commented Feb 27, 2023

The foundational work on constraints introduced pre-flight contract validation on column names, expecting that the user-provided column names are identical both in name and order the column names obtained from the model SQL. This is currently by design, so that the database associates any configured constraints to the correct columns from the query result.

However, it will be cumbersome for users to ensure that the order of configured columns in a model yaml file always matches with the order the database provides.

We can mitigate this while still ensuring any constraints are applied to the correct columns by wrapping user-provided SQL in a subquery which SELECTs columns in the order they are provided in the schema yaml (which is the same order used to generate the constraints ddl).

Implementation Notes:

  • If contract: True, wrap the user-provided SQL in a SELECT .. statement generated from model['columns'] (the user-provided columns).

Could look something like:

{% macro <adapter>__create_table_as(temporary, relation, sql) -%}
    ... 
    {% if config.get('contract', False) %}
      {{ get_assert_columns_equivalent(sql) }}
      {{ get_columns_spec_ddl() }}
      {%- set sql = get_select_subquery(sql) %}
    {% endif %}
  as (
    {{ sql }}
  );
{%- endmacro %}

{% macro get_select_subquery(sql) -%}
  {{ return(adapter.dispatch('get_select_subquery', 'dbt')(sql)) }}
{% endmacro %}

{% macro default__get_select_subquery(sql) -%}
    select 
    {% for column in model['columns'] %}
      {{ column }}{{ ", " if not loop.last }}
    {%- endfor %}
    from (
        {{ sql }}
    ) as model_subq
{%- endmacro %}
  • This will require changes to adapter-specific macros for create table statements.
  • We won't need to do this for view materializations as they don't support constraints.

Relevant discussions: #6271 (comment), https://github.com/dbt-labs/dbt-core/pull/6271/files#r1069332715

@github-actions github-actions bot changed the title Make constraints ddl agnostic to column order [CT-2196] Make constraints ddl agnostic to column order Feb 27, 2023
@jtcohen6
Copy link
Contributor

What this should look like, roughly:

create table dbt_jcohen.my_model (
    -- returned by get_columns_spec_ddl
    column_a integer not null,
    column_b text,
    ...
)
as (
-- this part is new
select
    column_a, column_b, ...
from (
    -- user's model SQL could return these columns in a diff order
    select
        ... as column_b,
        ... as column_a,
        ...
) model_subq -- subqueries must be aliased on Postgres
);

Where column_a, column_b, ... is templated out from the column names defined in the user's yaml ({{ model['columns'] }}), in the order defined in that yaml.

Note that this is quite similar to, but slightly different from, get_columns_spec_ddl, which needs to also include the columns' data type and constraints. Perhaps we should call that macro twice, with different arguments?

@MichelleArk
Copy link
Contributor Author

MichelleArk commented Mar 7, 2023

In terms of integration testing, BaseConstraintsRuntimeEnforcement should need to be updated to test the scenario where a model's columns are represented in different ordering between the schema yaml and sql definitions.

We won't be able to remove BaseConstraintsColumnsEqual.test__constraints_wrong_column_order until #6975 is closed.

@gshank
Copy link
Contributor

gshank commented Mar 9, 2023

I can't create a test with schema columns different than the SQL because an error occurs: 19:04:40 Compilation Error in model my_model (models/my_model.sql)
19:04:40 Please ensure the name, data_type, order, and number of columns in your yml file match the columns in your SQL file.

Am I supposed to remove that check?

@MichelleArk
Copy link
Contributor Author

Ah right, because the model contract (name, data_type, order (for now)) enforcement occurs before the ddl generation. So we won't even get to the ddl generation as long as model contract enforcement depends on contract order.

We do want to remove that check as part of #6975. I'd be comfortable with removing that test as part of this work assuming the PR to close #6975 is merged in coordination with the PR associated with this issue.

@jtcohen6
Copy link
Contributor

Resolved by #7161

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
model_contracts multi_project user docs [docs.getdbt.com] Needs better documentation
Projects
None yet
Development

No branches or pull requests

3 participants