Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

group by field from related model instead of field from original model #360

Conversation

kirsinger
Copy link

This is a:

  • bug fix PR with no breaking changes — please ensure the base branch is master
  • new functionality — please ensure the base branch is the latest dev/ branch
  • a breaking change — please ensure the base branch is the latest dev/ branch

Description & motivation

The cardinality_equality test is grouping by the wrong field in the related model. Rather than using the field specified as a parameter to the test, it's using the field from the original model. This means if the field doesn't exist on the related model, you'll get a "column does not exist" error when running the test.

For example consider two models: car and make.

models:
  - name: car
    colums:
      - name: id
      - name: make_id

  - name: make
    colums:
      - name: id
      - name: name

Each car has a make (eg: Ford, Hyundai etc). Each make is represented at least once in our set of cars (ie: there's at least one Ford, one Hyundai etc).

Given this, I would expect the cardinality of the make_id column in the car model to equal the cardinality of the id column in the make model, such that the following test would pass:

models:
  - name: car
    colums:
      - name: id
      - name: make_id
        tests:
          - dbt_utils.cardinality_equality:
              to: ref('make')
              field: id

Instead you'll get an error when the test runs along the lines of:

Column "make_id" does not exist in table "make"

This is because the GROUP BY statement that's grouping the related model (ie: the model specified by the to parameter in the test) is referencing the column that's being tested, rather than the column specified as the field parameter.

The fix here was simply to use the field parameter in the GROUP BY for the related model.

Checklist

  • I have verified that these changes work locally on the following warehouses (Note: it's okay if you do not have access to all warehouses, this helps us understand what has been covered)
    • BigQuery
    • Postgres
    • Redshift
    • Snowflake
  • I have updated the README.md (if applicable)
  • I have added tests & descriptions to my models (and macros if applicable)
  • I have added an entry to CHANGELOG.md

I don't think any README or test updates are necessary. Unsure if I need to add an entry to the CHANGELOG.md. Please advise if there's something required here.

@kirsinger kirsinger requested a review from clrcrl as a code owner May 6, 2021 08:25
@joellabes
Copy link
Contributor

This is fixed in #334 which has been merged but not released yet. In the meantime, utils 0.6.3 doesn’t have the regression included so you can switch back to that as a workaround!

@kirsinger
Copy link
Author

Ah nice, thanks for flagging that. I'll pin the old version until the new release.

@kirsinger kirsinger closed this May 7, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants