Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add precision + exclude_columns option to equality test #765

Merged
merged 19 commits into from
Mar 5, 2024

Conversation

rlh1994
Copy link
Contributor

@rlh1994 rlh1994 commented Feb 6, 2023

resolves #757, resolves #734, resolves #785, replaces #737, resolves #828

This is a:

  • documentation update
  • bug fix with no breaking changes
  • new functionality
  • a breaking change

All pull requests from community contributors should target the main branch (default).

Description & motivation

This adds an argument to the equality test that allows for floating point columns (column is_numeric or is_float) to be more easily compared and reducing false errors raised. It is fully backwards compatible (default arguments retain existing behaviour and code), and has been tested on and supports BQ/Postgres/Databricks/Snowflake/Redshift.

It also includes a few new tests both positive and negative assertions.

It also includes the work done in #737 to save rebasing if one of these was merged independently.

Finally, it adds a fix for #785 that ensures when no compare/ignore columns list is provided that both tables have the same columns.

Checklist

  • This code is associated with an Issue which has been triaged and accepted for development.
  • I have verified that these changes work locally on the following warehouses (Note: it's okay if you do not have access to all warehouses, this helps us understand what has been covered)
    • BigQuery
    • Postgres
    • Redshift
    • Snowflake
    • Databricks
  • I followed guidelines to ensure that my changes will work on "non-core" adapters by:
    • dispatching any new macro(s) so non-core adapters can also use them (e.g. the star() source)
    • using the limit_zero() macro in place of the literal string: limit 0
    • using dbt.type_* macros instead of explicit datatypes (e.g. dbt.type_timestamp() instead of TIMESTAMP
  • I have updated the README.md (if applicable)
  • I have added tests & descriptions to my models (and macros if applicable)
  • I have added an entry to CHANGELOG.md

@rlh1994 rlh1994 force-pushed the add-precision-to-equality-test branch 2 times, most recently from 90327e1 to 5ff1911 Compare February 7, 2023 12:01
@rlh1994 rlh1994 changed the title add precision option to equality test add precision + exclude_columns option to equality test Feb 7, 2023
@rlh1994 rlh1994 force-pushed the add-precision-to-equality-test branch from 5ff1911 to 31405d5 Compare February 7, 2023 13:00
@adammarples
Copy link

adammarples commented Apr 5, 2023

@rlh1994 this PR would be very useful to me. Any idea why the tests are failing? It looks like errors in seeding on redshift but it's not clear to me why.

@rlh1994
Copy link
Contributor Author

rlh1994 commented Apr 5, 2023

@adammarples I don't think it's related to this change as it seems to be failing across multiple PRs, I was just waiting for @joellabes to review

Copy link
Contributor

@joellabes joellabes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You and @brunocostalopes have done some great stuff here @rlh1994!

I particularly love the robust set of tests.

I will want to come back and do a deeper read of the specific jinja in the equality.sql file itself, but it makes sense to do that in one hit alongside any changes necessary with #785, as I imagine that'll lead to some changes too.

The good news is that the tests are now up and running again on Redshift, so we should be able to move much faster and more confidence.

@rlh1994
Copy link
Contributor Author

rlh1994 commented May 2, 2023

@joellabes I've added it in, but a few notes on this final change:

  1. I've made it a separate block, which means that currently it hits the warehouse twice for the model to get the columns - I tried to do this as part of the normal get_columns... part but it seems like set() alters the object it is acting on (at least they way I was using it for the generator that's returned) which cause a load of new failures when it shouldn't have. I'm not aware if there is the equivalent of a deep_copy in dbt/jinja so I've done it this way just to be sure.
  2. It feels like this should actually raise a compiler error rather than fail the test itself, but I wasn't sure how to add tests for such a case, let me know your preference on this and I can swap it out to raise an error instead.
  3. I think this would now qualify as a breaking change, as previously passing tests would now fail. A workaround would be to add a new argument to the macro to enable/disable this check and set the default (until a breaking change version is released) to be false.
  4. I had to make a change to one of the other tests as this highlighted that the tables had different columns (which means it works!)

@nachimehta
Copy link

also looking forward to this, at the moment the equality silently passes if your comparison model has more columns than the tested model, which seems like unexpected behavior. I believe your fix rectifies this issue.

@joellabes
Copy link
Contributor

joellabes commented May 16, 2023

  • I'm not aware if there is the equivalent of a deep_copy in dbt/jinja so I've done it this way just to be sure.

Me either, I think this is OK

  • It feels like this should actually raise a compiler error rather than fail the test itself, but I wasn't sure how to add tests for such a case, let me know your preference on this and I can swap it out to raise an error instead.

Yeah it should raise a compiler error. I think that means we can't have a test running for it 😬 but better to do the right thing on the end user's side.

  • I think this would now qualify as a breaking change, as previously passing tests would now fail. A workaround would be to add a new argument to the macro to enable/disable this check and set the default (until a breaking change version is released) to be false.

They’ll fail because the data is invalid though, right? So I am OK with that - it's a bug that we're fixing as opposed to deviating from documented behaviour

4. I had to make a change to one of the other tests as this highlighted that the tables had different columns (which means it works!)

Ha! Great news 🎉

Will go over this properly soon

@rlh1994 rlh1994 force-pushed the add-precision-to-equality-test branch 5 times, most recently from 778c45e to cda354a Compare May 22, 2023 08:29
@rlh1994
Copy link
Contributor Author

rlh1994 commented Jul 4, 2023

Hey @joellabes just wondering if you know when you'll have a chance to look at this please?

@rlh1994 rlh1994 mentioned this pull request Sep 25, 2023
17 tasks
@rlh1994 rlh1994 force-pushed the add-precision-to-equality-test branch from 983d292 to 4ef3c06 Compare September 25, 2023 15:02
@rlh1994
Copy link
Contributor Author

rlh1994 commented Oct 24, 2023

Redshift failing due to a deadlock, not a failing test

@brunocostalopes
Copy link
Contributor

@rlh1994 I noticed you've updated the equality test and renamed the arguments from ignore to exclude, just a note that when discussing with @joellabes he was of the opinion that we should name them "ignore" rather than "exclude". See here: #734 (comment)

@rlh1994
Copy link
Contributor Author

rlh1994 commented Oct 25, 2023

@rlh1994 I noticed you've updated the equality test and renamed the arguments from ignore to exclude, just a note that when discussing with @joellabes he was of the opinion that we should name them "ignore" rather than "exclude". See here: #734 (comment)

Ah okay, I went with exclude based on this comment here: #829 (comment)

To be honest I have no strong feelings either way and can easily change it once this gets reviewed 😅

@brunocostalopes
Copy link
Contributor

Oh, I missed that PR. So we now have the same feature implemented in #737 (my original PR) #765 (which includes those changes) and #829 (new PR). I wonder if any of these will ever get merged :(

@seub
Copy link

seub commented Oct 25, 2023

@brunocostalopes Sorry I had missed your PR as well! It appears that there's a real need for these ignore/exclude columns! ;)

It would indeed help avoid a stack of redundant PRs if some of them finally got reviewed/approved @joellabes

Copy link
Contributor

@gwenwindflower gwenwindflower left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Left one comment for consideration but otherwise this seems good to me and everything is passing now in CI. Thanks again for your patience on this!

macros/generic_tests/equality.sql Outdated Show resolved Hide resolved
@gwenwindflower gwenwindflower added this pull request to the merge queue Mar 5, 2024
Merged via the queue into dbt-labs:main with commit 23da1f4 Mar 5, 2024
4 checks passed
@gwenwindflower
Copy link
Contributor

Love where you landed with the new comments @rlh1994 ! Thanks again for the patience, this is a fantastic addition to the package, really appreciate the work on this one. 💜

@jomccr
Copy link

jomccr commented Mar 7, 2024

This has already helped me a ton, thanks for this contribution!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
8 participants