Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Equality exclude columns #829

Closed
wants to merge 6 commits into from
Closed

Conversation

seub
Copy link

@seub seub commented Aug 30, 2023

resolves #828

This is a:

  • documentation update
  • bug fix with no breaking changes
  • new functionality
  • a breaking change

All pull requests from community contributors should target the main branch (default).

Description & motivation

This PR adds an optional exclude_columns to the equality test (similar to compare_columns).

Checklist

  • This code is associated with an Issue which has been triaged and accepted for development.
  • I have verified that these changes work locally on the following warehouses (Note: it's okay if you do not have access to all warehouses, this helps us understand what has been covered)
    • BigQuery
    • Postgres
    • Redshift
    • Snowflake
  • I followed guidelines to ensure that my changes will work on "non-core" adapters by:
    • dispatching any new macro(s) so non-core adapters can also use them (e.g. the star() source)
    • using the limit_zero() macro in place of the literal string: limit 0
    • using dbt.type_* macros instead of explicit datatypes (e.g. dbt.type_timestamp() instead of TIMESTAMP
  • I have updated the README.md (if applicable)
  • I have added tests & descriptions to my models (and macros if applicable)
  • I have added an entry to CHANGELOG.md

@seub
Copy link
Author

seub commented Aug 30, 2023

I created this PR before the associated issue (#828) was triaged and accepted for development, my apologies! I had forgotten that all PRs must be associated with an issue, oops.

For potential reviewers (if the PR makes it that far), I do have one important technical question: I replaced

{%- if not compare_columns -%}
    {%- do dbt_utils._is_ephemeral(model, 'test_equality') -%}
    {%- set compare_columns = adapter.get_columns_in_relation(model) | map(attribute='quoted') -%}
{%- endif -%}

by

{%- if not compare_columns -%}
    {%- do dbt_utils._is_ephemeral(model, 'test_equality') -%}
    {%- set compare_columns = adapter.get_columns_in_relation(model) | map(attribute='name') | list -%}
{%- endif -%}

I don't know why map(attribute='quoted') was there. I figured it would be fine to get rid of it because when compare_columns is provided, the columns aren't quoted. The new version seems to work fine (all integration tests passed) but it would be good if someone who knows the intention behind the "quoted" in the current version could confirm that my change won't break something.

Edit: Looks like integration tests failed in Snowflake, maybe it's related. Some help would be welcome.

@seub
Copy link
Author

seub commented Aug 30, 2023

Update: I figured out what the deal was with Snowflake and quoted columns. I fixed it with a new version that is a bit uglier but seems to work. However it's now failing for BigQuery apparently. Some help would be great 🙏

@seub
Copy link
Author

seub commented Aug 31, 2023

Update: I think I finally figured out how to handle quoting in a satisfactory way (tricky! But I learned something). Note: I tested locally that everything works as expected in Snowflake when some columns are quoted (yuk).

I think the PR is good to go! Thanks for reviewing 🙏

Copy link
Contributor

@joellabes joellabes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At a glance this looks good! One note is that only one of compare_columns and exclude_columns should be accepted (see union_relations for an example)

I'll try and find someone for a more robust review and getting this merged 🎉

@seub
Copy link
Author

seub commented Sep 20, 2023

Thank you @joellabes !

About forbidding to have both compare_columns and exclude_columns: I considered that; I also saw an example of that in the macro get_merge_update_columns in dbt-core here.

We could easily forbid it explicitly, but imho there's no need. The final set of columns that will be compared is the subset of compare_columns that are not in exclude_columns, that seems pretty natural.

@rlh1994
Copy link
Contributor

rlh1994 commented Sep 25, 2023

I think this is a duplicate of #765 (although that uses ignore_columns, no particular reason)?

@seub
Copy link
Author

seub commented Oct 24, 2023

@rlh1994 Sorry for the slow response!

Indeed it is duplicate, my bad for missing it. Do you know what's the hold up with your PR? Looks like it's been open for more than 6 months.

If I compare what I wrote to your PR, I only see 2 tiny differences:

  • Terminology: I went for exclude_columns instead of your ignore_columns. Either is perfectly reasonable, I went for exclude to mimic the merge_exclude_columns in dbt-core.
  • My PR allows to have both compare_columns and exclude_columns. I don't think it hurts to allow both, but it also makes sense to not allow it.

I tried to be very careful about quoting (depending on whether compare_columns is provided) and I think my PR works. I was pleased to see that your logic does exactly the same as mine. I use get_quoted_csv in one case and you don't but the result is the same.

I'm looking forward to your PR being merged and I can close this one. Unless you want to split yours to only add precision? I don't think it's worth it, unless something about that is holding up your PR from being approved?

@rlh1994
Copy link
Contributor

rlh1994 commented Oct 24, 2023

@seub I think exclude_columns makes more sense, I'll swap mine over to that. Mine also additionally checks if one of the tables is missing one of the columns (which was previously missed in one direction, I can't remember which).

I don't know of any reasons the PR isn't progressing tbh, I think it's ready to go but just awaiting a review

@dbeatty10 dbeatty10 added the enhancement New feature or request label Apr 18, 2024
@dbeatty10
Copy link
Contributor

Thank you @seub and @rlh1994 !

Closing this in favor of #765

@dbeatty10 dbeatty10 closed this Apr 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add optional exclude_columns to the equality test
4 participants