Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add option to ignore columns in equality test #737

Conversation

brunocostalopes
Copy link
Contributor

@brunocostalopes brunocostalopes commented Dec 8, 2022

resolves #734

This is a:

  • documentation update
  • bug fix with no breaking changes
  • new functionality
  • a breaking change

All pull requests from community contributors should target the main branch (default).

Description & motivation

This is to include an option to ignore columns in the equality test. Currently the only way to do this is to use the compare_columns option and include all columns we want to compare, leaving out any that we want to exclude. In certain cases this might result in large yml files, difficult to maintain. The ignore_columns option fixes that.

Checklist

  • This code is associated with an Issue which has been triaged and accepted for development.
  • I have verified that these changes work locally on the following warehouses (Note: it's okay if you do not have access to all warehouses, this helps us understand what has been covered)
    • BigQuery
    • Postgres
    • Redshift
    • Snowflake
  • I followed guidelines to ensure that my changes will work on "non-core" adapters by:
    • dispatching any new macro(s) so non-core adapters can also use them (e.g. the star() source)
    • using the limit_zero() macro in place of the literal string: limit 0
    • using dbt.type_* macros instead of explicit datatypes (e.g. dbt.type_timestamp() instead of TIMESTAMP
  • I have updated the README.md (if applicable)
  • I have added tests & descriptions to my models (and macros if applicable)
  • I have added an entry to CHANGELOG.md

@brunocostalopes brunocostalopes marked this pull request as ready for review December 8, 2022 00:40
@brunocostalopes brunocostalopes marked this pull request as draft December 8, 2022 10:08
@brunocostalopes brunocostalopes marked this pull request as ready for review December 8, 2022 12:36
@joellabes
Copy link
Contributor

Thanks @brunocostalopes - seems like something odd is happening with the CI, I'll have to look into that! Code itself looks good though 🎉

@moreaupascal56
Copy link

Hello! Nice, would love this feature! Just a suggestion:

don't you think you could remove the elements from compare_columns instead of creating a second list ?
the loop would be:

for column in ignore_columns:
compare_columns.remove(column)

So it would use only existing objects and loop only on ignore_columns elements (which schould be fewer than compare_columns ones) .
I do not really use Jinja a lot but the remove syntax is a Python one so it is maybe possible.

@brunocostalopes
Copy link
Contributor Author

@moreaupascal56 thanks for the tip. I think you are right. If we remove instead of append it makes the code neater and I've pushed those changes.

I've also tried your other suggestion of looping through the elements in the ignore list first but that would make the code a bit more complex when removing from the original compare list so I've decided against it as I don't think the potential performance gains would be that relevant here.

@joellabes seems like the checks are passing again, thank you for looking into it. I think the PR is now ready for your review.

@pauldalewilliams
Copy link

Excited to see this come through - have been running into a need for this during legacy code conversion.

@pauldalewilliams
Copy link

@joellabes @deanna-minnick Can this PR get reviewed and approved? I'd love to have this feature available.

@brunocostalopes
Copy link
Contributor Author

brunocostalopes commented Jan 24, 2023

I've just realised that the latest change I've pushed has a bug which I can't explain.

This is the code with the issue:

    {%- for column in compare_columns -%}
        {%- if column | lower in ignore_columns -%}
            {% do compare_columns.remove(column) %}
        {%- endif %}
    {%- endfor %}

whenever a match is found and a column is removed from compare_columns, the next column in the list is ignored and it is not tested against the ignore_columns list.

I can't understand why this is happening. If anyone has a clue please let me know, I would be happy to fix. In the meantime, I'm going to revert back to the previous version which, even thought it is not the tidiest implementation, it seems to work well.

This reverts commit 4112453.
@pauldalewilliams
Copy link

I've just realised that the latest change I've pushed has a bug which I can't explain.

This is the code with the issue:

    {%- for column in compare_columns -%}
        {%- if column | lower in ignore_columns -%}
            {% do compare_columns.remove(column) %}
        {%- endif %}
    {%- endfor %}

whenever a match is found and a column is removed from compare_columns, the next column in the list is ignored and it is not tested against the ignore_columns list.

I can't understand why this is happening. If anyone has a clue please let me know, I would be happy to fix. In the meantime, I'm going to revert back to the previous version which, even thought it is not the tidiest implementation, it seems to work well.

I believe @moreaupascal56's suggestion was that you wouldn't loop over compare_columns anyway. Why not try it like this? Should avoid the bug you're seeing and be a faster loop anyway.

    {%- for column in ignore_columns -%}
        {%- if column | lower in compare_columns -%}
            {% do compare_columns.remove(column) %}
        {%- endif %}
    {%- endfor %}

@brunocostalopes
Copy link
Contributor Author

Thanks for the suggestion @pauldalewilliams. I've tried your suggestion but I don't think it works without adding more complexity to handle the different upper/lower case scenarios. I think I prefer the current solution.

@joellabes is there still interest in adding this feature? If so, anything else I can do to help merging? Happy to take any additional suggestions on board and make the necessary changes.

@github-actions
Copy link

This PR has been marked as Stale because it has been open with no activity as of late. If you would like the PR to remain open, please comment on the PR or else it will be closed in 7 days.

@github-actions github-actions bot added the Stale label Oct 30, 2023
Copy link

github-actions bot commented Nov 6, 2023

Although we are closing this PR as stale, it can still be reopened to continue development. Just add a comment to notify the maintainers.

@github-actions github-actions bot closed this Nov 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add option to ignore columns in equality test
5 participants