Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

equality test passes when first table has a subset of second tables columns #785

Closed
1 of 5 tasks
rlh1994 opened this issue Apr 25, 2023 · 3 comments · Fixed by #765
Closed
1 of 5 tasks

equality test passes when first table has a subset of second tables columns #785

rlh1994 opened this issue Apr 25, 2023 · 3 comments · Fixed by #765
Labels
bug Something isn't working

Comments

@rlh1994
Copy link
Contributor

rlh1994 commented Apr 25, 2023

Describe the bug

When the equality test is used to compare two models and compare_columns is not provided the macro currently gets the list of columns from the first model only. If there are additional or different columns in this model when compared with compare_model the sql will fail to run and return an error, however when this first model contains a subset of the columns from the compare column no error occurs.

Steps to reproduce

  1. Create a project, add dbt-utils as as dependency, then create two seeds (for simplicity, the same applies to models), example:
    # data1.csv
    col1,col2
    a,b
    c,d
    
    # data2.csv
    col1,col2,col3
    a,b,y
    c,d,z
    
  2. Add a test on the first seed with compare_model set to the second seed, and the inverse
    version: 2
    
    seeds:
    - name: data1
      tests:
        - dbt_utils.equality:
            compare_model: ref('data2')
    
    - name: data2
      tests:
        - dbt_utils.equality:
            compare_model: ref('data1')
    
    
  3. run dbt deps, dbt seed, dbt test. The first test should pass, but the second will fail.

Expected results

Both tests would fail in the same way, or a custom alert would warn me the columns are different.

Actual results

Test 1 passes, test 2 fails.

Screenshots and log output

11:17:41  Running with dbt=1.4.5
11:17:42  Found 2 models, 2 tests, 0 snapshots, 0 analyses, 403 macros, 0 operations, 2 seed files, 0 sources, 0 exposures, 0 metrics
11:17:42  
11:17:42  Concurrency: 1 threads (target='postgres')
11:17:42  
11:17:42  1 of 2 START test dbt_utils_equality_data1_ref_data2_ .......................... [RUN]
11:17:42  1 of 2 PASS dbt_utils_equality_data1_ref_data2_ ................................ [PASS in 0.06s]
11:17:42  2 of 2 START test dbt_utils_equality_data2_ref_data1_ .......................... [RUN]
11:17:42  2 of 2 ERROR dbt_utils_equality_data2_ref_data1_ ............................... [ERROR in 0.04s]
11:17:42  
11:17:42  Finished running 2 tests in 0 hours 0 minutes and 0.23 seconds (0.23s).
11:17:42  
11:17:42  Completed with 1 error and 0 warnings:
11:17:42  
11:17:42  Database Error in test dbt_utils_equality_data2_ref_data1_ (models/test.yml)
11:17:42    column "col3" does not exist
11:17:42    LINE 48:     select "col1", "col2", "col3" from b
11:17:42                                        ^
11:17:42    HINT:  There is a column named "col3" in table "*SELECT* 1", but it cannot be referenced from this part of the query.
11:17:42    compiled Code at target/run/dbt_demo/models/test.yml/dbt_utils_equality_data2_ref_data1_.sql
11:17:42  
11:17:42  Done. PASS=1 WARN=0 ERROR=1 SKIP=0 TOTAL=2

System information

Which database are you using dbt with?

  • postgres
  • redshift
  • bigquery
  • snowflake
  • other (specify: ____________)

The output of dbt --version:

Core:
  - installed: 1.4.5
  - latest:    1.4.6 - Update available!

  Your version of dbt-core is out of date!
  You can find instructions for upgrading here:
  https://docs.getdbt.com/docs/installation

Plugins:
  - databricks: 1.4.2 - Update available!
  - bigquery:   1.4.3 - Up to date!
  - snowflake:  1.4.2 - Up to date!
  - redshift:   1.4.0 - Up to date!
  - postgres:   1.4.5 - Update available!
  - spark:      1.4.1 - Up to date!

Additional context

{%- if not compare_columns -%}
{%- do dbt_utils._is_ephemeral(model, 'test_equality') -%}
{%- set compare_columns = adapter.get_columns_in_relation(model) | map(attribute='quoted') -%}
{%- endif -%}

Are you interested in contributing the fix?

Sure, I think it should just need to pull the columns from both models and compare them, I could add it as part of #765 to save touching it twice if that's easier.

@joellabes
Copy link
Contributor

Yep I think doing this as part of 765 makes sense! Thanks @rlh1994

@joellabes joellabes removed the triage label May 2, 2023
rlh1994 added a commit to rlh1994/dbt-utils that referenced this issue May 2, 2023
rlh1994 added a commit to rlh1994/dbt-utils that referenced this issue May 22, 2023
rlh1994 added a commit to rlh1994/dbt-utils that referenced this issue May 22, 2023
rlh1994 added a commit to rlh1994/dbt-utils that referenced this issue Sep 25, 2023
@github-actions
Copy link

This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please comment on the issue or else it will be closed in 7 days.

@github-actions github-actions bot added the Stale label Oct 30, 2023
@rlh1994
Copy link
Contributor Author

rlh1994 commented Oct 30, 2023

I Have A Permit 7648

@github-actions github-actions bot removed the Stale label Oct 31, 2023
github-merge-queue bot pushed a commit that referenced this issue Mar 5, 2024
* add exclude columns to equality test

* add precision option to equality test

* CI fix?

* CI fix 2.0

* Update CHANGELOG.md

* Check for subset of columns (Close #785)

* cast type

* cast type across warehouses

* swap to copiler error, account for ignore columns

* Update CL

* allow for different cased names

* fix CL

* linting

* Rename to exclude_columns

* Fix typo

* Add package-lock.yaml to .gitignore

* Update comments

---------

Co-authored-by: bruno <[email protected]>
Co-authored-by: Joel Labes <[email protected]>
Co-authored-by: gwen windflower <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants