Show unexpected values when accepted_values test fails #3265

brylie · 2021-04-15T08:12:09Z

When an accepted_values test fails, print the unexpected values.

I am currently writing a test against a column with many possible values, where we want to know if any new values appear in the column that might affect our analytics.

When the accepted_values test fails, it shows a message like:

Got N results, expected 0.

However, it doesn't display information about the unexpected results

Showing the unexpected results would make it easier to investigate and modify the test.

I can help contribute to this feature.

The text was updated successfully, but these errors were encountered:

jtcohen6 · 2021-04-15T18:41:16Z

@brylie What a cool idea! I think that the requisite pieces may just be falling into place, so that we could make this happen.

Back in #2593, I made a detailed proposal for how tests could store their (failing) results in database tables. To date, each schema test query has returned a single numeric value—count(*) or whatever else—so it would have been impossible to save and present an actual table of failing values. In order to make that happen, we've been working on creating a standard test "materialization" (#3181), and implementing it for ~~schema~~ generic tests as well (#3192). Those generic tests will now return a set of rows—whether that's simply rows from the table where a column is null, or rows from an aggregate query that rolls up counts by unacceptable values.

All of which is to say, if we're doing all that and can save those results in a database table, in addition to grabbing the total failure count/value, we could also print a few of those results to stdout using good old results.print_table():

Failure in test accepted_values__orders__order_status (models/resources.yml)
  Found 7 unacceptable values of orders.order_status, erroring because >0.
  
    | VALUE_FIELD |      COUNT |
    | ----------- | ---------- |
    | blue        | 17,452,443 |
    | red         |  6,105,277 |
    | green       |  1,111,547 |
  
    For full results:
    --------------------------------------------------------------------------------
    select * from dev_jcohen__dbt_test_results.accepted_values__orders__order_status
    --------------------------------------------------------------------------------

  compiled SQL at target/compiled/models/resources.yml/schema_test/not_null__orders__order_status.sql

I think we'd put in limitations like:

Print top 5 rows, and up to 5 columns
An optional property of generic test blocks that defines order_by (in the case above, order by 2 desc)

Let me know what you think about the above, and let's see if we can sneak in this (or something like it) alongside the other test functionality coming in v0.20.0 :)

brylie · 2021-04-16T08:15:01Z

Yeah, that seems like a good idea.

How does the error message "Got N results, expected 0." get the row count? Would it be possible to run a select distinct(column) from result when the error message is generated?

jtcohen6 · 2021-04-16T13:28:13Z

I think this imagines us rewriting the default accepted_values test to be:

with all_values as (

    select
        {{ column_name }} as value_field,
        count(*) as count

    from {{ model }}
    group by 1

),

validation_errors as (

    select
        value_field,
        count

    from all_values
    where value_field not in (
        {% for value in values -%}
            {% if quote_values -%}
            '{{ value }}'
            {%- else -%}
            {{ value }}
            {%- endif -%}
            {%- if not loop.last -%},{%- endif %}
        {%- endfor %}
    )
)

select * from validation_errors

jtcohen6 · 2021-09-30T20:25:27Z

I was able to get this working (sorta) with some janky code that leverages the existing functionality around dbt seed --show. That functionality is quite old and not well maintained, and the steps I took below are not ones I'd recommend following if we tried to do this for real. But, fun to play around with!

Steps:

core/dbt/main.py: Add --show flag to test subparser
core/dbt/flags.py: Add SHOW, set from args
core/dbt/include/global_project/macros/materializations/test.sql: Add logic to the if should_store_failures() conditional branch that, if flags.SHOW, runs an extra statement named results that wraps {{ main_sql }} and fetches results into the context
core/dbt/task/test.py: In the execute_test method, load the bonus results statement from the context, and pass its agate table form into the agate_table property of the RunResult returned by the task's execute method
core/dbt/task/printer.py: Include logic in print_run_result_error that checks for result.agate_table and, if available, prints it using the same logic as in the seed task

All of that for:

$ dbt test --show --store-failures
Running with dbt=0.21.0-rc1
Found 1 model, 1 test, 1 snapshot, 0 analyses, 350 macros, 0 operations, 1 seed file, 0 sources, 0 exposures

20:06:50 | Concurrency: 5 threads (target='dev')
20:06:50 |
20:06:50 | 1 of 1 START test unique_country_codes_two_letter_iso_code........... [RUN]
20:06:50 | 1 of 1 FAIL 247 unique_country_codes_two_letter_iso_code............. [FAIL 247 in 0.06s]
20:06:50 |
20:06:50 | Finished running 1 test in 0.16s.

Completed with 1 error and 0 warnings:

Failure in test unique_country_codes_two_letter_iso_code (data/whatever.yml)
  Got 247 results, configured to fail if != 0

  compiled SQL at target/compiled/my_new_project/data/whatever.yml/unique_country_codes_two_letter_iso_code.sql

See test failures:
---------------------------------------------------------------------------------------------
select * from "jerco"."dbt_jcohen_dbt_test__audit"."unique_country_codes_two_letter_iso_code"
---------------------------------------------------------------------------------------------
Random sample:
--------------
| unique_field | n_records |
| ------------ | --------- |
| PA           |         2 |
| HM           |         2 |
| NC           |         2 |
| PE           |         2 |
| NR           |         2 |
| KY           |         2 |
| RO           |         2 |
| FO           |         2 |
| UZ           |         2 |
| GA           |         2 |
| ...          |       ... |


Done. PASS=0 WARN=0 ERROR=1 SKIP=0 TOTAL=1

github-actions · 2022-03-30T02:05:31Z

This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please remove the stale label or comment on the issue, or it will be closed in 7 days.

brylie added enhancement New feature or request triage labels Apr 15, 2021

jtcohen6 added dbt tests Issues related to built-in dbt testing functionality and removed triage labels Apr 15, 2021

jtcohen6 mentioned this issue May 8, 2021

dbt test --store-failures #3316

Merged

4 tasks

jtcohen6 mentioned this issue Sep 30, 2021

View data summary as part of dbt run. #3979

Closed

github-actions bot added the stale Issues that have gone stale label Mar 30, 2022

github-actions bot closed this as completed Apr 7, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Show unexpected values when accepted_values test fails #3265

Show unexpected values when accepted_values test fails #3265

brylie commented Apr 15, 2021

jtcohen6 commented Apr 15, 2021 •

edited

Loading

brylie commented Apr 16, 2021

jtcohen6 commented Apr 16, 2021

jtcohen6 commented Sep 30, 2021

github-actions bot commented Mar 30, 2022

Show unexpected values when accepted_values test fails #3265

Show unexpected values when accepted_values test fails #3265

Comments

brylie commented Apr 15, 2021

jtcohen6 commented Apr 15, 2021 • edited Loading

brylie commented Apr 16, 2021

jtcohen6 commented Apr 16, 2021

jtcohen6 commented Sep 30, 2021

github-actions bot commented Mar 30, 2022

jtcohen6 commented Apr 15, 2021 •

edited

Loading