[CT-3417] [Feature] more usable compiled code output from uniqueness tests #9139

Mikael-Thorup · 2023-11-23T15:49:12Z

Is this your first time submitting a feature request?

I have read the expectations for open source contributors
I have searched the existing issues, and I could not find an existing issue for this feature
I am requesting a straightforward extension of existing dbt functionality, rather than a Big Idea better suited to a discussion

Describe the feature

whenever I have a failing uniqueness test, I end up spending too long rewriting the compiled output from dbt test, e.g.

select
    id as unique_field,
    count(*) as n_records

from some_model
where id is not null
group by id
having count(*) > 1

into

with
    src as (select * from some_model),
    dupes as (
        select 
            id as unique_field, 
            count(*) as n_records

        from src
        where id is not null
        group by id
        having count(*) > 1
    )
select *
from src
where id in (select unique_field from dupes)

in order to find out e.g. which of the upstream models caused the duplicates. I'd propose the compiled code from dbt test uniqueness tests follows the following format instead:

with
    src as (select * from some_model),
    dupes as (
        select 
            id as unique_field, 
            count(*) as n_records

        from src
        where id is not null
        group by id
        having count(*) > 1
    )
select * from dupes
--select * from src where id in (select unique_field from dupes)

this would make it easier rewrite the output: comment out select * from dupes and uncomment the line below.

Describe alternatives you've considered

my current alternative is to have the abovementioned query saved, but then I need to rewrite both the unique column, and the table name every time.

Who will this benefit?

debugging models with failing uniqueness tests.

Are you interested in contributing this feature?

I'd love to but I'd need some help as I mostly know sql and jinja

Anything else?

wondering if this would also be useful for not_null tests and potentially other tests, but it's a lot easier rewriting

select some_column
from some_table
where some_column is null

into

select *
from some_table
where some_column is null

so this wouldn't benefit as much from the approach I'm suggesting.

The text was updated successfully, but these errors were encountered:

dbeatty10 · 2023-11-28T02:26:19Z

Thanks for proposing this idea @Mikael-Thorup 💡

Downsides

This would change the schema and number of rows for anyone that has store_failures_as or store_failures configured.

But the primary reason for the way it is can be explained by #4777 -- including all columns is more expensive in some databases.

Upside

Your proposal would allow seeing the full context of the failed test by showing the whole row(s).

You can see how the not_null test tries to balance both considerations here by including all columns when using store_failures, but only the affected column otherwise.

Workaround

It is possible for you to override this behavior on your end.

For example, you can implement your desired output by creating a file in your macros directory with the following content:

{% macro default__test_unique(model, column_name) %}
with
    src as (select * from {{ model }}),
    dupes as (
        select 
            {{ column_name }} as unique_field, 
            count(*) as n_records

        from src
        where {{ column_name }} is not null
        group by {{ column_name }}
        having count(*) > 1
    )
select *
from src
where {{ column_name }} in (select unique_field from dupes)
{% endmacro %}

You can also override the other built-in generic data tests listed here in a similar fashion.

I didn't fully test the example above, so your mileage may vary.

Summary

For the reasons above, we aren't keen to change this. We might reconsider if we hear a lot of folks clamoring for this or if we decide the positive impact would be worth it.

In the meantime, I'm going to close this as "not planned", but we'll stay open to feedback if anyone wants to comment on this issue (or creates a similar issue) in the future.

Mikael-Thorup · 2023-11-29T09:35:10Z

Hi @dbeatty10 , thanks for the writeup! I think I have a workaround to avoid the downsides:

with
    dupes as (
        select 
            id as unique_field, 
            count(*) as n_records

        from {{ref('some_model')}}
        where id is not null
        group by id
        having count(*) > 1
    )
select unique_field, n_records from dupes
--select * from {{ref('some_model')}} where id in (select unique_field from dupes)

But also very OK if you don't wanna re-open the PR; I can just implement this through the workaround you mention 🙏

Mikael-Thorup added enhancement New feature or request triage labels Nov 23, 2023

github-actions bot changed the title ~~[Feature] more usable compiled code output from uniqueness tests~~ [CT-3417] [Feature] more usable compiled code output from uniqueness tests Nov 23, 2023

dbeatty10 closed this as not planned Won't fix, can't repro, duplicate, stale Nov 28, 2023

dbeatty10 added wontfix Not a bug or out of scope for dbt-core and removed triage labels Nov 28, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CT-3417] [Feature] more usable compiled code output from uniqueness tests #9139

[CT-3417] [Feature] more usable compiled code output from uniqueness tests #9139

Mikael-Thorup commented Nov 23, 2023

dbeatty10 commented Nov 28, 2023

Mikael-Thorup commented Nov 29, 2023

[CT-3417] [Feature] more usable compiled code output from uniqueness tests #9139

[CT-3417] [Feature] more usable compiled code output from uniqueness tests #9139

Comments

Mikael-Thorup commented Nov 23, 2023

Is this your first time submitting a feature request?

Describe the feature

Describe alternatives you've considered

Who will this benefit?

Are you interested in contributing this feature?

Anything else?

dbeatty10 commented Nov 28, 2023

Downsides

Upside

Workaround

Summary

Mikael-Thorup commented Nov 29, 2023