You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have searched the existing issues, and I could not find an existing issue for this feature
I am requesting a straightforward extension of existing dbt functionality, rather than a Big Idea better suited to a discussion
Describe the feature
whenever I have a failing uniqueness test, I end up spending too long rewriting the compiled output from dbt test, e.g.
select
id as unique_field,
count(*) as n_records
from some_model
where id is not nullgroup by id
havingcount(*) >1
into
with
src as (select*from some_model),
dupes as (
select
id as unique_field,
count(*) as n_records
from src
where id is not nullgroup by id
havingcount(*) >1
)
select*from src
where id in (select unique_field from dupes)
in order to find out e.g. which of the upstream models caused the duplicates. I'd propose the compiled code from dbt test uniqueness tests follows the following format instead:
with
src as (select*from some_model),
dupes as (
select
id as unique_field,
count(*) as n_records
from src
where id is not nullgroup by id
havingcount(*) >1
)
select*from dupes
--select * from src where id in (select unique_field from dupes)
this would make it easier rewrite the output: comment out select * from dupes and uncomment the line below.
Describe alternatives you've considered
my current alternative is to have the abovementioned query saved, but then I need to rewrite both the unique column, and the table name every time.
Who will this benefit?
debugging models with failing uniqueness tests.
Are you interested in contributing this feature?
I'd love to but I'd need some help as I mostly know sql and jinja
Anything else?
wondering if this would also be useful for not_null tests and potentially other tests, but it's a lot easier rewriting
select some_column
from some_table
where some_column is null
into
select*from some_table
where some_column is null
so this wouldn't benefit as much from the approach I'm suggesting.
The text was updated successfully, but these errors were encountered:
github-actionsbot
changed the title
[Feature] more usable compiled code output from uniqueness tests
[CT-3417] [Feature] more usable compiled code output from uniqueness tests
Nov 23, 2023
But the primary reason for the way it is can be explained by #4777 -- including all columns is more expensive in some databases.
Upside
Your proposal would allow seeing the full context of the failed test by showing the whole row(s).
You can see how the not_null test tries to balance both considerations here by including all columns when using store_failures, but only the affected column otherwise.
Workaround
It is possible for you to override this behavior on your end.
For example, you can implement your desired output by creating a file in your macros directory with the following content:
{% macro default__test_unique(model, column_name) %}
with
src as (select*from {{ model }}),
dupes as (
select
{{ column_name }} as unique_field,
count(*) as n_records
from src
where {{ column_name }} is not nullgroup by {{ column_name }}
havingcount(*) >1
)
select*from src
where {{ column_name }} in (select unique_field from dupes)
{% endmacro %}
You can also override the other built-in generic data tests listed here in a similar fashion.
I didn't fully test the example above, so your mileage may vary.
Summary
For the reasons above, we aren't keen to change this. We might reconsider if we hear a lot of folks clamoring for this or if we decide the positive impact would be worth it.
In the meantime, I'm going to close this as "not planned", but we'll stay open to feedback if anyone wants to comment on this issue (or creates a similar issue) in the future.
Hi @dbeatty10 , thanks for the writeup! I think I have a workaround to avoid the downsides:
with
dupes as (
select
id as unique_field,
count(*) as n_records
from {{ref('some_model')}}
where id is not nullgroup by id
havingcount(*) >1
)
select unique_field, n_records from dupes
--select * from {{ref('some_model')}} where id in (select unique_field from dupes)
But also very OK if you don't wanna re-open the PR; I can just implement this through the workaround you mention 🙏
Is this your first time submitting a feature request?
Describe the feature
whenever I have a failing uniqueness test, I end up spending too long rewriting the compiled output from
dbt test
, e.g.into
in order to find out e.g. which of the upstream models caused the duplicates. I'd propose the compiled code from
dbt test
uniqueness tests follows the following format instead:this would make it easier rewrite the output: comment out
select * from dupes
and uncomment the line below.Describe alternatives you've considered
my current alternative is to have the abovementioned query saved, but then I need to rewrite both the
unique
column, and the table name every time.Who will this benefit?
debugging models with failing uniqueness tests.
Are you interested in contributing this feature?
I'd love to but I'd need some help as I mostly know sql and jinja
Anything else?
wondering if this would also be useful for not_null tests and potentially other tests, but it's a lot easier rewriting
into
so this wouldn't benefit as much from the approach I'm suggesting.
The text was updated successfully, but these errors were encountered: