Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

User-supplied fail_calc for tests #3321

Closed
jtcohen6 opened this issue May 5, 2021 · 2 comments
Closed

User-supplied fail_calc for tests #3321

jtcohen6 opened this issue May 5, 2021 · 2 comments
Assignees
Labels
dbt tests Issues related to built-in dbt testing functionality enhancement New feature or request
Milestone

Comments

@jtcohen6
Copy link
Contributor

jtcohen6 commented May 5, 2021

Describe the feature

Today, the 'test' materialization hard-codes count(*) as the way to calculate failures from a test:

https://github.com/fishtown-analytics/dbt/blob/26fb58bd1b08218781b182918f5ed4dec3f735d9/core/dbt/include/global_project/macros/materializations/test.sql#L4-L7

I think that makes sense 90% of the time in the general case, but we want to give users the ability to customize the "failure calculation" if they'd like it to be something other than count(*). This is important for backward compatibility, since schema tests could previously calculate and return whatever numeric value they wanted. In the wild, this could be as simple as sum(column) instead of count(*), or it could be as complex as the dbt_utils.equality test:

select count(*) from unioned) +
        (select abs(
            (select count(*) from a_minus_b) -
            (select count(*) from b_minus_a)
            )

I'm hopeful this is quite straightforward to implement—it's just a matter of pulling in the fail_calc and templating it into the materialization.

Questions

  • Should fail_calc be a test config or a test property? I lean toward property, since I think this is an essential component of the test definition and less like something that wants to be set for many different types of tests at once, e.g. from dbt_project.yml. (In a post-Set configs in schema.yml files #2401 world, this is hopefully a less meaningful distinction!)
  • Could this have potentially strange interactions with % values of warn_if / error_if (Net-new test configs #3258)? Yes! I don't think we need to solve for every edge case there now.
@jtcohen6 jtcohen6 added enhancement New feature or request dbt tests Issues related to built-in dbt testing functionality labels May 5, 2021
@jtcohen6 jtcohen6 added this to the Margaret Mead milestone May 5, 2021
@kwigley kwigley self-assigned this May 11, 2021
@jtcohen6
Copy link
Contributor Author

Update: We're going to make this a test config, following the pattern sketched out in #3258.

I don't think it makes a ton of sense for users to override the default configs set by the generic test definition, but hey, they'll be able to if they want to. I could even see this being compelling—let's say you want the unique test to calculate failure as the number of original rows containing a duplicate, rather than the length of the set of duplicate values.

@jtcohen6
Copy link
Contributor Author

jtcohen6 commented Jun 2, 2021

Resolved by #3392

@jtcohen6 jtcohen6 closed this as completed Jun 2, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dbt tests Issues related to built-in dbt testing functionality enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants