Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make complex tests’ purposes clearer in compiled SQL and dbt docs #2959

Closed
joellabes opened this issue Dec 16, 2020 · 5 comments
Closed

Make complex tests’ purposes clearer in compiled SQL and dbt docs #2959

joellabes opened this issue Dec 16, 2020 · 5 comments
Labels
dbt tests Issues related to built-in dbt testing functionality enhancement New feature or request stale Issues that have gone stale

Comments

@joellabes
Copy link
Contributor

Describe the feature

  • When a custom schema test fails, the first thing I do is to download the compiled sql and run it myself to see what’s gone wrong.
  • backsolving from the file’s name and contents of the query is more complex than it needs to be -what exactly does dbt_utils_expression_is_true_paypal__subscribers_is_trial_or_sub_active_is_true__is_trial_expired_is_false_and_is_subscription_expired_is_false mean?
  • I currently include a comment in the yaml file explaining in English what the test does, but that’s not where I’m looking when a test fails.
        # active accounts can't have expired
        - dbt_utils.expression_is_true:
            expression: 'is_trial_expired is false and is_subscription_expired is false'
            condition: 'is_trial_or_sub_active is true'
        
        # inactive accounts must have expired
        - dbt_utils.expression_is_true:
            expression: 'is_trial_expired or is_subscription_expired'
            condition: 'is_trial_or_sub_active is false'
        
        # inactive accounts may only have been a trial or a subscription
        - dbt_utils.expression_is_true:
            expression: 'not (is_trial_expired and is_subscription_expired)'
  • Instead of it being a yaml comment, I can add a description argument to the test macro and include that as a comment in the compiled SQL

I then got to thinking about how custom tests display in dbt docs, and thought it would be nice to also display that nice comment instead of the stringified kwargs.

Describe alternatives you've considered

  • stopping at “include the description in the sql query”, which wouldn’t require any changes to dbt itself, but would mean needing to maintain our own fork of the dbt-utils tests
  • waiting for Schema test blocks #1173 which looks like it might be a relative to this, albeit much broader scope

Who will this benefit?

  • If appropriate, a description property could be exposed to the built-in dbt tests as well as the dbt-utils ones, in which case it would help everyone who wants to debug their failing tests. Admittedly, not_null doesn’t need a lot of explanation, but perhaps it’s useful to say why it shouldn’t be null.
  • People consuming dbt docs (in fact, this might better live under the dbt-docs repo, but I’m not sure about the interplay before it gets to doc-rendering time - I assume that a description property would need to be explicitly added to the manifest, but not sure if that’s true given that I could already add arbitrary params to this macro...)

Are you interested in contributing this feature?

The docs test suite and I are close personal friends at this point, let’s do it.

@joellabes joellabes added enhancement New feature or request triage labels Dec 16, 2020
@jtcohen6 jtcohen6 removed the triage label Dec 17, 2020
@jtcohen6 jtcohen6 added this to the Oh-Twenty [TBD] milestone Dec 17, 2020
@jtcohen6
Copy link
Contributor

@joellabes I 1000% agree with everything you've laid out above. Our current plan is for v0.20.0 to be "the testing release," including a lot of what's covered in #1173 and #2593.

In particular, I really, really want:

  • a saner unique_id for schema tests, instead of the kwarg portmanteau you pasted above
  • the ability to configure a description within schema test macros/blocks and data tests, to populate in manifest.json and dbt-docs
  • the ability to write a human-friendly, plain-language failure message describing what was expected and what's gone wrong, to populate in CLI output and in run_results.json
  • a much faster time-to-debug, between seeing that a test has failed and finding the failing records, by (optionally) storing those records in the database

All of those are things you touch on above, more or less, and with a cogency that I appreciate. I'm going to leave this issue open and add it to our v0.20.0 milestone. Even if we don't open PRs to address it head-on, it's right in the spirit of where I hope to go.

@joellabes
Copy link
Contributor Author

Awesome, thanks! Looking forward to seeing what comes of it all 👀

@jtcohen6 jtcohen6 added the dbt tests Issues related to built-in dbt testing functionality label Dec 31, 2020
@jtcohen6 jtcohen6 removed this from the Margaret Mead milestone May 11, 2021
@github-actions
Copy link
Contributor

github-actions bot commented Nov 8, 2021

This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please remove the stale label or comment on the issue, or it will be closed in 7 days.

@github-actions github-actions bot added the stale Issues that have gone stale label Nov 8, 2021
@joellabes
Copy link
Contributor Author

@jtcohen6 I missed the stale warning on this one. Here's where I think we stand from your four bullets earlier in the year:

a saner unique_id for schema tests, instead of the kwarg portmanteau you pasted above Completed in #3616
a much faster time-to-debug, between seeing that a test has failed and finding the failing records, by (optionally) storing those records in the database Completed in #3316
the ability to configure a description within schema test macros/blocks and data tests, to populate in manifest.json and dbt-docs Waiting on #3249
the ability to write a human-friendly, plain-language failure message describing what was expected and what's gone wrong, to populate in CLI output and in run_results.json Also waiting on #3249

As long as 3249 is going to pick up the torch for these improvements I'm happy for this one to fade into the night - I just want it to stay open somewhere 🥺

@EPNickBosma
Copy link

👋 I recently had a fever dream 🥵 where dbt tests were failing, but failing in human readable ways ❗ It revived my interest in the idea of making test failures easier to parse/solve, especially at scale.

Then when I was feeling better and back at work, I found on Slack that our new Lead Engineer had run into a test failure while I was away and couldn't quite trace back up the stack and figure out why it was happening. It was a test that affected our model snapshots, so it was effectively blocking one of our snapshot jobs until it could be solved.

I had all the context in my head -- so, if I were around that day, I could have assisted. I wasn't around, and dbt doesn't give us anywhere to stash this context that might be useful for others. For example, tests themselves don't have descriptions, and when tests fail there is (I'm sorry to have to say) nothing inherently very human-friendly about the logging/output of those failures.

This is to say, when something fails, we get the what and the where but much less (I would say) of the why and the how. We get roughly half of the puzzle, but even if the other half of the puzzle is known, we have nowhere to make it available.

I think, returning to Jeremy's comment above, something like this is incredibly important for efficient stack maintenance:

the ability to write a human-friendly, plain-language failure message describing what was expected and what's gone wrong, to populate in CLI output and in run_results.json

When I leave my current employer, that will be everyone who had anything to do with implementing the stack gone, and all that context will therefore be gone forever. There will still be people expected to maintain the stack, but they won't have the context that the stack founders had and they will therefore be dealing with known unknowns rather than known knowns until they get up to speed. But then the cycle will repeat when they leave, and on it could go, ad nauseam.

IMO there should definitely be somewhere (a test config called explanation/context?? 🤔) that I can stash what I have in my brain about why and how tests fail that will ultimately make them easier to resolve. To illustrate the point a little bit:

image

I can't speak to how easy this would be to implement, but I can speak to how much of a product improvement it would be.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dbt tests Issues related to built-in dbt testing functionality enhancement New feature or request stale Issues that have gone stale
Projects
None yet
Development

No branches or pull requests

3 participants