Schema test blocks #1173

drewbanin · 2018-12-05T05:20:06Z

Describe the feature

dbt should support blocks that represent schema test definitions. A rough sketch of what these could look like:

{% test schema_test_name(....arguments) %}

{{ config(...test_configs) }}

select ...

{% endtest %}

Implementation ideas:

The name of the test block should dictate how the calling schema.yml file calls this test
The config block should help provide a rendered name and description for the resulting test node
The test should return a set of failing rows for the test, rather than a count(*) directly

Important considerations:

databases like BigQuery charge based on data scanned, so a naive translation of existing count(*) style tests to a select * style test could be really expensive. Let's keep that in mind as we build out this functionality

Configuration ideas:

To support dynamic test node names and descriptions, the config block should accept two configs:

name
description

A quick and imperfect example:

{% test unique(model, column_name) %}

{{ config(
  name="test_unique_{{ model.name }}.{{ column_name }}",
  description="Assert that {{ column_name }} is unique in {{ model }}"
) }}
....

I know, I know, curlies inside of curlies. Let's figure out a sensible way to make these names and descriptions easy to write.

When this feature is live, we should deprecate the old-style schema test macros and mark them for removal in a future release.

Additional context

The compiled SQL would be more inspectable in a query runner
- We should think hard about how to 1) generate a test query and 2) generate a diagnosis/debugging query from the original select statement
We could build-in support for things like a maxerror count, severity, or enable/disable flags at a config level
We could add support for recording failing rows in a table for inspection (Record failing rows for tests into an auditable table #903)

The text was updated successfully, but these errors were encountered:

jtcohen6 · 2021-03-08T13:36:16Z

I wanted to share some of our latest thinking on schema test blocks, which we're hoping to include in v0.20.0. Necessary caveats: We're still working through the details and various implications; this is just the tip of the iceberg; anything below is still subject to change.

Semantics

First off, I'd like to ditch the names "schema" and "data" tests, and many of the current distinctions between them. The two should provide the dbt developer two angles into the same functionality:

It's easy to define a one-off test as a SQL file. Write a query that returns the rows you don't want from the specific model(s). It's executed by dbt test every time.
If you want to reuse that test query for multiple models/columns, rather than copying and pasting, you can refactor the SQL to accept arguments. This is the reusable/generic/special-sort-of-macro test block proposed by this issue, and it would take the place of the test__ macro currently used.

This issue is about the latter.

Proposed syntax

# macros/reusable_tests.sql

{% test not_null(model, column_name), adapter = 'default' %}

    select * from {{ model }}
    where {{ column_name }} is null

{% endtest %}

# macros/schema.yml

tests:
 - name: not_null
   adapter: default
   description: "Assert that {{ column_name }} is not null in {{ model }}"
   fail_msg: "Found {{ result }} null values of {{ model }}.{{ column_name }}, which is {{ warn_or_error }}"
   fail_calc: "count(*)"   # this is default
   args:
     - name: model
       description: "The model to test"
     - name: column_name
       description: "The column or expression to test"

The YAML properties are optional—just the SQL definition is sufficient to define this test on models/columns/etc—but they're really what make the test block shine, and what takes it a step beyond the standard macro. In this world, packages of custom tests offer more than just a reusable bit of SQL: they offer a complete, packaged asset with richer behavior in docs/artifacts (a la #2959).

Most custom schema tests existing today could upgrade easily by:

Switching {% macro test__test_name(...) %} ... {% endmacro %} to {% test test_name(...) ... {% endtest %}
Switching the final select count(*) to select *. If the schema test returns a different calculation from count(*), it will be necessary to define the fail_calc.

Tactical questions

How much should we disambiguate between the two types of tests? Should they have different resource types, or both just be called test (in which case, names must be unique between them)?
- Should the generic test blocks be defined in the macros/ path, or the tests/ path alongside one-off tests?
- Each type would be defining different YAML resource properties: reusable tests are more like macros, and one-off tests ought to be more like other node types (Allow instances of generic data tests to be documented #2578). Would this be too confusing, to have a single type of YAML block (tests:), wherein some entries define a macro-esque thing while other entries define an analysis-y thing? I imagine we'd say it's best practice to keep them separate.
How much standard macro functionality should test blocks still support?
- Should the test block appear in the documentation site? As a functional resource (similar to macros)? Each specific instantiation, as a node in the project / DAG?
- Should a test block still be "callable" as a macro? E.g. should a test block named my_test return its SQL if called as test__my_test(model, column_name) (say, by another macro)? I don't feel this is critical—users can find straightforward ways to work around it if we remove this functionality, i.e. by using standard macros—but we should come up with a documented and consistent answer here. (E.g., today, the incremental materialization is technically callable as a macro named dbt.materialization_incremental.)

clausherther · 2021-03-08T17:36:06Z

Love this idea. Couple of initial thoughts, more to come...:

Tests really make a lot of sense as a first-level resource type. Personally, I don't see a difference in macro-style tests vs the older data tests as I think it's just a matter of parameterization as you point out.
I like the idea of a separate tests folder for these to better organize them away from macros. Would that also allow me to have a macro and a test with the same name? (e.g. expression_is_true)?
I don't think it's necessary for tests to be callable. If I needed I would revert back to a macro that gets called from the test.

jtcohen6 · 2021-03-09T15:03:35Z

Would that also allow me to have a macro and a test with the same name? (e.g. expression_is_true)?

A macro ({% macro expression_is_true() %}) and a test ({% test expression_is_true() %}), yes, I think so!

As for a one-off test (tests/expression_is_true.sql) and a generic test ({% test expression_is_true() %}) with the same name: I'm not sure, leaning no.

jtcohen6 · 2021-03-10T12:19:42Z

Would this be too confusing, to have a single type of YAML block (tests:), wherein some entries define a macro-esque thing while other entries define an analysis-y thing?

I've thought about this more: No! I think many of these properties would be shared between both types of tests. I totally see a one-off test (tests/*.sql) wanting to define description, fail_msg, and fail_calc.

Of course, some properties—adapter, args, and the special variables model and column_name—would be specific to the generic test block, in reflection the fact it is more macro-like.

I think both test types could also define docs and columns, the way analyses do today, which impacts their appearance in the documentation site. By default, one-off tests should appear more like nodes, and generic tests should appear more like macros/materializations.

The fact that both test types would have the majority of their resource properties in common feels like a strong argument in favor of centralizing them under a single tests: YAML block. Disallowing one-off tests from being called tests/unique.sql or tests/not_null.sql feels like a small price to pay for that improved consistency.

I imagine we'd say it's best practice to keep them separate.

Having said all of the above, I think this is definitely still true.

drewbanin added enhancement New feature or request schema tests help_wanted Trickier changes, with a clear starting point, good for previous/experienced contributors labels Dec 5, 2018

drewbanin removed the schema tests label Aug 6, 2019

drewbanin added 1.0.0 Issues related to the 1.0.0 release of dbt and removed help_wanted Trickier changes, with a clear starting point, good for previous/experienced contributors labels Mar 31, 2020

drewbanin changed the title ~~New schema test macro syntax~~ Schema test blocks Mar 31, 2020

drewbanin mentioned this issue Mar 31, 2020

Milestone: 1.0 #2277

Closed

11 tasks

This was referenced Apr 9, 2020

Threshold argument for Test Severity #2219

Closed

Namespace conflict for tests: Two tests gets the same name but only one is executed #2308

Closed

drewbanin mentioned this issue May 13, 2020

Add CLI option to print all error's compiled SQL #1803

Closed

clrcrl mentioned this issue Jun 1, 2020

Custom data test blocks #2274

Closed

jtcohen6 mentioned this issue Jun 25, 2020

Storing test results in the database #2593

Closed

jtcohen6 added this to the Oh-Twenty [TBD] milestone Nov 13, 2020

joellabes mentioned this issue Dec 16, 2020

Make complex tests’ purposes clearer in compiled SQL and dbt docs #2959

Closed

jtcohen6 added the dbt tests Issues related to built-in dbt testing functionality label Dec 31, 2020

jtcohen6 mentioned this issue Mar 10, 2021

[Q1C2] More consistent, configurable tests #3066

Closed

kwigley self-assigned this Mar 26, 2021

This was referenced Apr 12, 2021

Configurable description in yaml config for generic tests #3249

Closed

Add test Jinja tag #3261

Merged

kwigley closed this as completed Apr 26, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Schema test blocks #1173

Schema test blocks #1173

drewbanin commented Dec 5, 2018 •

edited

Loading

jtcohen6 commented Mar 8, 2021 •

edited

Loading

clausherther commented Mar 8, 2021

jtcohen6 commented Mar 9, 2021

jtcohen6 commented Mar 10, 2021

Schema test blocks #1173

Schema test blocks #1173

Comments

drewbanin commented Dec 5, 2018 • edited Loading

Describe the feature

Implementation ideas:

Configuration ideas:

Additional context

jtcohen6 commented Mar 8, 2021 • edited Loading

Semantics

Proposed syntax

Tactical questions

clausherther commented Mar 8, 2021

jtcohen6 commented Mar 9, 2021

jtcohen6 commented Mar 10, 2021

drewbanin commented Dec 5, 2018 •

edited

Loading

jtcohen6 commented Mar 8, 2021 •

edited

Loading