Run tests IFF all first-order parents are selected #2891

jtcohen6 · 2020-11-16T20:47:29Z

Describe the feature

See prior discussion: #1827, #2132, misc state:modified conversations

Run a test only if all first-order parents are included in the selection criteria. This would be most immediately relevant for:

relationships tests
data tests with multiple parents

This would also handle cases where a schema test has a property (tag:, state:) that its one parent model does not have. Since its single parent would not be included in the selection criteria, given the proposed condition, the test should not run.

Describe alternatives you've considered

How useful vs. confusing is this idea?
Do we change default behavior? Or add a flag (--all-parents) to enable this?

Who will this benefit?

Users who want finer-grained control over which tests they're running, especially:

using complex selection syntax
running in CI environments

The text was updated successfully, but these errors were encountered:

balmasi · 2021-03-05T20:08:29Z

I just ran into this in the following scenario for a slim CI build:

I have models A <--- B where B has a foreign key to A.

I also have a relationship test defined on model B that points to A.

I touch model A, so my slim CI build becomes dbt run -m A, followed by a dbt test -m A.

This fails since dbt selects the relationship test defined on model B because the relationship test depends on both A and B.

dbt is selecting the test if ANY parent is selected, while I was looking for it run the test if ALL parents were selected.

Shouldn't running the test when ALL dependent nodes are selected be the default behaviour?

I would much prefer to have this be the default and have a flag for the current behaviour rather than the other way around.

Are there some use-cases I'm missing that this would completely break?

Workaround
I had to dbt test --exclude test_name:relationships which is not ideal.

Jeremy pointed out that I can use the --defer feature to fallback on prod, but that comes with its own challenges of state management.

jtcohen6 · 2021-03-10T11:50:52Z

@balmasi Thanks for the comment! A couple of thoughts here:

Slim CI: The reason we added support for dbt test --defer in dbt v0.19.0 is exactly for the reason you're describing: It often happens that, in a Slim CI run, you will have built one parent model of a multi-pronged test but not the other(s). The defer functionality gives dbt the ability to "fail over" to the prod version of an upstream model if it's unbuilt/missing in the CI schema.

In my view, relationships tests don't square with CI builds. You're frequently working with a limited subset of data, whether that's a date cutoff or a random sample. The other three builtin dbt tests—unique, not_null, and accepted_values—are no likelier to fail on a subset vs. the complete dataset, but tests on referential integrity are more likely to fail on arbitrary subsets. For that reason, I think dbt test --exclude test_name:relationships actually makes decent sense in a CI run.

Shouldn't running the test when ALL dependent nodes are selected be the default behaviour?

I take your point here, and I'm sympathetic to this argument. This is something we've considered changing in the default behavior (#1827, #2132); I'm still figuring out how we should implement it in practice.

This isn't how selection works for other resource types: If you dbt run -m model_a+, that selects model_c even if model_a is not its only parent. Say model_c depends also on model_b: then dbt run -m model_a+ in a fresh schema would fail on model_c, because model_b is missing. dbt introduced the @ selection operation, so that you could dbt run -m @model_a and know that it would build whatever is needed.

Granted, test selection is different from other resource selection because of the "magic jump" baked into it: when I test model_a, that really means executing the tests that directly depend on model_a. In that sense, dbt test -m model_a is more like selecting -m model_a+1.

What's the appeal of that "magic jump"? It enables consistency of syntax between test and other commands, to enable things like dbt run -m model_a && dbt test -m model_a, or (someday) the same thing written as dbt build -m model_a (see #2743). We could modulate the magic, and make it only select tests if all their parents are all the way present. That still gets us the consistent syntax we want, without any surprises.

In that world, if we had a relationships test between model_a and model_b:

dbt test -m model_a or -m model_b: Not selected. A parent is missing, so the "magic jump" doesn't take effect.
dbt test -m model_a model_b: Yes, selected. All parents are present, the "magic jump" says go for it!
3 dbt test -m model_a+, -m model_a+1, -m @model_a: Yes, selected. The test node is a child of model_a and directly included in the selection criteria; no "magic jump" necessary.

Today, all of the above select and execute that relationships test. In the first case, the intuitive syntax of the "magic jump" actually backfires, results in the unintuitive case you've encountered.

@drewbanin I'd be curious to get your thoughts here! I started this comment ready to say that we shouldn't do this... in the course of writing it out, I've managed to convince myself that it might just manage to thread the needle the way we'd want.

smomen · 2021-03-26T19:54:23Z

+1-ing this. In our use case, we need to do partial dbt runs in production in a separate environment due to sensitive (source) patient data in that environment. But we’d like to share the same models, so we have a monolithic project. So, some models just don’t run in this sensitive environment.

We’d like to follow a blue/green deployment and test before we deploy our data, but the selection of irrelevant relationship tests prevents this. (more on whether this is truly irrelevant below)

So, to recap:

we run dbt run -m source:sensitive+, and then dbt test -m source:sensitive+; the latter fails because it selects relationship tests that involve source:sensitive+, and the ones where the parent is not in source:sensitive+ fail.

I’m torn on whether this is fundamentally problematic or just requires configurability.

If there’s a relationship test, testing model A -(to)-> B, and run dbt test -m B, would I expect the relationship test to run?

B certainly partakes in the relationship, so rerunning it might have broken a relationship b/w A (if it exists) and B... so yes?
But if A does not exist, I’m not sure I would fail the test - there’s no relationship to test, and that’s not a problem with B/that can be solved by changing B/rerunning dbt run -m B... and, it’s “true” that all foreign key values in A (of which there are none - feels like this is a 0 vs null dichotomy?) exist in B. But certainly would expect dbt test -m A to fail if A is the parent of the relationship.

I don’t think dbt test --exclude test_name:relationship is a viable workaround, as it throws the baby out with the bathwater - there are other legitimate relationship tests I do want to run (e.g. B -(to)-> C in the scenario above).

defering - I’m not very familiar with this, but “falling back” to another version of the model is not possible for us - there is no other version that exists.

The only two workarounds we’ve found:

hardcode the tests that should be excluded during these runs (unfortunate, because we now have to maintain dependencies elsewhere/it leaks knowledge about the dependency graph into our test command, creating divergence/likely missed test coverage in the future)
or create dummy versions of our missing models during these runs (requiring some hacking up of those models/reduced maintainability).

jtcohen6 added the enhancement New feature or request label Nov 16, 2020

jtcohen6 added this to the Oh-Twenty [TBD] milestone Nov 16, 2020

jtcohen6 mentioned this issue Nov 16, 2020

'+' model selector with dbt test looks for incorrect models #2132

Closed

jtcohen6 changed the title ~~Run tests IFF all parents are selected~~ Run tests IFF all first-order parents are selected Nov 17, 2020

jtcohen6 added the dbt tests Issues related to built-in dbt testing functionality label Dec 31, 2020

jtcohen6 mentioned this issue Mar 10, 2021

[Q1C2] More consistent, configurable tests #3066

Closed

jtcohen6 mentioned this issue Mar 30, 2021

Remove the distinction between --select and --models flags when working with node types #3210

Closed

jtcohen6 added the 1.0.0 Issues related to the 1.0.0 release of dbt label Apr 6, 2021

jtcohen6 mentioned this issue Apr 7, 2021

Be less greedy in test selection expansion #3235

Merged

4 tasks

jtcohen6 closed this as completed in #3235 Apr 27, 2021

jtcohen6 mentioned this issue May 21, 2021

Unexpected trigger of downstream data tests when the model is used as an argument of those data tests #3382

Closed

5 tasks

jtcohen6 mentioned this issue Jun 25, 2021

dbt 0.20.0rc1 not finding test on model itself #3496

Closed

5 tasks

jtcohen6 mentioned this issue Jul 15, 2021

Relationships tests do not run on 0.20.0 when -m model_name is used. #3571

Closed

5 tasks

jtcohen6 added the node selection Functionality and syntax for selecting DAG nodes label Aug 2, 2021

joellabes mentioned this issue Aug 9, 2021

Model level tests to do not run when using -m flag (e.g., relationships and others) #3706

Closed

5 tasks

joellabes mentioned this issue Oct 17, 2021

Return to eager test selection by default, with an option to tone it down #4082

Closed

1 task

jtcohen6 mentioned this issue Apr 7, 2022

[CT-468] [Feature] ref() should be treated as model parent in generic test arguments #5006

Closed

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Run tests IFF all first-order parents are selected #2891

Run tests IFF all first-order parents are selected #2891

jtcohen6 commented Nov 16, 2020 •

edited

Loading

balmasi commented Mar 5, 2021 •

edited

Loading

jtcohen6 commented Mar 10, 2021 •

edited

Loading

smomen commented Mar 26, 2021 •

edited

Loading

Run tests IFF all first-order parents are selected #2891

Run tests IFF all first-order parents are selected #2891

Comments

jtcohen6 commented Nov 16, 2020 • edited Loading

Describe the feature

Describe alternatives you've considered

Who will this benefit?

balmasi commented Mar 5, 2021 • edited Loading

jtcohen6 commented Mar 10, 2021 • edited Loading

smomen commented Mar 26, 2021 • edited Loading

jtcohen6 commented Nov 16, 2020 •

edited

Loading

balmasi commented Mar 5, 2021 •

edited

Loading

jtcohen6 commented Mar 10, 2021 •

edited

Loading

smomen commented Mar 26, 2021 •

edited

Loading