[Feature] Unit Tests Should Support ref & source statements when specifying rows with sql #10227

ernestoongaro · 2024-05-27T14:57:01Z

Is this your first time submitting a feature request?

I have read the expectations for open source contributors
I have searched the existing issues, and I could not find an existing issue for this feature
I am requesting a straightforward extension of existing dbt functionality, rather than a Big Idea better suited to a discussion

Describe the feature

Was speaking to Roberto Zagni. He has a table with some sample rows he'd like to easily reference, something like this:

given:
      - input: ref('stg_tpch_customers')
        format: sql
        rows: |
          select customer_key from {{ target.database ~ '.' ~ target.schema }}.stg_tpch_customers where is_testing_row = true

would like to use the ref or perhaps source statement instead

Describe alternatives you've considered

{{ target.database ~ '.' ~ target.schema }}.<table_name>

Who will this benefit?

No response

Are you interested in contributing this feature?

No response

Anything else?

No response

The text was updated successfully, but these errors were encountered:

RobMcZagBDS · 2024-05-28T14:07:39Z

Thank you @ernestoongaro

The current unit-test functionality is perfectly suited to very specific and narrow testing, where you just pick the few columns you need, mock the macro calls to the return value you expect and check that you get the desired output.
It is a lot like unit tests done mocking everything, but that one tiny bit of complex logic that you want to test.
Great power, but you can paint a building with the same brush you paint small details in fine art paintings.

The typical case I had in mind is when you are trying to have a wider validation of a model, so you got some sample data and you have isolated a few rows that cover the different use cases and you received or manually verified the expected output for them, so you would keep these input and expected rows in a table and use them, eventually adding more rows when new use cases are found. One great case is adding the use case data when you find out a bug and fix it.

In this situation it is also useful to note that the output / expectation of one model easily becomes the input for the next model to test, so you could almost visualize the sequence as a chain of known set of inputs and their expected outputs down the lineage line.

I would suggest a format of ref or source and then as rows we could put the query that references the ref or source.

dbeatty10 · 2024-05-31T01:33:17Z

@RobMcZagBDS and @ernestoongaro thank you both for raising this issue 🤩

After discussing with @graciegoheen, this isn't something we’d prioritize anytime soon, but we will continue to listen for how many folks are asking for this.

A large reason for our prioritization is the complexity that would be involved in implementing this. Here is a summary of some of the obstacles identified by @gshank:

SQL fixtures: Each one would need to be compiled, requiring additional code and refactoring.
Extra fields: New fields like compiled_sql would need to be added.
Dependency handling: Handling dependencies (depends_on) would be difficult because fixture nodes are created dynamically and don’t exist during the initial parsing stage. The unit testing manifest might need separate depends_on structures for each fixture.
Additional unknowns: there may be remaining things that would be difficult to handle or that we'd have a hard time detecting.

RobMcZag · 2024-05-31T09:54:57Z

Thank you Dough, Grace, Gerda and Ernesto for looking into it.
I understand the technical difficulties, and somehow we can cope with the current limitations even if it is not as elegant and maintainable as it would by being able to use a source() reference.

Maybe it is just my feeling, but I would prefer to have a "FIXTURES" schema with "TABLE_XXX" and "TABLE_XXX__EXPECTATION" (if the expectation it is not the same as next input in the pipeline "TABLE_YYY") and select the rows and columns with SQL than have a similar collection of CSV or SQL files in a folder inside the repository.
We can do that with the current SQL feature by hardcoding the DB & SCHEMA. Do we have variables in the context?

BTW in the docs it would be nice to have a better description of what you expect from a SQL file.
My gut feeling is a piece of SQL that when run returns the desired rows and columns, but not 100% sure and have not yet experimented with it.

ernestoongaro added enhancement New feature or request triage unit tests Issues related to built-in dbt unit testing functionality labels May 27, 2024

graciegoheen mentioned this issue Jun 25, 2024

When providing BOTH fixture and rows to unit test, rows silently ignored - instead throw an error #10357

Open

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature] Unit Tests Should Support ref & source statements when specifying rows with sql #10227

[Feature] Unit Tests Should Support ref & source statements when specifying rows with sql #10227

ernestoongaro commented May 27, 2024

RobMcZagBDS commented May 28, 2024

dbeatty10 commented May 31, 2024

RobMcZag commented May 31, 2024

[Feature] Unit Tests Should Support ref & source statements when specifying rows with sql #10227

[Feature] Unit Tests Should Support ref & source statements when specifying rows with sql #10227

Comments

ernestoongaro commented May 27, 2024

Is this your first time submitting a feature request?

Describe the feature

Describe alternatives you've considered

Who will this benefit?

Are you interested in contributing this feature?

Anything else?

RobMcZagBDS commented May 28, 2024

dbeatty10 commented May 31, 2024

RobMcZag commented May 31, 2024