-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow use of sources as unit testing inputs #9059
Allow use of sources as unit testing inputs #9059
Conversation
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## unit_testing_feature_branch #9059 +/- ##
============================================================
Coverage 86.80% 86.81%
============================================================
Files 181 181
Lines 27057 27075 +18
============================================================
+ Hits 23488 23505 +17
- Misses 3569 3570 +1
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
@@ -263,8 +271,10 @@ def create_from( | |||
node: ResultNode, | |||
**kwargs: Any, | |||
) -> Self: | |||
if node.resource_type == NodeType.Source: | |||
if not isinstance(node, SourceDefinition): | |||
if node.resource_type == NodeType.Source or isinstance(node, UnitTestSourceDefinition): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we simplify the logic here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How? We can't set the resource_type to Source because that breaks execution.
I looked at taking out the special casing of UnitTestSourceDefinition, but unfortunately there are subtle differences in the specification of quoting between sources and models, and so I think it's best to actually use the relation.create_from_source to get the quoting right. |
source_name=original_input_node.source_name, # needed for source lookup | ||
) | ||
# Sources need to go in the sources dictionary in order to create the right lookup | ||
self.unit_test_manifest.sources[input_node.unique_id] = input_node # type: ignore |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we anticipate any issues by having the sources dictionary contain a unique_id key that is prefixed with model
instead of source
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't seem to care. We don't actually check the unique_id prefix that I can recall. If somebody starts parsing the unit_test_manifest, I suppose it might be confusing. But right now we're putting it in two places, so one of them will be wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This probably isn't worth spending tons of time on.. but I think it could be possible to get around having to add the node to manifest.sources and do the lookup from the .nodes collection in UnitTestRuntimeSourceResolver since the unique_id will include source_name. kind of like what's done here: https://github.com/dbt-labs/dbt-core/blob/unit_testing_feature_branch/core/dbt/context/providers.py#L578
Not entirely sure what's more readable or less complex in this case. I can imagine having to maintain UnitTestSourceDefinitions across both dictionaries could be error-prone though..
But right now we're putting it in two places, so one of them will be wrong.
Given that UnitTestSourceDefinition is a ModelNode, I think having it in nodes is 'more' correct
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the lookup behavior of sources and nodes is subtly different with regard to the meaning of package=None, so I don't think looking up sources as though they were nodes is worth it.
# Sources need to go in the sources dictionary in order to create the right lookup | ||
self.unit_test_manifest.sources[input_node.unique_id] = input_node # type: ignore | ||
|
||
# Both ModelNode and UnitTestSourceDefinition need to go in nodes dictionary |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for my own understanding - is this to enable cte injection?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah. There's code in compilation.py that looks up the existence of the cte in the nodes dictionary: if cte.id not in manifest.nodes:
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In theory we could also check for a UnitTestSourceDefinition and in sources, but that didn't feel like an improvement.
core/dbt/parser/unit_tests.py
Outdated
"resource_type": NodeType.Model, | ||
"package_name": package_name, | ||
"original_file_path": original_input_node.original_file_path, | ||
"unique_id": f"model.{package_name}.{input_name}", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we may need to include source_name
in input_name
to avoid clobbering sources with the same table_name but different source_names when they are inserted to manifest.nodes
and manifest.sources
below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've changed this to include the source_name. This does make for pretty long unique_ids. Do we have any concerns about that? It's not like we're using that name to construct tables or anything...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So far I've noticed this issue creep up in #9015. I think we could shorten the node name for CTE generation (since it doesn't need to be unique) but keep the unique_id longer
package when looking up source
* Initial implementation of unit testing (from pr #2911) Co-authored-by: Michelle Ark <[email protected]> * 8295 unit testing artifacts (#8477) * unit test config: tags & meta (#8565) * Add additional functional test for unit testing selection, artifacts, etc (#8639) * Enable inline csv format in unit testing (#8743) * Support unit testing incremental models (#8891) * update unit test key: unit -> unit-tests (#8988) * convert to use unit test name at top level key (#8966) * csv file fixtures (#9044) * Unit test support for `state:modified` and `--defer` (#9032) Co-authored-by: Michelle Ark <[email protected]> * Allow use of sources as unit testing inputs (#9059) * Use daff for diff formatting in unit testing (#8984) * Fix #8652: Use seed file from disk for unit testing if rows not specified in YAML config (#9064) Co-authored-by: Michelle Ark <[email protected]> Fix #8652: Use seed value if rows not specified * Move unit testing to test and build commands (#9108) * Enable unit testing in non-root packages (#9184) * convert test to data_test (#9201) * Make fixtures files full-fledged members of manifest and enable partial parsing (#9225) * In build command run unit tests before models (#9273) --------- Co-authored-by: Michelle Ark <[email protected]> Co-authored-by: Michelle Ark <[email protected]> Co-authored-by: Emily Rockman <[email protected]> Co-authored-by: Jeremy Cohen <[email protected]> Co-authored-by: Kshitij Aranke <[email protected]>
resolves #8507
Problem
We want to support the use of sources as inputs in unit test cases.
Solution
Created a UnitTestSourceDefinition object, which acts as a source for purposes of resolving "source" calls, but acts as a model for purpose of executing the test case.
Checklist