-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add testing #22
Add testing #22
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jtcohen6 : I would love to hear what you think about this set-up!
The main goal is to find a way to test our macros more easily. In expect that this could be a lightweight set-up to allow dbt-spark
users to define pytest test for their macros.
There are some integration points with the dbt-core
I would like to get your feedback on.
tests/test_macros.py
Outdated
dbt.tracking.active_user = User(os.getcwd()) | ||
|
||
|
||
class _SparkConnectionManager(SparkConnectionManager): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have opened an issue in dbt-spark
to add a pyspark
connection to the package. Then it is not needed to do this anymore.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Really interesting, responded in the issue
tests/test_macros.py
Outdated
@pytest.mark.parametrize( | ||
"macro_generator", ["macro.spark_utils.get_tables"], indirect=True | ||
) | ||
def test_spark_session( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is what I consider the power of this set-up: write a pytest unit test to test your dbt macros!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is really neat!!!
We've long wanted/needed a way to write reasonable unit tests for dbt macros. (We've done it before, but in ways that are pretty hard to untangle/replicate.)
If we could find a way to do this over and over, with minimal setup, it would open up better testing for:
- adapter plugin development (write + test each macro as you go)
- package development (such as this package)
- custom macro/materialization development in users' own projects
tests/test_macros.py
Outdated
from dbt.parser.manifest import ManifestLoader | ||
from dbt.tracking import User | ||
from pyspark.sql import SparkSession | ||
from sodaspark.scan import Connection |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See the issue in dbt-spark
about replacing the Connection
with a reference to the SparkSession
tests/test_macros.py
Outdated
from sodaspark.scan import Connection | ||
|
||
|
||
dbt.tracking.active_user = User(os.getcwd()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dbt
complained about not having an active_user
(it was None
). If possible, I would like to remove this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you see these complaints when you instantiated RuntimeConfig
? Most of our integration tests expect active_user
to be None
(i.e. anonymous tracking disabled)
tests/test_macros.py
Outdated
args = Args() | ||
|
||
# Sets the Spark plugin in dbt.adapters.factory.FACTORY | ||
config = RuntimeConfig.from_args(args) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dbt
is written as a command line tool. The entrypoint of dbt - the dbt cli - uses the parsed arguments heavily. I tried to find a balance between reusing what is in dbt-core
with minimal dependencies: here we still use the from_args
class method to create a RuntimeConfig
- but with a smaller Args
class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Heard. For our part, we know that:
RuntimeConfig
is a messy grab-bag- We need to decouple the CLI / arg parsing logic from the actual entrypoints to the
dbt-core
application: Loosely Couple CLI dbt-core#4179 and Split out dbt front end for many benefits dbt-core#4475
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, that would help. Also, see the comment (in the updated code):
requires a profile in your project which also exists in your profiles file
I would like to not require a profile to be present. Especially for dbt packages this does not makes sense. Also, when running the test suite in a CI, a profile is not necessarily present.
tests/test_macros.py
Outdated
# Sets the Spark plugin in dbt.adapters.factory.FACTORY | ||
config = RuntimeConfig.from_args(args) | ||
|
||
register_adapter(config) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This stores the Spark adapter in a global environment variable
tests/test_macros.py
Outdated
def adapter(config: RuntimeConfig) -> AdapterContainer: | ||
adapter = get_adapter(config) | ||
|
||
connection_manager = _SparkConnectionManager(adapter.config) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As mentioned before, I would like to move this into dbt-spark
fa668eb
to
b4a9add
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@JCZuurmond Very neat, thanks for sketching this out! I remember your issue last year that encouraged us to lean much more into pytest built-ins (dbt-labs/dbt-core#3193), reinventing fewer wheels and making it easier for external contributors (especially ones with some python know-how) to jump in.
As you mention, a lot of the initial code in this PR is easier setup/mocking for a dbt-spark connection in python. Let's continue that conversation over in dbt-labs/dbt-spark#272. While it's obviously relevant to this PR/package, I sense it would be a much lighter lift for other adapters/databases.
So, the best bits are right at the end: I'm particularly taken by how you've used @pytest.fixture
to mock macro inputs and outputs. This feels incredibly compelling to me. @gshank @emmyoop I'd be especially interested in getting your eyes and thoughts on this approach, as we think about our next-generation approach to testing dbt (with modularity and extensibilty top of mind).
As just said in dbt-labs/dbt-spark#272, I will have a first go at a package for unit testing I would like some guidance on where to integrate with |
69a8370
to
fe8c05a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jtcohen6 : Could you have another look? I have created a package (Github repo) that contains the set-up logic for testing. It is written as a Pytest plugin.
I think it would be best to add the Spark session stuff to the dbt-spark
repo. This makes the pytest-dbt-core
warehouse independent. I'll continue the discussion about this in this issue.
This PR is not ready to be merged until that issue is resolved.
@JCZuurmond This is really neat!! Your work has sparked and anticipated many conversations internally, as we're working to refactor our own testing framework. If you're interested, check out dbt-labs/dbt-core#4691. We're just exiting the exploratory phase, and experimenting toward a set of concrete proposals. I'd love to hear your feedback over there, and (if you're up for it) find ways to include you in our ongoing work. |
@jtcohen6 : Curious to get feedback again! Do you think this way of testing benefits the project? Will you merge this PR? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@JCZuurmond I don't have any real opposition to including this sort of testing, as a neat experiment of what's possible for macro unit testing when you have a "database" that can run in-memory. Some neat possibilities for SQLite / DuckDB, perhaps? (cc @dbeatty10)
In practice, this package isn't one I'm able to regularly maintain, so I can't promise taking this any further than where you've brought it to date. I am interested in finding other folks who can help out — at dbt Labs, or potentially in the wider community.
Thanks for reviewing the PR, @jtcohen6. I have not tested the library against other adapters than Spark. The lib is written adapter agnostic, but I expect it is easiest to use on in-memory databases. I am keen to include others on the project or to move parts of it to another project. If you could help with that, that would be great. I have ideas about what I want to change further, however, I prefer to include others first. To hear what others think and to allow the library to grow to be community based. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@JCZuurmond Last ask before we merge this: Could you update the README with a quick overview of this test? Plus, some code annotation in tests/test_macros.py
?
@jtcohen6 : Could you have another look? |
The CI fails due to |
.github/workflows/workflow.yml
Outdated
|
||
- name: Run pytest | ||
shell: bash | ||
run: DBT_PROFILES_DIR=$PWD pytest tests/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see - this tries to run tests/functional/test_utils.py
as well (added in #25), which has other requirements (specified in dev-requirements.txt
).
Options:
- Unify: Add
python -m pip install -r dev-requirements.txt
into the "Install dependencies" step above—or better yet, unifydev-requirements
+test-requirements
into a single file—and try running the tests with--profile session
. I imagine that a few may fail (as indbt-spark
), but the rest should succeed, and we can mark the failing ones to skip on the "session" profile. - Separate: Treat these as two totally independent testing frameworks, which is really what they are. Create
tests/functional
for the tests using thedbt-core
functional-testing framework, andtests/unit
for the tests using thepytest-dbt-core
unit-testing framework. Have clearly separate READMEs and files used by each: separate requirements, separate profile setup, all that jazz.
What do you think? I lean toward option 2, "separate" — and then tests/unit
could be a clear, self-contained, and real-world demonstration of the potential of pytest-dbt-core
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 to option 2, "separate" (especially different folders tests/unit
vs. tests/functional
).
Separate folders won't undermine the functionality at all, and it will make the delineations between different frameworks more clear.
Side note: I don't see a problem with them sharing a single dev-requirements.txt
though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I moved the test_macros.py
in a unit
subfolder. And, I merged the test-requirements.txt
into the dev-requirements.txt
. I think it is confusing to have a test-requirements.txt
that does not contain all dependencies for the functional tests.
I had to move the conftest.py
into the functional
subdirectory because it was interfering with the unit tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Huzzah!! 🎉
@JCZuurmond Thanks for all your work to get this over the last mile. Excited to have this package as a demo of different approaches to testing complex macros.
This PR is a proposal for adding testing using pytest. I would like to try this first here. If we are happy with the set-up, I want to move this into a python package so that it can be used for all
dbt-spark
projects.