- About this document
- Getting the code
- Running
dbt-spark
in development - Testing
- Updating Docs
- Submitting a Pull Request
This document is a guide intended for folks interested in contributing to dbt-spark
. Below, we document the process by which members of the community should create issues and submit pull requests (PRs) in this repository. It is not intended as a guide for using dbt-spark
, and it assumes a certain level of familiarity with Python concepts such as virtualenvs, pip
, Python modules, and so on. This guide assumes you are using macOS or Linux and are comfortable with the command line.
For those wishing to contribute we highly suggest reading the dbt-core's contribution guide if you haven't already. Almost all of the information there is applicable to contributing here, too!
Please note that all contributors to dbt-spark
must sign the Contributor License Agreement to have their Pull Request merged into an dbt-spark
codebase. If you are unable to sign the CLA, then the dbt-spark
maintainers will unfortunately be unable to merge your Pull Request. You are, however, welcome to open issues and comment on existing ones.
You will need git
in order to download and modify the dbt-spark
source code. You can find directions here on how to install git
.
If you are not a member of the dbt-labs
GitHub organization, you can contribute to dbt-spark
by forking the dbt-spark
repository. For a detailed overview on forking, check out the GitHub docs on forking. In short, you will need to:
- fork the
dbt-spark
repository - clone your fork locally
- check out a new branch for your proposed changes
- push changes to your fork
- open a pull request against
dbt-labs/dbt-spark
from your forked repository
If you are a member of the dbt Labs
GitHub organization, you will have push access to the dbt-spark
repo. Rather than forking dbt-spark
to make your changes, just clone the repository, check out a new branch, and push directly to that branch.
First make sure that you set up your virtualenv
as described in Setting up an environment. Ensure you have the latest version of pip installed with pip install --upgrade pip
. Next, install dbt-spark
latest dependencies:
pip install -e . -r dev-requirements.txt
When dbt-spark
is installed this way, any changes you make to the dbt-spark
source code will be reflected immediately in your next dbt-spark
run.
To confirm you have correct version of dbt-core
installed please run dbt --version
and which dbt
.
dbt-spark
uses test credentials specified in a test.env
file in the root of the repository. This test.env
file is git-ignored, but please be extra careful to never check in credentials or other sensitive information when developing. To create your test.env
file, copy the provided example file, then supply your relevant credentials.
cp test.env.example test.env
$EDITOR test.env
There are a few methods for running tests locally.
To run functional tests we rely on dagger. This launches a virtual container or containers to test against.
pip install -r dagger/requirements.txt
python dagger/run_dbt_spark_tests.py --profile databricks_sql_endpoint --test-path tests/functional/adapter/test_basic.py::TestSimpleMaterializationsSpark::test_base
--profile
: required, this is the kind of spark connection to test against
options:
- "apache_spark"
- "spark_session"
- "databricks_sql_endpoint"
- "databricks_cluster"
- "databricks_http_cluster"
--test-path
: optional, this is the path to the test file you want to run. If not specified, all tests will be run.
Finally, you can also run a specific test or group of tests using pytest
directly (if you have all the dependencies set up on your machine). With a Python virtualenv active and dev dependencies installed you can do things like:
# run all functional tests
python -m pytest --profile databricks_sql_endpoint tests/functional/
# run specific functional tests
python -m pytest --profile databricks_sql_endpoint tests/functional/adapter/test_basic.py
# run all unit tests in a file
python -m pytest tests/unit/test_adapter.py
# run a specific unit test
python -m pytest test/unit/test_adapter.py::TestSparkAdapter::test_profile_with_database
Many changes will require and update to the dbt-spark
docs here are some useful resources.
- Docs are here.
- The docs repo for making changes is located here.
- The changes made are likely to impact one or both of Spark Profile, or Saprk Configs.
- We ask every community member who makes a user-facing change to open an issue or PR regarding doc changes.
We use changie to generate CHANGELOG
entries. Note: Do not edit the CHANGELOG.md
directly. Your modifications will be lost.
Follow the steps to install changie
for your system.
Once changie is installed and your PR is created, simply run changie new
and changie will walk you through the process of creating a changelog entry. Commit the file that's created and your changelog entry is complete!
You don't need to worry about which dbt-spark
version your change will go into. Just create the changelog entry with changie
, and open your PR against the main
branch. All merged changes will be included in the next minor version of dbt-spark
. The Core maintainers may choose to "backport" specific changes in order to patch older minor versions. In that case, a maintainer will take care of that backport after merging your PR, before releasing the new version of dbt-spark
.
dbt Labs provides a CI environment to test changes to the dbt-spark
adapter, and periodic checks against the development version of dbt-core
through Github Actions.
A dbt-spark
maintainer will review your PR. They may suggest code revision for style or clarity, or request that you add unit or functional test(s). These are good things! We believe that, with a little bit of help, anyone can contribute high-quality code.
Once all requests and answers have been answered the dbt-spark
maintainer can trigger CI testing.
Once all tests are passing and your PR has been approved, a dbt-spark
maintainer will merge your changes into the active development branch. And that's it! Happy developing 🎉