Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support to run dbt Python models #375

Merged
merged 12 commits into from
Jul 23, 2023
Merged

Add support to run dbt Python models #375

merged 12 commits into from
Jul 23, 2023

Conversation

tatiana
Copy link
Collaborator

@tatiana tatiana commented Jul 19, 2023

Add support to run dbt Python models, as described in the official documentation: https://docs.getdbt.com/docs/build/python-models

Add an example of Cosmos running Python models on Databricks (Postgres is not supported as of dbt 1.6).

The dbt example can be run by itself from the directory dev/dags/dbt/jaffle_shop_python by exporting the following environment variables:

  • DATABRICKS_HOST: Databricks host, similar to: dbc-some-id.cloud.databricks.com
  • DATABRICKS_WAREHOUSE_ID: Databricks SQL Warehouse ID from connection HTTP path (example: ca312a2206dfb361)
  • DATABRICKS_TOKEN: User Databricks access token
  • DATABRICKS_CLUSTER_ID: Databricks cluster ID (example: 0201-094352-wab833sb)

And running:

dbt build

To validate the feature from a Cosmos perspective, set up the databrics_default connection. One way of accomplishing this is by using environment variables:

export AIRFLOW_CONN_DATABRICKS_DEFAULT=databricks://@dbc-<account-id>.cloud.databricks.com?token=<access-token>&http_path=/sql/1.0/warehouses/<warehouse-id>

From a previously set-up Airflow environment, run the example_cosmos_python_models DAG. An example of how to execute it from the command line:

airflow dags test example_cosmos_python_models `date -Iseconds`

This feature was validated with load_mode=LoadMode.DBT_LS and LoadMode.CUSTOM. Example of the rendered DAG:
Screenshot 2023-07-19 at 23 46 14

Review can be simplified by checking the commits individually, especially between 796d6e8 and 78a7d31.

The downside with this change is that the integration tests are now slower to run and depend on the following environment variables being set up in the CI:

  • AIRFLOW_CONN_DATABRICKS_DEFAULT
  • DATABRICKS_CLUSTER_ID

Closes: #182

@netlify
Copy link

netlify bot commented Jul 19, 2023

👷 Deploy Preview for amazing-pothos-a3bca0 processing.

Name Link
🔨 Latest commit dc71269
🔍 Latest deploy log https://app.netlify.com/sites/amazing-pothos-a3bca0/deploys/64bd0fb2f334da0007805fb3

@codecov
Copy link

codecov bot commented Jul 19, 2023

Codecov Report

Patch coverage: 90.24% and project coverage change: +0.04 🎉

Comparison is base (cdfa0dd) 90.98% compared to head (1fd4b35) 91.03%.

❗ Current head 1fd4b35 differs from pull request most recent head dc71269. Consider uploading reports for the commit dc71269 to get more accurate results

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #375      +/-   ##
==========================================
+ Coverage   90.98%   91.03%   +0.04%     
==========================================
  Files          45       45              
  Lines        1542     1539       -3     
==========================================
- Hits         1403     1401       -2     
+ Misses        139      138       -1     
Impacted Files Coverage Δ
cosmos/dbt/parser/project.py 90.24% <89.47%> (+1.51%) ⬆️
cosmos/__init__.py 100.00% <100.00%> (ø)
cosmos/dbt/graph.py 100.00% <100.00%> (ø)

... and 4 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

@tatiana tatiana marked this pull request as ready for review July 19, 2023 23:07
@tatiana tatiana requested a review from a team as a code owner July 19, 2023 23:07
@tatiana tatiana requested a review from a team July 19, 2023 23:07
@tatiana
Copy link
Collaborator Author

tatiana commented Jul 21, 2023

The only test failing is test coverage related to a part of the code which was already in the code base and wasn't previously tested. Since this PR is already quite extensive, I suggest we disregard this check, exceptionally.

@tatiana tatiana merged commit 1058544 into main Jul 23, 2023
38 of 39 checks passed
@tatiana tatiana deleted the issue-182-py-models branch July 23, 2023 23:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support Python models in Cosmos
2 participants