Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for InvocationMode.DBT_RUNNER for local execution mode #836

Closed

Conversation

jbandoro
Copy link
Collaborator

@jbandoro jbandoro commented Feb 6, 2024

Description

This PR adds dbtRunner programmatic invocation for ExecutionMode.LOCAL. I decided to not make a new execution mode for each (e.g. ExecutionMode.LOCAL_DBT_RUNNER) and all of the child operators but instead added an additional config ExecutionConfig.invocation_mode where InvocationMode.DBT_RUNNER could be specified. This is so that users who are already using local execution mode could use dbt runner and see performance improvements.

With the dbtRunnerResult it makes it easy to know whether the dbt run was successful and logs do not need to be parsed but are still logged in the operator:

image

Performance Testing

After #827 was added, I modified it slightly to use postgres adapter instead of sqlite because the latest dbt-core support for sqlite is 1.4 when programmatic invocation requires >=1.5.0. I got the following results comparing subprocess to dbt runner for 10 models:

  1. InvocationMode.SUBPROCESS:
Ran 10 models in 23.77661895751953 seconds
NUM_MODELS=10
TIME=23.77661895751953
  1. InvocationMode.DBT_RUNNER:
Ran 10 models in 8.390100002288818 seconds
NUM_MODELS=10
TIME=8.390100002288818

So using InvocationMode.DBT_RUNNER is almost 3x faster, and can speed up dag runs if there are a lot of models that execute relatively quickly since there seems to be a 1-2s speed up per task.

One thing I found while working on this is that a manifest is stored in the result if you parse a project with the runner, and can be reused in subsequent commands to avoid reparsing. This could be a useful way for caching the manifest if we use dbt runner for dbt ls parsing and could speed up the initial render as well.

I thought at first it would be easy to have this also work for virtualenv execution, since I at first thought the entire execute method was run in the virtualenv, which is not the case since the virtualenv operator creates a virtualenv and then passes the executable path to a subprocess. It may be possible to have this work for virtualenv and would be better suited for a follow-up PR.

Related Issue(s)

closes #717

Breaking Change?

None

Checklist

  • I have made corresponding changes to the documentation (if required)
  • I have added tests that prove my fix is effective or that my feature works - added unit tests and integration tests.

Copy link

netlify bot commented Feb 6, 2024

Deploy Preview for sunny-pastelito-5ecb04 canceled.

Name Link
🔨 Latest commit f761a8a
🔍 Latest deploy log https://app.netlify.com/sites/sunny-pastelito-5ecb04/deploys/65d00500ebc18600082acee7

Copy link

codecov bot commented Feb 6, 2024

Codecov Report

Attention: 2 lines in your changes are missing coverage. Please review.

Comparison is base (9af7067) 94.72% compared to head (f0e03be) 94.70%.

❗ Current head f0e03be differs from pull request most recent head f761a8a. Consider uploading reports for the commit f761a8a to get more accurate results

Files Patch % Lines
cosmos/dbt/parser/output.py 96.29% 1 Missing ⚠️
cosmos/operators/local.py 98.27% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #836      +/-   ##
==========================================
- Coverage   94.72%   94.70%   -0.02%     
==========================================
  Files          56       56              
  Lines        2520     2589      +69     
==========================================
+ Hits         2387     2452      +65     
- Misses        133      137       +4     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@jbandoro jbandoro changed the title WIP - Add support for dbt runner Add support for Invocation.DBT_RUNNER for local and virtualenv execution modes Feb 6, 2024
@jbandoro jbandoro changed the title Add support for Invocation.DBT_RUNNER for local and virtualenv execution modes Add support for InvocationMode.DBT_RUNNER for local and virtualenv execution modes Feb 6, 2024
@jbandoro jbandoro marked this pull request as ready for review February 6, 2024 23:04
@jbandoro jbandoro requested a review from a team as a code owner February 6, 2024 23:04
@dosubot dosubot bot added the size:XL This PR changes 500-999 lines, ignoring generated files. label Feb 6, 2024
@jbandoro jbandoro added this to the 1.4.0 milestone Feb 6, 2024
@dosubot dosubot bot added area:execution Related to the execution environment/mode, like Docker, Kubernetes, Local, VirtualEnv, etc dbt:run Primarily related to dbt run command or functionality execution:local Related to Local execution environment labels Feb 6, 2024
Copy link
Collaborator

@jlaneve jlaneve left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking great! Will try it out locally over the weekend.

One thing I haven't thought a ton about is what config we should pass the performance tests. IMO the perf tests should cover the "best case" scenario (appropriately tune Cosmos for performance) so that we're always pushing the boundary vs the default. Thoughts on including a change in this PR to the perf tests to use this new method?

One other thought: do you think it's worth doing any auto-discovery to infer which invocation method is used? i.e. if you don't explicitly specify one, should we:

  • try to import the dbt runner, if it works, great - we can use the more performant method
  • if it doesn't work, no problem, we default to subprocess

@jbandoro jbandoro changed the title Add support for InvocationMode.DBT_RUNNER for local and virtualenv execution modes Add support for InvocationMode.DBT_RUNNER for local execution mode Feb 17, 2024
@jbandoro
Copy link
Collaborator Author

@jlaneve I'm closing this PR and opening up #850 because I couldn't update the GH action and have it run with the updates here on my forked branch.

@jbandoro jbandoro closed this Feb 17, 2024
@dosubot dosubot bot removed this from the 1.4.0 milestone Feb 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:execution Related to the execution environment/mode, like Docker, Kubernetes, Local, VirtualEnv, etc dbt:run Primarily related to dbt run command or functionality execution:local Related to Local execution environment size:XL This PR changes 500-999 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add support to DBT_RUNNER execution mode
2 participants