Skip to content

Commit

Permalink
Add docs
Browse files Browse the repository at this point in the history
  • Loading branch information
pankajkoti committed Sep 30, 2024
1 parent cc48161 commit 1068025
Show file tree
Hide file tree
Showing 3 changed files with 63 additions and 1 deletion.
2 changes: 2 additions & 0 deletions dev/dags/simple_dag_async.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@
),
)

# [START airflow_async_execution_mode_example]
simple_dag_async = DbtDag(
# dbt/cosmos-specific parameters
project_config=ProjectConfig(
Expand All @@ -35,3 +36,4 @@
tags=["simple"],
operator_args={"install_deps": True},
)
# [END airflow_async_execution_mode_example]
21 changes: 21 additions & 0 deletions docs/configuration/cosmos-conf.rst
Original file line number Diff line number Diff line change
Expand Up @@ -126,6 +126,27 @@ This page lists all available Airflow configurations that affect ``astronomer-co
- Default: ``None``
- Environment Variable: ``AIRFLOW__COSMOS__REMOTE_CACHE_DIR_CONN_ID``

.. _remote_target_path:

`remote_target_path`_:
(Introduced since Cosmos 1.7.0) The path to the remote target directory. This is the directory designated to
remotely copy & store in the files generated and stored by dbt in the dbt project's target directory. The value
for the remote target path can be any of the schemes that are supported by the
`Airflow Object Store <https://airflow.apache.org/docs/apache-airflow/stable/core-concepts/objectstorage.html>`_
feature introduced in Airflow 2.8.0 (e.g. ``s3://your_s3_bucket/cache_dir/``, ``gs://your_gs_bucket/cache_dir/``,
``abfs://your_azure_container/cache_dir``, etc.)

- Default: ``None``
- Environment Variable: ``AIRFLOW__COSMOS__REMOTE_TARGET_PATH``

.. _remote_target_path_conn_id:

`remote_target_path_conn_id`_:
(Introduced since Cosmos 1.7.0) The connection ID for the remote target path. If this is not set, the default
Airflow connection ID identified for the scheme will be used.

- Default: ``None``
- Environment Variable: ``AIRFLOW__COSMOS__REMOTE_TARGET_PATH_CONN_ID``

[openlineage]
~~~~~~~~~~~~~
Expand Down
41 changes: 40 additions & 1 deletion docs/getting_started/execution-modes.rst
Original file line number Diff line number Diff line change
Expand Up @@ -12,12 +12,13 @@ Cosmos can run ``dbt`` commands using five different approaches, called ``execut
5. **aws_eks**: Run ``dbt`` commands from AWS EKS Pods managed by Cosmos (requires a pre-existing Docker image)
6. **azure_container_instance**: Run ``dbt`` commands from Azure Container Instances managed by Cosmos (requires a pre-existing Docker image)
7. **gcp_cloud_run_job**: Run ``dbt`` commands from GCP Cloud Run Job instances managed by Cosmos (requires a pre-existing Docker image)
8. **airflow_async**: (Introduced since Cosmos 1.7.0) Run the dbt resources from your dbt project asynchronously, by submitting the corresponding compiled SQLs to Apache Airflow's `Deferrable operators <https://airflow.apache.org/docs/apache-airflow/stable/authoring-and-scheduling/deferring.html>`__

The choice of the ``execution mode`` can vary based on each user's needs and concerns. For more details, check each execution mode described below.


.. list-table:: Execution Modes Comparison
:widths: 20 20 20 20 20
:widths: 25 25 25 25
:header-rows: 1

* - Execution Mode
Expand Down Expand Up @@ -52,6 +53,10 @@ The choice of the ``execution mode`` can vary based on each user's needs and con
- Slow
- High
- No
* - Airflow Async
- Medium
- None
- Yes

Local
-----
Expand Down Expand Up @@ -238,6 +243,40 @@ Each task will create a new Cloud Run Job execution, giving full isolation. The
},
)
Airflow Async
-------------

.. versionadded:: 1.7.0

(**Experimental**)

The ``airflow_async`` execution mode is a way to run the dbt resources from your dbt project using Apache Airflow's
`Deferrable operators <https://airflow.apache.org/docs/apache-airflow/stable/authoring-and-scheduling/deferring.html>`__.
This execution mode could be preferred when you've long running resources and you want to run them asynchronously by
leveraging Airflow's deferrable operators. With that, you would be able to potentially observe higher throughput of tasks
as more dbt nodes will be run in parallel since they won't be blocking Airflow's worker slots.

In this mode, Cosmos adds a new operator, ``DbtCompileAirflowAsyncOperator``, as a root task in the DAG. The task runs
the ``dbt compile`` command on your dbt project which then outputs compiled SQLs in the project's target directory.
As part of the same task run, these compiled SQLs are then stored remotely to a remote path set using the
:ref:`remote_target_path` configuration. The remote path is then used by the subsequent tasks in the DAG to
fetch (from the remote path) and run the compiled SQLs asynchronously using e.g. the ``DbtRunAirflowAsyncOperator``.
You may observe that the compile task takes a bit longer to run due to the latency of storing the compiled SQLs remotely,
however, it is still a win as it is one-time overhead and the subsequent tasks run asynchronously utilising the Airflow's
deferrable operators and supplying to them those compiled SQLs.

Note that currently, the ``airflow_async`` execution mode has the following limitations and is released as Experimental:

1. Only supports the ``dbt resource type`` models to be run asynchronously using Airflow deferrable operators. All other resources are executed synchronously using dbt commands as they are in the ``local`` execution mode.
2. Only supports BigQuery as the target database. If a profile target other than BigQuery is specified, Cosmos will error out saying that the target database is not supported with this execution mode.

Example DAG:

.. literalinclude:: ../../dev/dags/simple_dag_async.py
:language: python
:start-after: [START airflow_async_execution_mode_example]
:end-before: [END airflow_async_execution_mode_example]

.. _invocation_modes:
Invocation Modes
================
Expand Down

0 comments on commit 1068025

Please sign in to comment.