diff --git a/docs/configuration/index.rst b/docs/configuration/index.rst index 919ed9b1e..ec69c1f52 100644 --- a/docs/configuration/index.rst +++ b/docs/configuration/index.rst @@ -20,6 +20,7 @@ Cosmos offers a number of configuration options to customize its behavior. For m Scheduling Testing Behavior Selecting & Excluding + Partial Parsing Operator Args Compiled SQL Logging diff --git a/docs/configuration/partial-parsing.rst b/docs/configuration/partial-parsing.rst new file mode 100644 index 000000000..3b59149c0 --- /dev/null +++ b/docs/configuration/partial-parsing.rst @@ -0,0 +1,55 @@ +.. _partial-parsing: + +Partial parsing +=============== + +Starting in the 1.4 version, Cosmos tries to leverage dbt's partial parsing (``partial_parse.msgpack``) to speed up both the task execution and the DAG parsing (if using ``LoadMode.DBT_LS``). + +This feature is bound to `dbt partial parsing limitations `_. +As an example, ``dbt`` requires the same ``--vars``, ``--target``, ``--profile``, and ``profile.yml`` environment variables (as called by the ``env_var()`` macro) while running dbt commands, otherwise it will reparse the project from scratch. + +Profile configuration +--------------------- + +To respect the dbt requirement of having the same profile to benefit from partial parsing, Cosmos users should either: +* If using Cosmos profile mapping (``ProfileConfig(profile_mapping=...``), disable using mocked profile mappings by setting ``render_config=RenderConfig(enable_mock_profile=False)`` +* Declare their own ``profiles.yml`` file, via ``ProfileConfig(profiles_yml_filepath=...)`` + +If users don't follow these guidelines, Cosmos will use different profiles to parse the dbt project and to run tasks, and the user won't leverage dbt partial parsing. +Their logs will contain multiple ``INFO`` messages similar to the following, meaning that Cosmos are is not using partial parsing: + +.. code-block:: + + 13:33:16 Unable to do partial parsing because profile has changed + 13:33:16 Unable to do partial parsing because env vars used in profiles.yml have changed + +dbt vars +-------- + +If the Airflow scheduler and worker processes run in the same node, users must ensure the dbt ``--vars`` flag is the same in the ``RenderConfig`` and ``ExecutionConfig``. + +Otherwise, users may see messages similar to the following in their logs: + +.. code-block:: + + [2024-03-14, 17:04:57 GMT] {{subprocess.py:94}} INFO - Unable to do partial parsing because config vars, config profile, or config target have changed + + +Caching +------- + +If the dbt project ``target`` directory has a ``partial_parse.msgpack``, Cosmos will attempt to use it. + +There is a chance, however, that the file is stale or was generated in a way that is different to how Cosmos runs the dbt commands. + +Therefore, Cosmos also caches the most up-to-date ``partial_parse.msgpack`` file after running a dbt command in the `system temporary directory `_. +With this, unless there are code changes, each Airflow node should only run the dbt command with a full dbt project parse once, and benefit from partial parsing from then onwards. + +It is possible to override the directory that Cosmos uses caching with the Airflow configuration ``[cosmos][cache_dir]`` or environment variable ``AIRFLOW__COSMOS__CACHE_DIR``. + +To turn off caching, set the Airflow configuration ``[cosmos][enable_cache]`` or the environment variable ``AIRFLOW__COSMOS__ENABLE_CACHE=0``. + +Disabling +--------- + +To switch off partial parsing in Cosmos, use the argument ``partial_parse=False`` in the ``ProjectConfig``. diff --git a/docs/getting_started/execution-modes.rst b/docs/getting_started/execution-modes.rst index 6b611b777..1765144d9 100644 --- a/docs/getting_started/execution-modes.rst +++ b/docs/getting_started/execution-modes.rst @@ -56,7 +56,10 @@ The ``local`` execution mode assumes a ``dbt`` binary is reachable within the Ai If ``dbt`` was not installed as part of the Cosmos packages, users can define a custom path to ``dbt`` by declaring the argument ``dbt_executable_path``. -By default, if Cosmos sees a ``partial_parse.msgpack`` in the target directory of the dbt project directory when using ``local`` execution, it will use this for partial parsing to speed up task execution. Due to the way that dbt `partial parsing works `_, it does not work with Cosmos profile mapping classes. To benefit from this feature, users have to set the ``profiles_yml_filepath`` argument in ``ProfileConfig``. It is possible to turned off partial parsing in Cosmos by setting ``partial_parse=False`` in the ``ProjectConfig``. +.. note:: + Starting in the 1.4 version, Cosmos tries to leverage the dbt partial parsing (``partial_parse.msgpack``) to speed up task execution. + This feature is bound to `dbt partial parsing limitations `_. + Learn more: :ref:`partial-parsing`. When using the ``local`` execution mode, Cosmos converts Airflow Connections into a native ``dbt`` profiles file (``profiles.yml``).