Skip to content

Commit

Permalink
Update docs structure and add getting started guides (#374)
Browse files Browse the repository at this point in the history
## Description

<!-- Add a brief but complete description of the change. -->

This PR updates the docs site with a better structure. It also adds
getting started guides for each of the popular ways of running Airflow.

## Related Issue(s)

Closes: #219
Closes: #316 
Closes: #307 

## Breaking Change?

<!-- If this introduces a breaking change, specify that here. -->

## Checklist

- [ ] I have made corresponding changes to the documentation (if
required)
- [ ] I have added tests that prove my fix is effective or that my
feature works

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Harel Shein <[email protected]>
  • Loading branch information
3 people authored and tatiana committed Aug 9, 2023
1 parent e9fdd6a commit 68e1474
Show file tree
Hide file tree
Showing 42 changed files with 1,094 additions and 902 deletions.
4 changes: 4 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,3 +1,7 @@
# cosmos-specific ignores
# these files get autogenerated
docs/profiles/*

# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
Expand Down
18 changes: 12 additions & 6 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -39,14 +39,23 @@ ___________________

You can render an Airflow Task Group using the ``DbtTaskGroup`` class. Here's an example with the jaffle_shop project:


.. code-block:: python
from pendulum import datetime
from airflow import DAG
from airflow.operators.empty import EmptyOperator
from cosmos.providers.dbt.task_group import DbtTaskGroup
from cosmos.task_group import DbtTaskGroup
profile_config = ProfileConfig(
profile_name="default",
target_name="dev",
profile_mapping=PostgresUserPasswordProfileMapping(
conn_id="airflow_db",
profile_args={"schema": "public"},
),
)
with DAG(
dag_id="extract_dag",
Expand All @@ -56,11 +65,8 @@ You can render an Airflow Task Group using the ``DbtTaskGroup`` class. Here's an
e1 = EmptyOperator(task_id="pre_dbt")
dbt_tg = DbtTaskGroup(
dbt_project_name="jaffle_shop",
conn_id="airflow_db",
profile_args={
"schema": "public",
},
project_config=ProjectConfig("jaffle_shop"),
profile_config=profile_config,
)
e2 = EmptyOperator(task_id="post_dbt")
Expand Down
3 changes: 1 addition & 2 deletions cosmos/hooks/subprocess.py
Original file line number Diff line number Diff line change
Expand Up @@ -44,8 +44,7 @@ def run_command(
:param env: Optional dict containing environment variables to be made available to the shell
environment in which ``command`` will be executed. If omitted, ``os.environ`` will be used.
Note, that in case you have Sentry configured, original variables from the environment
will also be passed to the subprocess with ``SUBPROCESS_`` prefix. See
:doc:`/administration-and-deployment/logging-monitoring/errors` for details.
will also be passed to the subprocess with ``SUBPROCESS_`` prefix.
:param output_encoding: encoding to use for decoding stdout
:param cwd: Working directory to run the command in.
If None (default), the command is run in a temporary directory.
Expand Down
Empty file added docs/__init__.py
Empty file.
19 changes: 5 additions & 14 deletions docs/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,8 @@
# Add the project root to the path so we can import the package
sys.path.insert(0, os.path.abspath("../"))

from docs.generate_mappings import generate_mapping_docs # noqa: E402

# Configuration file for the Sphinx documentation builder.
#
# For the full list of built-in configuration values, see the documentation:
Expand All @@ -20,11 +22,9 @@
# https://www.sphinx-doc.org/en/master/usage/configuration.html#general-configuration

extensions = [
"autoapi.extension",
# "autoapi.extension",
"sphinx.ext.autodoc",
"sphinx.ext.autosummary",
"sphinx.ext.autosectionlabel",
"sphinx_tabs.tabs",
]

add_module_names = False
Expand All @@ -48,16 +48,7 @@
"image_light": "cosmos-icon.svg",
"image_dark": "cosmos-icon.svg",
},
"footer_items": ["copyright"],
"footer_start": ["copyright"],
}


def skip_logger_objects(app, what, name, obj, skip, options):
if "logger" in name:
skip = True

return skip


def setup(sphinx):
sphinx.connect("autoapi-skip-member", skip_logger_objects)
generate_mapping_docs()
17 changes: 17 additions & 0 deletions docs/configuration/compiled-sql.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
.. _compiled-sql:

Compiled SQL
====================

When using the local execution mode, Cosmos will store the compiled SQL for each model in the ``compiled_sql`` field of the task's ``template_fields``. This allows you to view the compiled SQL in the Airflow UI.

If you'd like to disable this feature, you can set ``should_store_compiled_sql=False`` on the local operator (or via the ``operator_args`` parameter on the DAG/Task Group). For example:

.. code-block:: python
from cosmos import DbtDag
DbtDag(
operator_args={"should_store_compiled_sql": False},
# ...,
)
5 changes: 5 additions & 0 deletions docs/configuration/execution-config.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
Execution Config
==================

Cosmos supports multiple ways of executing your dbt models.
For more information, see the `execution modes <../getting_started/execution-modes.html>`_ page.
22 changes: 12 additions & 10 deletions docs/dbt/docs.rst → docs/configuration/generating-docs.rst
Original file line number Diff line number Diff line change
@@ -1,7 +1,9 @@
.. _generating-docs:

Generating Docs
================
===============

dbt allows you to generate static documentation on your models, tables, and more. You can read more about it in the `official documentation <https://docs.getdbt.com/docs/building-a-dbt-project/documentation>`_. For an example of what the docs look like with the ``jaffle_shop`` project, check out `this site <http://cosmos-docs.s3-website-us-east-1.amazonaws.com/>`_.
dbt allows you to generate static documentation on your models, tables, and more. You can read more about it in the `official dbt documentation <https://docs.getdbt.com/docs/building-a-dbt-project/documentation>`_. For an example of what the docs look like with the ``jaffle_shop`` project, check out `this site <http://cosmos-docs.s3-website-us-east-1.amazonaws.com/>`_.

Many users choose to generate and serve these docs on a static website. This is a great way to share your data models with your team and other stakeholders.

Expand All @@ -20,7 +22,7 @@ Examples
Upload to S3
~~~~~~~~~~~~~~~~~~~~~~~

S3 supports serving static files directly from a bucket. To learn more (and to set it up), check out the `official documentation <https://docs.aws.amazon.com/AmazonS3/latest/dev/WebsiteHosting.html>`_.
S3 supports serving static files directly from a bucket. To learn more (and to set it up), check out the `official S3 documentation <https://docs.aws.amazon.com/AmazonS3/latest/dev/WebsiteHosting.html>`_.

You can use the :class:`~cosmos.operators.DbtDocsS3Operator` to generate and upload docs to a S3 bucket. The following code snippet shows how to do this with the default jaffle_shop project:

Expand All @@ -32,14 +34,14 @@ You can use the :class:`~cosmos.operators.DbtDocsS3Operator` to generate and upl
generate_dbt_docs_aws = DbtDocsS3Operator(
task_id="generate_dbt_docs_aws",
project_dir="path/to/jaffle_shop",
conn_id="airflow_db",
schema="public",
profile_config=profile_config,
# docs-specific arguments
aws_conn_id="test_aws",
bucket_name="test_bucket",
)
Upload to Azure Blob Storage
~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Azure Blob Storage supports serving static files directly from a container. To learn more (and to set it up), check out the `official documentation <https://docs.microsoft.com/en-us/azure/storage/blobs/storage-blob-static-website>`_.

Expand All @@ -53,8 +55,8 @@ You can use the :class:`~cosmos.operators.DbtDocsAzureStorageOperator` to genera
generate_dbt_docs_azure = DbtDocsAzureStorageOperator(
task_id="generate_dbt_docs_azure",
project_dir="path/to/jaffle_shop",
conn_id="airflow_db",
schema="public",
profile_config=profile_config,
# docs-specific arguments
azure_conn_id="test_azure",
container_name="$web",
)
Expand Down Expand Up @@ -99,7 +101,7 @@ If you want to run custom code after the docs are generated, you can use the :cl
generate_dbt_docs = DbtDocsOperator(
task_id="generate_dbt_docs",
project_dir="path/to/jaffle_shop",
conn_id="airflow_db",
schema="public",
profile_config=profile_config,
# docs-specific arguments
callback=upload_docs,
)
22 changes: 22 additions & 0 deletions docs/configuration/index.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
.. _configuration:

Configuration
=============

Cosmos offers a number of configuration options to customize its behavior. For more info, check out the links on the left or the table of contents below.

.. toctree::
:caption: Contents:

Project Config <project-config>
Profile Config <profile-config>
Execution Config <execution-config>
Render Config <render-config>

Parsing Methods <parsing-methods>
Configuring Lineage <lineage>
Generating Docs <generating-docs>
Scheduling <scheduling>
Testing Behavior <testing-behavior>
Selecting & Excluding <selecting-excluding>
Compiled SQL <compiled-sql>
81 changes: 81 additions & 0 deletions docs/configuration/lineage.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,81 @@
.. _lineage:

Configuring Lineage
===================

Cosmos uses the `dbt-ol <https://openlineage.io/blog/dbt-with-marquez/>`_ wrapper to emit lineage events to OpenLineage. Follow the instructions below to ensure Cosmos is configured properly to do this.

With a Virtual Environment
--------------------------

1. Add steps in your ``Dockerfile`` for the venv and wrapping the dbt executable

.. code-block:: Docker
FROM quay.io/astronomer/astro-runtime:7.2.0
# install python virtualenv to run dbt
WORKDIR /usr/local/airflow
COPY dbt-requirements.txt ./
RUN python -m venv dbt_venv && source dbt_venv/bin/activate && \
pip install --no-cache-dir -r dbt-requirements.txt && deactivate
# wrap the executable from the venv so that dbt-ol can access it
RUN echo -e '#!/bin/bash' > /usr/bin/dbt && \
echo -e 'source /usr/local/airflow/dbt_venv/bin/activate && dbt "$@"' >> /usr/bin/dbt
# ensure all users have access to the executable
RUN chmod -R 777 /usr/bin/dbt
2. Create a ``dbt-requirements.txt`` file with the following contents. If you're using a different
data warehouse than Redshift, then replace with the one that you're using (i.e. ``dbt-bigquery``,
``dbt-snowflake``, etc.)

.. code-block:: text
dbt-redshift
openlineage-dbt
3. Add the following to your ``requirements.txt`` file

.. code-block:: text
astronomer-cosmos
4. When instantiating a Cosmos object be sure to use the ``dbt_executable_path`` parameter for the dbt-ol
installed

.. code-block:: python
jaffle_shop = DbtTaskGroup(
...,
ExecutionConfig(
dbt_executable_path="/usr/local/airflow/dbt_venv/bin/dbt-ol",
),
)
With the Base Cosmos Python Package
-----------------------------------

If you're using the base Cosmos Python package, then you'll need to install the dbt-ol wrapper
using the ``[dbt-openlineage]`` extra.

1. Add the following to your ``requirements.txt`` file

.. code-block:: text
astronomer-cosmos[dbt-openlineage]
2. When instantiating a Cosmos object be sure to use the ``dbt_executable_path`` parameter for the dbt-ol
installed

.. code-block:: python
jaffle_shop = DbtTaskGroup(
...,
ExecutionConfig(
dbt_executable_path="/usr/local/airflow/dbt_venv/bin/dbt-ol",
),
)
98 changes: 98 additions & 0 deletions docs/configuration/parsing-methods.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,98 @@
.. _parsing-methods:

Parsing Methods
===============

Cosmos offers several options to parse your dbt project:

- ``automatic``. Tries to find a user-supplied ``manifest.json`` file. If it can't find one, it will run ``dbt ls`` to generate one. If that fails, it will use Cosmos' dbt parser.
- ``dbt_manifest``. Parses a user-supplied ``manifest.json`` file. This can be generated manually with dbt commands or via a CI/CD process.
- ``dbt_ls``. Parses a dbt project directory using the ``dbt ls`` command.
- ``custom``. Uses Cosmos' custom dbt parser, which extracts dependencies from your dbt's model code.

There are benefits and drawbacks to each method:

- ``dbt_manifest``: You have to generate the manifest file on your own. When using the manifest, Cosmos gets a complete set of metadata about your models. However, Cosmos uses its own selecting & excluding logic to determine which models to run, which may not be as robust as dbt's.
- ``dbt_ls``: Cosmos will generate the manifest file for you. This method uses dbt's metadata AND dbt's selecting/excluding logic. This is the most robust method. However, this requires the dbt executable to be installed on your machine (either on the host directly or in a virtual environment).
- ``custom``: Cosmos will parse your project and model files for you. This means that Cosmos will not have access to dbt's metadata. However, this method does not require the dbt executable to be installed on your machine.

If you're using the ``local`` mode, you should use the ``dbt_ls`` method.

If you're using the ``docker`` or ``kubernetes`` modes, you should use either ``dbt_manifest`` or ``custom`` modes.


``automatic``
-------------

When you don't supply an argument to the ``load_mode`` parameter (or you supply the value ``"automatic"``), Cosmos will attempt the other methods in order:

1. Use a pre-existing ``manifest.json`` file (``dbt_manifest``)
2. Try to generate a ``manifest.json`` file from your dbt project (``dbt_ls``)
3. Use Cosmos' dbt parser (``custom``)

To use this method, you don't need to supply any additional config. This is the default.

``dbt_manifest``
----------------

If you already have a ``manifest.json`` file created by dbt, Cosmos will parse the manifest to generate your DAG.

You can supply a ``manifest_path`` parameter on the DbtDag / DbtTaskGroup with a path to a ``manifest.json`` file.

To use this:

.. code-block:: python
DbtDag(
project_config=ProjectConfig(
manifest_path="/path/to/manifest.json",
),
render_config=RenderConfig(
load_mode=LoadMode.DBT_MANIFEST,
)
# ...,
)
``dbt_ls``
----------

.. note::

This only works for the ``local`` execution mode.

If you don't have a ``manifest.json`` file, Cosmos will attempt to generate one from your dbt project. It does this by running ``dbt ls`` and parsing the output.

When Cosmos runs ``dbt ls``, it also passes your ``select`` and ``exclude`` arguments to the command. This means that Cosmos will only generate a manifest for the models you want to run.

To use this:

.. code-block:: python
DbtDag(
render_config=RenderConfig(
load_mode=LoadMode.DBT_LS,
)
# ...,
)
``custom``
----------

If the above methods fail, Cosmos will default to using its own dbt parser. This parser is not as robust as dbt's, so it's recommended that you use one of the above methods if possible.

The following are known limitations of the custom parser:

- it does not read from the ``dbt_project.yml`` file
- it does not parse Python files or models

To use this:

.. code-block:: python
DbtDag(
render_config=RenderConfig(
load_mode=LoadMode.CUSTOM,
)
# ...,
)
4 changes: 4 additions & 0 deletions docs/configuration/profile-config.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
Profile Config
================

Cosmos has multiple methods for supplying profiles. For more information, click on the Profiles tab on the top navbar.
Loading

0 comments on commit 68e1474

Please sign in to comment.