Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update docs structure and add getting started guides #374

Merged
merged 25 commits into from
Jul 26, 2023
Merged
Show file tree
Hide file tree
Changes from 8 commits
Commits
Show all changes
25 commits
Select commit Hold shift + click to select a range
cc99b1e
Update docs structure and add getting started guides
jlaneve Jul 18, 2023
44e1fb5
🎨 [pre-commit.ci] Auto format from pre-commit.com hooks
pre-commit-ci[bot] Jul 18, 2023
e132f97
get build working
jlaneve Jul 18, 2023
bc0a4a2
add requirements.txt file
jlaneve Jul 18, 2023
810336b
🎨 [pre-commit.ci] Auto format from pre-commit.com hooks
pre-commit-ci[bot] Jul 18, 2023
02428b8
revert python version
jlaneve Jul 18, 2023
826f404
fix broken links
jlaneve Jul 18, 2023
3850088
more docs updates
jlaneve Jul 19, 2023
b2a6da9
address some PR feedback
jlaneve Jul 22, 2023
1efb38a
Merge branch 'main' into docs-updated
harels Jul 26, 2023
7b6e05a
update astro docs
harels Jul 26, 2023
ba28a83
add dynamically generated profile pages
jlaneve Jul 26, 2023
30a2d4f
🎨 [pre-commit.ci] Auto format from pre-commit.com hooks
pre-commit-ci[bot] Jul 26, 2023
6b89b88
add __init__
jlaneve Jul 26, 2023
1bbb61c
ignore docs/profiles
jlaneve Jul 26, 2023
920e078
try to fix build issues
jlaneve Jul 26, 2023
dd98386
add airflow to docs requirements
jlaneve Jul 26, 2023
2489f5e
use relative paths
jlaneve Jul 26, 2023
f4cee7f
make dir if it doesnt exist
jlaneve Jul 26, 2023
c207940
fix ruff error
jlaneve Jul 26, 2023
1900cfe
add noqa
jlaneve Jul 26, 2023
8ce8cd3
update docs after config changes
jlaneve Jul 26, 2023
8d9fd30
🎨 [pre-commit.ci] Auto format from pre-commit.com hooks
pre-commit-ci[bot] Jul 26, 2023
8f2ee87
update readmes
jlaneve Jul 26, 2023
ef59a61
make pre commit happy
jlaneve Jul 26, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -45,7 +45,7 @@ You can render an Airflow Task Group using the ``DbtTaskGroup`` class. Here's an

from airflow import DAG
from airflow.operators.empty import EmptyOperator
from cosmos.providers.dbt.task_group import DbtTaskGroup
from cosmos import DbtTaskGroup


with DAG(
Expand Down
3 changes: 1 addition & 2 deletions cosmos/hooks/subprocess.py
Original file line number Diff line number Diff line change
Expand Up @@ -40,8 +40,7 @@ def run_command(
:param env: Optional dict containing environment variables to be made available to the shell
environment in which ``command`` will be executed. If omitted, ``os.environ`` will be used.
Note, that in case you have Sentry configured, original variables from the environment
will also be passed to the subprocess with ``SUBPROCESS_`` prefix. See
:doc:`/administration-and-deployment/logging-monitoring/errors` for details.
will also be passed to the subprocess with ``SUBPROCESS_`` prefix.
:param output_encoding: encoding to use for decoding stdout
:param cwd: Working directory to run the command in.
If None (default), the command is run in a temporary directory.
Expand Down
17 changes: 2 additions & 15 deletions docs/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,11 +20,9 @@
# https://www.sphinx-doc.org/en/master/usage/configuration.html#general-configuration

extensions = [
"autoapi.extension",
# "autoapi.extension",
jlaneve marked this conversation as resolved.
Show resolved Hide resolved
"sphinx.ext.autodoc",
"sphinx.ext.autosummary",
"sphinx.ext.autosectionlabel",
"sphinx_tabs.tabs",
]

add_module_names = False
Expand All @@ -48,16 +46,5 @@
"image_light": "cosmos-icon.svg",
"image_dark": "cosmos-icon.svg",
},
"footer_items": ["copyright"],
"footer_start": ["copyright"],
}


def skip_logger_objects(app, what, name, obj, skip, options):
if "logger" in name:
skip = True

return skip


def setup(sphinx):
sphinx.connect("autoapi-skip-member", skip_logger_objects)
8 changes: 8 additions & 0 deletions docs/configuration/compiled-sql.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
.. _compiled-sql:

Compiled SQL
====================

When using the local execution mode, Cosmos will store the compiled SQL for each model in the ``compiled_sql`` field of the task's ``template_fields``. This allows you to view the compiled SQL in the Airflow UI.

If you'd like to disable this feature, you can set ``should_store_compiled_sql=False`` on the local operator (or via the ``operator_args`` parameter on the DAG/Task Group).
10 changes: 6 additions & 4 deletions docs/dbt/docs.rst → docs/configuration/generating-docs.rst
Original file line number Diff line number Diff line change
@@ -1,7 +1,9 @@
.. _generating-docs:

Generating Docs
================
===============

dbt allows you to generate static documentation on your models, tables, and more. You can read more about it in the `official documentation <https://docs.getdbt.com/docs/building-a-dbt-project/documentation>`_. For an example of what the docs look like with the ``jaffle_shop`` project, check out `this site <http://cosmos-docs.s3-website-us-east-1.amazonaws.com/>`_.
dbt allows you to generate static documentation on your models, tables, and more. You can read more about it in the `official dbt documentation <https://docs.getdbt.com/docs/building-a-dbt-project/documentation>`_. For an example of what the docs look like with the ``jaffle_shop`` project, check out `this site <http://cosmos-docs.s3-website-us-east-1.amazonaws.com/>`_.

Many users choose to generate and serve these docs on a static website. This is a great way to share your data models with your team and other stakeholders.

Expand All @@ -20,7 +22,7 @@ Examples
Upload to S3
~~~~~~~~~~~~~~~~~~~~~~~

S3 supports serving static files directly from a bucket. To learn more (and to set it up), check out the `official documentation <https://docs.aws.amazon.com/AmazonS3/latest/dev/WebsiteHosting.html>`_.
S3 supports serving static files directly from a bucket. To learn more (and to set it up), check out the `official S3 documentation <https://docs.aws.amazon.com/AmazonS3/latest/dev/WebsiteHosting.html>`_.

You can use the :class:`~cosmos.operators.DbtDocsS3Operator` to generate and upload docs to a S3 bucket. The following code snippet shows how to do this with the default jaffle_shop project:

Expand All @@ -39,7 +41,7 @@ You can use the :class:`~cosmos.operators.DbtDocsS3Operator` to generate and upl
)

Upload to Azure Blob Storage
~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Azure Blob Storage supports serving static files directly from a container. To learn more (and to set it up), check out the `official documentation <https://docs.microsoft.com/en-us/azure/storage/blobs/storage-blob-static-website>`_.

Expand Down
17 changes: 17 additions & 0 deletions docs/configuration/index.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
.. _configuration:

Configuration
=============

Cosmos offers a number of configuration options to customize its behavior. For more info, check out the links on the left or the table of contents below.

.. toctree::
:caption: Contents:

Parsing Methods <parsing-methods>
Configuring Lineage <lineage>
Generating Docs <generating-docs>
Scheduling <scheduling>
Testing Behavior <testing-behavior>
Selecting & Excluding <selecting-excluding>
Compiled SQL <compiled-sql>
81 changes: 81 additions & 0 deletions docs/configuration/lineage.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,81 @@
.. _lineage:

Configuring Lineage
===================

Cosmos uses the `dbt-ol <https://openlineage.io/blog/dbt-with-marquez/>`_ wrapper to emit lineage events to OpenLineage. Follow the instructions below to ensure Cosmos is configured properly to do this.

With a Virtual Environment
--------------------------

1. Add steps in your ``Dockerfile`` for the venv and wrapping the dbt executable

.. code-block:: Docker

FROM quay.io/astronomer/astro-runtime:7.2.0

# install python virtualenv to run dbt
WORKDIR /usr/local/airflow
COPY dbt-requirements.txt ./
RUN python -m venv dbt_venv && source dbt_venv/bin/activate && \
pip install --no-cache-dir -r dbt-requirements.txt && deactivate

# wrap the executable from the venv so that dbt-ol can access it
RUN echo -e '#!/bin/bash' > /usr/bin/dbt && \
echo -e 'source /usr/local/airflow/dbt_venv/bin/activate && dbt "$@"' >> /usr/bin/dbt

# ensure all users have access to the executable
RUN chmod -R 777 /usr/bin/dbt

2. Create a ``dbt-requirements.txt`` file with the following contents. If you're using a different
data warehouse than Redshift, then replace with the one that you're using (i.e. ``dbt-bigquery``,
``dbt-snowflake``, etc.)

.. code-block:: text

dbt-redshift
openlineage-dbt

3. Add the following to your ``requirements.txt`` file

.. code-block:: text

astronomer-cosmos

4. When instantiating a Cosmos object be sure to use the ``dbt_executable_path`` parameter for the dbt-ol
installed

.. code-block:: python

jaffle_shop = DbtTaskGroup(
...,
dbt_args={
"dbt_executable_path": "/usr/local/airflow/dbt_venv/bin/dbt-ol",
},
)


With the Base Cosmos Python Package
-----------------------------------

If you're using the base Cosmos Python package, then you'll need to install the dbt-ol wrapper
using the ``[dbt-openlineage]`` extra.

1. Add the following to your ``requirements.txt`` file

.. code-block:: text

astronomer-cosmos[dbt-openlineage]


2. When instantiating a Cosmos object be sure to use the ``dbt_executable_path`` parameter for the dbt-ol
installed

.. code-block:: python

jaffle_shop = DbtTaskGroup(
...,
dbt_args={
"dbt_executable_path": "/usr/local/bin/dbt-ol",
},
)
57 changes: 57 additions & 0 deletions docs/configuration/parsing-methods.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,57 @@
.. _parsing-methods:

Parsing Methods
jlaneve marked this conversation as resolved.
Show resolved Hide resolved
===============

jlaneve marked this conversation as resolved.
Show resolved Hide resolved
Cosmos offers several options to parse your dbt project:

- ``automatic``. Tries to find a user-supplied ``manifest.json`` file. If it can't find one, it will run ``dbt ls`` to generate one. If that fails, it will use Cosmos' dbt parser.
- ``dbt_manifest``. Parses a user-supplied ``manifest.json`` file. This can be generated manually with dbt commands or via a CI/CD process.
- ``dbt_ls``. Parses a dbt project directory using the ``dbt ls`` command.
- ``custom``. Uses Cosmos' custom dbt parser, which extracts dependencies from your dbt's model code.


``automatic``
-------------

When you don't supply an argument to the ``load_mode`` parameter (or you supply the value ``"automatic"``), Cosmos will attempt the other methods in order:

1. Use a pre-existing ``manifest.json`` file (``dbt_manifest``)
2. Try to generate a ``manifest.json`` file from your dbt project (``dbt_ls``)
3. Use Cosmos' dbt parser (``custom``)

``dbt_manifest``
----------------

If you already have a ``manifest.json`` file created by dbt, Cosmos will parse the manifest to generate your DAG.

You can supply a ``manifest_path`` parameter on the DbtDag / DbtTaskGroup with a path to a ``manifest.json`` file. For example:

.. code-block:: python

DbtDag(
manifest_path="/path/to/manifest.json"
...,
)

``dbt_ls``
jlaneve marked this conversation as resolved.
Show resolved Hide resolved
----------

.. note::

This only works for the ``local`` and ``virtualenv`` execution modes.

If you don't have a ``manifest.json`` file, Cosmos will attempt to generate one from your dbt project. It does this by running ``dbt ls`` and parsing the output.

When Cosmos runs ``dbt ls``, it also passes your ``select`` and ``exclude`` arguments to the command. This means that Cosmos will only generate a manifest for the models you want to run.
jlaneve marked this conversation as resolved.
Show resolved Hide resolved


``custom``
----------

If the above methods fail, Cosmos will default to using its own dbt parser. This parser is not as robust as dbt's, so it's recommended that you use one of the above methods if possible.

The following are known limitations of the custom parser:

- it does not read from the ``dbt_project.yml`` file
- it does not parse Python files or models
2 changes: 2 additions & 0 deletions docs/dbt/scheduling.rst → docs/configuration/scheduling.rst
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
.. _scheduling:

Scheduling
================

Expand Down
43 changes: 43 additions & 0 deletions docs/configuration/selecting-excluding.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
.. _selecting-excluding:

Selecting & Excluding
=======================

Cosmos allows you to filter by configs (e.g. ``materialized``, ``tags``) using the ``select`` and ``exclude`` parameters. If a model contains any of the configs in the ``select``, it gets included as part of the DAG/Task Group. Similarly, if a model contains any of the configs in the ``exclude``, it gets excluded from the DAG/Task Group.
jlaneve marked this conversation as resolved.
Show resolved Hide resolved

The ``select`` and ``exclude`` parameters are dictionaries with the following keys:
jlaneve marked this conversation as resolved.
Show resolved Hide resolved

- ``configs``: a list of configs to filter by. The configs are in the format ``key:value``. For example, ``tags:daily`` or ``materialized:table``.
jlaneve marked this conversation as resolved.
Show resolved Hide resolved
- ``paths``: a list of paths to filter by. The paths are in the format ``path/to/dir``. For example, ``analytics`` or ``analytics/tables``.
jlaneve marked this conversation as resolved.
Show resolved Hide resolved

.. note::
Cosmos currently reads from (1) config calls in the model code and (2) .yml files in the models directory for tags. It does not read from the dbt_project.yml file.
jlaneve marked this conversation as resolved.
Show resolved Hide resolved

Examples:

.. code-block:: python

from cosmos import DbtDag

jaffle_shop = DbtDag(
# ...
select={"configs": ["tags:daily"]},
jlaneve marked this conversation as resolved.
Show resolved Hide resolved
)

.. code-block:: python

from cosmos import DbtDag

jaffle_shop = DbtDag(
# ...
select={"configs": ["schema:prod"]},
jlaneve marked this conversation as resolved.
Show resolved Hide resolved
)

.. code-block:: python

from cosmos import DbtDag

jaffle_shop = DbtDag(
# ...
select={"paths": ["analytics/tables"]},
jlaneve marked this conversation as resolved.
Show resolved Hide resolved
)
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
Configuration
================
.. _testing-behavior:

Cosmos offers a few different configuration options for how your dbt project is run and structured. This page describes the available options and how to configure them.
Testing Behavior
================

Testing
----------------------
Testing Configuration
---------------------

By default, Cosmos will add a test after each model. This can be overridden using the ``test_behavior`` field. The options are:
jlaneve marked this conversation as resolved.
Show resolved Hide resolved

Expand All @@ -24,8 +24,9 @@ Example:
)


Warn Notification
----------------------
Warning Behavior
----------------

.. note::

As of now, this feature is only available for the default execution mode ``local``
Expand Down Expand Up @@ -85,53 +86,3 @@ When at least one WARN message is present, the function passed to ``on_warning_c
If warnings that are not associated with tests occur (e.g. freshness warnings), they will still trigger the
``on_warning_callback`` method above. However, these warnings will not be included in the ``test_names`` and
``test_results`` context variables, which are specific to test-related warnings.

Selecting and Excluding
----------------------

Cosmos allows you to filter by configs (e.g. ``materialized``, ``tags``) using the ``select`` and ``exclude`` parameters. If a model contains any of the configs in the ``select``, it gets included as part of the DAG/Task Group. Similarly, if a model contains any of the configs in the ``exclude``, it gets excluded from the DAG/Task Group.

The ``select`` and ``exclude`` parameters are dictionaries with the following keys:

- ``configs``: a list of configs to filter by. The configs are in the format ``key:value``. For example, ``tags:daily`` or ``materialized:table``.
- ``paths``: a list of paths to filter by. The paths are in the format ``path/to/dir``. For example, ``analytics`` or ``analytics/tables``.

.. note::
Cosmos currently reads from (1) config calls in the model code and (2) .yml files in the models directory for tags. It does not read from the dbt_project.yml file.

Examples:

.. code-block:: python

from cosmos import DbtDag

jaffle_shop = DbtDag(
# ...
select={"configs": ["tags:daily"]},
)

.. code-block:: python

from cosmos import DbtDag

jaffle_shop = DbtDag(
# ...
select={"configs": ["schema:prod"]},
)

.. code-block:: python

from cosmos import DbtDag

jaffle_shop = DbtDag(
# ...
select={"paths": ["analytics/tables"]},
)


Viewing Compiled SQL
----------------------

When using the local execution mode, Cosmos will store the compiled SQL for each model in the ``compiled_sql`` field of the task's ``template_fields``. This allows you to view the compiled SQL in the Airflow UI.

If you'd like to disable this feature, you can set ``should_store_compiled_sql=False`` on the local operator (or via the ``operator_args`` parameter on the DAG/Task Group).
Loading