Skip to content

Commit

Permalink
[AIRFLOW-XXX] Add section on task lifecycle & correct casing in docs (#…
Browse files Browse the repository at this point in the history
  • Loading branch information
BasPH authored and ashb committed Feb 11, 2019
1 parent 3edc91c commit d8f460e
Show file tree
Hide file tree
Showing 4 changed files with 40 additions and 16 deletions.
52 changes: 38 additions & 14 deletions docs/concepts.rst
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@
Concepts
########

The Airflow Platform is a tool for describing, executing, and monitoring
The Airflow platform is a tool for describing, executing, and monitoring
workflows.

Core Ideas
Expand Down Expand Up @@ -251,10 +251,34 @@ Task Instances
==============

A task instance represents a specific run of a task and is characterized as the
combination of a dag, a task, and a point in time. Task instances also have an
combination of a DAG, a task, and a point in time. Task instances also have an
indicative state, which could be "running", "success", "failed", "skipped", "up
for retry", etc.

Task Lifecycle
==============

A task goes through various stages from start to completion. In the Airflow UI
(graph and tree views), these stages are displayed by a color representing each
stage:

.. image:: img/task_lifecycle.png

The happy flow consists of the following stages:

1. no status (scheduler created empty task instance)
2. queued (scheduler placed a task to run on the queue)
3. running (worker picked up a task and is now running it)
4. success (task completed)

There is also visual difference between scheduled and manually triggered
DAGs/tasks:

.. image:: img/task_manual_vs_scheduled.png

The DAGs/tasks with a black border are scheduled runs, whereas the non-bordered
DAGs/tasks are manually triggered, i.e. by `airflow trigger_dag`.

Workflows
=========

Expand Down Expand Up @@ -657,7 +681,7 @@ It is possible, through use of trigger rules to mix tasks that should run
in the typical date/time dependent mode and those using the
``LatestOnlyOperator``.

For example, consider the following dag:
For example, consider the following DAG:

.. code:: python
Expand Down Expand Up @@ -690,7 +714,7 @@ For example, consider the following dag:
trigger_rule=TriggerRule.ALL_DONE)
task4.set_upstream([task1, task2])
In the case of this dag, the ``latest_only`` task will show up as skipped
In the case of this DAG, the ``latest_only`` task will show up as skipped
for all runs except the latest run. ``task1`` is directly downstream of
``latest_only`` and will also skip for all runs except the latest.
``task2`` is entirely independent of ``latest_only`` and will run in all
Expand Down Expand Up @@ -729,7 +753,7 @@ state.
Cluster Policy
==============

Your local airflow settings file can define a ``policy`` function that
Your local Airflow settings file can define a ``policy`` function that
has the ability to mutate task attributes based on other task or DAG
attributes. It receives a single argument as a reference to task objects,
and is expected to alter its attributes.
Expand All @@ -752,8 +776,8 @@ may look like inside your ``airflow_settings.py``:
Documentation & Notes
=====================

It's possible to add documentation or notes to your dags & task objects that
become visible in the web interface ("Graph View" for dags, "Task Details" for
It's possible to add documentation or notes to your DAGs & task objects that
become visible in the web interface ("Graph View" for DAGs, "Task Details" for
tasks). There are a set of special task attributes that get rendered as rich
content if defined:

Expand All @@ -767,7 +791,7 @@ doc_md markdown
doc_rst reStructuredText
========== ================

Please note that for dags, doc_md is the only attribute interpreted.
Please note that for DAGs, doc_md is the only attribute interpreted.

This is especially useful if your tasks are built dynamically from
configuration files, it allows you to expose the configuration that led
Expand Down Expand Up @@ -821,14 +845,14 @@ You can use Jinja templating with every parameter that is marked as "templated"
in the documentation. Template substitution occurs just before the pre_execute
function of your operator is called.

Packaged dags
Packaged DAGs
'''''''''''''
While often you will specify dags in a single ``.py`` file it might sometimes
be required to combine dag and its dependencies. For example, you might want
to combine several dags together to version them together or you might want
While often you will specify DAGs in a single ``.py`` file it might sometimes
be required to combine a DAG and its dependencies. For example, you might want
to combine several DAGs together to version them together or you might want
to manage them together or you might need an extra module that is not available
by default on the system you are running airflow on. To allow this you can create
a zip file that contains the dag(s) in the root of the zip file and have the extra
by default on the system you are running Airflow on. To allow this you can create
a zip file that contains the DAG(s) in the root of the zip file and have the extra
modules unpacked in directories.

For instance you can create a zip file that looks like this:
Expand Down
Binary file added docs/img/task_lifecycle.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/img/task_manual_vs_scheduled.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
4 changes: 2 additions & 2 deletions docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -25,8 +25,8 @@ Apache Airflow Documentation
Airflow is a platform to programmatically author, schedule and monitor
workflows.

Use airflow to author workflows as directed acyclic graphs (DAGs) of tasks.
The airflow scheduler executes your tasks on an array of workers while
Use Airflow to author workflows as Directed Acyclic Graphs (DAGs) of tasks.
The Airflow scheduler executes your tasks on an array of workers while
following the specified dependencies. Rich command line utilities make
performing complex surgeries on DAGs a snap. The rich user interface
makes it easy to visualize pipelines running in production,
Expand Down

0 comments on commit d8f460e

Please sign in to comment.