Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Missing Statsd Metrics in Docs #13708

Merged
merged 1 commit into from
Jan 16, 2021
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
67 changes: 41 additions & 26 deletions docs/apache-airflow/logging-monitoring/metrics.rst
Original file line number Diff line number Diff line change
Expand Up @@ -77,32 +77,40 @@ See :doc:`../modules_management` for details on how Python and Airflow manage mo
Counters
--------

======================================= ================================================================
Name Description
======================================= ================================================================
``<job_name>_start`` Number of started ``<job_name>`` job, ex. ``SchedulerJob``, ``LocalTaskJob``
``<job_name>_end`` Number of ended ``<job_name>`` job, ex. ``SchedulerJob``, ``LocalTaskJob``
``operator_failures_<operator_name>`` Operator ``<operator_name>`` failures
``operator_successes_<operator_name>`` Operator ``<operator_name>`` successes
``ti_failures`` Overall task instances failures
``ti_successes`` Overall task instances successes
``zombies_killed`` Zombie tasks killed
``scheduler_heartbeat`` Scheduler heartbeats
``dag_processing.processes`` Number of currently running DAG parsing processes
``scheduler.tasks.killed_externally`` Number of tasks killed externally
``scheduler.tasks.running`` Number of tasks running in executor
``scheduler.tasks.starving`` Number of tasks that cannot be scheduled because of no open slot in pool
``scheduler.orphaned_tasks.cleared`` Number of Orphaned tasks cleared by the Scheduler
``scheduler.orphaned_tasks.adopted`` Number of Orphaned tasks adopted by the Scheduler
``scheduler.critical_section_busy`` Count of times a scheduler process tried to get a lock on the critical
section (needed to send tasks to the executor) and found it locked by
another process.
``sla_email_notification_failure`` Number of failed SLA miss email notification attempts
``ti.start.<dagid>.<taskid>`` Number of started task in a given dag. Similar to <job_name>_start but for task
``ti.finish.<dagid>.<taskid>.<state>`` Number of completed task in a given dag. Similar to <job_name>_end but for task
``dag.callback_exceptions`` Number of exceptions raised from DAG callbacks. When this happens, it means DAG callback is not working.
``celery.task_timeout_error`` Number of ``AirflowTaskTimeout`` errors raised when publishing Task to Celery Broker.
======================================= ================================================================
=========================================== ================================================================
Name Description
=========================================== ================================================================
``<job_name>_start`` Number of started ``<job_name>`` job, ex. ``SchedulerJob``, ``LocalTaskJob``
``<job_name>_end`` Number of ended ``<job_name>`` job, ex. ``SchedulerJob``, ``LocalTaskJob``
``<job_name>_heartbeat_failure`` Number of failed Heartbeats for a ``<job_name>`` job, ex. ``SchedulerJob``,
``LocalTaskJob``
``operator_failures_<operator_name>`` Operator ``<operator_name>`` failures
``operator_successes_<operator_name>`` Operator ``<operator_name>`` successes
``ti_failures`` Overall task instances failures
``ti_successes`` Overall task instances successes
``previously_succeeded`` Number of previously succeeded task instances
``zombies_killed`` Zombie tasks killed
``scheduler_heartbeat`` Scheduler heartbeats
``dag_processing.processes`` Number of currently running DAG parsing processes
``dag_processing.manager_stalls`` Number of stalled ``DagFileProcessorManager``
``dag_file_refresh_error`` Number of failures loading any DAG files
``scheduler.tasks.killed_externally`` Number of tasks killed externally
``scheduler.orphaned_tasks.cleared`` Number of Orphaned tasks cleared by the Scheduler
``scheduler.orphaned_tasks.adopted`` Number of Orphaned tasks adopted by the Scheduler
``scheduler.critical_section_busy`` Count of times a scheduler process tried to get a lock on the critical
section (needed to send tasks to the executor) and found it locked by
another process.
``sla_email_notification_failure`` Number of failed SLA miss email notification attempts
``ti.start.<dagid>.<taskid>`` Number of started task in a given dag. Similar to <job_name>_start but for task
``ti.finish.<dagid>.<taskid>.<state>`` Number of completed task in a given dag. Similar to <job_name>_end but for task
``dag.callback_exceptions`` Number of exceptions raised from DAG callbacks. When this happens, it
means DAG callback is not working.
``celery.task_timeout_error`` Number of ``AirflowTaskTimeout`` errors raised when publishing Task to Celery Broker.
``task_removed_from_dag.<dagid>`` Number of tasks removed for a given dag (i.e. task no longer exists in DAG)
``task_restored_to_dag.<dagid>`` Number of tasks restored for a given dag (i.e. task instance which was
previously in REMOVED state in the DB is added to DAG file)
``task_instance_created-<operator_name>`` Number of tasks instances created for a given Operator
=========================================== ================================================================

Gauges
------
Expand All @@ -116,6 +124,12 @@ Name Description
``dag_processing.last_runtime.<dag_file>`` Seconds spent processing ``<dag_file>`` (in most recent iteration)
``dag_processing.last_run.seconds_ago.<dag_file>`` Seconds since ``<dag_file>`` was last processed
``dag_processing.processor_timeouts`` Number of file processors that have been killed due to taking too long
``scheduler.tasks.without_dagrun`` Number of tasks without DagRuns or with DagRuns not in Running state
``scheduler.tasks.running`` Number of tasks running in executor
``scheduler.tasks.starving`` Number of tasks that cannot be scheduled because of no open slot in pool
``scheduler.tasks.executable`` Number of tasks that are ready for execution (set to queued)
with respect to pool limits, dag concurrency, executor state,
and priority.
``executor.open_slots`` Number of open slots on executor
``executor.queued_tasks`` Number of queued tasks on executor
``executor.running_tasks`` Number of running tasks on executor
Expand Down Expand Up @@ -146,4 +160,5 @@ Name Description
``scheduler.critical_section_duration`` Milliseconds spent in the critical section of scheduler loop --
only a single scheduler can enter this loop at a time
``dagrun.<dag_id>.first_task_scheduling_delay`` Milliseconds elapsed between first task start_date and dagrun expected start
``collect_db_dags`` Milliseconds taken for fetching all Serialized Dags from DB
=================================================== ========================================================================