Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unsafe code: label «run_id» of «airflow_dag_run_duration» has unlimited cardinality #59

Closed
victorkashirin opened this issue Aug 6, 2019 · 3 comments · Fixed by #60

Comments

@victorkashirin
Copy link

The label and metric naming guide states the following about labels:

CAUTION: Remember that every unique combination of key-value label pairs represents a new time series, which can dramatically increase the amount of data stored. Do not use labels to store dimensions with high cardinality (many different label values), such as user IDs, email addresses, or other unbounded sets of values.

Given that typical run_id includes the date of run, it should generate new time series for each individual dag run, which might eventually blow up Prometheus.

@elephantum
Copy link
Contributor

@victorkashirin This is a good observation. Do you have a solution in mind?

Use case for this metric was to monitor and alert on stuck dagruns.

Maybe we do not need run_id as a dimension, but then this metric becomes ambiguous if several dagruns are executing simultaneously.

Thoughts?

@victorkashirin
Copy link
Author

Stuck dag run will require investigation anyway, and you can retrieve information about currently running dagruns and their duration in seconds via Ad Hoc Query(/admin/queryview/) with something like this (for Postgres):

select 
  dr.run_id, 
  ROUND(EXTRACT(EPOCH FROM (now() at time zone 'utc' - dr.start_date::timestamp))) as duration
from dag_run dr
where dag_id = '<dag_id>' and state = 'running';

Here <dag_id> will be known from alert, thus it's just a matter of running this query to find which dagrun is hanging.

@elephantum
Copy link
Contributor

I agree, we will change exported metrics not to include run_id

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants