Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AIRFLOW-5274] dag loading duration metric name too long #5890

Merged
merged 1 commit into from
Aug 26, 2019

Conversation

feng-tao
Copy link
Member

@feng-tao feng-tao commented Aug 23, 2019

Make sure you have checked all steps below.

Jira

  • My PR addresses the following Airflow Jira issues and references them in the PR title. For example, "[AIRFLOW-XXX] My Airflow PR"
    • https://issues.apache.org/jira/browse/AIRFLOW-XXX
    • In case you are fixing a typo in the documentation you can prepend your commit with [AIRFLOW-XXX], code changes always need a Jira issue.
    • In case you are proposing a fundamental code change, you need to create an Airflow Improvement Proposal (AIP).
    • In case you are adding a dependency, check if the license complies with the ASF 3rd Party License Policy.

Description

  • Here are some details about my PR, including screenshots of any UI changes:
    Make dag loading duration metric name not too long when the dag file has multiple dags.

Tests

  • My PR adds the following unit tests OR does not need testing for this extremely good reason:

Commits

  • My commits all reference Jira issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "How to write a good git commit message":
    1. Subject is separated from body by a blank line
    2. Subject is limited to 50 characters (not including Jira issue reference)
    3. Subject does not end with a period
    4. Subject uses the imperative mood ("add", not "adding")
    5. Body wraps at 72 characters
    6. Body explains "what" and "why", not "how"

Documentation

  • In case of new functionality, my PR adds documentation that describes how to use it.
    • All the public functions and the classes in the PR contain docstrings that explain what it does
    • If you implement backwards incompatible changes, please leave a note in the Updating.md so we can assign it to a appropriate release

Code Quality

  • Passes flake8

@feng-tao
Copy link
Member Author

cc @milton0825

elif file_stat.dag_num > 1:
# if the file has multiple dags, use dag.loading-duration.file for metric
Stats.timing('dag.loading-duration.{}'.
format(file_stat.file),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need to replace slashes, or some other special chars?

Copy link
Member

@ashb ashb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should add a note to UPDATING.md about the changed metric name

Stats.timing('dag.loading-duration.{}'.
format(dag_names),
format(dag_ids[0]),
file_stat.duration)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think for consistency we should always use the filename - otherwise any monitoring dashboards of load time will be "mixed"

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make sense, will do.

@feng-tao
Copy link
Member Author

@kaxil @ashb @milton0825 PR updated.

format(dag_names),
file_stat.duration)
Stats.timing('dag.loading-duration.{}'.
format(file_stat.file),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: Is file_stat.file just the file name or it is the file path that contains /?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, I double check and the file name is /subdag/dagname.py something. Update the pr.

@feng-tao feng-tao merged commit 45176c8 into apache:master Aug 26, 2019
@feng-tao feng-tao deleted the airflow_5274 branch August 27, 2019 03:57
format(dag_names),
file_stat.duration)
# file_stat.file similar format: /subdir/dag_name.py
filename = file_stat.file.split('/')[-1].replace('.py', '')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's possible to have the same file name in different folders. please can we join the split, instead of just taking the filename? eg.

filename = '.'.join(file_stat.file.split('/')).replace('.py', '')

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be a bad practice to have the same file name for different DAGs. If you think this would be an issue on your side, feel free to raise a pr.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Besides if we join all the subdir, it will make those stats unreadable if the dir for the dag file is very deep. Personally I am not favor of this. Not sure how other committer thinks.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i was imagining a file structure like the for example. it doesn't seem too unreasonable a structure?

dag_dir
   +->dag_module_1
   |       +->dag_definition.py
   |       |->callbacks.py
   |       |->functions.py
   |       |->config.json
   +->dag_module_2
           +->dag_definition.py
           |->callbacks.py
           |->functions.py
           |->config.json

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but yea, we dont use that dir structure, so im happy with the merge :)

Jerryguo pushed a commit to Jerryguo/airflow that referenced this pull request Sep 2, 2019
lcunibertti pushed a commit to winclap/airflow that referenced this pull request Apr 28, 2020
kaxil pushed a commit that referenced this pull request Oct 8, 2020
kaxil pushed a commit that referenced this pull request Oct 11, 2020
RaviTezu pushed a commit to RaviTezu/airflow that referenced this pull request Oct 25, 2020
kaxil pushed a commit that referenced this pull request Nov 12, 2020
@potiuk potiuk added the type:bug-fix Changelog: Bug Fixes label Nov 14, 2020
@potiuk potiuk added this to the Airflow 1.10.13 milestone Nov 14, 2020
potiuk pushed a commit that referenced this pull request Nov 16, 2020
kaxil pushed a commit that referenced this pull request Nov 18, 2020
cfei18 pushed a commit to cfei18/incubator-airflow that referenced this pull request Mar 5, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type:bug-fix Changelog: Bug Fixes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants