Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AIRFLOW-6095] Filter dags returned by task_stats #6684

Merged
merged 1 commit into from
Dec 4, 2019

Conversation

robinedwards
Copy link
Contributor

@robinedwards robinedwards commented Nov 28, 2019

Jira

  • My PR addresses the following Airflow Jira issues and references them in the PR title. For example, "[AIRFLOW-XXX] My Airflow PR"
    • https://issues.apache.org/jira/browse/AIRFLOW-6095
    • In case you are fixing a typo in the documentation you can prepend your commit with [AIRFLOW-XXX], code changes always need a Jira issue.
    • In case you are proposing a fundamental code change, you need to create an Airflow Improvement Proposal (AIP).
    • In case you are adding a dependency, check if the license complies with the ASF 3rd Party License Policy.

Description

  • Here are some details about my PR, including screenshots of any UI changes:

Add dag_ids parameter to task_stats end point so can filter by a set of dag_ids
present on the page. This is intended to speed up the response time
and reduce the size of the payload when running a large number of dags. I've experienced this endpoint taking up to 10 seconds and returning a 8mb (uncompressed) payload for around 1500 dags.

Note I have another branch against 1-10-test should this be approved.

Tests

  • My PR adds the following unit tests:

See test_views.py

Commits

  • My commits all reference Jira issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "How to write a good git commit message":
    1. Subject is separated from body by a blank line
    2. Subject is limited to 50 characters (not including Jira issue reference)
    3. Subject does not end with a period
    4. Subject uses the imperative mood ("add", not "adding")
    5. Body wraps at 72 characters
    6. Body explains "what" and "why", not "how"

Documentation

  • In case of new functionality, my PR adds documentation that describes how to use it.

    • All the public functions and the classes in the PR contain docstrings that explain what it does
    • If you implement backwards incompatible changes, please leave a note in the Updating.md so we can assign it to a appropriate release
  • No documentation needed

@robinedwards robinedwards force-pushed the airflow-6095-master branch 2 times, most recently from d918bef to 60bd3aa Compare November 28, 2019 13:33
@codecov-io
Copy link

codecov-io commented Nov 28, 2019

Codecov Report

Merging #6684 into master will decrease coverage by <.01%.
The diff coverage is 95%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #6684      +/-   ##
==========================================
- Coverage   83.88%   83.88%   -0.01%     
==========================================
  Files         668      668              
  Lines       37684    37694      +10     
==========================================
+ Hits        31612    31620       +8     
- Misses       6072     6074       +2
Impacted Files Coverage Δ
airflow/www/views.py 76.77% <95%> (+0.16%) ⬆️
airflow/utils/dag_processing.py 52.13% <0%> (-0.39%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 0a0d097...cbb8982. Read the comment docs.

@@ -403,7 +405,7 @@ <h2>DAGs</h2>
container: "body",
});
});
d3.json("{{ url_for('Airflow.task_stats') }}", function(error, json) {
d3.json("{{ url_for('Airflow.task_stats') }}?dag_ids=" + all_dags_ids.join(','), function(error, json) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is , allowed in DAGids? 😁

Also this parameter needs to be URI escaped, or bettwer yet if d3.json supports it d3.json("{{ url_for('Airflow.task_stats') }}", {"dag_ids": all_dags_ids.join(',') }, function(error, json) {

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Passing get parameters like you suggested above wasn't supported by d3.json so instead I've encoded each dag_id. I thought about passing thedag_id= multiple times but it does result in a rather long GET header (which I believe is often capped at 8kb?). This could potentially overflow for users with 100 dags per page.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think, we can use request.getlist("dag_ids") and send all request in seperate parameter e.g. url-view?dag_ids=AAA&dag_ids=AAA. Parameters should be encoded and so independent.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey sorry this got lost in my inbox. I still have to provide a PR for the /blocked endpoint. Happy to provide an additional PR switching the rest of the end points to use getlist() if you guys prefer it?

airflow/www/views.py Outdated Show resolved Hide resolved
airflow/www/views.py Outdated Show resolved Hide resolved
tests/www/test_views.py Outdated Show resolved Hide resolved
Copy link
Member

@ashb ashb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comments

@robinedwards robinedwards force-pushed the airflow-6095-master branch 3 times, most recently from cbb8982 to df0c127 Compare December 4, 2019 01:13
Add dag_ids parameter to task_stats so can filter by a set of dag_ids
present on the dags view. This is intended to speed up the response time
and reduce the size of the payload when running a large number of dags.
@ashb ashb merged commit 527c0dd into apache:master Dec 4, 2019
@ashb
Copy link
Member

ashb commented Dec 4, 2019

/cc @KevinYang21 You might find this useful.

ashb pushed a commit that referenced this pull request Dec 5, 2019
Add dag_ids parameter to task_stats so can filter by a set of dag_ids
present on the dags view. This is intended to speed up the response time
and reduce the size of the payload when running a large number of dags.

(Merged on to release branch by PR#6730)

(cherry picked from commit 2b11f55)
kaxil pushed a commit that referenced this pull request Dec 12, 2019
Add dag_ids parameter to task_stats so can filter by a set of dag_ids
present on the dags view. This is intended to speed up the response time
and reduce the size of the payload when running a large number of dags.

(Merged on to release branch by PR#6730)

(cherry picked from commit 2b11f55)
galuszkak pushed a commit to FlyrInc/apache-airflow that referenced this pull request Mar 5, 2020
Add dag_ids parameter to task_stats so can filter by a set of dag_ids
present on the dags view. This is intended to speed up the response time
and reduce the size of the payload when running a large number of dags.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants