Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AIRFLOW-6871] optimize tree view for large DAGs #7492

Merged
merged 1 commit into from
Feb 28, 2020

Conversation

houqp
Copy link
Member

@houqp houqp commented Feb 21, 2020

This change reduces page size by more than 10X and reduces page load time by 3-5X. As a result, the tree view can now load large DAGs that were causing 5XX errors before.

Another example: one of our DAGs' tree view had a page size of 200MB and took 1 minute to load. With this patch, it now loads within 24s with a page size of 17MB.

List of optimizations applied to the view handler:

  • only seralize used task instance attributes to json instead of the
    whole ORM object
  • encode task instance attributes as array instead of dict
  • encode datetime in unix timestamp instead of iso formmat string
  • push task instance attribute construction into client side JS
  • remove redundant task instance attributes
  • simplify reduce_nodes() logic, remove unnecessary if statements
  • seralize JSON as string to be used with JSON.parse on the client side
    to speed up browser JS parse time
  • remove spaces in seralized JSON string to reduce payload size

Issue link: AIRFLOW-6871

Make sure to mark the boxes below before creating PR: [x]

  • Description above provides context of the change
  • Commit message/PR title starts with [AIRFLOW-NNNN]. AIRFLOW-NNNN = JIRA ID*
  • Unit tests coverage for changes (not needed for documentation changes)
  • Commits follow "How to write a good git commit message"
  • Relevant documentation is updated including usage instructions.
  • I will engage committers as explained in Contribution Workflow Example.

* For document-only changes commit message can start with [AIRFLOW-XXXX].


In case of fundamental code change, Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in UPDATING.md.
Read the Pull Request Guidelines for more information.

@boring-cyborg boring-cyborg bot added the area:webserver Webserver related Issues label Feb 21, 2020
@houqp houqp requested a review from ashb February 21, 2020 21:54
@houqp houqp force-pushed the tree_optimized branch 6 times, most recently from 83548cb to 41c65e1 Compare February 22, 2020 04:51
@codecov-io
Copy link

codecov-io commented Feb 22, 2020

Codecov Report

Merging #7492 into master will increase coverage by 0.08%.
The diff coverage is 84.21%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #7492      +/-   ##
==========================================
+ Coverage   86.76%   86.85%   +0.08%     
==========================================
  Files         896      896              
  Lines       42649    42663      +14     
==========================================
+ Hits        37005    37055      +50     
+ Misses       5644     5608      -36
Impacted Files Coverage Δ
airflow/www/views.py 76.19% <84.21%> (-0.05%) ⬇️
airflow/jobs/scheduler_job.py 90.07% <0%> (+0.43%) ⬆️
airflow/utils/sqlalchemy.py 84.93% <0%> (+1.36%) ⬆️
airflow/hooks/dbapi_hook.py 91.73% <0%> (+1.65%) ⬆️
airflow/providers/postgres/hooks/postgres.py 94.36% <0%> (+16.9%) ⬆️
...roviders/google/cloud/operators/postgres_to_gcs.py 85.29% <0%> (+32.35%) ⬆️
airflow/providers/postgres/operators/postgres.py 100% <0%> (+50%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 3111406...adf0c6d. Read the comment docs.

airflow/www/templates/airflow/tree.html Outdated Show resolved Hide resolved
airflow/www/templates/airflow/tree.html Outdated Show resolved Hide resolved
airflow/www/views.py Outdated Show resolved Hide resolved
@ashb
Copy link
Member

ashb commented Feb 22, 2020

Can you expand on "seralize JSON as string to be used with JSON.parse on the client side
to speed up browser JS parse time"? I don't see how doing more can make it faster

@houqp
Copy link
Member Author

houqp commented Feb 23, 2020

@ashb a quick analysis can be found at: https://developpaper.com/json-parse-is-faster-than-object-literal-syntax/.

This optimization only makes sense for very large JSON payload, which is the case for tree.html. For the large DAGs that we have, it reduces page load time by couple hundred milliseconds.

@houqp houqp requested a review from ashb February 23, 2020 23:15
@houqp houqp force-pushed the tree_optimized branch 3 times, most recently from ebcd4e5 to 2697fda Compare February 24, 2020 00:48
airflow/www/views.py Outdated Show resolved Hide resolved
airflow/www/views.py Outdated Show resolved Hide resolved
@houqp
Copy link
Member Author

houqp commented Feb 24, 2020

@ashb all builds are passing now, ready for another round of review.

if task.depends_on_past:
node['depends_on_past'] = task.depends_on_past
if task.start_date:
# round to seconds to reduce payload size
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How much does it reduce it by? Is stripping of all ms noticable? (Could we perhaps limit to 3 or 6 sig. fig.?)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not much for task node since we don't have too many of them, that's why i didn't add the rounding in the first place here. It did make a big difference for task instance node since we have lots of them, IIRC, probably around 10-20% size reduction.

I can change it to round to 3 sig. fig. everywhere to see what the performance implication would be.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ashb round to 3 sig. fig. increases the overall payload size by 15%. The question now is do we care about millisecond accuracy for task start/end time enough to take this 15% performance hit?

So far, I found second granularity has been good enough for us, but I might be missing other use-cases.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably not needed.

Is it worth making it a config option do you think?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code path has a very hot function call loop that's very sensitive to if statements. For the large DAG that we have, adding one extra if statement increases the response time by more than 400ms. That's why simplify reduce_nodes() logic, remove unnecessary if statements is in the optimization list :)

That and based on the understanding that we are rewriting Airflow web into a proper SPA, I think it's best not to introduce a config for this change. I would prefer us giving round to second a try and come back to add more sig. fig. or add a config later on if any real use-case comes up. It's better to not engineer solutions when we don't have a good use-case in mind.

If you are really concerned about the precision, we can perhaps change it to round to 1 sig. fig. for a 7% performance hit. I can't really think of a case where knowing 0.01 second difference of runtime is important.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good reasoning. I'll copy some/most of this in to the commit message for future-proofing.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leaving this unresolved as a reminder to myself.

airflow/www/views.py Outdated Show resolved Hide resolved
@houqp houqp force-pushed the tree_optimized branch 4 times, most recently from 916dc8e to cb933e3 Compare February 28, 2020 07:33
This change reduces page size by more than 10X and
reduces page load time by 3-5X. As a result, the
tree view can now load large DAGs that were causing
5XX error without the patch.

List of optimizations applied to the view handler:
* only seralize used task instance attributes to json instead of the
  whole ORM object
* encode task instance attributes as array instead of dict
* encode datetime in unix timestamp instead of iso formmat string
* push task instance attribute construction into client side JS
* remove redundant task instance attributes
* simplify reduce_nodes() logic, remove unnecessary if statements
* seralize JSON as string to be used with JSON.parse on the client side
  to speed up browser JS parse time
* remove spaces in seralized JSON string to reduce payload size
@houqp
Copy link
Member Author

houqp commented Feb 28, 2020

@ashb updated if statement and CI is passing now :)

@ashb ashb merged commit c1c2d6a into apache:master Feb 28, 2020
@houqp houqp deleted the tree_optimized branch February 28, 2020 23:01
galuszkak pushed a commit to FlyrInc/apache-airflow that referenced this pull request Mar 5, 2020
This change reduces page size by more than 10X and
reduces page load time by 3-5X. As a result, the
tree view can now load large DAGs that were causing
5XX error without the patch.

List of optimizations applied to the view handler:
* only seralize used task instance attributes to json instead of the
  whole ORM object
* encode task instance attributes as array instead of dict
* encode datetime in unix timestamp instead of iso formmat string
* push task instance attribute construction into client side JS
* remove redundant task instance attributes
* simplify reduce_nodes() logic, remove unnecessary if statements
* seralize JSON as string to be used with JSON.parse on the client side
  to speed up browser JS parse time
* remove spaces in seralized JSON string to reduce payload size
kaxil added a commit that referenced this pull request Apr 1, 2020
This change reduces page size by more than 10X and
reduces page load time by 3-5X. As a result, the
tree view can now load large DAGs that were causing
5XX error without the patch.

List of optimizations applied to the view handler:
* only seralize used task instance attributes to json instead of the
  whole ORM object
* encode task instance attributes as array instead of dict
* encode datetime in unix timestamp instead of iso formmat string
* push task instance attribute construction into client side JS
* remove redundant task instance attributes
* simplify reduce_nodes() logic, remove unnecessary if statements
* seralize JSON as string to be used with JSON.parse on the client side
  to speed up browser JS parse time
* remove spaces in seralized JSON string to reduce payload size

Co-Authored-By: QP Hou <[email protected]>

(cherry-picked from c1c2d6a)
kaxil added a commit that referenced this pull request Apr 2, 2020
This change reduces page size by more than 10X and
reduces page load time by 3-5X. As a result, the
tree view can now load large DAGs that were causing
5XX error without the patch.

List of optimizations applied to the view handler:
* only seralize used task instance attributes to json instead of the
  whole ORM object
* encode task instance attributes as array instead of dict
* encode datetime in unix timestamp instead of iso formmat string
* push task instance attribute construction into client side JS
* remove redundant task instance attributes
* simplify reduce_nodes() logic, remove unnecessary if statements
* seralize JSON as string to be used with JSON.parse on the client side
  to speed up browser JS parse time
* remove spaces in seralized JSON string to reduce payload size

Co-Authored-By: QP Hou <[email protected]>

(cherry-picked from c1c2d6a)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:performance area:webserver Webserver related Issues
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants