Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AIRFLOW-6728] Change various DAG info methods to POST #7364

Merged
merged 4 commits into from
Feb 7, 2020

Conversation

madison-ookla
Copy link
Contributor

@madison-ookla madison-ookla commented Feb 5, 2020

This PR changes the following DAG info methods to POST:

  • /dag_stats
  • /task_stats
  • /last_dagruns
  • /blocked

The reason for this is described in the JIRA ticket:

It appears that the URL uses a dag_ids parameter that is a comma separated list in the GET request. Since DAGs can have a dag_id length of 250 characters and the UI will show up to 100 DAGs per page, this means that the URL parameters alone could be 25,000 characters (not including domain, etc.). It seems that it might be best to switch this request over to POST to accommodate the potentially large request body.

Here is a small DAG file I used to generate DAGs with excessively large names:

EXPAND FOR CODE
from datetime import datetime

from airflow import DAG

from airflow.operators.dummy_operator import DummyOperator


def create_dag(dag_id,
               schedule,
               default_args):


    dag = DAG(dag_id,
              schedule_interval=schedule,
              default_args=default_args)

    with dag:
        t1 = DummyOperator(task_id='dummy_task')

    return dag


# build a dag for each number in range(100)
for n in range(0, 100):
    dag_id = 'long_name_dag__' + f"{n}".zfill(235)

    default_args = {'owner': 'airflow',
                    'start_date': datetime(2018, 1, 1)
                    }

    schedule = None

    globals()[dag_id] = create_dag(dag_id,
                                  schedule,
                                  default_args)

I was able to verify that the page loads correctly with the new changes. (BTW Breeze was extremely helpful for this 😀)


Issue link: AIRFLOW-6728

Make sure to mark the boxes below before creating PR: [x]

  • Description above provides context of the change
  • Commit message/PR title starts with [AIRFLOW-NNNN]. AIRFLOW-NNNN = JIRA ID*
  • Unit tests coverage for changes (not needed for documentation changes)
  • Commits follow "How to write a good git commit message"
  • Relevant documentation is updated including usage instructions.
  • I will engage committers as explained in Contribution Workflow Example.

* For document-only changes commit message can start with [AIRFLOW-XXXX].


In case of fundamental code change, Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in UPDATING.md.
Read the Pull Request Guidelines for more information.

@boring-cyborg boring-cyborg bot added the area:webserver Webserver related Issues label Feb 5, 2020
@madison-ookla
Copy link
Contributor Author

The build failure looks like a timeout unrelated to the changes of this PR

@codecov-io
Copy link

codecov-io commented Feb 5, 2020

Codecov Report

Merging #7364 into master will increase coverage by <.01%.
The diff coverage is 60%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #7364      +/-   ##
==========================================
+ Coverage   86.35%   86.35%   +<.01%     
==========================================
  Files         871      871              
  Lines       40627    40636       +9     
==========================================
+ Hits        35082    35091       +9     
  Misses       5545     5545
Impacted Files Coverage Δ
airflow/www/views.py 76.22% <60%> (+0.14%) ⬆️
airflow/www/app.py 94.24% <0%> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 2bfd7f2...8826960. Read the comment docs.

@robinedwards
Copy link
Contributor

Have you actually encountered issues with the request body size? POST feels a bit weird here not very RESTy

@ashb ashb requested a review from mik-laj February 5, 2020 17:11
@madison-ookla
Copy link
Contributor Author

Have you actually encountered issues with the request body size? POST feels a bit weird here not very RESTy

I brought this issue forward because the Recent Tasks column in our production instance wasn't displaying any information. After investigation I found that it was because of the request URL size. Since the limit is 4096, to query 100 DAGs those DAGs only need to be 41 characters long to reach that limit.

@ashb
Copy link
Member

ashb commented Feb 5, 2020

Rather than a JSON body more "resty" would be to have this be a "form post" -- i.e. application/x-www-form-urlencoded body.

@ashb
Copy link
Member

ashb commented Feb 5, 2020

@mik-laj has a simiarl PR (but it keeps it as a GET which doesn't fix this problem)

@madison-ookla
Copy link
Contributor Author

Rather than a JSON body more "resty" would be to have this be a "form post" -- i.e. application/x-www-form-urlencoded body.

I can change the request to use a form post instead of JSON

@mik-laj
Copy link
Member

mik-laj commented Feb 5, 2020

Here is my PR: https://github.com/apache/airflow/pulls/mik-laj
#6855
But this PR is better.

@robinedwards
Copy link
Contributor

I guess another option is doing multiple GET requests...

@mik-laj
Copy link
Member

mik-laj commented Feb 5, 2020

I would like to point out that this endpoint belongs to the internal API so it didn't have to be pretty. This should work well. Only public APIs must be beautiful, standardized and backward compatible.

@robinedwards
Copy link
Contributor

Fair point :-) Cheers for fixing @madison-ookla and apologies for any breakage my original change caused.

@madison-ookla
Copy link
Contributor Author

Thanks all for the feedback :) I would prefer to keep it as JSON just because it makes processing processing the request body a bit easier (rather than having to split a comma separated string). That said I don't do a lot of web-dev work so that practice may be commonplace.

@madison-ookla
Copy link
Contributor Author

Although it looks like I could do request.form.getlist('dag_ids') inside the actual request handlers for a regular form POST, which is straightforward. I'd just have to add the form encoding onto all of the requests. I could go either way - let me know if JSON POST or form POST is preferred.

@robinedwards
Copy link
Contributor

robinedwards commented Feb 5, 2020

There is the FormData API for doing the encoding if your interested https://developer.mozilla.org/en-US/docs/Web/API/FormData/Using_FormData_Objects

@ashb
Copy link
Member

ashb commented Feb 5, 2020

There's also a postAsForm helper method already in Airflow

function postAsForm(url, parameters) {
var form = $("<form></form>");
form.attr("method", "POST");
form.attr("action", url);
$.each(parameters || {}, function(key, value) {
var field = $('<input></input>');
field.attr("type", "hidden");
field.attr("name", key);
field.attr("value", value);
form.append(field);
});
var field = $('<input></input>');
field.attr("type", "hidden");
field.attr("name", "csrf_token");
field.attr("value", csrfToken);
form.append(field);
// The form needs to be a part of the document in order for us to be able
// to submit it.
$(document.body).append(form);
form.submit();
}
if that's of use.

@ashb
Copy link
Member

ashb commented Feb 5, 2020

And yes, using request.form.getlist() is the way to go over json

@madison-ookla
Copy link
Contributor Author

Gotcha, thanks @ashb. I'll move this over to a form POST then.

@ashb
Copy link
Member

ashb commented Feb 5, 2020

That helper method won't accept multiple fields of the same name just yet, so you might have to extend that or use something else

@madison-ookla
Copy link
Contributor Author

That helper method won't accept multiple fields of the same name just yet, so you might have to extend that or use something else

Since the request is done with d3.json I'm not sure I'd be able to use that anyway

@madison-ookla
Copy link
Contributor Author

I've converted the endpoints from JSON to form POST. It was easier than I thought!

.post("dag_ids=" + (encoded_dag_ids.join(',')), dagStatsHandler);
d3.json("{{ url_for('Airflow.task_stats') }}")
.header("X-CSRFToken", "{{ csrf_token() }}")
.post("dag_ids=" + (encoded_dag_ids.join(',')), taskStatsHandler);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this right? I would expect that this would post the encoded as a single parameter..

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably don't have to handle the encoding and building of forms ourselves:

var p = new URLSearchParams();
p.append("dag_id", "not+encoded.here");
p.append("dag_id", "second dag/id");
d3.json("/a").post(p)

And that sends this request:

POST /a HTTP/1.1
Host: 0.0.0.0:8080
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:72.0) Gecko/20100101 Firefox/72.0
Accept: application/json,*/*
Accept-Language: en-US,en;q=0.5
Accept-Encoding: gzip, deflate
Content-Type: application/x-www-form-urlencoded;charset=UTF-8
Content-Length: 48
Origin: http://0.0.0.0:8080
Connection: keep-alive
Referer: http://0.0.0.0:8080/home

dag_id=not%2Bencoded.here&dag_id=second+dag%2Fid

Could you make this change @madison-ookla ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, can do

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done! I've tested this locally and it works 👍

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ready for you now @ashb @mik-laj 😁

Copy link
Member

@ashb ashb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, looking good (and quite a short change in the end!)

@ashb ashb merged commit b738c9e into apache:master Feb 7, 2020
@madison-ookla
Copy link
Contributor Author

madison-ookla commented Feb 7, 2020

Nice, looking good (and quite a short change in the end!)

Thanks! Any chance this can be backported to the v1.10.x branch?

Edit: Nevermind, I see it's now part of the 1.10.10 fix version on Jira 🙂

@madison-ookla madison-ookla deleted the bugfix/ui-dag-stats branch February 7, 2020 16:30
galuszkak pushed a commit to FlyrInc/apache-airflow that referenced this pull request Mar 5, 2020
If the number of dags was large and/or the length of the DAG ids were too large this would exceed the maximum possible query string limit.

To work around that we have made these endpoints always make POST requests
kaxil added a commit to astronomer/airflow that referenced this pull request Mar 19, 2020
@ashb
Copy link
Member

ashb commented Apr 9, 2020

@madison-ookla And included in 1.10.10 which was just released :)

@madison-ookla
Copy link
Contributor Author

madison-ookla commented Apr 9, 2020

@madison-ookla And included in 1.10.10 which was just released :)

Thanks, very excited about this release! 😁

@ashb ashb mentioned this pull request Apr 30, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:webserver Webserver related Issues
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants