You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
For SimpleHttpOperator, if you set deferrable=True (or set in airflow.cfg [operators]default_deferrable=True), then how your data parameter is handled in POST requests changes.
I noticed that when we tried using an internal REST service; when the user-agent was python-requests, we got a 201 response; when the user-agent was aiohttp, we got a 400 response.
Reading the code in:
airflow/providers/http/hooks/http.py
the issue is in the API differences of how HttpHook uses requests.Request to post data, versus how HttpAsyncHook uses aiohttp session.post to post data.
With requests, it's using the data parameter, which is defined as:
data – the body to attach to the request. If a dictionary or list of tuples [(key, value)] is provided, form-encoding will take place.
With aiohttp, it's using the json parameter for POST requests, which is defined as:
Any json compatible python object (optional). json and data parameters could not be used at the same time.
The application we were using was sending json-encoded bytes as the data parameter to the operator. So:
in deferrable=False mode, it sends the json-encoded bytes
in deferrable=True mode, the json-encode bytes will be wrapped inside a json string
This would be why we would get a 400 Bad Request response from the REST server, the client sent json-encoded-string of json data, not json data.
If the application instead gave a Python dict as the SimpleHttpOperator data parameter:
in deferrable=False mode, it'd be form-encoded
in deferrable=True mode, it'd be json-encoded
I haven't setup an environment where I can test all of this, but this is what I see from reading the code.
What you think should happen instead
I don't know how SimpleHttpOperator can be fixed now without breaking the API of SimpleHttpOperator and it's hooks.
If people tested their SimpleHttpOperator code on Airflow 2.6.3, and got it working with deferrable set to True, going back now and fixing the Async API to match the Sync API would break their tested code.
How to reproduce
You can test both versions of the POST data APIs by using SimpleHttpOperator with both options of deferrable=False and deferrable=True.
…equests and aiohttp params (apache#37291)
Transparently pass `data` and `json` HttpOperator arguments to `requests.Request`
and `aiohttp.ClientSession.post` (and other ClientSession HTTP methods).
This is a breaking change.
gazev
added a commit
to gazev/airflow
that referenced
this issue
Mar 26, 2024
…equests and aiohttp params (apache#37291)
Transparently pass `data` and `json` HttpOperator arguments to `requests.Request`
and `aiohttp.ClientSession.post` (and other ClientSession HTTP methods).
This is a breaking change.
gazev
added a commit
to gazev/airflow
that referenced
this issue
Mar 26, 2024
…equests and aiohttp params (apache#37291)
Transparently pass `data` and `json` HttpOperator arguments to `requests.Request`
and `aiohttp.ClientSession.post` (and other ClientSession HTTP methods).
This is a breaking change.
Apache Airflow Provider(s)
http
Versions of Apache Airflow Providers
apache-airflow-providers-http 4.5.2
Apache Airflow version
2.6.3
Operating System
"Ubuntu 20.04.6 LTS"
Deployment
Google Cloud Composer
Deployment details
No response
What happened
For SimpleHttpOperator, if you set deferrable=True (or set in airflow.cfg
[operators]default_deferrable=True
), then how yourdata
parameter is handled inPOST
requests changes.I noticed that when we tried using an internal REST service; when the user-agent was python-requests, we got a 201 response; when the user-agent was aiohttp, we got a 400 response.
Reading the code in:
airflow/providers/http/hooks/http.py
the issue is in the API differences of how HttpHook uses requests.Request to post data, versus how HttpAsyncHook uses aiohttp session.post to post data.
With requests, it's using the
data
parameter, which is defined as:With aiohttp, it's using the
json
parameter for POST requests, which is defined as:The application we were using was sending json-encoded bytes as the data parameter to the operator. So:
This would be why we would get a 400 Bad Request response from the REST server, the client sent json-encoded-string of json data, not json data.
If the application instead gave a Python dict as the SimpleHttpOperator data parameter:
I haven't setup an environment where I can test all of this, but this is what I see from reading the code.
What you think should happen instead
I don't know how SimpleHttpOperator can be fixed now without breaking the API of SimpleHttpOperator and it's hooks.
If people tested their SimpleHttpOperator code on Airflow 2.6.3, and got it working with deferrable set to True, going back now and fixing the Async API to match the Sync API would break their tested code.
How to reproduce
You can test both versions of the POST data APIs by using SimpleHttpOperator with both options of deferrable=False and deferrable=True.
Anything else
No response
Are you willing to submit PR?
Code of Conduct
The text was updated successfully, but these errors were encountered: