Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SimpleHttpOperator changes POST data API if you change deferrable option #37291

Closed
1 of 2 tasks
dlesco opened this issue Feb 9, 2024 · 0 comments · Fixed by #37293
Closed
1 of 2 tasks

SimpleHttpOperator changes POST data API if you change deferrable option #37291

dlesco opened this issue Feb 9, 2024 · 0 comments · Fixed by #37293

Comments

@dlesco
Copy link

dlesco commented Feb 9, 2024

Apache Airflow Provider(s)

http

Versions of Apache Airflow Providers

apache-airflow-providers-http 4.5.2

Apache Airflow version

2.6.3

Operating System

"Ubuntu 20.04.6 LTS"

Deployment

Google Cloud Composer

Deployment details

No response

What happened

For SimpleHttpOperator, if you set deferrable=True (or set in airflow.cfg [operators]default_deferrable=True), then how your data parameter is handled in POST requests changes.

I noticed that when we tried using an internal REST service; when the user-agent was python-requests, we got a 201 response; when the user-agent was aiohttp, we got a 400 response.

Reading the code in:
airflow/providers/http/hooks/http.py

the issue is in the API differences of how HttpHook uses requests.Request to post data, versus how HttpAsyncHook uses aiohttp session.post to post data.

With requests, it's using the data parameter, which is defined as:

data – the body to attach to the request. If a dictionary or list of tuples [(key, value)] is provided, form-encoding will take place.

With aiohttp, it's using the json parameter for POST requests, which is defined as:

Any json compatible python object (optional). json and data parameters could not be used at the same time.

The application we were using was sending json-encoded bytes as the data parameter to the operator. So:

  • in deferrable=False mode, it sends the json-encoded bytes
  • in deferrable=True mode, the json-encode bytes will be wrapped inside a json string

This would be why we would get a 400 Bad Request response from the REST server, the client sent json-encoded-string of json data, not json data.

If the application instead gave a Python dict as the SimpleHttpOperator data parameter:

  • in deferrable=False mode, it'd be form-encoded
  • in deferrable=True mode, it'd be json-encoded

I haven't setup an environment where I can test all of this, but this is what I see from reading the code.

What you think should happen instead

I don't know how SimpleHttpOperator can be fixed now without breaking the API of SimpleHttpOperator and it's hooks.

If people tested their SimpleHttpOperator code on Airflow 2.6.3, and got it working with deferrable set to True, going back now and fixing the Async API to match the Sync API would break their tested code.

How to reproduce

You can test both versions of the POST data APIs by using SimpleHttpOperator with both options of deferrable=False and deferrable=True.

Anything else

No response

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

@dlesco dlesco added area:providers kind:bug This is a clearly a bug needs-triage label for new issues that we didn't triage yet labels Feb 9, 2024
@hussein-awala hussein-awala added provider:http and removed needs-triage label for new issues that we didn't triage yet labels Feb 9, 2024
@hussein-awala hussein-awala self-assigned this Feb 9, 2024
gazev added a commit to gazev/airflow that referenced this issue Mar 26, 2024
…equests and aiohttp params (apache#37291)

Transparently pass `data` and `json` HttpOperator arguments to `requests.Request`
and `aiohttp.ClientSession.post` (and other ClientSession HTTP methods).

This is a breaking  change.
gazev added a commit to gazev/airflow that referenced this issue Mar 26, 2024
…equests and aiohttp params (apache#37291)

Transparently pass `data` and `json` HttpOperator arguments to `requests.Request`
and `aiohttp.ClientSession.post` (and other ClientSession HTTP methods).

This is a breaking  change.
gazev added a commit to gazev/airflow that referenced this issue Mar 26, 2024
…equests and aiohttp params (apache#37291)

Transparently pass `data` and `json` HttpOperator arguments to `requests.Request`
and `aiohttp.ClientSession.post` (and other ClientSession HTTP methods).

This is a breaking  change.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants