-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
http/wsgi.py: do not unquote path before injecting into environ #1211
Conversation
Gunicorn (and other servers like Werkzeug) follows PEP 3333 it implicitly requires PATH_INFO and friends to be unquoted. If you don't unquote PATH_INFO, you are going to get broken PATH_INFO. See https://www.python.org/dev/peps/pep-3333/#url-reconstruction for the URL reconstruction algorithm:
|
@berkerpeksag yes, the problem is the "implicitly". It's not explicit, so it leaves some room for interpretation. Obviously, certain applications require to be able to distinguish escaped from non-escaped entities in the original request path. That's why mod_wsgi introduced the configuration variable Now, what would you propose? I try to summarize:
Heavily related discussions: It seems like double-encoding is the most reliable solution to that for applications. |
application that need the RAW URI can get it from the environment variable RAW_URI in gunicorn. This how it's done since awhile. I would stick that way so we keep complying strictly to the WSGI spec. Imo we should have the same behaviour in all workers. Thoughts? |
@benoitc thanks for the pointer towards
So, that PR can be closed, I guess. I'm not exactly sure what you mean with
Is that not the case as of now? Just FYI, in our application, we are now relying on clients/API consumers to replace slashes that are supposed to be transparent to the URL template engine of our WSGI application with a |
I think when @benoitc says that every worker should behave the same he's saying that the aio worker should do the same as the others. Does that require aio-libs/aiohttp#343 to happen or is gunicorn unquoting before aiohttp see the request? |
Closing per #1211 (comment) |
This is a follow-up to #930, where we decided that WSGI
environ['PATH_INFO']
should not be an unquoted (percent-decoded) path, but the undecoded (original) path. This was then fixed in aiohttp's WSGI implementation, see aio-libs/aiohttp#177 -- but it looks like we missed taking care of Gunicorn's WSGI implementation.In the current aiohttp implementation we find (https://github.com/KeepSafe/aiohttp/blob/master/aiohttp/wsgi.py):
However, in
gunicorn/http/wsgi.py
we currently still find:This leads to a class of problems where escaped and non-escaped parts are not easily distinguishable anymore on framework-level, as vividly clarified by the following issue: https://code.djangoproject.com/ticket/15718
There, the behavior is partially controllable in Apache's mod_wsgi by setting
AllowEncodedSlashes On
.There, the issue appeared with Django+mod_wsgi. In our case, the issue appeared with Falcon+Gunicorn, but factually it's exactly the same problem.
I think the most important insight is from aio-libs/aiohttp#177
So, I went ahead and simply removed the unquoting operation (exactly as done in aiohttp's wsgi.py). I ran tests against Python 3.4, and nothing broke.
This change affects three worker types (async, gthread, sync):
I think we can just merge this, mainly motivated by the fact that aiohttp uses the same method.
Still, it would be nice to now add a test that breaks with the old behavior, and passes with the new one. However, I am not warm enough with the gunicorn test structure.
And of course the question is if applications out there in the world rely on that behavior ... :/
In our application we have to be able to distinguish real slashes from escaped slashes on framework level. That is, we have to use a custom branch of Gunicorn right now -- otherwise that information is lost once requests enter framework level.