Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[wptserve] Rework header encoding/decoding in Request #13246

Merged
merged 3 commits into from
Oct 9, 2018

Conversation

Hexcles
Copy link
Member

@Hexcles Hexcles commented Sep 27, 2018

This PR relands the reverted #13248 from @Ms2ger , but with some tweaks on how to encode/decode the headers. See individual commits for details.

Fixes #13204 .

tools/wptserve/wptserve/ranges.py Outdated Show resolved Hide resolved
tools/wptserve/wptserve/request.py Outdated Show resolved Hide resolved
@@ -308,7 +308,7 @@ def POST(self):
self.raw_input.seek(0)
fs = cgi.FieldStorage(fp=self.raw_input,
environ={"REQUEST_METHOD": self.method},
headers=self.headers,
headers=self.raw_headers,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the behaviour change here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because cgi.FieldStorage needs to parse the headers and expect the values to be string not bytes. Since self.readers is read-only, I figure it would be safe to pass in the original raw_headers here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

self.raw_headers is the headers as bytes, no? What type does cgi.FieldStorage expect? str on both Py2/3, no?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

self.raw_headers is BaseHTTPRequestHandler.headers; and both its keys and values are str on both Py2/3, which is why we need to _maybe_encode them in RequestHeaders.

tools/wptserve/wptserve/request.py Outdated Show resolved Hide resolved
@Hexcles
Copy link
Member Author

Hexcles commented Sep 28, 2018

A few clarifications from the discussions on IRC regarding whether we should encode the header values back to bytes (i.e. headers = [maybeencode(items[header])]): https://w3.logbot.info/testing/20180928#c444317

First of all, regardless of the approach (whether to encode and which encoding to use), this WILL NOT affect existing tests (namely custom handlers) at all. Currently, WPT only really works with Python 2, and in Python 2 headers (both names and values) have str type, which is a "binary" type, so maybeencode is a no-op.

People reasonably want wptserve itself to work in Python 3, and this is where the change has effects. I would hope Python 3 still gives us headers in binary type (i.e. bytes), as Python 2, but Python 3 decides to decode the headers in iso-8859-1 perhaps for better compatibility. I think the sane choice here is to return bytes, which requires re-encoding, as HTTP RFC doesn't really say what encoding headers must use and encoding non-ASCII characters in iso-8859-1 is broken. The downside is that potentially more places that manipulate headers need to be changed (e.g. literal operands of string operations need to be bytes).

Now, since many custom handlers don't work in Python 3 in the first place, I'm inclined to to maybeencode the header values (i.e. return bytes in Python 3).

@Hexcles
Copy link
Member Author

Hexcles commented Sep 28, 2018

@gsnedders This PR is ready for review again. I addressed all your comments but one (the cgi.FieldStorage one; please see my reply there).

@Hexcles Hexcles requested a review from annevk September 28, 2018 20:00
@annevk
Copy link
Member

annevk commented Sep 29, 2018

and encoding non-ASCII characters in iso-8859-1 is broken

It's not? Using bytes is perhaps purer in a way, but if you make them strings that is the way to go. Anything else would lead to information loss.

@Hexcles
Copy link
Member Author

Hexcles commented Sep 29, 2018 via email

@Ms2ger
Copy link
Contributor

Ms2ger commented Oct 1, 2018

I suspect that supporting only binary_type in this API would be the most straightforward approach.

@Hexcles
Copy link
Member Author

Hexcles commented Oct 1, 2018

@Ms2ger you mean keys should also be binary type? Yeah I can do that. It might be break a few more things (in Python 3 only) but it's probably more consistent.

@gsnedders
Copy link
Member

@Ms2ger @Hexcles I have no strong preference; in principle header names are strictly ASCII only, though I don't think impls enforce that.

@Hexcles
Copy link
Member Author

Hexcles commented Oct 2, 2018

@Ms2ger @gsnedders there's one complication here. Technically, a byte string doesn't have the lower method. Yet we usually wish header names to be case-insensitive and we implement it by always converting the keys to lowercase both during storage and lookup.

@gsnedders
Copy link
Member

@Hexcles Yes it does? bytes.lower does ASCII lowercasing on both Py2/3.

@Hexcles
Copy link
Member Author

Hexcles commented Oct 2, 2018

@gsnedders duh my bad. Yeah let me update the PR.

@Hexcles Hexcles force-pushed the do-not-decode-header-values branch from 67d7b78 to a69a669 Compare October 2, 2018 17:24
@Hexcles
Copy link
Member Author

Hexcles commented Oct 2, 2018

@gsnedders @Ms2ger PTAL again

tools/wptserve/tests/functional/base.py Outdated Show resolved Hide resolved
tools/wptserve/tests/functional/test_pipes.py Outdated Show resolved Hide resolved
tools/wptserve/wptserve/pipes.py Show resolved Hide resolved
@@ -308,7 +308,7 @@ def POST(self):
self.raw_input.seek(0)
fs = cgi.FieldStorage(fp=self.raw_input,
environ={"REQUEST_METHOD": self.method},
headers=self.headers,
headers=self.raw_headers,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

self.raw_headers is the headers as bytes, no? What type does cgi.FieldStorage expect? str on both Py2/3, no?

tools/wptserve/wptserve/request.py Outdated Show resolved Hide resolved
tools/wptserve/tests/functional/test_pipes.py Show resolved Hide resolved
tools/wptserve/wptserve/pipes.py Outdated Show resolved Hide resolved
@Ms2ger Ms2ger force-pushed the do-not-decode-header-values branch from f10c76e to ce24036 Compare October 4, 2018 08:33
@andypaicu
Copy link
Contributor

Just as an fyi guys I'm also waiting on this re-land as I suspect the revert broke the "wpt / content-security-policy / generic / only-valid-whitespaces-are-allowed.html" test.

There hasn't been any movement on this for a week and I just want to make sure that this will eventually get merged.

https://wpt.fyi/results/content-security-policy/generic/only-valid-whitespaces-are-allowed.html?label=experimental

@Ms2ger Ms2ger force-pushed the do-not-decode-header-values branch from ce24036 to 26d33f0 Compare October 9, 2018 10:06
@gsnedders
Copy link
Member

@Hexcles @Ms2ger do one of you want to rewrite history to something saner?

@Ms2ger
Copy link
Contributor

Ms2ger commented Oct 9, 2018

If @Hexcles hasn't done it by tomorrow, I'll squash them all together.

The test sends a request to wptserve with non-ASCII characters in a
header and sets up a simple handler to return the value of that header.
The server shouldn't crash in either Python 2 or 3, and the response
should not be garbled. The server crashes in Python 2 (#13204).
This change changes the encoding/decoding of headers in Request and
username/password in Authentication: now all keys and values have binary
type (because of an implementation detail in the Python 3 standard
library, we actually need to re-encode the headers back to bytes in
Python 3).

Documentation and comments are also improved to clarify the encoding
situation. Also add test cases for non-ASCII characters in auth headers.
@Hexcles Hexcles force-pushed the do-not-decode-header-values branch from 26d33f0 to 887214f Compare October 9, 2018 19:50
@Hexcles
Copy link
Member Author

Hexcles commented Oct 9, 2018

Thanks for the review, @gsnedders . (The comment threads do look quite confusing...)

I just cleaned up the history and will merge the rebase-merge PR once Travis passes.

@Hexcles Hexcles merged commit 8b09ece into master Oct 9, 2018
@Hexcles Hexcles deleted the do-not-decode-header-values branch October 9, 2018 21:25
aarongable pushed a commit to chromium/chromium that referenced this pull request Oct 15, 2018
The most important two changes to roll:

* [wptserve] Rework header encoding/decoding in Request
  (web-platform-tests/wpt#13246): needed to fix
  tests with non-ASCII cookies
* Fix handling of about:blank in lint
  (web-platform-tests/wpt#13489): needed to
  unblock the importer

Bug: 895239
Change-Id: I10b308c19bdb20d2a1ec5af8dece42d7ada7e25b
Reviewed-on: https://chromium-review.googlesource.com/c/1279195
Commit-Queue: Robert Ma <[email protected]>
Reviewed-by: Philip Jägenstedt <[email protected]>
Cr-Commit-Position: refs/heads/master@{#599617}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants