-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
URL: percent-encoding test framework basics #26317
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Excited to see further work on data-driven tests!!
url/resources/percent-encoding.py
Outdated
value = request.GET.first(b"value") | ||
encoding = request.GET.first(b"encoding") | ||
|
||
output_value = numeric_references(unquote(value).decode(b"utf-8")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I worry a bit about all the steps of smuggling the value through URL parameters. If a browser doesn't do percent-encoding correctly, then probably it will fail to get the correct output_value
, messing up the test.
I guess the assumption is that the browser does utf-8 percent encoding correctly?
If you wanted to avoid that assumption, you could use another encoding scheme, e.g. base64 encoding or maybe JSON.stringify
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that might be a good improvement here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't seem like the output of JSON.stringify
is always ASCII so maybe I'll leave this as-is for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
base64 with atob/btoa and Python's equivalent seems like it'd work nicely?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does that work given the non-ASCII input?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, right, it's indeed a bit more complicated. https://github.com/jsdom/whatwg-url/blob/d5f65c2bef5d88f873203d33289ddfe13f2eee1f/live-viewer/live-viewer.js#L135-L150 is what I was thinking, but perhaps it's not worth it, especially given that you'd have to implement it in both Python and JS.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made this work, but it took longer than expected due to some weirdness from WPT.
I filed https://bugs.webkit.org/show_bug.cgi?id=218380 and https://bugzilla.mozilla.org/show_bug.cgi?id=1674369 while playing around a bit more. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting that every browser seems to fail some of these... but different ones.
url/resources/percent-encoding.py
Outdated
value = request.GET.first(b"value") | ||
encoding = request.GET.first(b"encoding") | ||
|
||
output_value = numeric_references(unquote(value).decode(b"utf-8")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
base64 with atob/btoa and Python's equivalent seems like it'd work nicely?
Since the ISO-2022-JP encoder is stateful, percent-encoding needs to hold onto an instance of the encoder and manually perform error handling. This also requires the input to be the full string rather than individual code points as otherwise the callers of percent-encoding would need to be aware of this too. (As UTF-8 encoding cannot fail this problem does not affect those endpoints.) Builds on this Encoding PR: whatwg/encoding#238. Tests: web-platform-tests/wpt#26158 and web-platform-tests/wpt#26317. Fixes #557.
Since the ISO-2022-JP encoder is stateful, percent-encoding needs to hold onto an instance of the encoder and manually perform error handling. This also requires the input to be the full string rather than individual code points as otherwise the callers of percent-encoding would need to be aware of this too. (As UTF-8 encoding cannot fail this problem does not affect those endpoints.) Builds on this Encoding PR: whatwg/encoding#238. Tests: web-platform-tests/wpt#26158 and web-platform-tests/wpt#26317. Fixes #557.
This test case should probably be added as well, since the bug that makes Firefox fail on "−" with Shift_JIS is more general than that:
|
"windows-1252": "%86" | ||
} | ||
}, | ||
"This uses a trailing A to prevent the URL parser from trimming the C0 control.", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like in to-ascii.json we had a "comment" field inside each test case, which we could use for single-test case comments. Might be worth reusing here. (whatwg-url creates the test case description by appending the comment, if it exists.)
Still LGTM. I incorporated these into whatwg-url, although as discussed previously whatwg-url doesn't do non-UTF-8 encodings, so it's not as helpful as one might like. Still, it gives a basic sanity check, I guess. jsdom/whatwg-url#174 |
Refs: web-platform-tests/wpt#26317 PR-URL: #36032 Reviewed-By: Rich Trott <[email protected]> Reviewed-By: Michaël Zasso <[email protected]> Reviewed-By: Joyee Cheung <[email protected]>
Refs: web-platform-tests/wpt#26317 PR-URL: #36032 Reviewed-By: Rich Trott <[email protected]> Reviewed-By: Michaël Zasso <[email protected]> Reviewed-By: Joyee Cheung <[email protected]>
@domenic @rmisev @achristensen07 thoughts? (There are many ways to expand this to also cover form submission and such, but I thought I'd start out simple to ensure we agree on a common ground.)