Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

URL: percent-encoding test framework basics #26317

Merged
merged 6 commits into from
Nov 3, 2020
Merged

URL: percent-encoding test framework basics #26317

merged 6 commits into from
Nov 3, 2020

Conversation

annevk
Copy link
Member

@annevk annevk commented Oct 28, 2020

@domenic @rmisev @achristensen07 thoughts? (There are many ways to expand this to also cover form submission and such, but I thought I'd start out simple to ensure we agree on a common ground.)

Copy link
Member

@domenic domenic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excited to see further work on data-driven tests!!

url/resources/percent-encoding.json Show resolved Hide resolved
url/percent-encoding.window.js Outdated Show resolved Hide resolved
value = request.GET.first(b"value")
encoding = request.GET.first(b"encoding")

output_value = numeric_references(unquote(value).decode(b"utf-8"))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I worry a bit about all the steps of smuggling the value through URL parameters. If a browser doesn't do percent-encoding correctly, then probably it will fail to get the correct output_value, messing up the test.

I guess the assumption is that the browser does utf-8 percent encoding correctly?

If you wanted to avoid that assumption, you could use another encoding scheme, e.g. base64 encoding or maybe JSON.stringify.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that might be a good improvement here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't seem like the output of JSON.stringify is always ASCII so maybe I'll leave this as-is for now.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

base64 with atob/btoa and Python's equivalent seems like it'd work nicely?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does that work given the non-ASCII input?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, right, it's indeed a bit more complicated. https://github.com/jsdom/whatwg-url/blob/d5f65c2bef5d88f873203d33289ddfe13f2eee1f/live-viewer/live-viewer.js#L135-L150 is what I was thinking, but perhaps it's not worth it, especially given that you'd have to implement it in both Python and JS.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made this work, but it took longer than expected due to some weirdness from WPT.

@annevk
Copy link
Member Author

annevk commented Oct 30, 2020

@annevk annevk requested a review from domenic October 30, 2020 11:30
@wpt-pr-bot wpt-pr-bot temporarily deployed to wpt-preview-26317 October 30, 2020 11:32 Inactive
Copy link
Member

@domenic domenic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting that every browser seems to fail some of these... but different ones.

value = request.GET.first(b"value")
encoding = request.GET.first(b"encoding")

output_value = numeric_references(unquote(value).decode(b"utf-8"))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

base64 with atob/btoa and Python's equivalent seems like it'd work nicely?

annevk added a commit to whatwg/url that referenced this pull request Nov 2, 2020
Since the ISO-2022-JP encoder is stateful, percent-encoding needs to hold onto an instance of the encoder and manually perform error handling. This also requires the input to be the full string rather than individual code points as otherwise the callers of percent-encoding would need to be aware of this too. (As UTF-8 encoding cannot fail this problem does not affect those endpoints.)

Builds on this Encoding PR: whatwg/encoding#238.

Tests: web-platform-tests/wpt#26158 and web-platform-tests/wpt#26317.

Fixes #557.
annevk added a commit to whatwg/url that referenced this pull request Nov 2, 2020
Since the ISO-2022-JP encoder is stateful, percent-encoding needs to hold onto an instance of the encoder and manually perform error handling. This also requires the input to be the full string rather than individual code points as otherwise the callers of percent-encoding would need to be aware of this too. (As UTF-8 encoding cannot fail this problem does not affect those endpoints.)

Builds on this Encoding PR: whatwg/encoding#238.

Tests: web-platform-tests/wpt#26158 and web-platform-tests/wpt#26317.

Fixes #557.
@wpt-pr-bot wpt-pr-bot temporarily deployed to wpt-preview-26317 November 3, 2020 10:56 Inactive
@andreubotella
Copy link
Member

This test case should probably be added as well, since the bug that makes Firefox fail on "−" with Shift_JIS is more general than that:

{
  "input": "á|",
  "output": {
    "windows-1252": "%E1|",
    "utf-8": "%C3%A1|"
  }
}

@wpt-pr-bot wpt-pr-bot temporarily deployed to wpt-preview-26317 November 3, 2020 11:20 Inactive
"windows-1252": "%86"
}
},
"This uses a trailing A to prevent the URL parser from trimming the C0 control.",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like in to-ascii.json we had a "comment" field inside each test case, which we could use for single-test case comments. Might be worth reusing here. (whatwg-url creates the test case description by appending the comment, if it exists.)

@domenic
Copy link
Member

domenic commented Nov 3, 2020

Still LGTM. I incorporated these into whatwg-url, although as discussed previously whatwg-url doesn't do non-UTF-8 encodings, so it's not as helpful as one might like. Still, it gives a basic sanity check, I guess. jsdom/whatwg-url#174

@wpt-pr-bot wpt-pr-bot temporarily deployed to wpt-preview-26317 November 3, 2020 15:20 Inactive
@annevk annevk merged commit 09d8830 into master Nov 3, 2020
@annevk annevk deleted the annevk/pe branch November 3, 2020 16:00
watilde added a commit to watilde/node that referenced this pull request Nov 8, 2020
watilde added a commit to nodejs/node that referenced this pull request Nov 19, 2020
Refs: web-platform-tests/wpt#26317

PR-URL: #36032
Reviewed-By: Rich Trott <[email protected]>
Reviewed-By: Michaël Zasso <[email protected]>
Reviewed-By: Joyee Cheung <[email protected]>
codebytere pushed a commit to nodejs/node that referenced this pull request Nov 22, 2020
Refs: web-platform-tests/wpt#26317

PR-URL: #36032
Reviewed-By: Rich Trott <[email protected]>
Reviewed-By: Michaël Zasso <[email protected]>
Reviewed-By: Joyee Cheung <[email protected]>
domenic added a commit to jsdom/whatwg-url that referenced this pull request Feb 23, 2021
domenic added a commit to jsdom/whatwg-url that referenced this pull request Feb 23, 2021
domenic added a commit to jsdom/whatwg-url that referenced this pull request Feb 23, 2021
domenic added a commit to jsdom/whatwg-url that referenced this pull request Feb 23, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants