fix(tools.uri) normalization decodes as much as possible #8140

bungle · 2021-12-02T11:06:18Z

We decide to let normalize function to decode URL-encoded string as much as possible.

PLEASE REFERER TO: #8140 (comment)

Issues resolved

Outdated discussion:

This is alternative to PR #8139 where we actually fix the normalization function to not do excessive percent-decoding on normalization.

When we added normalization kong.tools.uri.normalize, that function does percent-decoding on everything, except for the reserved characters.

That means that we basically percent-decode more than just the ranges of ALPHA (%41–%5A and %61–%7A), DIGIT (%30–%39), hyphen (%2D), period (%2E), underscore (%5F), or tilde (%7E). (so called Unreserved Characters)

Alternative Implementation: See #8139

flrgh · 2021-12-04T01:00:18Z

Hmm. RFCs are always a chore to interpret. It seems like the original mistake was that normalization was implemented with the assumption that unreserved chars == (all chars - reserved chars), which, as you know, turns out to be incorrect (according to the RFC).

IMO instead of over-decoding and re-encoding we should just fix the logic to be more selective in our decoding, according to the chars in section 2.3:

unreserved = ALPHA / DIGIT / "-" / "." / "_" / "~"

I put a draft/poc of this in another branch (fix/uri-normalize-unreserved 6313dd2). Maybe take a look and tell me what you think?

bungle · 2021-12-07T16:29:17Z

Hmm. RFCs are always a chore to interpret. It seems like the original mistake was that normalization was implemented with the assumption that unreserved chars == (all chars - reserved chars), which, as you know, turns out to be incorrect (according to the RFC).

IMO instead of over-decoding and re-encoding we should just fix the logic to be more selective in our decoding, according to the chars in section 2.3:

unreserved = ALPHA / DIGIT / "-" / "." / "_" / "~"

I put a draft/poc of this in another branch (fix/uri-normalize-unreserved 6313dd2). Maybe take a look and tell me what you think?

@flrgh yes, we discussed this with @dndx. The thing is that I am not sure which one is better. More selective percent decoding or not. I am not sure can there ever be things already percent decoded in ngx.var.request_uri or not. Do you? If there could be, then those need to be normalized back to percent encoded form. Similar question goes to route.paths. Also as this is kong.tools.uri.normalize, it should support normalizing this: "/ä/a/%2e./a%2E%5f%99%af" should produce "/%C3%A4/a._%99%AF" which is done decoding, process dots, encoding approach I took here (it is difficult to do in one pass). The performance of this PR should be same as before, aka it does not do more work than before. We have already seen usages on other PRs, like yours: https://github.com/Kong/kong/pull/8129/files (if this is about to be tool for normalization it needs to do correct thing).

Also the escaping includes % in non-escape chars which is rather questionable:
https://github.com/Kong/kong/blob/master/kong/tools/uri.lua#L38

kikito · 2021-12-09T17:00:58Z

I have spoken with @bungle and for now we're removing this from the 2.7 milestone

bungle · 2022-01-18T13:11:41Z

If someone wants to work or try different approaches, here is a list that need to be taken in account:

definitions:
reserved chars: ! * ' ( ) ; : @ & = + $ , / ? % # [ ]
unreserved: - _ . ~
alphanumeric: a b c d e f g h i j k l m n o p q r s t u v w x y z A B C D E F G H I J K L M N O P Q R S T U V W X Y Z 0 1 2 3 4 5 6 7 8 9
other: * (meaning any char that is not mentioned above)

dot processing:
/. AND /./
/.. AND /../
/%2e AND /%2e/
/.%2e AND /.%2e/
/.%2E AND /.%2E/
/%2e%2E AND /%2e%2E/
etc.

merge slashes (this is generally thought to be good thing to do, but it might change semantics in some cases):
//
/a//b
etc.

given this:
/ä/a/%2e./a%2E//a/%2e/./a/../a/%2e%2E/%5f%99%af%2f%2F

should normalize (I think) to this:
/%C3%A4/a./a/_%99%AF%2F%2F

What happens there?
ä -> being percent encoded to %C3%A4
%2e -> being uppercased to %2E
// -> slashes being merged
%5f -> being percent decoded to _
/../ -> dot segment being processed
/%2e/ -> dot segment being processed
/a%2E/ -> dot being percent decoded to /a./

In general:

unreserved and alphanumeric should be percent decoded
reserved should be kept as is (no percent decoding, no percent encoding) BUT percent encoded forms should be turned to uppercase
rest should be percent encoded

Then you need to implement dot processing.

Current issue with kong.tools.uri.normalize (as a generic tool):

it over percent decodes (basically it percent decodes everything)
it does not percent encode

kong/tools/uri.lua

It's not a very reliable and sound way to support percent-encoding in regex. We choose to tell users that we have a normalized (standard) form to match with so there's no ambiguity. #8140 (comment) fix CT-344

bungle

I looked at it, and it looks good to me. @flrgh, do you agree?

kong/tools/uri.lua

flrgh

Code-wise, I reviewed and left a couple nitpicks. The table.new thing should probably be fixed, but the other comment on chars_to_decode is mostly just a readability gripe, so not a blocker if anyone disagrees with me about it.

Behavior-wise, there's been a whole bunch of activity on the path handling/normalization discussion since I last participated, so as long as everyone else here is in agreement about this change, it gets my 👍.

This is alternative to PR #8139 where we actually fix the normalization function to not do excessive percent-decoding on normalization.

We decided to decode "others" like the unpreserved ones. Therefore we have better interface for regex.

bungle added the pr/discussion This PR is being debated. Probably just a few details. label Dec 2, 2021

bungle requested a review from dndx December 2, 2021 11:06

bungle mentioned this pull request Dec 2, 2021

feat(pdk) add kong.request.get_normalized_path #8129

Closed

bungle force-pushed the fix/uri-normalize branch from 0433fe2 to 6769e6c Compare December 2, 2021 11:08

bungle requested a review from javierguerragiraldez December 2, 2021 11:08

bungle mentioned this pull request Dec 2, 2021

fix(router) properly escape uri captures #8139

Closed

bungle force-pushed the fix/uri-normalize branch 3 times, most recently from 3fc077f to a9c33a4 Compare December 2, 2021 15:42

bungle added core/router pr/please review and removed pr/discussion This PR is being debated. Probably just a few details. labels Dec 2, 2021

bungle added this to the 2.7 milestone Dec 3, 2021

bungle force-pushed the fix/uri-normalize branch from a9c33a4 to dbe121f Compare December 7, 2021 17:31

kikito removed this from the 2.7 milestone Dec 9, 2021

bungle force-pushed the fix/uri-normalize branch from dbe121f to 2ccc284 Compare December 20, 2021 11:22

bungle force-pushed the fix/uri-normalize branch from 2ccc284 to e1f9ac7 Compare January 3, 2022 13:03

bungle force-pushed the fix/uri-normalize branch from e1f9ac7 to 2e5533a Compare January 18, 2022 13:27

dndx force-pushed the fix/uri-normalize branch from 2e5533a to e1acee1 Compare January 24, 2022 07:39

github-actions bot added core/proxy and removed core/router labels Jan 24, 2022

dndx force-pushed the fix/uri-normalize branch 2 times, most recently from da9e1c0 to adc751d Compare January 24, 2022 07:57

dndx requested a review from flrgh January 24, 2022 08:04

kikito previously approved these changes Jan 25, 2022

View reviewed changes

kikito mentioned this pull request Jan 26, 2022

IP normalization issues from #3679 #4311

Closed

StarlightIbuki removed the pr/do not merge label Jul 6, 2022

dndx reviewed Jul 6, 2022

View reviewed changes

kong/tools/uri.lua Outdated Show resolved Hide resolved

kong/tools/uri.lua Outdated Show resolved Hide resolved

kong/tools/uri.lua Outdated Show resolved Hide resolved

kong/tools/uri.lua Outdated Show resolved Hide resolved

kong/tools/uri.lua Outdated Show resolved Hide resolved

StarlightIbuki requested a review from dndx July 7, 2022 02:49

StarlightIbuki self-requested a review July 12, 2022 03:31

chronolaw approved these changes Jul 14, 2022

View reviewed changes

StarlightIbuki force-pushed the fix/uri-normalize branch 2 times, most recently from e63b391 to e352d77 Compare July 18, 2022 08:34

bungle commented Jul 19, 2022

View reviewed changes

StarlightIbuki changed the title ~~fix(tools.uri) normalization function excessive percent decoding~~ fix(tools.uri) normalization decodes as much as possible Jul 19, 2022

StarlightIbuki approved these changes Jul 19, 2022

View reviewed changes

flrgh reviewed Jul 21, 2022

View reviewed changes

kong/tools/uri.lua Outdated Show resolved Hide resolved

flrgh reviewed Jul 21, 2022

View reviewed changes

kong/tools/uri.lua Show resolved Hide resolved

flrgh reviewed Jul 21, 2022

View reviewed changes

bungle and others added 11 commits July 22, 2022 10:37

fix(tools.uri) normalization function excessive percent decoding

ef76b1e

This is alternative to PR #8139 where we actually fix the normalization function to not do excessive percent-decoding on normalization.

do not decode non-unreserved characters

3744780

add tests for %20

547b3f3

fix(utils) escape should return only

358fd27

fix(core) handling of others for normalization

f88a7d2

We decided to decode "others" like the unpreserved ones. Therefore we have better interface for regex.

restore the name of the tests

3bb6fcf

style

4d1be6d

apply suggestions

decee96

adapt to new api

ce57c70

fix test

d1e58a7

apply suggestions

0134e72

StarlightIbuki force-pushed the fix/uri-normalize branch from e352d77 to 0134e72 Compare July 22, 2022 02:53

fffonion merged commit b7c082e into master Jul 26, 2022

fffonion deleted the fix/uri-normalize branch July 26, 2022 03:58

kikito mentioned this pull request Feb 9, 2023

feat(router) add decode_uri_captures parameter #8064

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(tools.uri) normalization decodes as much as possible #8140

fix(tools.uri) normalization decodes as much as possible #8140

bungle commented Dec 2, 2021 •

edited by StarlightIbuki

Loading

flrgh commented Dec 4, 2021

bungle commented Dec 7, 2021 •

edited

Loading

kikito commented Dec 9, 2021

bungle commented Jan 18, 2022 •

edited

Loading

bungle left a comment

flrgh left a comment

fix(tools.uri) normalization decodes as much as possible #8140

fix(tools.uri) normalization decodes as much as possible #8140

Conversation

bungle commented Dec 2, 2021 • edited by StarlightIbuki Loading

Issues resolved

flrgh commented Dec 4, 2021

bungle commented Dec 7, 2021 • edited Loading

kikito commented Dec 9, 2021

bungle commented Jan 18, 2022 • edited Loading

bungle left a comment

Choose a reason for hiding this comment

flrgh left a comment

Choose a reason for hiding this comment

bungle commented Dec 2, 2021 •

edited by StarlightIbuki

Loading

bungle commented Dec 7, 2021 •

edited

Loading

bungle commented Jan 18, 2022 •

edited

Loading