-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(tools.uri) normalization decodes as much as possible #8140
Conversation
0433fe2
to
6769e6c
Compare
3fc077f
to
a9c33a4
Compare
Hmm. RFCs are always a chore to interpret. It seems like the original mistake was that normalization was implemented with the assumption that IMO instead of over-decoding and re-encoding we should just fix the logic to be more selective in our decoding, according to the chars in section 2.3:
I put a draft/poc of this in another branch (fix/uri-normalize-unreserved 6313dd2). Maybe take a look and tell me what you think? |
@flrgh yes, we discussed this with @dndx. The thing is that I am not sure which one is better. More selective percent decoding or not. I am not sure can there ever be things already percent decoded in ngx.var.request_uri or not. Do you? If there could be, then those need to be normalized back to percent encoded form. Similar question goes to route.paths. Also as this is Also the escaping includes |
a9c33a4
to
dbe121f
Compare
I have spoken with @bungle and for now we're removing this from the 2.7 milestone |
dbe121f
to
2ccc284
Compare
2ccc284
to
e1f9ac7
Compare
If someone wants to work or try different approaches, here is a list that need to be taken in account: definitions: dot processing: merge slashes (this is generally thought to be good thing to do, but it might change semantics in some cases): given this: should normalize (I think) to this: What happens there? In general:
Then you need to implement dot processing. Current issue with
|
e1f9ac7
to
2e5533a
Compare
2e5533a
to
e1acee1
Compare
da9e1c0
to
adc751d
Compare
It's not a very reliable and sound way to support percent-encoding in regex. We choose to tell users that we have a normalized (standard) form to match with so there's no ambiguity. #8140 (comment) fix CT-344
e63b391
to
e352d77
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I looked at it, and it looks good to me. @flrgh, do you agree?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code-wise, I reviewed and left a couple nitpicks. The table.new
thing should probably be fixed, but the other comment on chars_to_decode
is mostly just a readability gripe, so not a blocker if anyone disagrees with me about it.
Behavior-wise, there's been a whole bunch of activity on the path handling/normalization discussion since I last participated, so as long as everyone else here is in agreement about this change, it gets my 👍.
This is alternative to PR #8139 where we actually fix the normalization function to not do excessive percent-decoding on normalization.
We decided to decode "others" like the unpreserved ones. Therefore we have better interface for regex.
e352d77
to
0134e72
Compare
We decide to let
normalize
function to decode URL-encoded string as much as possible.PLEASE REFERER TO: #8140 (comment)
Issues resolved
Fix #7913, FTI-2904
Outdated discussion:
This is alternative to PR #8139 where we actually fix the normalization function to not do excessive percent-decoding on normalization.
When we added normalization kong.tools.uri.normalize, that function does percent-decoding on everything, except for the reserved characters.
That means that we basically percent-decode more than just the ranges of ALPHA (%41–%5A and %61–%7A), DIGIT (%30–%39), hyphen (%2D), period (%2E), underscore (%5F), or tilde (%7E). (so called Unreserved Characters)
Alternative Implementation: See #8139