Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kong proxy match order #4121

Closed
zhangxing406 opened this issue Dec 21, 2018 · 15 comments
Closed

kong proxy match order #4121

zhangxing406 opened this issue Dec 21, 2018 · 15 comments

Comments

@zhangxing406
Copy link

NOTE: GitHub issues are reserved for bug reports only. For anything else,
please join the conversation in Kong Nation https://discuss.konghq.com/.

Please read the CONTRIBUTING.md guidelines to learn on which channels you can
seek for help and ask general questions:

https://github.com/Kong/kong/blob/master/CONTRIBUTING.md#where-to-seek-for-help

Summary

SUMMARY_GOES_HERE

i configured some routes,but the true matching rule is different from the docs tells. i think docs says the matching rule is:

  1. full uri match
  2. prefix match
  3. regex match
    for example, two rules:
    a: {path:'/api/account'}
    b: {path:'/api/.*'}
    when requesting '/api/account/getAll',it matched b.

that is to say the truth is:

  1. full uri match
  2. regex match
  3. prefix match

i use the default kong config and db type is postgres. hope for your answer.

Additional Details & Logs

@RobSerafini
Copy link
Contributor

The matching can definitely be complex and counter intuitive and will depend on the exact examples you are using. Please review the following discussions and documentation and reply with the exact routes and requests you are using.

https://discuss.konghq.com/t/a-few-words-on-regex-uris-usage-and-common-pitfalls/506

and

https://docs.konghq.com/1.0.x/proxy/#matching-priorities

Regards,
Rob

@bungle bungle added the task/bug label Jan 3, 2019
@bungle
Copy link
Member

bungle commented Jan 3, 2019

I think it is a bug. I will send a PR soon.

@thibaultcha
Copy link
Member

This is not a bug, but a known missing feature. The feature would be the equivalent of the nginx location directive's ^~ modifier, which means "don't evaluate this regex if a previous prefix was already matched. See ngx_http_core_module's location directive docs.

The order of evaluation is:

  1. full paths
  2. prefix paths
  3. regex paths

But in the case of multiple matches, the priority order is:

  1. full paths
  2. regex paths
  3. prefix paths

Of course, the actual order of evaluation is determined by the number of matching properties. So a Route with a host and a path will always be matched, even if it has a prefix path while another route has a plain matching path (but no matching host, that is).

@bungle
Copy link
Member

bungle commented Jan 4, 2019

According to this:
https://docs.konghq.com/1.0.x/proxy/#evaluation-order

Prefix paths are always evaluated before regex paths.

So I guess we need to update the docs? In a meanwhile I corrected this and some other issues in path matching in #4153 (though, now I am not sure should we change it a bit?)

@bungle
Copy link
Member

bungle commented Jan 4, 2019

@thibaultcha, I don't fully understand the:

But in the case of multiple matches, the priority order is

You mean if single route has multiple paths, then we pick regex first to get the captures or something totally different?

@bungle
Copy link
Member

bungle commented Jan 4, 2019

Just a note, this very same issue has previously been discussed in #3111.

@bungle
Copy link
Member

bungle commented Jan 4, 2019

@thibaultcha you described it as:

  1. match plain prefixes by descending order of length
  2. match regexes (NGINX does so in the order they are defined in the file, we do so in creation order in the database)
  3. if a regex match was found in step 2, override the match from step 1

How is it different from:

  1. match regexes
  2. match prefixes

@thibaultcha
Copy link
Member

thibaultcha commented Jan 4, 2019

Prefix paths are always evaluated before regex paths.

This piece is correct. But regex paths are also evaluated after prefix paths and take precedence if there is a match (over the previous prefix match). That is what the ^~ modifier prevents (as discussed in #4153 - thanks for digging there btw). Yes, the docs could be clarified, definitely (as stated by my year old comment in this issue).

You mean if single route has multiple paths, then we pick regex first to get the captures or something totally different?

Given a Route with a prefix path, and another Route with a regex path (and both match the incoming request path), the latter will override the former, even though the former was evaluated first, and was a match. This is what the documentation describes, the order of evaluation is correctly mentioned, but we fail to mention that a regex match will override the prefix match. This is identical to nginx's behavior except that they have the ^~ modifier to avoid this, which we haven't implemented yet, and fell down in the priority list.

How is it different from:

If the order was what you wrote (regexes then prefixes), then it would never be possible to override a regex with a prefix match (i.e. shadow the regex). But with the current evaluation order (as implemented and correctly documented), it is possible to do so, given we implement the ^~ modifier. In practice for a user, the evaluation order appears to be the one you described, but in the implementation, it isn't. And given we implement the modifier, it will become obvious to the user as well. In short, the evaluation order is justified by the existence of the ^~ modifier, which, since it doesn't exist yet (in Kong), makes the current evaluation order seem incorrect. But it isn't.

Now on to the next steps, and implement the ^~ modifier itself: the only difficulty I see is to translate what is an nginx directive modifier to a Kong concept: is it a Route attribute? Then is it shared by all regex paths of a Route? Is it a prefix that we somehow inject before a regex path? Does it look unintuitive, or cumbersome to specify? Etc... While nginx is configured via a text file, we configure Kong via a dynamic RESTful API, and we must port such concepts intelligently.

@hishamhm
Copy link
Contributor

hishamhm commented Jan 4, 2019

@thibaultcha Thank you for the full explanation on the history of how the present status came about.

But I'll play Devil's (or User's?) Advocate a bit. Given that the difference between evaluation order and matching precedence is fully opaque to a Kong user (as it should be; only the second one matters), the users are fully justified in thinking that the "evaluation order" the documentation mentions are in fact the precedence rules (especially since the precedence rules themselves are not distinguished from that), and that either the docs or code are wrong. I'm sure that if I show our proxy docs to 10 people and ask them "given route A with /foo and route B with /f.*, which one will /foobar match?", all of them will answer "A, because prefixes are evaluated first".

If the code is not changing to match their expectations, then the docs should change.

As for the prefix-not-regex functionality, I think the Nginx evaluation order has IMO a weird design that I think we shouldn't expose or replicate (I didn't dig the full history of past Nginx releases, but the ^~ thing looks very tacked-on-after-the-fact). I think the great design questions you raised in your last paragraph on how to present this cleanly in Kong also reflect how awkward the semantics of Nginx are.

In light of all this, moving forward, I believe we should just generalize the regex_priority field into a general priority field — I think this idea has already been floated around. Ideally, it should have some very simple tie-breaking rule when priority is not set explicitly, such as "regexes matched in some stable order first, then prefixes matched longest to shortest next" (which would match our current behavior). This way, a matching rule (prefix or regex) with higher priority would stop evaluation of any rules with lower priority.

Does it look unintuitive, or cumbersome to specify?

I think the explicit priority approach would be the most intuitive, but updating priorities could get cumbersome indeed if somone is ordering each route individually with 1, 2, 3... In practice though, I could see someone using priorities in groups (e.g. setting a bunch of prefix-paths all with priority 100, then some regexes with priority 200, then a fallback / path with priority 999).

This sounds flexible enough. The Nginx docs contain this example to illustrate their semantics:

Let’s illustrate the above by an example:

 location = / {
     [ configuration A ]
 }

 location / {
     [ configuration B ]
 }

 location /documents/ {
     [ configuration C ]
 }

 location ^~ /images/ {
     [ configuration D ]
 }

 location ~* \.(gif|jpg|jpeg)$ {
     [ configuration E ]
 }

The “/” request will match configuration A, the “/index.html” request will match configuration B, the “/documents/document.html” request will match configuration C, the “/images/1.gif” request will match configuration D, and the “/documents/1.jpg” request will match configuration E.

in our case, that would be (modulo regex case-insensitivity)

[ { "paths": ["/$"],                 "priority": 100,  "service": A },
  { "paths": ["/images/"],           "priority": 100,  "service": D },
  { "paths": ["\\.(gif|jpg|jpeg)$"], "priority": 200,  "service": E },
  { "paths": ["/documents/"],        "priority": 200,  "service": C },
  { "paths": ["/"],                  "priority": 200, "service": B } ]

Presentation is central for understandability, so ideally we should list routes sorted by matching priority (including the tie-breaking logic). The example above could use 5 different priority numbers, I just used 2 to illustrate the tie-breaking.

(Side note: we could even optimize any <non-regex-chars>$ regex to translate into a exact-string-match like nginx's = modifier when we build the router, without adding an extra user-visible concept.)

@thibaultcha
Copy link
Member

thibaultcha commented Jan 4, 2019

If the code is not changing to match their expectations, then the docs should change.

Yes, I said so in my above comment: the docs could be clarified around this use-case, definitely. The code should also change, is what I was hoping was transpiring from my above comments as well: the router is still incomplete in that regard (whichever tie-breaker attribute we end up implementing).

In light of all this, moving forward, I believe we should just generalize the regex_priority field into a general priority field — I think this idea has already been floated around.

Indeed. In December, I advocated for this field to be updated to priority prior to the 1.0 release. The bigger rationale for doing so is the recent addition of the sources and destinations matching attributes, which also need a tie-breaking attribute. Therefore, regex_priority isn't only about regexes anymore. With the release of 1.0 behind us, we have no choice but to carry backwards-compatibility code with us for a long time now.

I like the idea of relying solely on the priority attribute for tie-breaking. We have the luxury of having this option, given Kong's configuration interface, vs. that of nginx's configuration files 🎉 However, doing so now may introduce breaking changes in existing routers setup in the wild. First of all, our extensive test suite may catch some of these regressions (and we'll then now to maybe not move forward). If it doesn't, I'd still consider this change risky in a 1.0 world.

@hishamhm
Copy link
Contributor

hishamhm commented Jan 7, 2019

I like the idea of relying solely on the priority attribute for tie-breaking

We always need some criteria for tie-breaking (even if it's created_at) if we want to have a stable order (in the sense of sorting algorithms).

My risk-mitigation suggestion is to have the current behavior (regexes first, prefixes long-to-short next) to be the tie-breaker for priority so we can introduce it in Kong 1.1 with a migration and no breakage. regex_priority can be deprecated and removed in 2.0.)

@thibaultcha
Copy link
Member

By “solely”, I meant without ^~, and not by changing the evaluation order of the Route’s properties.

@javierguerragiraldez
Copy link
Contributor

closing for lack of activity (and probably outdated)

@thibaultcha
Copy link
Member

Don't think this is outdated, but certainly still a missing feature. Shall we reopen but rename it maybe? There's valuable explanations in here.

@reetik-raj
Copy link

Do we have any plans of implementing this feature, or do we continue with kong overriding a prefix match with a regex if both of them match?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants