Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

header: New HeaderMatcher and StringMatcher type - Contains #12623

Merged
merged 7 commits into from
Aug 21, 2020
Merged

header: New HeaderMatcher and StringMatcher type - Contains #12623

merged 7 commits into from
Aug 21, 2020

Conversation

shivanshu21
Copy link

Fixes: 12590

Signed-off-by: Shivanshu Goswami [email protected]

For an explanation of how to fill out the fields, please see the relevant section
in PULL_REQUESTS.md

Commit Message: header: New HeaderMatcher type - Contains
Additional Description: For matching values in the header that might be somewhere in the middle of the header, the present option is to use Regex in the form .Search-Pattern.. This can cause catastrophic backtracking as described in #7728
As a solution, I have introduced another header match type called contains which is based on absl::StrContains().

Risk Level: Low
Testing: Unit tests are included and manual testing was performed.
Docs Changes:
Release Notes:
[Optional Runtime guard:]
[Optional Fixes #Issue]
[Optional Deprecated:]

@repokitteh-read-only
Copy link

CC @envoyproxy/api-shepherds: Your approval is needed for changes made to (api/envoy[\w/]*/(v1alpha\d?|v1|v2alpha\d?|v2))|(api/envoy/type/(matcher/)?\w+.proto).
CC @envoyproxy/api-shepherds: Your approval is needed for changes made to api/envoy/.
CC @envoyproxy/api-watchers: FYI only for changes made to api/envoy/.

🐱

Caused by: #12623 was opened by shivanshu21.

see: more, trace.

@shivanshu21
Copy link
Author

Tests passing in local

shigoswami@LM-SJL-11016036 ~/gitcode/envoy: /private/vbazel test -c opt //test/common/http:header_utility_test
INFO: Analyzed target //test/common/http:header_utility_test (0 packages loaded, 0 targets configured).
INFO: Found 1 test target...
Target //test/common/http:header_utility_test up-to-date:
  bazel-bin/test/common/http/header_utility_test
INFO: Elapsed time: 14.771s, Critical Path: 13.52s
INFO: 3 processes: 3 darwin-sandbox.
INFO: Build completed successfully, 4 total actions
//test/common/http:header_utility_test                                   PASSED in 0.1s

Executed 1 out of 1 test: 1 test passes.
INFO: Build completed successfully, 4 total actions
shigoswami@LM-SJL-11016036 ~/gitcode/envoy: 

Comment on lines 1571 to 1579

// If specified, header match will be performed based on whether the header value contains
// the given value or not.
// Note: empty contains match is not allowed, please use present_match instead.
//
// Examples:
//
// * The value *abcd* matches the value *xyzabcdpqr*, but not for *xyzbcdpqr*.
string contains_match = 12 [(validate.rules).string = {min_bytes: 1}];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

v2 is deprecated and freezed, no change is allowed in v2.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed v2 api changes.

@@ -1568,6 +1568,15 @@ message HeaderMatcher {
//
// * The suffix *abcd* matches the value *xyzabcd*, but not for *xyzbcd*.
string suffix_match = 10 [(validate.rules).string = {min_bytes: 1}];

// If specified, header match will be performed based on whether the header value contains
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we just want to add support general globs, rather than special casing this.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@htuch This is being added for issue opened here: #12590

This is no more a special case than suffix-match or present-match. @htuch and @mattklein123 Later today I will add the same matcher for StringMatcher as well. Please let me know if we want to proceed this way.

Is there another PR or design doc available for wildcard based matches? Regex matching with std::regex can cause catastrophic backtracking and occasional crashes. Is there such a possibility with glob/wildcard matching too? Substring matches are O(n) as they are based on KMP.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@htuch I did a little more digging. I see that the Xray tracer extension has a wildcard match:
https://github.com/envoyproxy/envoy/pull/8526/files#diff-620892d553c2be71cf26761d0c9db408R26

I am still breaking it down but right now it looks like it supports ? and * which is good enough for our case.

Do you want just these two wildcards or a full set? If you're okay with this I can refactor the code a little to reuse the same matcher.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we would want a syntax more like #7763. Maybe we should go with a simple contains for now.

@htuch
Copy link
Member

htuch commented Aug 14, 2020

@shivanshu21 one other question, do you get catastophic backtracing with RE2 for this case? Do you know what the performance penalty of using RE2 vs. a strcontains operation is here?

@shivanshu21
Copy link
Author

@shivanshu21 one other question, do you get catastophic backtracing with RE2 for this case? Do you know what the performance penalty of using RE2 vs. a strcontains operation is here?

@htuch We haven't tried with Safe Regex as the definition says this:

(type.matcher.v3.RegexMatcher) If specified, this regex string is a regular expression rule which implies the entire request header value must match the regex. The rule will not match if only a subsequence of the request header value matches the regex.

The rule will not match if only a subsequence of the header matches the regex. So looks like strContains() case won't be handled here.

We have however tried comparing deprecated regex to RE2 and RE2 outperforms significantly.

@shivanshu21
Copy link
Author

Matchers test passing in local

shigoswami@LM-SJL-11016036 ~/gitcode/envoy: bazel test -c opt //test/common/common:matchers_test
INFO: Analyzed target //test/common/common:matchers_test (0 packages loaded, 0 targets configured).
INFO: Found 1 test target...
Target //test/common/common:matchers_test up-to-date:
  bazel-bin/test/common/common/matchers_test
INFO: Elapsed time: 12.590s, Critical Path: 11.52s
INFO: 3 processes: 3 darwin-sandbox.
INFO: Build completed successfully, 4 total actions
//test/common/common:matchers_test                                       PASSED in 0.1s

Executed 1 out of 1 test: 1 test passes.
INFO: Build completed successfully, 4 total actions

@shivanshu21 shivanshu21 changed the title [WIP] header: New HeaderMatcher type - Contains header: New HeaderMatcher type - Contains Aug 14, 2020
@shivanshu21 shivanshu21 changed the title header: New HeaderMatcher type - Contains header: New HeaderMatcher and StringMatcher type - Contains Aug 14, 2020
@htuch
Copy link
Member

htuch commented Aug 18, 2020

RE2 should work, as long as you glob at the start and end, e.g. .*<string>.*.

@htuch
Copy link
Member

htuch commented Aug 18, 2020

LGTM, but just want to check with other @envoyproxy/api-shepherds that this makes sense. I think that RE2 is not the right answer for performance here, but do we want to essentially open up arbitrary string predicates as first class API elements?

@shivanshu21
Copy link
Author

Thanks! I will wait for API shepherds to respond before merging.

@shivanshu21
Copy link
Author

@lizan PTAL

@shivanshu21
Copy link
Author

@envoyproxy/api-shepherds , @lizan This PR has been awaiting review. Please take a look.

Copy link
Member

@lizan lizan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

API LGTM

@@ -100,6 +100,10 @@ bool StringMatcherImpl::match(const absl::string_view value) const {
case envoy::type::matcher::v3::StringMatcher::MatchPatternCase::kSuffix:
return matcher_.ignore_case() ? absl::EndsWithIgnoreCase(value, matcher_.suffix())
: absl::EndsWith(value, matcher_.suffix());
case envoy::type::matcher::v3::StringMatcher::MatchPatternCase::kContains:
return matcher_.ignore_case() ? absl::StrContains(absl::AsciiStrToLower(value),
absl::AsciiStrToLower(matcher_.contains()))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we cache the lowered matcher_.contains somewhere for this case?

Shivanshu Goswami added 5 commits August 19, 2020 16:57
Fixes: 12590

Signed-off-by: Shivanshu Goswami <[email protected]>
Signed-off-by: Shivanshu Goswami <[email protected]>
Signed-off-by: Shivanshu Goswami <[email protected]>
Signed-off-by: Shivanshu Goswami <[email protected]>
@shivanshu21
Copy link
Author

Matchers and Header utility tests passing

Target //test/common/common:matchers_test up-to-date:
  bazel-bin/test/common/common/matchers_test
INFO: Elapsed time: 346.251s, Critical Path: 70.60s
INFO: 359 processes: 359 darwin-sandbox.
INFO: Build completed successfully, 385 total actions
//test/common/common:matchers_test                                       PASSED in 0.2s

Executed 1 out of 1 test: 1 test passes.
INFO: Build completed successfully, 385 total actions
shigoswami@LM-SJL-11016036 ~/gitcode/envoy:
shigoswami@LM-SJL-11016036 ~/gitcode/envoy: bazel test -c opt //test/common/http:header_utility_test
INFO: Analyzed target //test/common/http:header_utility_test (1 packages loaded, 2 targets configured).
INFO: Found 1 test target...
Target //test/common/http:header_utility_test up-to-date:
  bazel-bin/test/common/http/header_utility_test
INFO: Elapsed time: 17.793s, Critical Path: 17.06s
INFO: 3 processes: 3 darwin-sandbox.
INFO: Build completed successfully, 4 total actions
//test/common/http:header_utility_test                                   PASSED in 0.1s

Executed 1 out of 1 test: 1 test passes.
INFO: Build completed successfully, 4 total actions

@lizan I have added the lowercase string to the class so that it is only computed once. PTAL.

@shivanshu21 shivanshu21 requested a review from lizan August 20, 2020 00:37
@shivanshu21
Copy link
Author

/retest

@repokitteh-read-only
Copy link

Retrying Azure Pipelines, to retry CircleCI checks, use /retest-circle.
Cannot retry non-completed check: envoy-presubmit (Linux-x64 compile_time_options), please wait.

🐱

Caused by: a #12623 (comment) was created by @shivanshu21.

see: more, trace.

Copy link
Member

@htuch htuch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM but can you also add release notes? Thanks.

@@ -13,6 +13,7 @@ Minor Behavior Changes

* compressor: always insert `Vary` headers for compressible resources even if it's decided not to compress a response due to incompatible `Accept-Encoding` value. The `Vary` header needs to be inserted to let a caching proxy in front of Envoy know that the requested resource still can be served with compression applied.
* decompressor: headers-only requests were incorrectly not advertising accept-encoding when configured to do so. This is now fixed.
* http: added a new header matcher type called ``contains`` which matches if the value of the header has the substring mentioned in contains matcher.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Usually, we link the added field here. In this case, probably:

* matcher: added :ref:`contains <envoy_v3_api_field_type.matcher.v3.StringMatcher.contains>` to ...

Copy link
Member

@dio dio Aug 20, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And, it seems like you have another one for route as well.

Signed-off-by: Shivanshu Goswami <[email protected]>
@shivanshu21
Copy link
Author

@mattklein123 How do we remove the v2 api blocker labels? Merging is blocked because of that.

@lizan and Htuch have both LGTMed and Lizan has LGTMed for API changes as well.
There are No v2 api changes in this PR.

I have addressed all the comments including adding release notes with proper links.

Copy link
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with small comment, thanks!

/wait

@@ -16,7 +16,7 @@ option (udpa.annotations.file_status).package_version_status = FROZEN;
// [#protodoc-title: String matcher]

// Specifies the way to match a string.
// [#next-free-field: 7]
// [#next-free-field: 8]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is still modifying a v2 API. Please revert this.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh! I thought all the v2 apis are in api/envoy/api/v2 folder. Reverting.

@shivanshu21
Copy link
Author

@mattklein123

Addressed the comment. Please take a look and suggest ways to unblock the merge.

@shivanshu21
Copy link
Author

Thanks Lizan! The API shepherds hold is removed.

But owing to this:
Screen Shot 2020-08-21 at 12 13 35 PM

I cannot merge till @mattklein123 clicks this:
Screen Shot 2020-08-21 at 12 13 56 PM

@htuch htuch merged commit e322daa into envoyproxy:master Aug 21, 2020
@shivanshu21 shivanshu21 deleted the contains branch August 21, 2020 20:13
lavignes added a commit to lavignes/envoy that referenced this pull request Aug 24, 2020
* envoy/master: (90 commits)
  cleanup: use structured binding (envoyproxy#12791)
  docs: fix header name for retries in gRPC services (envoyproxy#12790)
  docs: clarify meaning of HeaderValueOption.append (envoyproxy#12792)
  doc: clarify handling of duplicate xDS resource names (envoyproxy#12756)
  Dependencies: build updates. (envoyproxy#12786)
  Ratelimit: Add optional descriptor key to generic_key action (envoyproxy#12734)
  test: refactor header inclusion to speed up building (for test/mocks/upstream:upstream_mocks)  (envoyproxy#12407)
  docs: Fix omitted word (envoyproxy#12782)
  ci: avoid uploading dwp as separate artifact (envoyproxy#12777)
  doc: Fix small typos (envoyproxy#12769)
  fix cache factory category (envoyproxy#12765)
  docs: fix typo v1.15.0.rst (envoyproxy#12680)
  Add clang-cl RBE toolchain for Windows (envoyproxy#12776)
  fuzz: add router fuzz proto (envoyproxy#12727)
  header: New HeaderMatcher and StringMatcher type - Contains (envoyproxy#12623)
  tcp_proxy: use dynamicMetadata() from StreamInfo for load balancing (envoyproxy#12595)
  network: add io handle recv function for http inspector (envoyproxy#12736)
  jwt_authn: supports jwt payload without "iss" field (envoyproxy#12744)
  Add support for nested JSON format in json logging mode (envoyproxy#12602)
  http: fixing a fuzz flake by setting details on connection teardown (envoyproxy#12737)
  ...

Signed-off-by: Scott LaVigne <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants