Optimize queries using regex matchers for set lookups #602

naivewong · 2019-05-14T04:24:54Z

fixes prometheus/prometheus#2651

Signed-off-by: naivewong <[email protected]>

brian-brazil · 2019-05-14T08:35:47Z

You need to check for and exclude the wrapper we put around the regex the user specifies.

naivewong · 2019-05-14T08:44:12Z

Oh yes, I forgot to exclude the wrapper. But sorry what did you mean by "check for"?

naivewong · 2019-05-14T08:47:58Z

Is that we exclude the wrapper only if we find it?

brian-brazil · 2019-05-14T08:53:41Z

If the wrapper is missing, treat it as a regex.

Signed-off-by: naivewong <[email protected]>

brian-brazil · 2019-05-14T11:16:54Z

That looks about right. Can you add it to the postings benchmark too?

I'd be interested to know if there's a point where it's better to use the regex rather than the set.

gouthamve · 2019-05-14T12:20:21Z

I'd be interested to know if there's a point where it's better to use the regex rather than the set.

I don't think that'll ever be the case, as with regex, first we'll get all the matching label values (the set) and doing what the set matcher does anyways. With set, we'll be avoiding a step.

brian-brazil · 2019-05-14T12:33:58Z

At some point, us doing parsing of the strings might be slower than the regex. Would be good to check one way or the other.

bboreham · 2019-05-14T12:58:15Z

What's the motivation for only allowing escaping of special characters? What if I want to match \t or \n in a label value?

codesome · 2019-05-14T13:05:42Z

Adding to @bboreham's comment, isn't it enough to just include any character that is escaped rather than checking for special character or '\\'?

Signed-off-by: naivewong <[email protected]>

naivewong · 2019-05-15T03:38:00Z

Benchmark

BenchmarkSetMatcher/SetMatch,nSeries=15,pattern="^(?:1|2|3)$"-8         	  300000	      4579 ns/op	    3425 B/op	      59 allocs/op
BenchmarkSetMatcher/RegexMatch,nSeries=15,pattern="1|2|3"-8             	  200000	      6349 ns/op	    2836 B/op	      64 allocs/op
BenchmarkSetMatcher/SetMatch,nSeries=15,pattern="^(?:1|2|3|4|5|6|7|8|9|10)$"-8         	  200000	      8249 ns/op	    5443 B/op	      96 allocs/op
BenchmarkSetMatcher/RegexMatch,nSeries=15,pattern="1|2|3|4|5|6|7|8|9|10"-8             	  200000	      8844 ns/op	    4208 B/op	      81 allocs/op
BenchmarkSetMatcher/SetMatch,nSeries=200,pattern="^(?:1|2|3)$"-8                       	  300000	      5011 ns/op	    3425 B/op	      59 allocs/op
BenchmarkSetMatcher/RegexMatch,nSeries=200,pattern="1|2|3"-8                           	   20000	     70441 ns/op	   31928 B/op	     545 allocs/op
BenchmarkSetMatcher/SetMatch,nSeries=200,pattern="^(?:1|2|3|4|5|6|7|8|9|10)$"-8        	  200000	      9755 ns/op	    5546 B/op	      96 allocs/op
BenchmarkSetMatcher/RegexMatch,nSeries=200,pattern="1|2|3|4|5|6|7|8|9|10"-8            	   20000	     83923 ns/op	   35351 B/op	     644 allocs/op

codesome

Can you also add test case for >1 blocks and get the benchmark results, which your benchmark already allows? One such test case should be enough.

querier_test.go

Signed-off-by: naivewong <[email protected]>

naivewong · 2019-05-15T08:05:32Z

benchmark                                                                   old ns/op     new ns/op     delta
BenchmarkSetMatcher/nSeries=15,pattern="^(?:1|2|3)$"-8                      6512          4461          -31.50%
BenchmarkSetMatcher/nSeries=15,pattern="^(?:1|2|3|4|5|6|7|8|9|10)$"-8       10420         8136          -21.92%
BenchmarkSetMatcher/nSeries=1000,pattern="^(?:1|2|3)$"-8                    346047        234610        -32.20%
BenchmarkSetMatcher/nSeries=1000,pattern="^(?:1|2|3|4|5|6|7|8|9|10)$"-8     429347        273327        -36.34%

benchmark                                                                   old allocs     new allocs     delta
BenchmarkSetMatcher/nSeries=15,pattern="^(?:1|2|3)$"-8                      72             59             -18.06%
BenchmarkSetMatcher/nSeries=15,pattern="^(?:1|2|3|4|5|6|7|8|9|10)$"-8       95             96             +1.05%
BenchmarkSetMatcher/nSeries=1000,pattern="^(?:1|2|3)$"-8                    2290           1330           -41.92%
BenchmarkSetMatcher/nSeries=1000,pattern="^(?:1|2|3|4|5|6|7|8|9|10)$"-8     2655           1975           -25.61%

benchmark                                                                   old bytes     new bytes     delta
BenchmarkSetMatcher/nSeries=15,pattern="^(?:1|2|3)$"-8                      3657          3425          -6.34%
BenchmarkSetMatcher/nSeries=15,pattern="^(?:1|2|3|4|5|6|7|8|9|10)$"-8       5478          5445          -0.60%
BenchmarkSetMatcher/nSeries=1000,pattern="^(?:1|2|3)$"-8                    105927        90080         -14.96%
BenchmarkSetMatcher/nSeries=1000,pattern="^(?:1|2|3|4|5|6|7|8|9|10)$"-8     130824        118979        -9.05%

Signed-off-by: naivewong <[email protected]>

brian-brazil · 2019-05-15T08:16:13Z

What about 100k to 1M series? It's the bigger data sizes where the wins are, what you've tested are already fast.

gouthamve · 2019-05-15T08:18:31Z

Also a case where there is only one value for the label, but 15 options in the set matcher. Maybe that will tell us if the set matcher is always faster or not.

Signed-off-by: naivewong <[email protected]>

naivewong · 2019-05-15T12:27:01Z

benchmark                                                                               old ns/op     new ns/op     delta
BenchmarkSetMatcher/nSeries=1,nBlocks=1,pattern="^(?:1|2|3|4|5|6|7|8|9|10)$"-8          5937          8446          +42.26%
BenchmarkSetMatcher/nSeries=15,nBlocks=1,pattern="^(?:1|2|3|4|5|6|7|8|9|10)$"-8         12308         8428          -31.52%
BenchmarkSetMatcher/nSeries=15,nBlocks=1,pattern="^(?:1|2|3)$"-8                        8016          4653          -41.95%
BenchmarkSetMatcher/nSeries=1000,nBlocks=20,pattern="^(?:1|2|3)$"-8                     522961        207214        -60.38%
BenchmarkSetMatcher/nSeries=1000,nBlocks=20,pattern="^(?:1|2|3|4|5|6|7|8|9|10)$"-8      651767        303631        -53.41%
BenchmarkSetMatcher/nSeries=100000,nBlocks=1,pattern="^(?:1|2|3|4|5|6|7|8|9|10)$"-8     27351         13267         -51.49%
BenchmarkSetMatcher/nSeries=500000,nBlocks=1,pattern="^(?:1|2|3|4|5|6|7|8|9|10)$"-8     42813         31066         -27.44%

benchmark                                                                               old allocs     new allocs     delta
BenchmarkSetMatcher/nSeries=1,nBlocks=1,pattern="^(?:1|2|3|4|5|6|7|8|9|10)$"-8          50             96             +92.00%
BenchmarkSetMatcher/nSeries=15,nBlocks=1,pattern="^(?:1|2|3|4|5|6|7|8|9|10)$"-8         95             96             +1.05%
BenchmarkSetMatcher/nSeries=15,nBlocks=1,pattern="^(?:1|2|3)$"-8                        72             59             -18.06%
BenchmarkSetMatcher/nSeries=1000,nBlocks=20,pattern="^(?:1|2|3)$"-8                     3290           1330           -59.57%
BenchmarkSetMatcher/nSeries=1000,nBlocks=20,pattern="^(?:1|2|3|4|5|6|7|8|9|10)$"-8      3655           1975           -45.96%
BenchmarkSetMatcher/nSeries=100000,nBlocks=1,pattern="^(?:1|2|3|4|5|6|7|8|9|10)$"-8     180            96             -46.67%
BenchmarkSetMatcher/nSeries=500000,nBlocks=1,pattern="^(?:1|2|3|4|5|6|7|8|9|10)$"-8     180            96             -46.67%

benchmark                                                                               old bytes     new bytes     delta
BenchmarkSetMatcher/nSeries=1,nBlocks=1,pattern="^(?:1|2|3|4|5|6|7|8|9|10)$"-8          3527          5209          +47.69%
BenchmarkSetMatcher/nSeries=15,nBlocks=1,pattern="^(?:1|2|3|4|5|6|7|8|9|10)$"-8         5476          5436          -0.73%
BenchmarkSetMatcher/nSeries=15,nBlocks=1,pattern="^(?:1|2|3)$"-8                        3657          3425          -6.34%
BenchmarkSetMatcher/nSeries=1000,nBlocks=20,pattern="^(?:1|2|3)$"-8                     121933        90080         -26.12%
BenchmarkSetMatcher/nSeries=1000,nBlocks=20,pattern="^(?:1|2|3|4|5|6|7|8|9|10)$"-8      146805        118994        -18.94%
BenchmarkSetMatcher/nSeries=100000,nBlocks=1,pattern="^(?:1|2|3|4|5|6|7|8|9|10)$"-8     6744          5353          -20.63%
BenchmarkSetMatcher/nSeries=500000,nBlocks=1,pattern="^(?:1|2|3|4|5|6|7|8|9|10)$"-8     6745          5352          -20.65%

gouthamve · 2019-05-15T12:43:16Z

This looks good so far!

Can you also add a nBlocks=10 version for BenchmarkSetMatcher/nSeries=500000,nBlocks=1,pattern="^(?:1|2|3|4|5|6|7|8|9|10)$"-8?

Signed-off-by: naivewong <[email protected]>

naivewong · 2019-05-15T13:26:38Z

Unable to show you the results because the previous one was already the limit of my MBP.

krasi-georgiev · 2019-05-15T13:31:22Z

pinged you on irc to give you access to a big machine where you can run these test quickly.

querier_test.go

Signed-off-by: naivewong <[email protected]>

naivewong · 2019-05-16T05:49:09Z

benchmark                                                                                                     old ns/op      new ns/op     delta
BenchmarkSetMatcher/nSeries=1,nBlocks=1,cardinality=100,pattern="^(?:1|2|3|4|5|6|7|8|9|10)$"-48               7499           11422         +52.31%
BenchmarkSetMatcher/nSeries=15,nBlocks=1,cardinality=100,pattern="^(?:1|2|3|4|5|6|7|8|9|10)$"-48              15715          12522         -20.32%
BenchmarkSetMatcher/nSeries=15,nBlocks=1,cardinality=100,pattern="^(?:1|2|3)$"-48                             9715           6789          -30.12%
BenchmarkSetMatcher/nSeries=1000,nBlocks=20,cardinality=100,pattern="^(?:1|2|3)$"-48                          731441         369395        -49.50%
BenchmarkSetMatcher/nSeries=1000,nBlocks=20,cardinality=100,pattern="^(?:1|2|3|4|5|6|7|8|9|10)$"-48           951066         466389        -50.96%
BenchmarkSetMatcher/nSeries=100000,nBlocks=1,cardinality=100000,pattern="^(?:1|2|3|4|5|6|7|8|9|10)$"-48       24648679       16168         -99.93%
BenchmarkSetMatcher/nSeries=500000,nBlocks=1,cardinality=500000,pattern="^(?:1|2|3|4|5|6|7|8|9|10)$"-48       127604669      12485         -99.99%
BenchmarkSetMatcher/nSeries=500000,nBlocks=10,cardinality=500000,pattern="^(?:1|2|3|4|5|6|7|8|9|10)$"-48      1222647955     206051        -99.98%
BenchmarkSetMatcher/nSeries=1000000,nBlocks=1,cardinality=1000000,pattern="^(?:1|2|3|4|5|6|7|8|9|10)$"-48     260974613      15219         -99.99%

benchmark                                                                                                     old allocs     new allocs     delta
BenchmarkSetMatcher/nSeries=1,nBlocks=1,cardinality=100,pattern="^(?:1|2|3|4|5|6|7|8|9|10)$"-48               50             96             +92.00%
BenchmarkSetMatcher/nSeries=15,nBlocks=1,cardinality=100,pattern="^(?:1|2|3|4|5|6|7|8|9|10)$"-48              95             96             +1.05%
BenchmarkSetMatcher/nSeries=15,nBlocks=1,cardinality=100,pattern="^(?:1|2|3)$"-48                             72             59             -18.06%
BenchmarkSetMatcher/nSeries=1000,nBlocks=20,cardinality=100,pattern="^(?:1|2|3)$"-48                          3290           1330           -59.57%
BenchmarkSetMatcher/nSeries=1000,nBlocks=20,cardinality=100,pattern="^(?:1|2|3|4|5|6|7|8|9|10)$"-48           3655           1975           -45.96%
BenchmarkSetMatcher/nSeries=100000,nBlocks=1,cardinality=100000,pattern="^(?:1|2|3|4|5|6|7|8|9|10)$"-48       100080         96             -99.90%
BenchmarkSetMatcher/nSeries=500000,nBlocks=1,cardinality=500000,pattern="^(?:1|2|3|4|5|6|7|8|9|10)$"-48       500080         96             -99.98%
BenchmarkSetMatcher/nSeries=500000,nBlocks=10,cardinality=500000,pattern="^(?:1|2|3|4|5|6|7|8|9|10)$"-48      5000765        919            -99.98%
BenchmarkSetMatcher/nSeries=1000000,nBlocks=1,cardinality=1000000,pattern="^(?:1|2|3|4|5|6|7|8|9|10)$"-48     1000080        96             -99.99%

benchmark                                                                                                     old bytes     new bytes     delta
BenchmarkSetMatcher/nSeries=1,nBlocks=1,cardinality=100,pattern="^(?:1|2|3|4|5|6|7|8|9|10)$"-48               3542          5227          +47.57%
BenchmarkSetMatcher/nSeries=15,nBlocks=1,cardinality=100,pattern="^(?:1|2|3|4|5|6|7|8|9|10)$"-48              5486          5438          -0.87%
BenchmarkSetMatcher/nSeries=15,nBlocks=1,cardinality=100,pattern="^(?:1|2|3)$"-48                             3663          3430          -6.36%
BenchmarkSetMatcher/nSeries=1000,nBlocks=20,cardinality=100,pattern="^(?:1|2|3)$"-48                          122103        90211         -26.12%
BenchmarkSetMatcher/nSeries=1000,nBlocks=20,cardinality=100,pattern="^(?:1|2|3|4|5|6|7|8|9|10)$"-48           146976        119136        -18.94%
BenchmarkSetMatcher/nSeries=100000,nBlocks=1,cardinality=100000,pattern="^(?:1|2|3|4|5|6|7|8|9|10)$"-48       1605359       5353          -99.67%
BenchmarkSetMatcher/nSeries=500000,nBlocks=1,cardinality=500000,pattern="^(?:1|2|3|4|5|6|7|8|9|10)$"-48       8006583       5352          -99.93%
BenchmarkSetMatcher/nSeries=500000,nBlocks=10,cardinality=500000,pattern="^(?:1|2|3|4|5|6|7|8|9|10)$"-48      80087445      54573         -99.93%
BenchmarkSetMatcher/nSeries=1000000,nBlocks=1,cardinality=1000000,pattern="^(?:1|2|3|4|5|6|7|8|9|10)$"-48     16007303      5352          -99.97%

querier_test.go

brian-brazil · 2019-05-16T08:53:36Z

Those results look more like it.

codesome · 2019-05-16T08:55:08Z

Gains at higher cardinality are impressive!

Signed-off-by: naivewong <[email protected]>

querier.go

Signed-off-by: naivewong <[email protected]>

brian-brazil · 2019-05-16T14:58:42Z

Just had a thought there that we should have some unittests to cover when one of the values in the regex is the empty string.

Signed-off-by: naivewong <[email protected]>

bboreham · 2019-05-17T06:56:51Z

querier.go

+	for i := 4; i < len(pattern)-2; i++ {
+		if escaped {
+			switch {
+			case isRegexMetaCharacter(pattern[i]):


Why is this check necessary?
Why not just allow every escaped character? (Or perhaps disallow \0... since that’s complicated).

@brian-brazil Should I include the cases above?

Sure, though check what Go does for escaping characters that don't need to be escaped.

I'm sorry, I was thinking about the wrong language.
The set of escapes we care about here is documented at https://github.com/google/re2/wiki/Syntax and is painfully complicated.
There are lots of escapes we don't want to accept, like \g, \p, \x, so you do need something like what you have.

I have rechecked, the special characters you meant like \n, \a are actually already included in the else {} part of findSetMatches. What I detect here are the special characters like \\., \\+ in regexp, which means after I find \\, I determine if the next char is special.

Signed-off-by: naivewong <[email protected]>

codesome

LGTM

block_test.go

querier.go

brian-brazil · 2019-05-27T09:29:27Z

👍

gouthamve

Looks good! Good work!

naivewong added 4 commits May 13, 2019 21:53

Original version of the set optimization

d249d1d

Signed-off-by: naivewong <[email protected]>

simple set matcher

32080de

Signed-off-by: naivewong <[email protected]>

simple set matcher

3f60bf8

Signed-off-by: naivewong <[email protected]>

update

02dfa44

Signed-off-by: naivewong <[email protected]>

krasi-georgiev requested review from gouthamve and codesome May 14, 2019 09:24

update

842e2a4

Signed-off-by: naivewong <[email protected]>

naivewong added 3 commits May 15, 2019 11:09

add benchmark

1e3cf63

Signed-off-by: naivewong <[email protected]>

update

4a66e34

Signed-off-by: naivewong <[email protected]>

update

87c3186

Signed-off-by: naivewong <[email protected]>

codesome reviewed May 15, 2019

View reviewed changes

querier_test.go Show resolved Hide resolved

querier_test.go Outdated Show resolved Hide resolved

querier_test.go Outdated Show resolved Hide resolved

update benchmark

017871f

Signed-off-by: naivewong <[email protected]>

update

b81e86b

Signed-off-by: naivewong <[email protected]>

update benchmark

0f88eed

Signed-off-by: naivewong <[email protected]>

update benchmark

81cb028

Signed-off-by: naivewong <[email protected]>

brian-brazil reviewed May 15, 2019

View reviewed changes

querier_test.go Outdated Show resolved Hide resolved

naivewong added 2 commits May 16, 2019 11:40

update benchmark

954df56

Signed-off-by: naivewong <[email protected]>

update

c929d7a

Signed-off-by: naivewong <[email protected]>

krasi-georgiev reviewed May 16, 2019

View reviewed changes

querier_test.go Outdated Show resolved Hide resolved

update

959158e

Signed-off-by: naivewong <[email protected]>

juliusv reviewed May 16, 2019

View reviewed changes

querier.go Show resolved Hide resolved

juliusv reviewed May 16, 2019

View reviewed changes

querier.go Outdated Show resolved Hide resolved

juliusv reviewed May 16, 2019

View reviewed changes

querier.go Outdated Show resolved Hide resolved

juliusv reviewed May 16, 2019

View reviewed changes

querier.go Show resolved Hide resolved

update

591ae7c

Signed-off-by: naivewong <[email protected]>

update

2749e56

Signed-off-by: naivewong <[email protected]>

bboreham reviewed May 17, 2019

View reviewed changes

naivewong added 2 commits May 17, 2019 22:25

use genSeries from prometheus-junkyard#467

d30f3a2

Signed-off-by: naivewong <[email protected]>

update

6049a19

Signed-off-by: naivewong <[email protected]>

codesome reviewed May 27, 2019

View reviewed changes

block_test.go Show resolved Hide resolved

querier.go Show resolved Hide resolved

gouthamve approved these changes May 27, 2019

View reviewed changes

codesome merged commit 13c80a5 into prometheus-junkyard:master May 27, 2019

codesome mentioned this pull request Jan 2, 2020

Fix findSetMatch for regex query prometheus/prometheus#6540

Merged

cstyan mentioned this pull request Jan 16, 2020

Optimize for regexp matchers in ingesters cortexproject/cortex#1993

Closed

roidelapluie mentioned this pull request Feb 8, 2020

[ENHANCEMENT] remote storage: When the matcher does not contain regular expressions, convert the value of type from RE to EQ prometheus/prometheus#6789

Closed

pstibrany mentioned this pull request Apr 16, 2020

store: Added regex-set optimization to ExpandedPostings thanos-io/thanos#2450

Merged

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize queries using regex matchers for set lookups #602

Optimize queries using regex matchers for set lookups #602

naivewong commented May 14, 2019 •

edited by krasi-georgiev

Loading

brian-brazil commented May 14, 2019

naivewong commented May 14, 2019

naivewong commented May 14, 2019

brian-brazil commented May 14, 2019

brian-brazil commented May 14, 2019

gouthamve commented May 14, 2019

brian-brazil commented May 14, 2019

bboreham commented May 14, 2019

codesome commented May 14, 2019 •

edited

Loading

naivewong commented May 15, 2019

codesome left a comment

naivewong commented May 15, 2019

brian-brazil commented May 15, 2019

gouthamve commented May 15, 2019

naivewong commented May 15, 2019

gouthamve commented May 15, 2019

naivewong commented May 15, 2019

krasi-georgiev commented May 15, 2019

naivewong commented May 16, 2019

brian-brazil commented May 16, 2019

codesome commented May 16, 2019

brian-brazil commented May 16, 2019

bboreham May 17, 2019

naivewong May 17, 2019

brian-brazil May 17, 2019

bboreham May 17, 2019

naivewong May 17, 2019

codesome left a comment

brian-brazil commented May 27, 2019

gouthamve left a comment

Optimize queries using regex matchers for set lookups #602

Optimize queries using regex matchers for set lookups #602

Conversation

naivewong commented May 14, 2019 • edited by krasi-georgiev Loading

brian-brazil commented May 14, 2019

naivewong commented May 14, 2019

naivewong commented May 14, 2019

brian-brazil commented May 14, 2019

brian-brazil commented May 14, 2019

gouthamve commented May 14, 2019

brian-brazil commented May 14, 2019

bboreham commented May 14, 2019

codesome commented May 14, 2019 • edited Loading

naivewong commented May 15, 2019

codesome left a comment

Choose a reason for hiding this comment

naivewong commented May 15, 2019

brian-brazil commented May 15, 2019

gouthamve commented May 15, 2019

naivewong commented May 15, 2019

gouthamve commented May 15, 2019

naivewong commented May 15, 2019

krasi-georgiev commented May 15, 2019

naivewong commented May 16, 2019

brian-brazil commented May 16, 2019

codesome commented May 16, 2019

brian-brazil commented May 16, 2019

bboreham May 17, 2019

Choose a reason for hiding this comment

naivewong May 17, 2019

Choose a reason for hiding this comment

brian-brazil May 17, 2019

Choose a reason for hiding this comment

bboreham May 17, 2019

Choose a reason for hiding this comment

naivewong May 17, 2019

Choose a reason for hiding this comment

codesome left a comment

Choose a reason for hiding this comment

brian-brazil commented May 27, 2019

gouthamve left a comment

Choose a reason for hiding this comment

naivewong commented May 14, 2019 •

edited by krasi-georgiev

Loading

codesome commented May 14, 2019 •

edited

Loading