-
Notifications
You must be signed in to change notification settings - Fork 179
Optimize queries using regex matchers for set lookups #602
Conversation
Signed-off-by: naivewong <[email protected]>
Signed-off-by: naivewong <[email protected]>
Signed-off-by: naivewong <[email protected]>
Signed-off-by: naivewong <[email protected]>
You need to check for and exclude the wrapper we put around the regex the user specifies. |
Oh yes, I forgot to exclude the wrapper. But sorry what did you mean by "check for"? |
Is that we exclude the wrapper only if we find it? |
If the wrapper is missing, treat it as a regex. |
Signed-off-by: naivewong <[email protected]>
That looks about right. Can you add it to the postings benchmark too? I'd be interested to know if there's a point where it's better to use the regex rather than the set. |
I don't think that'll ever be the case, as with regex, first we'll get all the matching label values (the set) and doing what the set matcher does anyways. With set, we'll be avoiding a step. |
At some point, us doing parsing of the strings might be slower than the regex. Would be good to check one way or the other. |
What's the motivation for only allowing escaping of special characters? What if I want to match |
Adding to @bboreham's comment, isn't it enough to just include any character that is escaped rather than checking for special character or |
Signed-off-by: naivewong <[email protected]>
Signed-off-by: naivewong <[email protected]>
Signed-off-by: naivewong <[email protected]>
Benchmark
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you also add test case for >1 blocks and get the benchmark results, which your benchmark already allows? One such test case should be enough.
Signed-off-by: naivewong <[email protected]>
|
Signed-off-by: naivewong <[email protected]>
What about 100k to 1M series? It's the bigger data sizes where the wins are, what you've tested are already fast. |
Also a case where there is only one value for the label, but 15 options in the set matcher. Maybe that will tell us if the set matcher is always faster or not. |
Signed-off-by: naivewong <[email protected]>
|
This looks good so far! Can you also add a |
Signed-off-by: naivewong <[email protected]>
Unable to show you the results because the previous one was already the limit of my MBP. |
pinged you on irc to give you access to a big machine where you can run these test quickly. |
Signed-off-by: naivewong <[email protected]>
Signed-off-by: naivewong <[email protected]>
|
Those results look more like it. |
Gains at higher cardinality are impressive! |
Signed-off-by: naivewong <[email protected]>
Signed-off-by: naivewong <[email protected]>
Just had a thought there that we should have some unittests to cover when one of the values in the regex is the empty string. |
Signed-off-by: naivewong <[email protected]>
for i := 4; i < len(pattern)-2; i++ { | ||
if escaped { | ||
switch { | ||
case isRegexMetaCharacter(pattern[i]): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this check necessary?
Why not just allow every escaped character? (Or perhaps disallow \0...
since that’s complicated).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@brian-brazil Should I include the cases above?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, though check what Go does for escaping characters that don't need to be escaped.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm sorry, I was thinking about the wrong language.
The set of escapes we care about here is documented at https://github.com/google/re2/wiki/Syntax and is painfully complicated.
There are lots of escapes we don't want to accept, like \g
, \p
, \x
, so you do need something like what you have.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have rechecked, the special characters you meant like \n
, \a
are actually already included in the else {}
part of findSetMatches
. What I detect here are the special characters like \\.
, \\+
in regexp, which means after I find \\
, I determine if the next char is special.
Signed-off-by: naivewong <[email protected]>
Signed-off-by: naivewong <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
👍 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! Good work!
fixes prometheus/prometheus#2651