feat(hogql): bunch of improvements (HogVM part 1) #16274
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Problem
The HogQL local evaluation PR (aka HogVM) is getting out of control. I'll split it into at least three. This is part one. It does a bit of cleanup, and adds better regex support.
Changes
BinaryOperation
toArithmeticOperation
LtE
toLtEq
, andLE
toLT_EQ
not(in(...))
in the printer, use shorthands likenotIn()
when possible~
,=~
and!~
.~*
,=~*
and!~*
. We need this separately for local evaluation. ClickHouse usesre2
for regex matching, whereas Python and JS usepcre
. Adding our ownre2
bindings to the HogVM implementations is possible (~200kb of WASM?), but likely not worth it when we want to keep the bundle size down. E.g. for client side feature flags. A lot of the syntax between both implementations is similar, however a big difference is in how you pass flags.re2
uses(?i)
inside the regex, whereaspcre
uses a separate modifiers list (like/a/i
). Introducing these operators seemed to be the only way to reliably do case-insensitive regex matching both locally and in ClickHouse.cohort(1)
function, and instead only support theid in cohort 1
syntax. We're effectively moving fromin(id, cohort(1))
toinCohort(id, 1)
. Nothing really changes for users, this makes cohort matching easier to implement in HogVM, and unlocks further optimisations.How did you test this code?
Added or updated tests where relevant