-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: split detected fields queries #12491
Conversation
pkg/storage/detected/fields.go
Outdated
//this is an estimation, as the true cardinality could be greater | ||
//than either of the seen values, but will never be less | ||
if ok { | ||
curCard, newCard := f.Cardinality, field.Cardinality |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if you merge multiple timerange why isn't the cardinality additive ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
because you don't know if you've seen different values in each timerange, for example
timerange 1 (cardinality 3)
flavor=sweet
flavor=sour
flavor=sweet
flavor=spicy
timerange 2 (cardinality 2)
flavor=sweet
flavor=sweet
flavor=bland
since we're discarding the actual values, there's no way to be 100% accurate here. the safest bet is to take the highest cardinality seen. in this example, the true cardinality is actually 4, even though we'll report it as 3. adding them to 5 would be incorrect. Also imagine if they saw they same 3 values in each time range, adding them we produce 6 which would be even further from the truth.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds like you might want to run the sketch in the frontend may be. Happy to start like this though.
But if you can stream 100k logs back to frontend and do sketch there then it will be more accurate. may be querier could do the parsing of fields to absord most of the CPU work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cyril also suggested (in a separate conversation) looking in to some ways to merge the sketches in the frontend, so I'm going to investigate that a bit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've changed this code to merge sketches and get more accurate cardinality estimations
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
Really think we should look at merging sketch for better result in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
What this PR does / why we need it:
This PR splits detected fields queries using the existing split by time logic used for log filter queries.
Which issue(s) this PR fixes:
Re #12339