-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(blooms): Fully deduplicate chunks from FilterChunkRef responses #12807
Conversation
2a084f3
to
3d4c66c
Compare
3281995
to
95b011b
Compare
Signed-off-by: Christian Haudum <[email protected]>
for the case when chunks have same from/through time, but different checksums Signed-off-by: Christian Haudum <[email protected]>
Signed-off-by: Christian Haudum <[email protected]>
Signed-off-by: Christian Haudum <[email protected]>
Signed-off-by: Christian Haudum <[email protected]>
Signed-off-by: Christian Haudum <[email protected]>
75517c9
to
628f1c2
Compare
This PR must be merged before a backport PR will be created. |
1 similar comment
This PR must be merged before a backport PR will be created. |
pkg/bloomgateway/bloomgateway.go
Outdated
@@ -383,6 +383,8 @@ func (g *Gateway) consumeTask(ctx context.Context, task Task, tasksCh chan<- Tas | |||
case <-ctx.Done(): | |||
// do nothing | |||
default: | |||
// chunks may not be sorted | |||
sort.Slice(res.Removals, func(i, j int) bool { return res.Removals[i].Less(res.Removals[j]) }) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: We tend to use sort very often and I'm worried we end up sorting already sorted slices (quicksort can degrade to O(n^2) for sorted inputs). Should we check if the list is sorted before sorting it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good point.
afaik, pdqsort
which is used by sort.Slice()
prevents the O(n^2)
worst case scenario from quicksort and can do O(n log n)
instead (which is still bad).
pkg/bloomgateway/client.go
Outdated
if a.Less(b) { | ||
result = append(result, a) | ||
i++ | ||
} else { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nite: more idiomatic to add continue
inside the if than having an else.
Signed-off-by: Christian Haudum <[email protected]>
Signed-off-by: Christian Haudum <[email protected]>
Signed-off-by: Christian Haudum <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approving but leaving a nit here
// chunks may not be sorted | ||
sort.Slice(res.Removals, func(i, j int) bool { return res.Removals[i].Less(res.Removals[j]) }) | ||
// chunks may not always be sorted | ||
if !slices.IsSortedFunc(res.Removals, func(a, b v1.ChunkRef) int { return a.Cmp(b) }) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: we end up using the same cmp funtion all over the places. Would be nice to have another function next to func (r *ChunkRef) Cmp(other ChunkRef) int
that does this:
func CmpChunkRefs(a, b ChunkRef) int { return a.cmp(b) }
So we can use IsSortedFunc and SortFunc as:
if !slices.IsSortedFunc(a.Refs, CmpChunkRefs) {
slices.SortFunc(a.Refs, CmpChunkRefs)
}
if !slices.IsSortedFunc(b.Refs, CmpChunkRefs) {
slices.SortFunc(b.Refs, CmpChunkRefs)
}
…12807) This PR aims for full de-duplication of chunks and series from filter requests from the index gateway to the bloom gateway. Whenever we merge/de-duplicate slices, the inputs need to be sorted. It appears that the Removals (chunks) from the v1.Output are not guaranteed to be sorted. When comparing ShortRefs, both From, Through, and Checksum need to be used. Signed-off-by: Christian Haudum <[email protected]> (cherry picked from commit a0f358f)
…ef responses (#12835) Backport a0f358f from #12807 Co-authored-by: Christian Haudum <[email protected]>
…ef responses (#12835) Backport a0f358f from #12807 Co-authored-by: Christian Haudum <[email protected]>
…12807) This PR aims for full de-duplication of chunks and series from filter requests from the index gateway to the bloom gateway. Whenever we merge/de-duplicate slices, the inputs need to be sorted. It appears that the Removals (chunks) from the v1.Output are not guaranteed to be sorted. When comparing ShortRefs, both From, Through, and Checksum need to be used. Signed-off-by: Christian Haudum <[email protected]>
What this PR does / why we need it:
This PR aims for full de-duplication of chunks and series from filter requests from the index gateway to the bloom gateway.
Whenever we merge/de-duplicate slices, the inputs need to be sorted. It appears that the
Removals
(chunks) from thev1.Output
are not guaranteed to be sorted.When comparing ShortRefs, both
From
,Through
, andChecksum
need to be used.Checklist
CONTRIBUTING.md
guide (required)docs/sources/setup/upgrade/_index.md
production/helm/loki/Chart.yaml
and updateproduction/helm/loki/CHANGELOG.md
andproduction/helm/loki/README.md
. Example PRdeprecated-config.yaml
anddeleted-config.yaml
files respectively in thetools/deprecated-config-checker
directory. Example PR