Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

avoid unneeded copy on touching but non-overlaping byte ranges #4219

Merged
merged 3 commits into from
Dec 4, 2023

Conversation

trinity-1686a
Copy link
Contributor

Description

when a ByteRangeCache has to store bytes for the range 0..100_000 and the range 100_000..100_100, it currently merged both range, issuing a memcpy which takes up to a few milliseconds. In practice this is almost never useful, as future get are unlikely to make a request that cross boundaries. This PR removes that merging on range that touch but do not overlap. When fetching from cache, if the cache realize it can only server a query by merging multiple ranges. This impact mostly the warmup of fastfields. On a sample find_trace_ids request, done on a single searcher over ~1b spans, it lowered query time from ~5.8s to ~4.1s

How was this PR tested?

added unit test, manually checked there is a performance improvement

.map(|(k, _v)| k);

let end_key = CacheKey::from_borrowed(tag.borrow(), byte_range.end - 1);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my understanding is that with this, we remove (as opposed to merge) any data that is overlapping (not just is included in) with the slice we are trying to put.

Is my understanding correct?
If so can we comment this more clearly?

possibly isolate in a different method:
fn remove_overlapping_slices(&mut self, ....)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I understand your query.
In this function, the only change is to what we consider for merge.
Before, if someone would put [10..15), we would search for something where either starting or ending bound (or both) is in [10..=15]. This includes [5..10) and [15..20).
After, we only consider things in [11..=14], that is, things with which we share at least one byte. If we share something, the merge happens as before, if the new rule consider we don't overlap, the range is added to the cache without merging anything

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes: correction.
we merge any data that is overlapping (not just touches) with the slice we are trying to put.

could we comment what we are doing, or even better isolate it into a method?

@trinity-1686a trinity-1686a enabled auto-merge (squash) December 4, 2023 16:39
@trinity-1686a trinity-1686a merged commit 187e2b7 into main Dec 4, 2023
4 checks passed
@trinity-1686a trinity-1686a deleted the trinity--avoid-unneeded-copy branch December 4, 2023 16:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants