-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(query engine): Include lines with ts equal to end timestamp of the query range when executing range aggregations #13448
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be noted as a breaking change, and/or require a call out in the upgrading guide?
This is not a breaking change, but a fix to include chunks that were otherwise dismissed because the overlap check failed. |
So that the range is start inclusive and end is exclusive. Signed-off-by: Christian Haudum <[email protected]>
Signed-off-by: Christian Haudum <[email protected]>
Signed-off-by: Christian Haudum <[email protected]>
c778d52
to
6cb62f2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 👍
@@ -115,7 +115,7 @@ func (i *MultiIndex) forMatchingIndices(ctx context.Context, from, through model | |||
queryBounds := newBounds(from, through) | |||
|
|||
return i.iter.For(ctx, i.maxParallel, func(ctx context.Context, idx Index) error { | |||
if Overlap(queryBounds, idx) { | |||
if Overlap(idx, queryBounds) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
q: wondering if any other places of Overlap
usage need this swap of arguments as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I checked, but all other places already use the Overlap function with the correct argument order.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
What this PR does / Why we need it
Background
When performing range vector aggregations, such as
count_over_time({env="dev"}[1h])
, the query range is divided into multiple steps at which the aggregation operation (e.g. counting the log lines) is evaluated.Each step starts at
current step - step interval
and ends atcurrent step
, as depicted in the following chart. The select range for the logs is extended by thestep interval
into the past, in order to select logs for calculating the first step.However, the select range for logs is
start
inclusive andend
exclusive (written as[start, end)
), but the evaluation of the steps for the range aggregation isstart
exclusive andend
inclusive (written as(start, end]
).This leads to the problem that the very first timestamp at the beginning of the select range and the very last timestamp at the end of the select range are not included in the range aggregation. The "missing" last timestamp is not a problem, because a) in an instant query it is not supposed to be included anyway because of the
[start, end)
inclusivity of the query range and b) in a range query the last point of the previous step will be part of the next step evaluation.Issue
The missing first timestamp, however, gets problematic when executing an instant query and the log timestamps are exactly at the start of the query range. This can happen when the query is split in the query frontend into multiple smaller time ranges, e.g.
1h
,30m
, ...Since the sub queries are executed independently on the queriers, all logs that have a timestamp exactly a multiple of the split interval, e.g. 00:00, 01:00, 02:00, ... for a 1h interval, are dismissed and therefore missing in the query result over the full time range of the original query.
Fix
In order to avoid the missing logs that have a timestamp a multiple of the split interval in instant queries, we need to adjust the query range for logs to also include the
end
timestamp (written as[start, end]
). This is done by adding a "leap nanosecond" to theend
timestamp of the log select range. This ensures that the includedend
timestamp of the step evaluation is also included in the log selection.Checklist
CONTRIBUTING.md
guide (required)feat
PRs are unlikely to be accepted unless a case can be made for the feature actually being a bug fix to existing behavior.docs/sources/setup/upgrade/_index.md
production/helm/loki/Chart.yaml
and updateproduction/helm/loki/CHANGELOG.md
andproduction/helm/loki/README.md
. Example PRdeprecated-config.yaml
anddeleted-config.yaml
files respectively in thetools/deprecated-config-checker
directory. Example PR