-
Notifications
You must be signed in to change notification settings - Fork 9.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUGFIX: TSDB: panic in query during truncation with OOO head #14831
Conversation
Regression test for prometheus#14822 Signed-off-by: György Krajcsovits <[email protected]>
a113fe7
to
d102c99
Compare
Ref: #14831 Signed-off-by: György Krajcsovits <[email protected]>
c754ab5
to
247ed00
Compare
Attempted fix Signed-off-by: György Krajcsovits <[email protected]>
Signed-off-by: Bryan Boreham <[email protected]>
356e519
to
8a39690
Compare
Signed-off-by: György Krajcsovits <[email protected]>
8a39690
to
03e1ac1
Compare
tsdb/ooo_head_read.go
Outdated
@@ -513,7 +513,7 @@ type HeadAndOOOQuerier struct { | |||
head *Head | |||
index IndexReader | |||
chunkr ChunkReader | |||
querier storage.Querier | |||
querier storage.Querier // This might be nil if head was truncated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally I advise to state what the thing means, not when you expect it to apply.
So "If nil, do not read from in-order head"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clarified
Signed-off-by: György Krajcsovits <[email protected]>
cc PTAL @colega |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! I can see how you are preventing new panics and how data is still coming from the blocks in the test. Nice work!
Followup to prometheus#14831 Signed-off-by: György Krajcsovits <[email protected]>
Followup to prometheus#14831 Signed-off-by: György Krajcsovits <[email protected]>
Added regression test for #14822. Doesn't cause segfault before #14354
The segfault was due to a race condition between query start and compaction.
When compaction starts, in-order queries may overlap with the TSDB head, but also might fall into the truncated time of the head. In such case, the head querier
headQuerier
is nil in db.go hereprometheus/tsdb/db.go
Line 2062 in 1e01e97
and
prometheus/tsdb/db.go
Line 2137 in 1e01e97
That pointer is not used for selecting samples, but is referenced in
Close()
which causes the segfault.The fix essentially restores the original function where we did not rely on the headQuerier in creating the OOO head querier:
prometheus/tsdb/db.go
Line 2157 in 9849418