Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUGFIX: TSDB: panic in query during truncation with OOO head #14831

Merged
merged 5 commits into from
Sep 5, 2024

Conversation

krajorama
Copy link
Member

@krajorama krajorama commented Sep 5, 2024

Added regression test for #14822. Doesn't cause segfault before #14354

The segfault was due to a race condition between query start and compaction.
When compaction starts, in-order queries may overlap with the TSDB head, but also might fall into the truncated time of the head. In such case, the head querier headQuerier is nil in db.go here

headQuerier = nil

and
headQuerier = nil

That pointer is not used for selecting samples, but is referenced in Close() which causes the segfault.

The fix essentially restores the original function where we did not rely on the headQuerier in creating the OOO head querier:

rh := NewOOORangeHead(db.head, mint, maxt, db.lastGarbageCollectedMmapRef)

krajorama added a commit that referenced this pull request Sep 5, 2024
Ref: #14831

Signed-off-by: György Krajcsovits <[email protected]>
krajorama and others added 2 commits September 5, 2024 11:24
Attempted fix

Signed-off-by: György Krajcsovits <[email protected]>
Signed-off-by: Bryan Boreham <[email protected]>
@krajorama krajorama force-pushed the fix-panic-in-ooo-query branch 2 times, most recently from 356e519 to 8a39690 Compare September 5, 2024 10:57
@krajorama krajorama marked this pull request as ready for review September 5, 2024 11:11
@@ -513,7 +513,7 @@ type HeadAndOOOQuerier struct {
head *Head
index IndexReader
chunkr ChunkReader
querier storage.Querier
querier storage.Querier // This might be nil if head was truncated.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally I advise to state what the thing means, not when you expect it to apply.
So "If nil, do not read from in-order head"?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clarified

Signed-off-by: György Krajcsovits <[email protected]>
@krajorama
Copy link
Member Author

cc PTAL @colega

Copy link
Member

@jesusvazquez jesusvazquez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! I can see how you are preventing new panics and how data is still coming from the blocks in the test. Nice work!

@bboreham bboreham merged commit 536d9f9 into prometheus:main Sep 5, 2024
26 checks passed
krajorama added a commit to krajorama/prometheus that referenced this pull request Sep 9, 2024
Followup to prometheus#14831

Signed-off-by: György Krajcsovits <[email protected]>
bboreham pushed a commit to bboreham/prometheus that referenced this pull request Oct 22, 2024
Followup to prometheus#14831

Signed-off-by: György Krajcsovits <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants