-
Notifications
You must be signed in to change notification settings - Fork 178
Keep series that are still in WAL in checkpoints #577
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, just one area where I think a comment would be useful.
Your commit message is already descriptive of the issue, but ideally we don't have to git blame
or git log
the file to see why the change was made.
@brian-brazil that is probably the root cause for #21 . some people report that it is appearing after a crash but others that it also appears without a crash so it seems they are hitting this bug. |
I think instead of this we should just drop all irrelevant samples as per this PR |
I believe this will fix #21. There should be no irrelevant samples. |
87eb904
to
1c7070a
Compare
I think this should be the correct fix for this. Instead of keeping series we should just delete the irrelevant samples. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good!
@@ -75,6 +75,9 @@ type Head struct { | |||
symbols map[string]struct{} | |||
values map[string]stringset // label names to possible values | |||
|
|||
deletedMtx sync.Mutex | |||
deleted map[uint64]int // Deleted series, and what WAL segment they must be kept until. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
must be kept until what?
deleted map[uint64]int // Deleted series, and what WAL segment they must be kept until. | |
deleted map[uint64]int // Deleted series, and what WAL segment they must be kept until the last segment has samples referencing these. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your suggestion doesn't make sense. This is a data structure, not an algorithm.
_, err := app.Add(labels.Labels{{"a", "b"}}, int64(i), 0) | ||
testutil.Ok(t, err) | ||
testutil.Ok(t, app.Commit()) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lets extract all this in a separate func like in my other PR
https://github.com/prometheus/tsdb/pull/467/files#diff-7ae027963734feb4c8722aa99f033363R165
createHead(t, genSeries(....))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That'd clash with the existing name, plus that function doesn't have the WAL handling I need.
head_test.go
Outdated
// Confirm there's been a checkpoint. | ||
cdir, _, err := LastCheckpoint(dir) | ||
testutil.Ok(t, err) | ||
// Read in checkpoint and WAL |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// Read in checkpoint and WAL | |
// Read in checkpoint and WAL. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
head.go
Outdated
@@ -570,6 +580,17 @@ func (h *Head) Truncate(mint int64) (err error) { | |||
// that supersedes them. | |||
level.Error(h.logger).Log("msg", "truncating segments failed", "err", err) | |||
} | |||
|
|||
// The checkpoint is written and segments before it truncated, so we no |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// The checkpoint is written and segments before it truncated, so we no | |
// The checkpoint is written and segments before it is truncated, so we no |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
@@ -501,6 +501,57 @@ func TestDeleteUntilCurMax(t *testing.T) { | |||
testutil.Ok(t, err) | |||
testutil.Equals(t, []tsdbutil.Sample{sample{11, 1}}, ressmpls) | |||
} | |||
|
|||
func TestDeletedSamplesAndSeriesStillInWALAfterCheckpoint(t *testing.T) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe extend the test a bit like:
After the checkpoint add more samples for existing series. Read the wal from scratch and ensure that there are no samples with unknown series references.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not quite sure what you're getting at. There are no existing series after we've read the checkpoint, as we've deleted them.
If all the samples are deleted for a series, we should still keep the series in the WAL as anything else reading the WAL will still care about it in order to understand the samples. Signed-off-by: Brian Brazil <[email protected]>
1c7070a
to
9a071eb
Compare
LGTM |
If all the samples are deleted for a series,
we should still keep the series in the WAL as
anything else reading the WAL will still care
about it in order to understand the samples.
fixes: #21
Signed-off-by: Brian Brazil [email protected]