Thanos-Store empty bocks in local storage #1610

KarstenSiemer · 2019-10-07T08:31:05Z

Hi there!
I am using the quay.io/thanos/thanos:v0.7.0 container and i am experiencing problems with the store component.
The store is missing metadata from it's bocks inside it's local storage.
But the metadata exists in the s3 bucket.
Store log:

 level=warn ts=2019-10-07T07:15:12.145791006Z caller=bucket.go:325 msg="error parsing block range" block=01DPJ5368THP909JKH2DW72JJM err="read meta.json: open /thanos-store-data/01DPJ5368THP909JKH2DW72JJM/meta.json: no such file or directory"

S3 bucket ls:

2019-10-07 03:47 536864293   s3://de.y6b.system.prometheus/01DPJ5368THP909JKH2DW72JJM/chunks/000001
2019-10-07 03:48 536857676   s3://de.y6b.system.prometheus/01DPJ5368THP909JKH2DW72JJM/chunks/000002
2019-10-07 03:48 536860520   s3://de.y6b.system.prometheus/01DPJ5368THP909JKH2DW72JJM/chunks/000003
2019-10-07 03:48 536864881   s3://de.y6b.system.prometheus/01DPJ5368THP909JKH2DW72JJM/chunks/000004
2019-10-07 03:48 536863844   s3://de.y6b.system.prometheus/01DPJ5368THP909JKH2DW72JJM/chunks/000005
2019-10-07 03:48 536865851   s3://de.y6b.system.prometheus/01DPJ5368THP909JKH2DW72JJM/chunks/000006
2019-10-07 03:49 536771867   s3://de.y6b.system.prometheus/01DPJ5368THP909JKH2DW72JJM/chunks/000007
2019-10-07 03:49 536685857   s3://de.y6b.system.prometheus/01DPJ5368THP909JKH2DW72JJM/chunks/000008
2019-10-07 03:49 536868988   s3://de.y6b.system.prometheus/01DPJ5368THP909JKH2DW72JJM/chunks/000009
2019-10-07 03:49 536868010   s3://de.y6b.system.prometheus/01DPJ5368THP909JKH2DW72JJM/chunks/000010
2019-10-07 03:49 536868033   s3://de.y6b.system.prometheus/01DPJ5368THP909JKH2DW72JJM/chunks/000011
2019-10-07 03:50 536869076   s3://de.y6b.system.prometheus/01DPJ5368THP909JKH2DW72JJM/chunks/000012
2019-10-07 03:50 536870302   s3://de.y6b.system.prometheus/01DPJ5368THP909JKH2DW72JJM/chunks/000013
2019-10-07 03:50 516198435   s3://de.y6b.system.prometheus/01DPJ5368THP909JKH2DW72JJM/chunks/000014
2019-10-07 03:50 519912306   s3://de.y6b.system.prometheus/01DPJ5368THP909JKH2DW72JJM/index
2019-10-07 03:50  13922915   s3://de.y6b.system.prometheus/01DPJ5368THP909JKH2DW72JJM/index.cache.json
2019-10-07 03:50      1997   s3://de.y6b.system.prometheus/01DPJ5368THP909JKH2DW72JJM/meta.json

The metadata and index file is actually missing when i take a look into the data directory of the store component for that block.In the web-ui from the querier the store looks healthy and also has the correct min and max time ranges.When i restart the store, it comes back up healthy and all the metadata from the before faulty bocks are there now and are query-able.
But eventually the data goes missing and holes are represented in the graphs done by the queriers.
A restart always fixes that.This only recently started after updating to version v0.7.0
What might be important to note here is, that i run a daily bucket verify job on the bucket, while the compactor is actually still running.
But the bucket verify is always configured without the repair flag.
After restarting a store and then running the verifier does not cause holes.
I cannot manually recreate the problem, it only eventually happens after some time.I'd be very thankful for any help

The text was updated successfully, but these errors were encountered:

bwplotka · 2019-10-07T09:31:23Z

Thanks for this. Do you have persistent volume? It looks really like the issue we fixed recently with this, which will be released soon: https://github.com/thanos-io/thanos/blob/master/CHANGELOG.md#fixed

Can you try running master? E.g master-2019-10-06-bb1ac398

bwplotka · 2019-10-07T09:31:35Z

Next release is this week (:

FUSAKLA · 2019-10-07T09:32:52Z

Hi, I wonder if this is related to shis issue #1504 ?

It's interesting that it gets fixed after restart. Do you have persistent storage on that store? In my case it persisted after restart so a added check to erease malformed blocks. It got merged after 0.7.0 was released IIRC could you try recent master?

FUSAKLA · 2019-10-07T09:34:47Z

Hah, Bartek was faster :)

I still wonder how those malformed blocks happen to be.

bwplotka · 2019-10-07T09:35:45Z

It's quite straightforward. Check #1505 (review) for explanation.

KarstenSiemer · 2019-10-07T09:39:45Z

Thanks for the quick response!
I do not use a persistent volume. It is saved into an empty dir.
Should i rather add a persistent volume to the store? I figured that it is unnecessary, since i have persistence inside s3. I have roughly 4TB of metrics in total in my s3. Keeping data inside the store after a restart didn't seem resourceful since pods rarely restart in my cluster.
I will try master and come back to you guys if it happens again.
Thanks so much 👍

KarstenSiemer · 2019-10-24T07:15:28Z

Just for readers that have run into this problem, since using version v0.8.1 I did not experience this problem again.

KarstenSiemer closed this as completed Oct 7, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Thanos-Store empty bocks in local storage #1610

Thanos-Store empty bocks in local storage #1610

KarstenSiemer commented Oct 7, 2019

bwplotka commented Oct 7, 2019

bwplotka commented Oct 7, 2019

FUSAKLA commented Oct 7, 2019

FUSAKLA commented Oct 7, 2019

bwplotka commented Oct 7, 2019 •

edited

Loading

KarstenSiemer commented Oct 7, 2019

KarstenSiemer commented Oct 24, 2019

Thanos-Store empty bocks in local storage #1610

Thanos-Store empty bocks in local storage #1610

Comments

KarstenSiemer commented Oct 7, 2019

bwplotka commented Oct 7, 2019

bwplotka commented Oct 7, 2019

FUSAKLA commented Oct 7, 2019

FUSAKLA commented Oct 7, 2019

bwplotka commented Oct 7, 2019 • edited Loading

KarstenSiemer commented Oct 7, 2019

KarstenSiemer commented Oct 24, 2019

bwplotka commented Oct 7, 2019 •

edited

Loading