Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

compact: no-compact-mark.json ignored #5603

Closed
jimethn opened this issue Aug 17, 2022 · 8 comments
Closed

compact: no-compact-mark.json ignored #5603

jimethn opened this issue Aug 17, 2022 · 8 comments

Comments

@jimethn
Copy link

jimethn commented Aug 17, 2022

Thanos, Prometheus and Golang version used:

quay.io/thanos/thanos:v0.25.1

Object Storage Provider:

S3

What happened:

level=error ts=2022-08-17T07:27:40.951712117Z caller=compact.go:495 msg="critical error detected; halting" err="compaction: group 0@12040338854421827242: compact blocks
[/var/thanos/compact/compact/0@12040338854421827242/01FANBC0GJAFTEP9DNJERHCRV6
/var/thanos/compact/compact/0@12040338854421827242/01FAV4Q3VHNZ1NYMKF55T5JQ9Y]:
populate block: chunk iter: cannot populate chunk 385: checksum mismatch expected:6b128655, actual:1c69f469"

What you expected to happen:

I've marked both blocks to skip compaction, so I'd expect compact to not try to compact them.

/ $ thanos tools bucket mark --id=01FAV4Q3VHNZ1NYMKF55T5JQ9Y --id=01FANBC0GJAFTEP9DNJERHCRV6 --marker=no-compact-mark.json --details="corrupt block" --objstore.config-file=/tmp/conf
level=info ts=2022-08-17T13:12:26.360635881Z caller=factory.go:49 msg="loading bucket configuration"
level=warn ts=2022-08-17T13:12:26.510952339Z caller=block.go:378 msg="requested to mark for no compaction, but file already exists; this should not happen; investigate" err="file 01FAV4Q3VHNZ1NYMKF55T5JQ9Y/no-compact-mark.json already exists in bucket"
level=warn ts=2022-08-17T13:12:26.528072708Z caller=block.go:378 msg="requested to mark for no compaction, but file already exists; this should not happen; investigate" err="file 01FANBC0GJAFTEP9DNJERHCRV6/no-compact-mark.json already exists in bucket"
level=info ts=2022-08-17T13:12:26.528113319Z caller=tools_bucket.go:1062 msg="marking done" marker=no-compact-mark.json IDs=01FAV4Q3VHNZ1NYMKF55T5JQ9Y,01FANBC0GJAFTEP9DNJERHCRV6
level=info ts=2022-08-17T13:12:26.528158321Z caller=main.go:161 msg=exiting

How to reproduce it (as minimally and precisely as possible):

I guess to reproduce you'd have to create a corrupt block. In my case prometheus did this for me. ;) If you wanted to do it, I guess you could just pick a block and change some random characters, that would probably cause a checksum mismatch.

Step two would be to turn on compact and confirm it halts on the block.

Step 3 would be to add the no-compact-mark and bounce compact.

Full logs to relevant components:

Anything else we need to know:

I found a couple similar issues:

However, both of those specifically relate to no-compact being ignored when index size exceeds 64GiB. In my case the index size is nominal, but no-compact is still being ignored.

@stale
Copy link

stale bot commented Nov 13, 2022

Hello 👋 Looks like there was no activity on this issue for the last two months.
Do you mind updating us on the status? Is this still reproducible or needed? If yes, just comment on this PR or push a commit. Thanks! 🤗
If there will be no activity in the next two weeks, this issue will be closed (we can always reopen an issue if we need!). Alternatively, use remind command if you wish to be reminded at some point in future.

@xBazilio
Copy link
Contributor

How to reproduce the issue

  1. generate some blocks
docker run --rm -i quay.io/thanos/thanosbench:v0.3.0-rc.0 block plan -p continuous-30d-tiny --max-time=2022-03-10T00:00:00Z --labels="prometheus_replica=\"a\"" | docker run --rm -i -v /__path_to__/block_gen:/data/block_gen quay.io/thanos/thanosbench:v0.3.0-rc.0 block gen --output.dir=/data/block_gen/
  1. take 1 block and copy it to local store dir as two blocks with names 01FVVQHRGM37XXYS1VV8R7QT42 and 01FZGJK6GYSC29QY2243EF9QJ4. replace names in meta.json too.
  2. create objstore.config.file with type: FILESYSTEM, pointing to local store dir
  3. mark blocks for no compaction
thanos tools bucket mark --marker=no-compact-mark.json --objstore.config-file=thanos_bucket.yaml --details="overlapping" --id=01FVVQHRGM37XXYS1VV8R7QT42

thanos tools bucket mark --marker=no-compact-mark.json --objstore.config-file=thanos_bucket.yaml --details="overlapping" --id=01FZGJK6GYSC29QY2243EF9QJ4
  1. run thanos compact
thanos compact --log.level=debug --data-dir=data --compact.concurrency=1 --retention.resolution-raw=12d --retention.resolution-5m=60d -w --delete-delay=6h --downsample.concurrency=1 --compact.skip-block-with-out-of-order-chunks --objstore.config-file=thanos_bucket.yaml

Expected result:
thanos compact has nothing to do

Actual result:

level=error ts=2023-03-20T16:35:22.712232349Z caller=compact.go:487 msg="critical error detected; halting" err="compaction: group 0@2611777887291397997: pre compaction overlap check: overlaps found while gathering blocks. [mint: 1647734415001, maxt: 1647741600002, range: 1h59m45s, blocks: 2]: <ulid: 01FVVQHRGM37XXYS1VV8R7QT42, mint: 1647734415001, maxt: 1647741600002, range: 1h59m45s>, <ulid: 01FZGJK6GYSC29QY2243EF9QJ4, mint: 1647734415001, maxt: 1647741600002, range: 1h59m45s>"

Same error occurs with deletion-mark.json

@yeya24 can you, please, confirm the expected result? or was there different logic in mind?

I can dig deeper and propose a fix, so that thanos compact will ignore marked blocks.

@xBazilio
Copy link
Contributor

@yeya24 I tried to propose a fix in #6229 but it was incorrect.
I think marked blocks should be excluded earlier, may be during groupping, so that compact.Group.metasByMinTime already doesn't have any metas from marked blocks. What do you think?

@xBazilio
Copy link
Contributor

As I undertood from #3409 blocks with no-compact-mark.json should not be compacted. It is so, but the check is in the planner.
Maybe it is worth to exclude such block with GatherNoCompactionMarkFilter, rename the filter to IgnoreNoCompactionMarkFilter the same as IgnoreDeletionMarkFilter, and delete blocks metas from metas during filtering.
Though after that planner will need to be refactored either. because it wouldn't have to filter out marked blocks again.

@GiedriusS
Copy link
Member

I can't repro this locally with neither the OP's instructions:

I guess to reproduce you'd have to create a corrupt block. In my case prometheus did this for me. ;) If you wanted to do it, I guess you could just pick a block and change some random characters, that would probably cause a checksum mismatch.

Step two would be to turn on compact and confirm it halts on the block.

Step 3 would be to add the no-compact-mark and bounce compact.

Nor these ones. I believe that this might be the actual fix.

@xBazilio
Copy link
Contributor

xBazilio commented Apr 27, 2023

@GiedriusS I've just reproduced the bug on main branch with your last commit "compact: atomically replace no compact marked map (#6319)"

the steps are still actual.

so the bug still exists.

@yeya24
Copy link
Contributor

yeya24 commented Sep 16, 2024

I wonder if this is still an issue. @GiedriusS @xBazilio Do we know if we can resolve this issue or it is still a problem

@xBazilio
Copy link
Contributor

I can confirm that this is NOT an issue any more.
Just checked to be sure.
v0.36.1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants