-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
store: "invalid size" when decoding postings. #6545
Comments
@yeya24 maybe you've seen this on Cortex side? |
Was able to repro this consistently, let me try to fix it. |
Added repro here with prod data #6575, not sure how this data got generated in the first place. In one instance |
@ryan-suther do you use Redis caching in prod perhaps? |
@GiedriusS No, we do not. We use a normal PVC. Our
|
Somehow fixing the following data race also fixed this problem for me: #6575. |
Still reproducible, unfortunately, trying to fix it. Breaks long-term querying with index cache :'/ |
Snappy works on byte level and it can cut two different chunks in the middle of a varint. Thus, if there's some error from the Decbuf then fill up the buffer and try reading a varint again. Added repro test. Closes thanos-io#6545. Signed-off-by: Giedrius Statkevičius <[email protected]>
Will finally be fixed in #6602 |
Snappy works on byte level and it can cut two different chunks in the middle of a varint. Thus, if there's some error from the Decbuf then fill up the buffer and try reading a varint again. Added repro test. Closes thanos-io#6545. Signed-off-by: Giedrius Statkevičius <[email protected]>
Snappy works on byte level and it can cut two different chunks in the middle of a varint. Thus, if there's some error from the Decbuf then fill up the buffer and try reading a varint again. Added repro test. Closes #6545. Signed-off-by: Giedrius Statkevičius <[email protected]>
@ylazaar IIUC it is a different issue as the error is from compactor, not store gateway. Would you mind creating a separate issue? |
Thanos, Prometheus and Golang version used:
Thanos: main@dc337b23
Prometheus: v0.44.1
Golang: 1.20.2
thanos, version 0.32.0-dev (branch: HEAD, revision: dc337b2)
build user: rsuther@
build date: 20230720-18:24:06
go version: go1.20.2
platform: darwin/amd64
tags: netgo
Object Storage Provider:
CEPH(S3)
What happened:
We were on version v0.30.2 and upgraded to 0.31.0 and experienced the high mem usage while rolling out new deployments. Then we upgraded to the above commit for the high mem usage fix and began experiencing the below error.
Multiple refreshes of the Grafana dashboard produced different block IDs.
What you expected to happen:
Thanos to operate normally.
How to reproduce it (as minimally and precisely as possible):
We are not able to reliably reproduce the error. I was able to download one of the reported blocks and run some e2e tests using the original query from the dashboard and was not able to reproduce the issue. I did however, get good results from the query.
In the test, I used a filesystem bucket with the reported block, and an in memory index cache. Output of the test:
start store, run
InitialSync()
Submit query via
store.Series(...)
call.Run
store.Sync(...)
store.Series(...)
again, with the same query.Resulting series looked good, no warnings.
Full logs to relevant components:
The error above is not logged. All other logging for the block(s) appeared normal.
Anything else we need to know:
The text was updated successfully, but these errors were encountered: