Failed to query trace in s3 storage, and index.json.gz has not been updated for a long time. #3369

aaashen · 2024-02-06T08:09:07Z

To Reproduce
Steps to reproduce the behavior:

Start Tempo using helm chart(Tempo 2.3.0)
Perform Operations (Read/Write/Others)

Expected behavior

Environment:

Infrastructure: Kubernetes 1.20
Deployment tool: helm

Additional Context
It looks like the image of v2.3.0 includes the commit of polling improvements(#2652).
By executing "./tempo-cli list blocks single-tenant -c tempo.yaml" and printing the blockId obtained each time，
we found that it seemed to be trapped in an endless loop, and there were duplicate blockIds in the log.
After rolling back the image to v2.3.0-rc, everything went backed to normal.
Is it possible there is a bug in the listBlocks method? Or any wrong with our environment or configuration?

joe-elliott · 2024-02-06T20:11:14Z

It's hard to say. Let's start by reviewing your compactor logs. The compactor is the component responsible for updating the tenant index and it may contain some clues about why your index is so out of date.

cc @zalegrala

zalegrala · 2024-02-06T20:37:31Z

It looks like #3224 hasn't been released, which contains an important fix for the PR linked above.

aaashen · 2024-02-07T06:54:56Z

It's hard to say. Let's start by reviewing your compactor logs. The compactor is the component responsible for updating the tenant index and it may contain some clues about why your index is so out of date.

cc @zalegrala

hi joe, i am sorry that we did not save the complete logs. Here is a portion of the compactor logs.

level=info ts=2024-02-02T12:46:15.522406823Z caller=tempodb.go:428 msg="polling enabled" interval=5m0s concurrency=50
level=debug ts=2024-02-02T12:46:15.5863981592 coller=s3.go:273 msg="listing blocks" keypath=/ found=1 IsTruncoted=true NextMarker=single-tenant/0078d61a-93a5-4290-a036-c37e2aad89b7/data.parquet
level=debug ts=2024-02-02T12:46:15.630589307Z caller=s3.go:273 msg="listing blocks" keypath=/ found=0 IsTruncated=false NextMarker
level=debug ts=2024-02-02T12:46:15.630975992Z caller=compactor.go:195 msg="checking hash" hash=build-tenant-index-0-single-tenant
level=debug ts=2024-02-02T12:46:15.631127599Z caller=compactor.go:214 msg="checking addresses" owning_addr=10.178.13.70:0 this_addr=10.178.13.70:0

There were no other error logs or logs like 'listing blocks complete'.

aaashen · 2024-02-07T06:58:19Z

It looks like #3224 hasn't been released, which contains an important fix for the PR linked above.

Hi, zalegrala，thanks for reply. I check the #3224, it fixed poller waitgroup handling in pollUnknown method. But it seems that cmd-list-blocks does not call pollUnknown？

zalegrala · 2024-02-07T14:44:13Z

Checking a little closer, it looks like the polling change is also unreleased. Are you overriding the image in your helm values? v2.3.1...main

The fix above as you mention will help the compactor, but not the ListBlocks() call. I tested the tempo-cli out locally to check for duplicates, but didn't see any in this output. This is from a main build.

./bin/linux/tempo-cli-amd64 list blocks ops -c tempo.yaml | awk '/| / {print $2 }' | sort | uniq -d

The tempo-cli that you are using is also from the v2.3.0 release, correct? A quick way to know if the polling change is in place is to include s3.list_blocks_concurrency in your config. This was introduced with the polling change.

mi5guided · 2024-02-22T21:43:42Z

Yup - this was a bummer. I used an image from docker hub that was a little bit after the 2.3.0 release and it had this issue. Since I was new to Tempo, I thought I was screwing up the configuration with the s3 backend. Finally pieced together that the index.json.gz file was missing and that the compactor was responsible for creating/updating that file. I deployed 2.3.0-rc0 from docker hub and things work great!

zalegrala · 2024-02-23T20:22:41Z

We keep release branches so not all commits to main get released right away with the immediate next release. If you want to run only the released images, stick to the tagged versions but drop the leading v so just :2.3.0 for the image tag. I just tested out grafana/tempo:2.3.0 for example. We encourage you to read the release notes when updating as usual.

The upcoming 2.4.0 release will have the polling change and the fix. We've been running it with good results for the last few months.

Was this the same issue as originally reported? Was the image used was from main? Please correct me if I misunderstood.

github-actions · 2024-04-24T00:03:26Z

This issue has been automatically marked as stale because it has not had any activity in the past 60 days.
The next time this stale check runs, the stale label will be removed if there is new activity. The issue will be closed after 15 days if there is no new activity.
Please apply keepalive label to exempt this Issue.

aaashen closed this as completed Feb 7, 2024

aaashen reopened this Feb 7, 2024

github-actions bot added the stale Used for stale issues / PRs label Apr 24, 2024

github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale May 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Failed to query trace in s3 storage, and index.json.gz has not been updated for a long time. #3369

Failed to query trace in s3 storage, and index.json.gz has not been updated for a long time. #3369

aaashen commented Feb 6, 2024

joe-elliott commented Feb 6, 2024

zalegrala commented Feb 6, 2024

aaashen commented Feb 7, 2024

aaashen commented Feb 7, 2024

zalegrala commented Feb 7, 2024

mi5guided commented Feb 22, 2024

zalegrala commented Feb 23, 2024

github-actions bot commented Apr 24, 2024

Failed to query trace in s3 storage, and index.json.gz has not been updated for a long time. #3369

Failed to query trace in s3 storage, and index.json.gz has not been updated for a long time. #3369

Comments

aaashen commented Feb 6, 2024

joe-elliott commented Feb 6, 2024

zalegrala commented Feb 6, 2024

aaashen commented Feb 7, 2024

aaashen commented Feb 7, 2024

zalegrala commented Feb 7, 2024

mi5guided commented Feb 22, 2024

zalegrala commented Feb 23, 2024

github-actions bot commented Apr 24, 2024