Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tempo/Vulture: GetObject S3 API operation - costs. #3700

Closed
lukasmrtvy opened this issue May 22, 2024 · 2 comments · Fixed by #3874
Closed

Tempo/Vulture: GetObject S3 API operation - costs. #3700

lukasmrtvy opened this issue May 22, 2024 · 2 comments · Fixed by #3874

Comments

@lukasmrtvy
Copy link

lukasmrtvy commented May 22, 2024

Hey,
This one is probably not a bug, but a finding that the Vulture is quite expensive regarding S3 costs.

Adding the screenshot from AWS Cost Explorer ( May-19: Vulture was deployed , May-21: Vulture was undeployed ). The difference is a whopping ~$97 per day. This was tested in an environment with 1-2 ppl not actively querying the Tempo.

vulture

At the same time, I find Vulture useful as a consistency-checking tool, but honestly, I am not sure if it's worth the price, Thoughts?

Tempo ( distributed ): helm-chart 1.9.9
Vulture: helm-chart 0.4.1

Thanks

EDIT:
The cost for May-21 StandarStorage is probably not correct ( It will be ~$20 ), it takes some time to propagate correctly in AWS, but still it's a huge difference.

Updated:
vulture2

@joe-elliott
Copy link
Member

I honestly have no idea how much vulture costs per day. Two options that may help reduce spend:

Increase the time between calls by using these params:

https://github.com/grafana/tempo/blob/main/cmd/tempo-vulture/main.go#L71-L72

Add a bloom/footer cache to reduce GETs on trace by id lookup.

@bmteller
Copy link
Contributor

bmteller commented Jun 6, 2024

We noticed a similar issue with some of our internal tooling which was doing trace-id lookup and if you know the timestamp where the trace occurs then you can significantly reduce the number of blocks that tempo will have to process by specifying a start and end window around the timestamp where the trace would be created. We have ~300 blocks so that would be 300 requests to s3 for all of the bloom filter lookups. When specifying a window +/- 20 minutes around the trace this reduced the number of blocks checked to 1. Though, it's always possible the window could straddle two blocks then it would check 2 blocks. These blocks were also in the past so I suspect for something like vulture using a window would mean the blocks would not be checked at all because the trace would not be committed to any block yet.

https://github.com/grafana/tempo/blob/main/cmd/tempo-vulture/main.go#L469

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants