-
Notifications
You must be signed in to change notification settings - Fork 510
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve Polling Performance at high block counts #2521
Comments
One possible solution is to find a way with each backend to list the meta and compacted meta's directly. If this is possible then step 3 above would look like:
Backend support:
|
Another possible solution: when a compactor rebuilds an index it already has the blocklist in memory from 5 minutes ago (default polling interval), eliminate work by not re-reading the meta for blocks it already has. Get the new list of block IDs (1 backend List call), compare with the previous list to detect new and deleted blocks. When a block is compacted (meta.json -> meta.compacted.json) this wouldn't show up as a delta in the list of block IDs, but we can solve it by recording the obsoleted block IDs in the meta.json for the new block. Then when the new block is detected and we read the meta, we know which old blocks are now compacted, and can update accordingly. Review of steps:
This approach doesn't require any new behavior from the backend and maintains current compatibility. There may be some more edge cases to think about but overall is promising. |
I like the idea of reducing load on the API calls by reusing the data from the last poll. Currently the poller doesn't have the details of the last poll, but passing in the last blocklist to the poller seems like it would work to perform the comparison of new blocks to old blocks. I'm not yet clear on where to make adjustments to include a new field I'll continue to dig on the compaction for the combining logic to see where we might make a change. |
The previous blocklist can't be trusted b/c any given meta.json may have been moved to compacted. Also, any meta.compacted.json may have been deleted. This is why I was looking for a way to get this information from the the list operation instead of individual GETs.
I'd prefer looking at using the list operation first. It requires less state management and code changes if it works. |
I think the modifications to the List() operation make sense, or perhaps a ListWildcard, but would prefer to amend the existing interface. I think I'm good to proceed on this with the suggestions and would like to get some thoughts down today so I can hit the ground running next week. I think azure might also support wildcard. I wanted to explore Marty's idea on the issue, but after reviewing the Compact() methods in the encoders, I realize there is lots of complexity in there and changes would need to be made to each encoder in order to know which source IDs were used to create the new block. |
I've got #2652 for my work in progress. I've got this deployed in an environment with about 125k blocks and a couple thousand tenants, concurrence turned off and compaction disabled. It looks like the blocklist poll duration actually goes up since the list call is now doing more work. We still save on the Get calls, but overall duration seems worse. I'll revert to I suspect that this may still be acceptable due to the reduction in get calls for the meta files. |
I've been making progress on #2652 over the last couple of weeks. Joe and I had a chat last week about paralleling the effort and so I've made some adjustments to include that, but in the process broke a bunch of tests that I've been chasing down since. I'm now back to the point where I have an image running for testing, and just working on the last couple of items to get the e2e tests to pass. I believe this is nearly ready for another review. I'm expecting one point of contention in the review that perhaps we can discuss as a team, which is the inclusion of a |
We've reached some disagreement about how to proceed. I'm working up a document to put in front of the team to gather feedback and ideally an agreement about how to move forward. |
Hey @zalegrala any updates on this issue? Its related to a PIR action so just want to check if anything has moved forward. |
Also quick question I'm recording the priority of issues relating to issues created from PIRs. What would you have this at? High Medium or low |
@zalegrala did you reach a consensus on this since our last ping? Did you manage to put the doc together your mentioned? |
@amurray2306 Apologies, I've not done a great job about keeping this up to date. I'm also perpetually behind on github notifications. This is my quarterly deliverable and expect it to ship soon. I'm not sure how this slots in to your efforts, so let me know what I can help with. At the mention of a doc, we opted for a meeting instead where we discussed the approaches and came to some agreement about how to proceed. Since then, the image has been tested and run in our ops environment with great success. I believe we are close to finalizing the PR here, with the last few iterations of review mostly about test coverage and style. |
Awesome news congrats :D. Thanks for the context ill update my notes |
The PR for this work has been merged and will roll out to our environments in the next couple weeks. |
As Tempo approaches 1M+ blocks the TCO and performance of polling becomes negatively impactful. Polling operates using the following steps.
Queriers and query-frontends:
<tenant>/index.json.gz
for each tenantCompactors:
<tenant>/index.json.gz
for the tenant<tenant>/*
<tenant>/<guid>/meta.json
. If this 404s get<tenant>/<guid>/meta.compacted.json
<tenant>/index.json.gz
The cost in time and api calls of getting all of the meta.json's and meta.compacted.json's adds up considerably. Given that meta.json and meta.compacted.json are immutable we don't need to GET them individually and we need to find a way to reduce these calls.
The text was updated successfully, but these errors were encountered: