Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve Polling Performance at high block counts #2521

Closed
joe-elliott opened this issue May 31, 2023 · 14 comments · Fixed by #2652
Closed

Improve Polling Performance at high block counts #2521

joe-elliott opened this issue May 31, 2023 · 14 comments · Fixed by #2652
Assignees

Comments

@joe-elliott
Copy link
Member

joe-elliott commented May 31, 2023

As Tempo approaches 1M+ blocks the TCO and performance of polling becomes negatively impactful. Polling operates using the following steps.

Queriers and query-frontends:

  1. List all tenants
  2. Get <tenant>/index.json.gz for each tenant

Compactors:

  1. List all tenants
  2. If I am not the tenant index builder for this tenant
    • Get <tenant>/index.json.gz for the tenant
  3. If I am the tenant index builder for this tenant
    • List all blocks beneath the tenant: <tenant>/*
    • For each block get <tenant>/<guid>/meta.json. If this 404s get <tenant>/<guid>/meta.compacted.json
    • Use the above information to build a new tenant index
    • Put <tenant>/index.json.gz

The cost in time and api calls of getting all of the meta.json's and meta.compacted.json's adds up considerably. Given that meta.json and meta.compacted.json are immutable we don't need to GET them individually and we need to find a way to reduce these calls.

@joe-elliott
Copy link
Member Author

One possible solution is to find a way with each backend to list the meta and compacted meta's directly. If this is possible then step 3 above would look like:

  1. If I am the tenant index builder for this tenant
    • List all meta.jsons for this tenant: <tenant>/*/meta.json
    • List all compacted meta.jsons for this tenant: <tenant>/*/meta.compacted.json
    • Compare the above lists to the current list in <tenant>/index.json.gz
    • Update and PUT the new <tenant>/index.json.gz based on the above lists. This will require GETing only new metas and compacted metas.

Backend support:

  • Local storage can fall back to the old behavior. This does not need support to scaling to 1M+.
  • GCS has direct support
  • I believe that the s3 list call is already returning these objects and we are filtering down to blocks by using CommonPrefixes. This may simply require a change to our own list operation to support "wildcard" support.
  • Something similar may be possible with Azure. We seem to have similar functionality in the list call. If we have to fall back to the old behavior for azure I would be ok with that. This change is more critical for s3/gcs for us.

@mdisibio
Copy link
Contributor

mdisibio commented Jun 6, 2023

Another possible solution: when a compactor rebuilds an index it already has the blocklist in memory from 5 minutes ago (default polling interval), eliminate work by not re-reading the meta for blocks it already has. Get the new list of block IDs (1 backend List call), compare with the previous list to detect new and deleted blocks. When a block is compacted (meta.json -> meta.compacted.json) this wouldn't show up as a delta in the list of block IDs, but we can solve it by recording the obsoleted block IDs in the meta.json for the new block. Then when the new block is detected and we read the meta, we know which old blocks are now compacted, and can update accordingly.

Review of steps:

  1. On first poll resort to original behavior (full scan)
  2. On subsequent call: list block IDs for tenant
  3. Compare with existing list in-memory. New entries means new block. Missing entry means deleted block.
  4. Read meta for new blocks.
  5. Meta of new blocks contains a new field compactedBlocks which are the block IDs that were used to create this new block. Update those entries from "live" to "compacted".
  6. If an error occurs fallback to original behavior (full scan)

This approach doesn't require any new behavior from the backend and maintains current compatibility. There may be some more edge cases to think about but overall is promising.

@zalegrala zalegrala self-assigned this Jun 29, 2023
@zalegrala
Copy link
Contributor

zalegrala commented Jun 30, 2023

I like the idea of reducing load on the API calls by reusing the data from the last poll. Currently the poller doesn't have the details of the last poll, but passing in the last blocklist to the poller seems like it would work to perform the comparison of new blocks to old blocks.

I'm not yet clear on where to make adjustments to include a new field compactedBlocks on the blockMeta. I'd like to see where in the code we might slot that in. In the encoding interfaces, I see where new blocks are created, but I don't see the combining of multiple blocks into one during compaction so that we know the history block list. If I understand the suggestion correctly, during compaction, we should be able to take the list of block ids that we are combining into one block to record on the newly compacted block meta. Then when we list all the new blocks, we'll need to read the meta (and the compacted meta) to identify the blocks in the previous polling list that can be removed from the block list we are about to write. Does that sound workable?

I'll continue to dig on the compaction for the combining logic to see where we might make a change.

@joe-elliott
Copy link
Member Author

but passing in the last blocklist to the poller seems like it would work to perform the comparison of new blocks to old blocks.

The previous blocklist can't be trusted b/c any given meta.json may have been moved to compacted. Also, any meta.compacted.json may have been deleted. This is why I was looking for a way to get this information from the the list operation instead of individual GETs.

I'm not yet clear on where to make adjustments to include a new field compactedBlocks on the blockMeta.

I'd prefer looking at using the list operation first. It requires less state management and code changes if it works.

@zalegrala
Copy link
Contributor

I think the modifications to the List() operation make sense, or perhaps a ListWildcard, but would prefer to amend the existing interface. I think I'm good to proceed on this with the suggestions and would like to get some thoughts down today so I can hit the ground running next week. I think azure might also support wildcard.

I wanted to explore Marty's idea on the issue, but after reviewing the Compact() methods in the encoders, I realize there is lots of complexity in there and changes would need to be made to each encoder in order to know which source IDs were used to create the new block.

@zalegrala
Copy link
Contributor

zalegrala commented Jul 14, 2023

I've got #2652 for my work in progress.

I've got this deployed in an environment with about 125k blocks and a couple thousand tenants, concurrence turned off and compaction disabled. It looks like the blocklist poll duration actually goes up since the list call is now doing more work. We still save on the Get calls, but overall duration seems worse. I'll revert to latest in this environment and observe it for a few hours to confirm.

I suspect that this may still be acceptable due to the reduction in get calls for the meta files.

@zalegrala
Copy link
Contributor

I've been making progress on #2652 over the last couple of weeks. Joe and I had a chat last week about paralleling the effort and so I've made some adjustments to include that, but in the process broke a bunch of tests that I've been chasing down since. I'm now back to the point where I have an image running for testing, and just working on the last couple of items to get the e2e tests to pass. I believe this is nearly ready for another review.

I'm expecting one point of contention in the review that perhaps we can discuss as a team, which is the inclusion of a HasFeature() method on the RawReader. Koenraad mentions this might also be useful for checking which backend supports versioning, and so maybe it won't be an issue, but would like to get some thoughts on this.

@zalegrala
Copy link
Contributor

We've reached some disagreement about how to proceed. I'm working up a document to put in front of the team to gather feedback and ideally an agreement about how to move forward.

@amurray2306
Copy link

Hey @zalegrala any updates on this issue? Its related to a PIR action so just want to check if anything has moved forward.

@amurray2306
Copy link

Also quick question I'm recording the priority of issues relating to issues created from PIRs. What would you have this at? High Medium or low

@mapno mapno removed the postmortem label Oct 6, 2023
@amurray2306
Copy link

@zalegrala did you reach a consensus on this since our last ping? Did you manage to put the doc together your mentioned?

@zalegrala
Copy link
Contributor

zalegrala commented Oct 26, 2023

@amurray2306 Apologies, I've not done a great job about keeping this up to date. I'm also perpetually behind on github notifications. This is my quarterly deliverable and expect it to ship soon. I'm not sure how this slots in to your efforts, so let me know what I can help with.

At the mention of a doc, we opted for a meeting instead where we discussed the approaches and came to some agreement about how to proceed. Since then, the image has been tested and run in our ops environment with great success. I believe we are close to finalizing the PR here, with the last few iterations of review mostly about test coverage and style.

@amurray2306
Copy link

Awesome news congrats :D. Thanks for the context ill update my notes

@zalegrala
Copy link
Contributor

The PR for this work has been merged and will roll out to our environments in the next couple weeks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

5 participants