Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle prefixes when listing blocks from S3 and GCS #3466

Merged
merged 14 commits into from
Mar 14, 2024

Conversation

bpfoster
Copy link
Contributor

@bpfoster bpfoster commented Mar 5, 2024

What this PR does: Adjusts the logic to list blocks from S3 and GCS to consider object path prefix.
You can see that tempodb/backend/s3/s3.go would ignore any results whose key, upon splitting by /, did not have 3 elements. When a prefix is configured, e.g. a/b/c/, the returned key will instead be something like a/b/c/<tenantID>/<blockID>/meta. Stip the prefix off this path before pulling out those 3 useful bits from the path.

Which issue(s) this PR fixes:
Fixes #3465

Checklist

  • Tests updated
  • Documentation added
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

@CLAassistant
Copy link

CLAassistant commented Mar 5, 2024

CLA assistant check
All committers have signed the CLA.

@joe-elliott
Copy link
Member

joe-elliott commented Mar 5, 2024

ok, this is undoubtedly related to the polling improvements we made in 2.4.0. thx for looking at this. it is likely broken in azure/gcs as well.

@zalegrala can you review?

@bpfoster bpfoster changed the title Handle prefixes when listing blocks from S3 Handle prefixes when listing blocks from S3 and GCS Mar 6, 2024
@bpfoster
Copy link
Contributor Author

bpfoster commented Mar 6, 2024

Yes it looks like you were right that it was introduced in 2.4.0. Running 2.3.1 looks like the compactor is cleaning up as expected.

Though I don't have Azure or GCS accounts to do integration testing, I have added to this PR unit tests covering these cases for both that are similar to S3. As you expected, the issue is present in the GCS implementation (and I have added a fix for it to this PR), but looks like the Azure implementations already are correctly stripping the prefix.

Copy link
Contributor

@zalegrala zalegrala left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the PR, and for fixing all the backends. It might be nice to have an e2e test for this, but it digging in there is a little more work. If you're willing that would be great, but if its too much, I won't push. This is already a good step.

Part of the testing for the work that you are fixing was integrated into a test here to specifically test the poller with the various backends. It might be simple enough to update the config to include a prefix. We'll want to test with and without a prefix.

https://github.com/grafana/tempo/blob/main/integration/poller/poller_test.go#L45

Alternatively we have the main e2e tests that could be adjusted to include prefix testing.

https://github.com/grafana/tempo/blob/main/integration/e2e/e2e_test.go#L42

Probably duplicating the above with a config change would work.

CHANGELOG.md Show resolved Hide resolved
tempodb/backend/azure/azure_test.go Outdated Show resolved Hide resolved
@bpfoster
Copy link
Contributor Author

Thanks @zalegrala . I'll address your comments and then take a look at the integrations tests and see if I can make something happen there.

@bpfoster
Copy link
Contributor Author

@zalegrala - please see the new commits when you have a chance.

  • Addressed your comment with the unit test regarding checking block IDs
  • Updated both the poller_test and e2e_test to exercise configuration with and without prefixes. Feel free to choose both/either/neither :) There are also other ways we could handle the test permutations, so let me know if you're not a fan of how I incorporated those changes into the tests.

Both the integrations tests confirm that S3 and GCS backends fail when configured with a prefix in v2.4.0 1.
Both confirm that Azure is in fact unaffected.
Both confirm that S3 and GCS are fixed in this PR.

I did slightly refactor my changes to the S3 and GCS backends to be more in line with the Azure backends. This has the added benefit over the prior iteration of working when the configured prefix does not end in a /. This is covered here in prefix test cases in both integration tests.

Given that Azure has been verified in the integration tests, I believe we can resolve the discussion on CHANGELOG.md?

Footnotes

  1. I don't actually have a hard failure in the e2e test against v2.4.0. I gave up waiting on the test to complete after 30 minutes (stuck on the tempo.WaitSumMetricsWithOptions() line). I believe that the failure to complete, along with logs always showing metas=0, is a symptom of the issue - no blocks ever returned.

@zalegrala
Copy link
Contributor

Thank you for the updates, I've approved the CI. I think the only concern I have is the one you mention, which is the amount of time it takes for the e2e tests. The change to the poller_test looks good and I appreciate all the permutations, but perhaps we reduce the permutations on the e2e test as a matter of pragmatism. The e2e tests take much longer to run generally.

I appreciate you confirming the Azure code with this test also. Yes, I believe you are correct about the CHANGELOG. I checked the code of both azure versions and it looks like this case is handled, so just the two you are fixing were missed.

I see a couple failures, but I'll check out the code and run locally to take a look. Let's see what the rest of the CI checks look like.

@bpfoster
Copy link
Contributor Author

bpfoster commented Mar 14, 2024

I addressed the format error, fixed the failing e2e tests, and removed 2/4 of the prefix test case permutations from the e2e test.

It looks like all these are passing the CI checks now. The lint job is failing in the last run, but this looks more like a CI job issue, I don't think any changes here would cause the errors being reported.

A local run of golangci-lint run --new-from-rev=origin/main reports 1 error but it is from existing code that just had indentation changed.

integration/e2e/e2e_test.go:113:50: string `/api/echo` has 5 occurrences, make it a constant (goconst)
				assertEcho(t, "http://"+tempo.Endpoint(3200)+"/api/echo")

Copy link
Contributor

@zalegrala zalegrala left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me. Thank you for the contribution.

@zalegrala zalegrala merged commit 8e6e7fe into grafana:main Mar 14, 2024
14 checks passed
@bpfoster bpfoster deleted the compactor-s3-prefix branch March 15, 2024 11:49
@joe-elliott joe-elliott added type/bug Something isn't working backport release-v2.4 labels Mar 19, 2024
Copy link
Contributor

The backport to release-v2.4 failed:

The process '/usr/bin/git' failed with exit code 1

To backport manually, run these commands in your terminal:

# Fetch latest updates from GitHub
git fetch
# Create a new branch
git switch --create backport-3466-to-release-v2.4 origin/release-v2.4
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x 8e6e7fe86f79d1e02972d0d01b35174d25237f81

When the conflicts are resolved, stage and commit the changes:

git add . && git cherry-pick --continue

If you have the GitHub CLI installed:

# Push the branch to GitHub:
git push --set-upstream origin backport-3466-to-release-v2.4
# Create the PR body template
PR_BODY=$(gh pr view 3466 --json body --template 'Backport 8e6e7fe86f79d1e02972d0d01b35174d25237f81 from #3466{{ "\n\n---\n\n" }}{{ index . "body" }}')
# Create the PR on GitHub
echo "${PR_BODY}" | gh pr create --title "[release-v2.4] Handle prefixes when listing blocks from S3 and GCS" --body-file - --label "type/bug" --label "backport" --base release-v2.4 --milestone release-v2.4 --web

Or, if you don't have the GitHub CLI installed (we recommend you install it!):

# Push the branch to GitHub:
git push --set-upstream origin backport-3466-to-release-v2.4

# Create a pull request where the `base` branch is `release-v2.4` and the `compare`/`head` branch is `backport-3466-to-release-v2.4`.

# Remove the local backport branch
git switch main
git branch -D backport-3466-to-release-v2.4

joe-elliott pushed a commit that referenced this pull request Mar 19, 2024
* Handle prefixes when listing blocks from S3

fixes #3465

* Handle prefixes when listing blocks from GCS

* Add test for prefixes when listing blocks from Azure

* Update unit tests to check for actual block IDs instead of just length of the slices

Cleanup unit tests

* Further refine S3/GCS backend for ListBlocks

Brings logic more in line with Azure object parsing.
Also has the benefit of handling prefixes without a trailing slash.

* Update poller integration test to exercise prefixes

* Update e2e test to exercise prefixes

* Fix format check error

* Fix failing e2e tests

* Remove unnecessary prefix permutations from e2e test

* Remove unnecessary test config file copy

* Ignore lint

---------

Co-authored-by: Zach Leslie <[email protected]>
(cherry picked from commit 8e6e7fe)
joe-elliott added a commit that referenced this pull request Mar 19, 2024
* Handle prefixes when listing blocks from S3

fixes #3465

* Handle prefixes when listing blocks from GCS

* Add test for prefixes when listing blocks from Azure

* Update unit tests to check for actual block IDs instead of just length of the slices

Cleanup unit tests

* Further refine S3/GCS backend for ListBlocks

Brings logic more in line with Azure object parsing.
Also has the benefit of handling prefixes without a trailing slash.

* Update poller integration test to exercise prefixes

* Update e2e test to exercise prefixes

* Fix format check error

* Fix failing e2e tests

* Remove unnecessary prefix permutations from e2e test

* Remove unnecessary test config file copy

* Ignore lint

---------

Co-authored-by: Zach Leslie <[email protected]>
(cherry picked from commit 8e6e7fe)

Co-authored-by: Ben Foster <[email protected]>
joe-elliott pushed a commit to joe-elliott/tempo that referenced this pull request Mar 19, 2024
* Handle prefixes when listing blocks from S3

fixes grafana#3465

* Handle prefixes when listing blocks from GCS

* Add test for prefixes when listing blocks from Azure

* Update unit tests to check for actual block IDs instead of just length of the slices

Cleanup unit tests

* Further refine S3/GCS backend for ListBlocks

Brings logic more in line with Azure object parsing.
Also has the benefit of handling prefixes without a trailing slash.

* Update poller integration test to exercise prefixes

* Update e2e test to exercise prefixes

* Fix format check error

* Fix failing e2e tests

* Remove unnecessary prefix permutations from e2e test

* Remove unnecessary test config file copy

* Ignore lint

---------

Co-authored-by: Zach Leslie <[email protected]>
joe-elliott added a commit that referenced this pull request Mar 20, 2024
* log request

Signed-off-by: Joe Elliott <[email protected]>

* move stuff a bit

Signed-off-by: Joe Elliott <[email protected]>

* oh my. e2e tests pass

Signed-off-by: Joe Elliott <[email protected]>

* add handlers

Signed-off-by: Joe Elliott <[email protected]>

* streaming tags

Signed-off-by: Joe Elliott <[email protected]>

* add cli support

Signed-off-by: Joe Elliott <[email protected]>

* improve logging

Signed-off-by: Joe Elliott <[email protected]>

* fix

Signed-off-by: Joe Elliott <[email protected]>

* docs

Signed-off-by: Joe Elliott <[email protected]>

* pipe overrides

Signed-off-by: Joe Elliott <[email protected]>

* cleanup

Signed-off-by: Joe Elliott <[email protected]>

* cleanup

Signed-off-by: Joe Elliott <[email protected]>

* support limits

Signed-off-by: Joe Elliott <[email protected]>

* docs

Signed-off-by: Joe Elliott <[email protected]>

* e2e tests and caching

Signed-off-by: Joe Elliott <[email protected]>

* key prefixes

Signed-off-by: Joe Elliott <[email protected]>

* cache keys

Signed-off-by: Joe Elliott <[email protected]>

* Fixed distinct collection in combiners

Signed-off-by: Joe Elliott <[email protected]>

* fixed combiner bugs and revived tests

Signed-off-by: Joe Elliott <[email protected]>

* restored all tests

Signed-off-by: Joe Elliott <[email protected]>

* lint

Signed-off-by: Joe Elliott <[email protected]>

* made search handler utilities generic

Signed-off-by: Joe Elliott <[email protected]>

* Added handler tests for tags

Signed-off-by: Joe Elliott <[email protected]>

* add diff support

Signed-off-by: Joe Elliott <[email protected]>

* lint

Signed-off-by: Joe Elliott <[email protected]>

* add distinct value collector tests

Signed-off-by: Joe Elliott <[email protected]>

* fix integration tests

Signed-off-by: Joe Elliott <[email protected]>

* diff tests

Signed-off-by: Joe Elliott <[email protected]>

* swapped query for the more robust ExtractMatchers(query)

Signed-off-by: Joe Elliott <[email protected]>

* tests

Signed-off-by: Joe Elliott <[email protected]>

* moved e2e tests to a more sensible place

Signed-off-by: Joe Elliott <[email protected]>

* fix non-deterministic  test

Signed-off-by: Joe Elliott <[email protected]>

* changelog

Signed-off-by: Joe Elliott <[email protected]>

* fix tests for 429 handling

Signed-off-by: Joe Elliott <[email protected]>

* Update docs/sources/tempo/operations/tempo_cli.md

Co-authored-by: Mario <[email protected]>

* review

Signed-off-by: Joe Elliott <[email protected]>

* Update docs/sources/tempo/api_docs/_index.md

Co-authored-by: Kim Nylander <[email protected]>

* Correctly cancel GRPC context beneath the HTTP server (#3443)

* cancel context

Signed-off-by: Joe Elliott <[email protected]>

* update dskit

Signed-off-by: Joe Elliott <[email protected]>

* focused timeouts

Signed-off-by: Joe Elliott <[email protected]>

* docs

Signed-off-by: Joe Elliott <[email protected]>

* lint N docs

Signed-off-by: Joe Elliott <[email protected]>

* more lint

Signed-off-by: Joe Elliott <[email protected]>

* make update-mod

Signed-off-by: Joe Elliott <[email protected]>

---------

Signed-off-by: Joe Elliott <[email protected]>

* Bump anchore/sbom-action from 0.15.8 to 0.15.9 (#3476)

Bumps [anchore/sbom-action](https://github.com/anchore/sbom-action) from 0.15.8 to 0.15.9.
- [Release notes](https://github.com/anchore/sbom-action/releases)
- [Commits](anchore/sbom-action@v0.15.8...v0.15.9)

---
updated-dependencies:
- dependency-name: anchore/sbom-action
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Doc update (#3482)

* doc: remove reference to previously purged script

* doc: correct label for docs updates

* [TraceQL Metrics] Use new per-tenant max_metrics_duration and fix duration check (#3484)

* Use new per-tenant max_metrics_duration, and fix duration timestamp handling

* Update docs and defaults

* Handle prefixes when listing blocks from S3 and GCS (#3466)

* Handle prefixes when listing blocks from S3

fixes #3465

* Handle prefixes when listing blocks from GCS

* Add test for prefixes when listing blocks from Azure

* Update unit tests to check for actual block IDs instead of just length of the slices

Cleanup unit tests

* Further refine S3/GCS backend for ListBlocks

Brings logic more in line with Azure object parsing.
Also has the benefit of handling prefixes without a trailing slash.

* Update poller integration test to exercise prefixes

* Update e2e test to exercise prefixes

* Fix format check error

* Fix failing e2e tests

* Remove unnecessary prefix permutations from e2e test

* Remove unnecessary test config file copy

* Ignore lint

---------

Co-authored-by: Zach Leslie <[email protected]>

* Update doc-validator.yml (#3483)

Updates the doc-validator to the latest version. Note that this changes the reference format to use the full URL (https://....) instead of /docs/blah

* [DOC] Document Tempo Operator Monolithic mode (#3474)

* [DOC] Document Tempo Operator Monolithic mode

Signed-off-by: Andreas Gerstmayr <[email protected]>

* clarify supported storages

Signed-off-by: Andreas Gerstmayr <[email protected]>

* fix case of title

Signed-off-by: Andreas Gerstmayr <[email protected]>

* Apply suggestions from code review

* Apply suggestions from code review

---------

Signed-off-by: Andreas Gerstmayr <[email protected]>
Co-authored-by: Kim Nylander <[email protected]>

* [DOC] document Grafana data source setup using Grafana and Tempo operators (#3473)

* [docs] document Grafana data source setup using Grafana and Tempo operators

* move the Grafana data source setup page to the operator folder (this
  page is only relevant for the operator)
* document Grafana data source setup using Grafana and Tempo operators

Signed-off-by: Andreas Gerstmayr <[email protected]>

* Apply suggestions from code review

---------

Signed-off-by: Andreas Gerstmayr <[email protected]>
Co-authored-by: Kim Nylander <[email protected]>

* [DOC] fix typo in setup/operator/monolithic.md (#3496)

Signed-off-by: Andreas Gerstmayr <[email protected]>

* Bump github.com/prometheus/client_golang from 1.18.0 to 1.19.0 (#3455)

* Bump github.com/prometheus/client_golang from 1.18.0 to 1.19.0

Bumps [github.com/prometheus/client_golang](https://github.com/prometheus/client_golang) from 1.18.0 to 1.19.0.
- [Release notes](https://github.com/prometheus/client_golang/releases)
- [Changelog](https://github.com/prometheus/client_golang/blob/main/CHANGELOG.md)
- [Commits](prometheus/client_golang@v1.18.0...v1.19.0)

---
updated-dependencies:
- dependency-name: github.com/prometheus/client_golang
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <[email protected]>

* Update serverless gomod

---------

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: grafanabot <[email protected]>

* Add support for dashes, quotes and spaces in attribute names (#3458)

* Add support for dashes, quotes and spaces in attribute names

* chlog

* [TraceQL Metrics] Step align query_range time range (#3490)

* Step align query_range time range

* Time range error: improve message and fix format for prom format.

* oops remove printlns

* lint

* changelog

* 2.4.1 changelog (#3503)

Signed-off-by: Joe Elliott <[email protected]>

* [DOC] Add 2.4.1 release notes (#3504)

* fix tests due to interface changing

Signed-off-by: Joe Elliott <[email protected]>

* Pass context

Signed-off-by: Joe Elliott <[email protected]>

---------

Signed-off-by: Joe Elliott <[email protected]>
Signed-off-by: dependabot[bot] <[email protected]>
Signed-off-by: Andreas Gerstmayr <[email protected]>
Co-authored-by: Mario <[email protected]>
Co-authored-by: Kim Nylander <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Matt Simonsen <[email protected]>
Co-authored-by: Martin Disibio <[email protected]>
Co-authored-by: Ben Foster <[email protected]>
Co-authored-by: Zach Leslie <[email protected]>
Co-authored-by: Andreas Gerstmayr <[email protected]>
Co-authored-by: grafanabot <[email protected]>
@bpfoster bpfoster mentioned this pull request Apr 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Retention deletion does not work on S3 or GCS when a prefix is configured
4 participants