Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf(blooms): Remove compression of .tar archived bloom blocks #14159

Merged
merged 8 commits into from
Sep 20, 2024

Conversation

chaudum
Copy link
Contributor

@chaudum chaudum commented Sep 18, 2024

What this PR does / why we need it:

Decompression is a CPU intensive task, especially un-gzipping. The gain of compressing a tar archive of storage optimized binary blocks is neglectable (question: is it?).

In this example, the block of ~170MiB is ~3.3MiB bigger when not compressed, which is a ratio of ~2%

$ ls -las 2bc017f79711e12a-2bffc5dcc0e8e964_1726004114913-1726106283939-bc42f529.tar   
177048 -rw-rw-r-- 1 christian christian 181293056 Sep 18 13:49 2bc017f79711e12a-2bffc5dcc0e8e964_1726004114913-1726106283939-bc42f529.tar

$ ls -las 2bc017f79711e12a-2bffc5dcc0e8e964_1726004114913-1726106283939-bc42f529.tar.gz
173732 -rw-rw-r-- 1 christian christian 177897689 Sep 18 13:49 2bc017f79711e12a-2bffc5dcc0e8e964_1726004114913-1726106283939-bc42f529.tar.gz

$ qalc '(181293056 - 177897689) / 1024/ 1024'
((181293056177897689) / 1024) / 10243.238074303

$ qalc '181293056 / 177897689'
181293056 / 1778976891.019086066

Breaking change

⚠️ This PR introduces a breaking change of yet unreleased code, because the key (suffix) of blocks on object storage change.

This is less of an issue, because there has not been a release of the new structured metatada blooms yet. Anyone using a Loki version from main after commit a2fbaa8 is affected.

Special notes for your reviewer:

CPU profile from a time period where blocks have been downloaded and extracted.

screenshot_20240918_134030

Further discussion:

Adding the correct file type extension as suffix to the key in object storage makes any change to compression a breaking change, unless the GetBlock() call tries multiple different keys with different suffixes. That could be a rather hacky option to keep backwards compatibility, but it also introduces more complexity in various areas whenever the Addr() of a BlockRef needs to be resolved.

Another option would be to additionally store the compression algorithm into the BlockRef struct.

Update

After some consideration, we decided to store the encoding of the bloom block in the BlockRef. This means, that the changes in this PR do not break compatibility with existing blocks compressed with gzip, although new blocks will not be compressed any more.
However, the PR adds support for different compression algorithms, such as gzip, snappy, lz4, flate, and zstd. Compression is not configurable yet.

Checklist

  • Reviewed the CONTRIBUTING.md guide (required)
  • Documentation added
  • Tests updated
  • Title matches the required conventional commits format, see here
    • Note that Promtail is considered to be feature complete, and future development for logs collection will be in Grafana Alloy. As such, feat PRs are unlikely to be accepted unless a case can be made for the feature actually being a bug fix to existing behavior.
  • Changes that require user attention or interaction to upgrade are documented in docs/sources/setup/upgrade/_index.md
  • For Helm chart changes bump the Helm chart version in production/helm/loki/Chart.yaml and update production/helm/loki/CHANGELOG.md and production/helm/loki/README.md. Example PR
  • If the change is deprecating or removing a configuration option, update the deprecated-config.yaml and deleted-config.yaml files respectively in the tools/deprecated-config-checker directory. Example PR

@chaudum chaudum requested a review from a team as a code owner September 18, 2024 11:56
@chaudum chaudum force-pushed the chaudum/remove-bloom-block-compression branch from e860e37 to af91bae Compare September 18, 2024 12:11
@chaudum chaudum marked this pull request as draft September 18, 2024 13:30
@chaudum chaudum force-pushed the chaudum/remove-bloom-block-compression branch 2 times, most recently from cffee48 to ddc2c71 Compare September 18, 2024 18:39
@chaudum chaudum force-pushed the chaudum/remove-bloom-block-compression branch from ddc2c71 to bcbe6c7 Compare September 18, 2024 18:52
@pull-request-size pull-request-size bot added size/L and removed size/M labels Sep 18, 2024
@chaudum chaudum force-pushed the chaudum/remove-bloom-block-compression branch from bcbe6c7 to e5cad45 Compare September 19, 2024 18:37
@chaudum chaudum changed the title perf(blooms)!: Remove compression of .tar archived bloom blocks perf(blooms): Remove compression of .tar archived bloom blocks Sep 19, 2024
@chaudum chaudum marked this pull request as ready for review September 19, 2024 18:40
@chaudum chaudum marked this pull request as draft September 19, 2024 18:47
Split up the TarGz() function into a Tar() and a TarGz() function where
the latter uses the former.

Same change for the UnTarGz().

Signed-off-by: Christian Haudum <[email protected]>
Decompression is a CPU intensive task, especially un-gzipping. The gain
of compressing a tar archive of storage optimized binary blocks is
rather neglectable.

Signed-off-by: Christian Haudum <[email protected]>
Signed-off-by: Christian Haudum <[email protected]>
Signed-off-by: Christian Haudum <[email protected]>
@chaudum chaudum force-pushed the chaudum/remove-bloom-block-compression branch from 88091b8 to b591293 Compare September 20, 2024 09:40
Signed-off-by: Christian Haudum <[email protected]>
@chaudum chaudum marked this pull request as ready for review September 20, 2024 11:42
Copy link
Member

@rfratto rfratto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just optional nits

@@ -58,7 +56,7 @@ func (e Encoding) String() string {
case EncZstd:
return "zstd"
default:
return "unknown"
panic(fmt.Sprintf("invalid encoding: %d, supported: %s", e, SupportedEncoding()))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Panic on a call to String makes me a bit nervous since String can be implicitly called by different standard library functions. Normally in cases like this I would return fmt.Sprintf("Encoding(%d)", e) so you can see the value.

But if you think the risk is low, it's fine to keep it like this 👍

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reverted my change

EndTimestamp: md.Series.ThroughTs,
Checksum: md.Checksum,
},
func newRefFrom(tenant, table string, md v1.BlockMetadata) Ref {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: newRefFrom is only called once, and it's to immediately pass the ref into newBlockRefWithEncoding; should the functions be combined?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did the separation of concerns and composition of the functions intentionally. No strong opinions, though.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I don't feel strongly about it either, Let's keep what you did IMO

Signed-off-by: Christian Haudum <[email protected]>
@chaudum chaudum merged commit cdf084f into main Sep 20, 2024
62 checks passed
@chaudum chaudum deleted the chaudum/remove-bloom-block-compression branch September 20, 2024 13:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants