-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
perf(blooms): Remove compression of .tar
archived bloom blocks
#14159
Conversation
e860e37
to
af91bae
Compare
cffee48
to
ddc2c71
Compare
ddc2c71
to
bcbe6c7
Compare
bcbe6c7
to
e5cad45
Compare
.tar
archived bloom blocks.tar
archived bloom blocks
Split up the TarGz() function into a Tar() and a TarGz() function where the latter uses the former. Same change for the UnTarGz(). Signed-off-by: Christian Haudum <[email protected]>
Decompression is a CPU intensive task, especially un-gzipping. The gain of compressing a tar archive of storage optimized binary blocks is rather neglectable. Signed-off-by: Christian Haudum <[email protected]>
Signed-off-by: Christian Haudum <[email protected]>
Signed-off-by: Christian Haudum <[email protected]>
Signed-off-by: Christian Haudum <[email protected]>
Signed-off-by: Christian Haudum <[email protected]>
88091b8
to
b591293
Compare
Signed-off-by: Christian Haudum <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, just optional nits
pkg/compression/encoding.go
Outdated
@@ -58,7 +56,7 @@ func (e Encoding) String() string { | |||
case EncZstd: | |||
return "zstd" | |||
default: | |||
return "unknown" | |||
panic(fmt.Sprintf("invalid encoding: %d, supported: %s", e, SupportedEncoding())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Panic on a call to String makes me a bit nervous since String can be implicitly called by different standard library functions. Normally in cases like this I would return fmt.Sprintf("Encoding(%d)", e)
so you can see the value.
But if you think the risk is low, it's fine to keep it like this 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I reverted my change
EndTimestamp: md.Series.ThroughTs, | ||
Checksum: md.Checksum, | ||
}, | ||
func newRefFrom(tenant, table string, md v1.BlockMetadata) Ref { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: newRefFrom is only called once, and it's to immediately pass the ref into newBlockRefWithEncoding; should the functions be combined?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did the separation of concerns and composition of the functions intentionally. No strong opinions, though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I don't feel strongly about it either, Let's keep what you did IMO
Signed-off-by: Christian Haudum <[email protected]>
What this PR does / why we need it:
Decompression is a CPU intensive task, especially un-gzipping. The gain of compressing a tar archive of storage optimized binary blocks is neglectable (question: is it?).
In this example, the block of ~170MiB is ~3.3MiB bigger when not compressed, which is a ratio of ~2%
Breaking change
This is less of an issue, because there has not been a release of the new structured metatada blooms yet. Anyone using a Loki version frommain
after commit a2fbaa8 is affected.Special notes for your reviewer:
CPU profile from a time period where blocks have been downloaded and extracted.
Further discussion:
Adding the correct file type extension as suffix to the key in object storage makes any change to compression a breaking change, unless the GetBlock() call tries multiple different keys with different suffixes. That could be a rather hacky option to keep backwards compatibility, but it also introduces more complexity in various areas whenever the Addr() of a BlockRef needs to be resolved.
Another option would be to additionally store the compression algorithm into the BlockRef struct.
Update
After some consideration, we decided to store the encoding of the bloom block in the
BlockRef
. This means, that the changes in this PR do not break compatibility with existing blocks compressed with gzip, although new blocks will not be compressed any more.However, the PR adds support for different compression algorithms, such as gzip, snappy, lz4, flate, and zstd. Compression is not configurable yet.
Checklist
CONTRIBUTING.md
guide (required)feat
PRs are unlikely to be accepted unless a case can be made for the feature actually being a bug fix to existing behavior.docs/sources/setup/upgrade/_index.md
production/helm/loki/Chart.yaml
and updateproduction/helm/loki/CHANGELOG.md
andproduction/helm/loki/README.md
. Example PRdeprecated-config.yaml
anddeleted-config.yaml
files respectively in thetools/deprecated-config-checker
directory. Example PR