-
Notifications
You must be signed in to change notification settings - Fork 24.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reintroduce compression for binary doc_values #112416
base: main
Are you sure you want to change the base?
Conversation
6d7ee62
to
ccc63be
Compare
ccc63be
to
a04558f
Compare
Hi @dnhatn, I've created a changelog YAML for you. |
Pinging @elastic/es-storage-engine (Team:StorageEngine) |
meta.writeByte(binaryDVCompressionMode.code); | ||
switch (binaryDVCompressionMode) { | ||
case NO_COMPRESS -> doAddUncompressedBinary(field, valuesProducer); | ||
case COMPRESSED_WITH_LZ4 -> doAddCompressedBinary(field, valuesProducer); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it make sense to compress with zstd?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see the numbers in the description.. Seems like zstd offers a substantial improvement over lz4 as usual, wonder how much risk that would bring here though..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I'm not sure why we're keeping zstd behind the feature flag. If we're okay with it, I can switch to zstd.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
zstd usage in stored fields is behind a feature flag and more specifically for get by id performance in the best speed scenario. Hopefully we can remove the feature flag soon after we have done a few more experiments with different setting for best speed mode.
I think in the case of binary doc values we should use zstd instead of lz4?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @martijnvg. I will switch this to zstd.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be worth checking how it affects queries/aggs that need binary doc values, e.g. maybe the geoshape track?
this.addresses = addresses; | ||
this.compressedData = compressedData; | ||
// pre-allocate a byte array large enough for the biggest uncompressed block needed. | ||
this.uncompressedBlock = new byte[biggestUncompressedBlockSize]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to be careful here. I have seen (when this was introduced) that this array could get unwildly big and then we can have big issues with humongous allocations. This is actually pretty dangerous.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
iirc as part of #105301 we tried to add compression to binary doc values, but then the same concern was raised as this one and we went with the approach that didn't do compression. Just in order to allow tsdb codecs to be used for all doc value fields.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's initialize the size to something like min(16kB, biggestUncompressedBlockSize)
and dynamically resize on read? This will still help small values by never having to resize the array in practice?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might result in OOMs I guess....
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My point here is that we are always adding the same number of doc values per block, regardless of the size of the binary doc values, so it can get pretty big. I think we should limit the block size so we can have different number of doc values per block.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general I am against the change as it is.
We currently add x number of docs per blocks regardless the size of the binary doc value which can lead in having very big blocks. We need to make sure those blocks have a limit in soze or this becomes very dangerous.
public ES87TSDBDocValuesFormat() { | ||
this(BinaryDVCompressionMode.NO_COMPRESS); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be change to COMPRESSED_WITH_LZ4
? Otherwise compression doesn't get used outside tests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch :)
Thanks, @iverase. I copied this from LUCENE-9211. That's a good point; I'll introduce a chunk size limit for it. |
Thanks @dnhatn, that will remove my concern here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems to me that the NO_COMPRESS option is more about backward compatibility than about enabling users to disable compression on their binary doc values. If so, I wonder if we should fork a new format, e.g. ES816TSDBDocValuesFormat
?
meta.writeByte(binaryDVCompressionMode.code); | ||
switch (binaryDVCompressionMode) { | ||
case NO_COMPRESS -> doAddUncompressedBinary(field, valuesProducer); | ||
case COMPRESSED_WITH_LZ4 -> doAddCompressedBinary(field, valuesProducer); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be worth checking how it affects queries/aggs that need binary doc values, e.g. maybe the geoshape track?
final IndexOutput tempBinaryOffsets; | ||
|
||
CompressedBinaryBlockWriter() throws IOException { | ||
tempBinaryOffsets = EndiannessReverserUtil.createTempOutput( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't need to care about endianness here, do we?
this.addresses = addresses; | ||
this.compressedData = compressedData; | ||
// pre-allocate a byte array large enough for the biggest uncompressed block needed. | ||
this.uncompressedBlock = new byte[biggestUncompressedBlockSize]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's initialize the size to something like min(16kB, biggestUncompressedBlockSize)
and dynamically resize on read? This will still help small values by never having to resize the array in practice?
@jpountz I started with forking the format, but it was too much code, so I reverted and applied the diff to the current codec. I will update the PR with forking codec :). |
Hi @dnhatn, I've created a changelog YAML for you. |
The keyword doc values field gets an extra binary doc values field, that encodes the order of how array values were specified at index time. This also captures duplicate values. This is stored in an offset to ordinal array that gets vint encoded into the binary doc values field. The additional storage required for this will likely be minimized with elastic#112416 (zstd compression for binary doc values) In case of the following string array for a keyword field: ["c", "b", "a", "c"]. Sorted set doc values: ["a", "b", "c"] with ordinals: 0, 1 and 2. The offset array will be: [2, 1, 0, 2] Limitations: * only support for keyword field mapper. * multi level leaf arrays are flattened. For example: [[b], [c]] -> [b, c] * empty arrays ([]) are not recorded * arrays are always synthesized as one type. In case of keyword field, [1, 2] gets synthesized as ["1", "2"]. These limitations can be addressed, but some require more complexity and or additional storage.
The keyword doc values field gets an extra binary doc values field, that encodes the order of how array values were specified at index time. This also captures duplicate values. This is stored in an offset to ordinal array that gets vint encoded into the binary doc values field. The additional storage required for this will likely be minimized with elastic#112416 (zstd compression for binary doc values) In case of the following string array for a keyword field: ["c", "b", "a", "c"]. Sorted set doc values: ["a", "b", "c"] with ordinals: 0, 1 and 2. The offset array will be: [2, 1, 0, 2] Limitations: * only support for keyword field mapper. * multi level leaf arrays are flattened. For example: [[b], [c]] -> [b, c] * empty arrays ([]) are not recorded * arrays are always synthesized as one type. In case of keyword field, [1, 2] gets synthesized as ["1", "2"]. These limitations can be addressed, but some require more complexity and or additional storage.
This change reintroduces the compression for binary doc_values from LUCENE-9211 for TSDB and logs indices.
I ran a quick test comparing lz4 and zstd and zstd could save approximately 25% more storage:
Should we consider using zstd instead of lz4 for compression here?
Relates #78266