Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reintroduce compression for binary doc_values #112416

Open
wants to merge 9 commits into
base: main
Choose a base branch
from

Conversation

dnhatn
Copy link
Member

@dnhatn dnhatn commented Sep 1, 2024

This change reintroduces the compression for binary doc_values from LUCENE-9211 for TSDB and logs indices.

I ran a quick test comparing lz4 and zstd and zstd could save approximately 25% more storage:

-----------------------------------
| uncompressed |   LZ4  |   zstd  |
-----------------------------------
|   355.3MB    | 27.4MB |  21.7MB |

Should we consider using zstd instead of lz4 for compression here?

Relates #78266

@dnhatn dnhatn force-pushed the compress-binary-dv branch 2 times, most recently from 6d7ee62 to ccc63be Compare September 1, 2024 06:05
@dnhatn dnhatn changed the title Codec Reintroduce compression for binary doc_values Sep 1, 2024
@elasticsearchmachine
Copy link
Collaborator

Hi @dnhatn, I've created a changelog YAML for you.

@dnhatn dnhatn marked this pull request as ready for review September 1, 2024 07:46
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-storage-engine (Team:StorageEngine)

@dnhatn dnhatn added :StorageEngine/TSDB You know, for Metrics and removed :StorageEngine/Codec labels Sep 2, 2024
meta.writeByte(binaryDVCompressionMode.code);
switch (binaryDVCompressionMode) {
case NO_COMPRESS -> doAddUncompressedBinary(field, valuesProducer);
case COMPRESSED_WITH_LZ4 -> doAddCompressedBinary(field, valuesProducer);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it make sense to compress with zstd?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see the numbers in the description.. Seems like zstd offers a substantial improvement over lz4 as usual, wonder how much risk that would bring here though..

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I'm not sure why we're keeping zstd behind the feature flag. If we're okay with it, I can switch to zstd.

Copy link
Member

@martijnvg martijnvg Sep 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

zstd usage in stored fields is behind a feature flag and more specifically for get by id performance in the best speed scenario. Hopefully we can remove the feature flag soon after we have done a few more experiments with different setting for best speed mode.

I think in the case of binary doc values we should use zstd instead of lz4?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @martijnvg. I will switch this to zstd.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be worth checking how it affects queries/aggs that need binary doc values, e.g. maybe the geoshape track?

this.addresses = addresses;
this.compressedData = compressedData;
// pre-allocate a byte array large enough for the biggest uncompressed block needed.
this.uncompressedBlock = new byte[biggestUncompressedBlockSize];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to be careful here. I have seen (when this was introduced) that this array could get unwildly big and then we can have big issues with humongous allocations. This is actually pretty dangerous.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

iirc as part of #105301 we tried to add compression to binary doc values, but then the same concern was raised as this one and we went with the approach that didn't do compression. Just in order to allow tsdb codecs to be used for all doc value fields.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's initialize the size to something like min(16kB, biggestUncompressedBlockSize) and dynamically resize on read? This will still help small values by never having to resize the array in practice?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might result in OOMs I guess....

Copy link
Contributor

@iverase iverase Sep 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My point here is that we are always adding the same number of doc values per block, regardless of the size of the binary doc values, so it can get pretty big. I think we should limit the block size so we can have different number of doc values per block.

Copy link
Contributor

@iverase iverase left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general I am against the change as it is.

We currently add x number of docs per blocks regardless the size of the binary doc value which can lead in having very big blocks. We need to make sure those blocks have a limit in soze or this becomes very dangerous.

public ES87TSDBDocValuesFormat() {
this(BinaryDVCompressionMode.NO_COMPRESS);
Copy link
Member

@martijnvg martijnvg Sep 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be change to COMPRESSED_WITH_LZ4? Otherwise compression doesn't get used outside tests?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch :)

@dnhatn
Copy link
Member Author

dnhatn commented Sep 3, 2024

We currently add x number of docs per blocks regardless the size of the binary doc value which can lead in having very big blocks. We need to make sure those blocks have a limit in soze or this becomes very dangerous.

Thanks, @iverase. I copied this from LUCENE-9211. That's a good point; I'll introduce a chunk size limit for it.

@iverase
Copy link
Contributor

iverase commented Sep 3, 2024

I'll introduce a chunk size limit for it.

Thanks @dnhatn, that will remove my concern here.

Copy link
Contributor

@jpountz jpountz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems to me that the NO_COMPRESS option is more about backward compatibility than about enabling users to disable compression on their binary doc values. If so, I wonder if we should fork a new format, e.g. ES816TSDBDocValuesFormat?

meta.writeByte(binaryDVCompressionMode.code);
switch (binaryDVCompressionMode) {
case NO_COMPRESS -> doAddUncompressedBinary(field, valuesProducer);
case COMPRESSED_WITH_LZ4 -> doAddCompressedBinary(field, valuesProducer);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be worth checking how it affects queries/aggs that need binary doc values, e.g. maybe the geoshape track?

final IndexOutput tempBinaryOffsets;

CompressedBinaryBlockWriter() throws IOException {
tempBinaryOffsets = EndiannessReverserUtil.createTempOutput(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need to care about endianness here, do we?

this.addresses = addresses;
this.compressedData = compressedData;
// pre-allocate a byte array large enough for the biggest uncompressed block needed.
this.uncompressedBlock = new byte[biggestUncompressedBlockSize];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's initialize the size to something like min(16kB, biggestUncompressedBlockSize) and dynamically resize on read? This will still help small values by never having to resize the array in practice?

@dnhatn
Copy link
Member Author

dnhatn commented Sep 3, 2024

If so, I wonder if we should fork a new format, e.g. ES816TSDBDocValuesFormat?

@jpountz I started with forking the format, but it was too much code, so I reverted and applied the diff to the current codec. I will update the PR with forking codec :).

@elasticsearchmachine
Copy link
Collaborator

Hi @dnhatn, I've created a changelog YAML for you.

martijnvg added a commit to martijnvg/elasticsearch that referenced this pull request Sep 29, 2024
The keyword doc values field gets an extra binary doc values field, that encodes the order of how array values were specified at index time. This also captures duplicate values.

This is stored in an offset to ordinal array that gets vint encoded into the binary doc values field. The additional storage required for this will likely be minimized with elastic#112416 (zstd compression for binary doc values)

In case of the following string array for a keyword field: ["c", "b", "a", "c"].
Sorted set doc values: ["a", "b", "c"] with ordinals: 0, 1 and 2. The offset array will be: [2, 1, 0, 2]

Limitations:
* only support for keyword field mapper.
* multi level leaf arrays are flattened. For example: [[b], [c]] -> [b, c]
* empty arrays ([]) are not recorded
* arrays are always synthesized as one type. In case of keyword field, [1, 2] gets synthesized as ["1", "2"].

These limitations can be addressed, but some require more complexity and or additional storage.
martijnvg added a commit to martijnvg/elasticsearch that referenced this pull request Sep 30, 2024
The keyword doc values field gets an extra binary doc values field, that encodes the order of how array values were specified at index time. This also captures duplicate values.

This is stored in an offset to ordinal array that gets vint encoded into the binary doc values field. The additional storage required for this will likely be minimized with elastic#112416 (zstd compression for binary doc values)

In case of the following string array for a keyword field: ["c", "b", "a", "c"].
Sorted set doc values: ["a", "b", "c"] with ordinals: 0, 1 and 2. The offset array will be: [2, 1, 0, 2]

Limitations:
* only support for keyword field mapper.
* multi level leaf arrays are flattened. For example: [[b], [c]] -> [b, c]
* empty arrays ([]) are not recorded
* arrays are always synthesized as one type. In case of keyword field, [1, 2] gets synthesized as ["1", "2"].

These limitations can be addressed, but some require more complexity and or additional storage.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants