[FEA][JNI] Consider defaulting parquet dictionary encoding policy to ALWAYS #15580

abellina · 2024-04-22T21:17:23Z

This PR is going to set the cuDF dictionary encoding policy for parquet to ADAPTIVE (#15570)

This is to get around an issue in nvcomp zstd #15501, where too large pages are getting created and is causing zstd to not compress larger dictionary pages.

For now we can pick ALWAYS in order to retain the current behavior for Spark. We should consider the impact of this setting for different compression, especially zstd (reading and writing).

abellina added the feature request New feature or request label Apr 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEA][JNI] Consider defaulting parquet dictionary encoding policy to ALWAYS #15580

[FEA][JNI] Consider defaulting parquet dictionary encoding policy to ALWAYS #15580

abellina commented Apr 22, 2024

[FEA][JNI] Consider defaulting parquet dictionary encoding policy to ALWAYS #15580

[FEA][JNI] Consider defaulting parquet dictionary encoding policy to ALWAYS #15580

Comments

abellina commented Apr 22, 2024