Specify compression per column instead of globally #1594

ozgrakkurt · 2023-11-23T03:14:06Z

Maybe a similar api to how we pass encodings into RowGroupIterator.

This will allow to have different compression config for different columns. It would be very useful in cases where we have a sizeable column with random binary data like hash etc. Or if we are using rle/dictionary encoding, there might not be much point in compressing/decompressing.

This would give significant performance boost for my use case since when I look at timings for querying parquet, it shows 1/4. 1/2 of time is spent decompressing

I would like to work on this if I can get how I should modify the public api for this

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Specify compression per column instead of globally #1594

Specify compression per column instead of globally #1594

ozgrakkurt commented Nov 23, 2023 •

edited

Loading

Specify compression per column instead of globally #1594

Specify compression per column instead of globally #1594

Comments

ozgrakkurt commented Nov 23, 2023 • edited Loading

ozgrakkurt commented Nov 23, 2023 •

edited

Loading