Adds support for ZSTD encoding in schema-ddl #262

miike · 2017-08-24T23:30:53Z

This PR adds support for ZSTD encoding (#237)

The first commit modifies the default for VARCHAR (and TIMESTAMP) columns to preferentially now use ZSTD over LZO for these column types. It doesn't yet override suggestions for other data types though this would likely be useful. I'm not sure if this is something we'd like to test before going down this route (e.g., how well ZSTD functions on INTs that are uniformly distributed, normally distributed etc).
The second commit modifies the encodings on the self and parent columns from RUNLENGTH to ZSTD. For columns that primarily contain one value (such as schema_vendor) this change makes little difference to space on disk but for other columns, such as schema_version this can make a significant difference. In a small sample of ~10 million rows if two schemas are in use simultaneously and we make the assumption that the schema used is independent and identically distributed (over time) this column is approximated 2/3 of the size on disk using ZSTD when compared to RUNLENGTH.

alexanderdean · 2017-08-25T11:07:53Z

Thanks @miike! To @chuwy for review...

snowplowcla · 2017-08-25T11:17:26Z

@miike has signed the Software Grant and Corporate Contributor License Agreement

chuwy

Looks great!

miike · 2017-09-06T11:15:08Z

Thanks @chuwy!

oguzhanunlu · 2018-01-25T20:12:48Z

Cherry-picked to #309, closing.

miike added 2 commits August 25, 2017 08:55

Modify default varchar encoding from LZO to ZSTD

c8c67ac

Modify self and parent columns to use ZSTD

4c6a4ad

BenFradet requested a review from chuwy August 25, 2017 09:11

alexanderdean requested review from chuwy and removed request for chuwy August 25, 2017 11:07

snowplowcla added the cla:yes label Aug 25, 2017

chuwy approved these changes Sep 6, 2017

View reviewed changes

chuwy added this to the Release 8 Stamp TBC milestone Dec 15, 2017

oguzhanunlu removed this from the Release 8 Basel Dove milestone Dec 18, 2017

oguzhanunlu closed this Jan 25, 2018

Provide feedback