Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: implement ALP-RD compression #947

Merged
merged 14 commits into from
Oct 2, 2024
Merged

feat: implement ALP-RD compression #947

merged 14 commits into from
Oct 2, 2024

Conversation

a10y
Copy link
Contributor

@a10y a10y commented Sep 29, 2024

Fixes #10: Add ALP-RD compression.

Currently our only floating point compression algorithm is standard ALP, which targets floats/doubles that are originally decimal, and thus have some natural integer they can round to when you undo the exponent.

For science/math datasets, there are a lot of "real doubles", i.e. floating point numbers that use most/all of their available precision. These do not compress with standard ALP. The ALP paper authors had a solution for this called "ALP for 'Real' Doubles" / ALP-RD, which is implemented in this PR.

Basics

The key insight of ALP-RD is that even for dense floating point numbers, within a column they often share the front bits (exponent + first few bits of mantissa). We try and find the best cut-point within the leftmost 16-bits.

There are generally a small number of unique values for the leftmost bits, so you can create a dictionary of fixed size (here we use the choice of 8 from the C++ implementation) which naturally bit-packs down to 3 bits. If you compress perfectly without exceptions, you can store 49 bits/value ~23% compression. In practice the amount varies. In the comments below you can see a test with the POI dataset referenced in the ALP paper, and we replicate their results of 55 and 56 bits/value respectively.

List of changes

  • Reorganized the vortex-alp crate. I created two top-level modules, alp and alp_rd, and moved the previous implementation into the alp` module
  • Added new ALPRDArray in the alp_rd module. It supports both f32 and f64, and all major compute functions are implemented (save for MaybeCompareFn and the Accessors I will file an issue to implement these in a FLUP if alright, this PR is already quite large)
  • Added corresponding ALPRDCompressor and wired the CompressorRef everywhere I could find ALPCompressor
  • New benchmark for RD compression in the existing ALP benchmarks suite

@a10y
Copy link
Contributor Author

a10y commented Sep 30, 2024

Some Q's:

  1. For ALP-RD, we store a small dictionary (<= 16 bytes) for the left-parts. Should that be stored as a child or as metadata on the array?
  2. ALP-RD from the paper prescribes specific compression algorithms for each of its subcomponents (fused dict+FL for left-parts, FL bit-pack for right-parts). Should we implement those, or should we just let it cascade in the compressor?

Separately just an observation, but ALP from the paper recommends having one pair of exponents per vector, rather than for the entire array like we do now.

EDIT: answers

  1. Store in metadata
  2. Can bit-pack the left/right side explicitly

@a10y a10y force-pushed the aduffy/alp-rd branch 2 times, most recently from e293a50 to c51ecc4 Compare October 1, 2024 02:08
// dict-encode the left-parts, keeping track of exceptions
for (idx, left) in left_parts.iter_mut().enumerate() {
// TODO: revisit if we need to change the branch order for perf.
if let Some(code) = self.codes.iter().position(|v| *v == *left) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had originally used HashMap for this, like the C++ code does, but it turns out that doing linear search on a small fixed-size array is considerably faster (~5x) than doing hashmap lookups

@a10y
Copy link
Contributor Author

a10y commented Oct 1, 2024

I implemented a small test using the POI Kaggle dataset referenced from the paper, and I was able to replicate the compression ration results.

image
/Users/aduffy/.cargo/bin/cargo run --color=always --bin compress_poi --manifest-path /Volumes/Code/vortex/bench-vortex/Cargo.toml
    Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.16s
     Running `target/debug/compress_poi`
reading with schema: Struct(StructDType { names: ["name", "latitude_radian", "longitude_radian", "num_links", "links", "num_categories", "categories"], dtypes: [Utf8(Nullable), Primitive(F64, Nullable), Primitive(F64, Nullable), Primitive(I64, Nullable), Utf8(Nullable), Primitive(I64, Nullable), Utf8(Nullable)] }, NonNullable)
raw POI dataset
root: vortex.struct(0x04)({latitude_radian=f64?, longitude_radian=f64?}, len=424205) nbytes=6.79 MB (100.00%)
  metadata: StructMetadata { length: 424205, validity: NonNullable }
  "latitude_radian": vortex.primitive(0x03)(f64?, len=424205) nbytes=3.39 MB (50.00%)
    metadata: PrimitiveMetadata { validity: AllValid }
    buffer: 3.39 MB
  "longitude_radian": vortex.primitive(0x03)(f64?, len=424205) nbytes=3.39 MB (50.00%)
    metadata: PrimitiveMetadata { validity: AllValid }
    buffer: 3.39 MB

Compressed POI data
root: vortex.struct(0x04)({latitude_radian=f64?, longitude_radian=f64?}, len=424205) nbytes=5.95 MB (100.00%)
  metadata: StructMetadata { length: 424205, validity: NonNullable }
  "latitude_radian": vortex.alprd(0x1e)(f64?, len=424205) nbytes=2.95 MB (49.66%)
    metadata: ALPRDMetadata { is_f32: false, right_bit_width: 52, dict_len: 8, dict: [1022, 1021, 3069, 1020, 1023, 3070, 3068, 3071], left_parts_dtype: Primitive(U16, Nullable), has_exceptions: true }
    left_parts: fastlanes.bitpacked(0x15)(u16?, len=424205) nbytes=159.08 kB (2.68%)
      metadata: BitPackedMetadata { validity: AllValid, bit_width: 3, offset: 0, length: 424205, has_patches: false }
      buffer: 159.36 kB
    right_parts: fastlanes.bitpacked(0x15)(u64?, len=424205) nbytes=2.76 MB (46.38%)
      metadata: BitPackedMetadata { validity: AllValid, bit_width: 52, offset: 0, length: 424205, has_patches: false }
      buffer: 2.76 MB
    left_parts_exceptions: vortex.sparse(0x08)(u16?, len=424205) nbytes=36.42 kB (0.61%)
      metadata: SparseMetadata { indices_dtype: Primitive(U64, NonNullable), indices_offset: 0, indices_len: 8325, len: 424205, fill_value: Scalar { dtype: Primitive(U16, Nullable), value: Null } }
      indices: fastlanes.bitpacked(0x15)(u64, len=8325) nbytes=19.77 kB (0.33%)
        metadata: BitPackedMetadata { validity: NonNullable, bit_width: 19, offset: 0, length: 8325, has_patches: false }
        buffer: 21.89 kB
      values: vortex.primitive(0x03)(u16?, len=8325) nbytes=16.65 kB (0.28%)
        metadata: PrimitiveMetadata { validity: AllValid }
        buffer: 16.65 kB
  "longitude_radian": vortex.alprd(0x1e)(f64?, len=424205) nbytes=2.99 MB (50.34%)
    metadata: ALPRDMetadata { is_f32: false, right_bit_width: 53, dict_len: 8, dict: [1535, 510, 511, 512, 1536, 509, 1533, 1534], left_parts_dtype: Primitive(U16, Nullable), has_exceptions: true }
    left_parts: fastlanes.bitpacked(0x15)(u16?, len=424205) nbytes=159.08 kB (2.68%)
      metadata: BitPackedMetadata { validity: AllValid, bit_width: 3, offset: 0, length: 424205, has_patches: false }
      buffer: 159.36 kB
    right_parts: fastlanes.bitpacked(0x15)(u64?, len=424205) nbytes=2.81 MB (47.27%)
      metadata: BitPackedMetadata { validity: AllValid, bit_width: 53, offset: 0, length: 424205, has_patches: false }
      buffer: 2.82 MB
    left_parts_exceptions: vortex.sparse(0x08)(u16?, len=424205) nbytes=23.37 kB (0.39%)
      metadata: SparseMetadata { indices_dtype: Primitive(U64, NonNullable), indices_offset: 0, indices_len: 5342, len: 424205, fill_value: Scalar { dtype: Primitive(U16, Nullable), value: Null } }
      indices: fastlanes.bitpacked(0x15)(u64, len=5342) nbytes=12.69 kB (0.21%)
        metadata: BitPackedMetadata { validity: NonNullable, bit_width: 19, offset: 0, length: 5342, has_patches: false }
        buffer: 14.59 kB
      values: vortex.primitive(0x03)(u16?, len=5342) nbytes=10.68 kB (0.18%)
        metadata: PrimitiveMetadata { validity: AllValid }
        buffer: 10.68 kB

We can see that our bits-per-value are roughly 55 for latitude_radians and 56 for longitude_radians:

  • latitude = 55.6 bits per pixel
  • longitude: 56.4

This nets us an overall compression ratio of ~12.5

Screenshot 2024-10-01 at 3 41 27 PM
%

@a10y a10y marked this pull request as ready for review October 1, 2024 19:49
@a10y a10y changed the title WIP: ALP-RD feat: implement ALP-RD compression Oct 1, 2024
@@ -89,7 +89,7 @@ pub async fn rewrite_parquet_as_vortex<W: VortexWrite>(
Ok(())
}

pub fn read_parquet_to_vortex(parquet_path: &Path) -> VortexResult<ChunkedArray> {
pub fn read_parquet_to_vortex<P: AsRef<Path>>(parquet_path: P) -> VortexResult<ChunkedArray> {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ergonomics

@@ -19,7 +26,14 @@ impl Display for Exponents {
}
}

pub trait ALPFloat: Float + Display + 'static {
mod private {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in theory this was previously extensible, but we probably want to constrain it

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

poor f16 not considered in the paper. Anyway this is the right thing to do

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah i too forgot about f16. I suppose that we probably want special compressors for things like bf16

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is just a comment for future readers. The paper only talked about the common float types

encodings/alp/src/alp_rd/mod.rs Outdated Show resolved Hide resolved
encodings/alp/src/alp_rd/mod.rs Outdated Show resolved Hide resolved
}
}

// Only applies for F64.
Copy link
Contributor Author

@a10y a10y Oct 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

old comment. replace with real doc comment

@robert3005
Copy link
Member

one note - don't bother with accessors we decided that we likely need to change them

@@ -17,6 +17,8 @@ readme = { workspace = true }
workspace = true

[dependencies]
fastlanes = { workspace = true }
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unused

Copy link
Member

@robert3005 robert3005 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some small changes

bench-vortex/src/lib.rs Outdated Show resolved Hide resolved
vortex-dtype/src/dtype.rs Outdated Show resolved Hide resolved
vortex-sampling-compressor/src/compressors/alp_rd.rs Outdated Show resolved Hide resolved
encodings/alp/src/alp_rd/compute/slice.rs Outdated Show resolved Hide resolved
encodings/alp/src/alp_rd/compute/filter.rs Outdated Show resolved Hide resolved
encodings/alp/src/alp_rd/array.rs Outdated Show resolved Hide resolved
@a10y a10y enabled auto-merge (squash) October 2, 2024 15:58
@a10y a10y merged commit 389e6a4 into develop Oct 2, 2024
5 checks passed
@a10y a10y deleted the aduffy/alp-rd branch October 2, 2024 16:07
@lwwmanning
Copy link
Member

🥳

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Encoding: ALPRD
3 participants