Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update aircompressor to 3.0 #22532

Merged
merged 2 commits into from
Sep 5, 2024
Merged

Update aircompressor to 3.0 #22532

merged 2 commits into from
Sep 5, 2024

Conversation

wendigo
Copy link
Contributor

@wendigo wendigo commented Jun 27, 2024

Updates aircompressor to 3.0.

Changes include:

  • switched to v3 APIs across our own code
  • fixed some tests (better compression which changed the size of the columns)

@cla-bot cla-bot bot added the cla-signed label Jun 27, 2024
@github-actions github-actions bot added hudi Hudi connector iceberg Iceberg connector hive Hive connector labels Jun 27, 2024
@wendigo wendigo changed the title Try native (de)compression Nothing to see here Jun 27, 2024
@wendigo wendigo force-pushed the serafin/test-aircompressor branch from d216566 to 4c77115 Compare June 27, 2024 20:40
@starburstdata-automation
Copy link

starburstdata-automation commented Jun 27, 2024

Started benchmark workflow for this PR with test type = iceberg/sf1000_parquet_part.

Building Trino finished with status: success
Benchmark finished with status: success
Status message: NO Regression found.
Benchmark Comparison Report

@starburstdata-automation
Copy link

starburstdata-automation commented Jun 27, 2024

Started benchmark workflow for this PR with test type = iceberg/sf1000_parquet_unpart.

Building Trino finished with status: success
Benchmark finished with status: success
Status message: NO Regression found.
Benchmark Comparison Report

@wendigo wendigo force-pushed the serafin/test-aircompressor branch 2 times, most recently from 057b5e9 to 83598e9 Compare June 27, 2024 22:38
@wendigo wendigo changed the title Nothing to see here Test native (de)compression [Snappy, Zstd, Lz4] Jun 27, 2024
Copy link

This pull request has gone a while without any activity. Tagging the Trino developer relations team: @bitsondatadev @colebow @mosabua

@github-actions github-actions bot added the stale label Jul 19, 2024
@wendigo wendigo added the stale-ignore Use this label on PRs that should be ignored by the stale bot so they are not flagged or closed. label Jul 19, 2024
@wendigo wendigo force-pushed the serafin/test-aircompressor branch from 83598e9 to b4ce9a2 Compare July 19, 2024 19:20
@wendigo wendigo changed the title Test native (de)compression [Snappy, Zstd, Lz4] Test aircompressor 2.0-SNAPSHOT Jul 19, 2024
@wendigo wendigo force-pushed the serafin/test-aircompressor branch from 30849fc to 512ed7b Compare July 19, 2024 20:19
@wendigo wendigo changed the title Test aircompressor 2.0-SNAPSHOT Update aircompressor to 2.0 (requires a release) Jul 19, 2024
@wendigo wendigo force-pushed the serafin/test-aircompressor branch from 512ed7b to f989c5b Compare July 19, 2024 20:48
@wendigo wendigo force-pushed the serafin/test-aircompressor branch from f989c5b to a290c80 Compare July 19, 2024 21:40
@wendigo wendigo changed the title Update aircompressor to 2.0 (requires a release) Update aircompressor to 2.0 Aug 12, 2024
@wendigo wendigo requested a review from dain August 12, 2024 08:11
@wendigo
Copy link
Contributor Author

wendigo commented Aug 12, 2024

Updated to released version of aircompressor

@starburstdata-automation
Copy link

starburstdata-automation commented Aug 12, 2024

Started benchmark workflow for this PR with test type = iceberg/sf1000_parquet_part.

Building Trino finished with status: success
Benchmark finished with status: success
Status message: NO Regression found.
Benchmark Comparison Report

@sopel39
Copy link
Member

sopel39 commented Aug 12, 2024

@wendigo does it use native decompressors?

Could you share the numbers for both Intel and ARM?

@wendigo
Copy link
Contributor Author

wendigo commented Aug 12, 2024

@sopel39 it does

@wendigo
Copy link
Contributor Author

wendigo commented Aug 12, 2024

@sopel39 you mean micro (JMH) or macro benchmarks (benchto)?

@wendigo
Copy link
Contributor Author

wendigo commented Aug 12, 2024

Some JMH results were posted in the aircompressor PR:

  compress    airlift_lz4             calgary/book2                  333,498   375.5MB/s ±    29.4MB/s ( 7.82%) (N = 3, α = 99.9%)
  compress    airlift_lz4_native      calgary/book2                  333,498   461.9MB/s ±    47.4MB/s (10.27%) (N = 3, α = 99.9%)
  compress    airlift_snappy          calgary/book2                  334,111   357.4MB/s ±    34.5MB/s ( 9.64%) (N = 3, α = 99.9%)
  compress    airlift_snappy_native   calgary/book2                  334,941   529.0MB/s ±   139.7MB/s (26.41%) (N = 3, α = 99.9%)
  compress    airlift_zstd            calgary/book2                  205,814   149.4MB/s ±    49.8MB/s (33.30%) (N = 3, α = 99.9%)
  compress    airlift_zstd_native     calgary/book2                  203,941   236.8MB/s ±    63.9MB/s (26.98%) (N = 3, α = 99.9%)
  decompress  airlift_lz4             calgary/book2                  333,498  2713.4MB/s ±   616.6MB/s (22.73%) (N = 3, α = 99.9%)
  decompress  airlift_lz4_native      calgary/book2                  333,498  3553.0MB/s ±   959.0MB/s (26.99%) (N = 3, α = 99.9%)
  decompress  airlift_snappy          calgary/book2                  334,111   735.0MB/s ±    26.7MB/s ( 3.64%) (N = 3, α = 99.9%)
  decompress  airlift_snappy_native   calgary/book2                  334,941  2225.0MB/s ±   105.1MB/s ( 4.72%) (N = 3, α = 99.9%)
  decompress  airlift_zstd            calgary/book2                  205,814   817.0MB/s ±    16.6MB/s ( 2.04%) (N = 3, α = 99.9%)
  decompress  airlift_zstd_native     calgary/book2                  203,941  1115.3MB/s ±   169.5MB/s (15.19%) (N = 3, α = 99.9%)

@sopel39
Copy link
Member

sopel39 commented Aug 12, 2024

@wendigo

Are these JMH results for ARM or Intel? We should test both TBH in case there is regression in one arch.

benchmarks (benchto)?

Bechto TPC-DS would be awesome! Arm/Intel

@wendigo
Copy link
Contributor Author

wendigo commented Aug 12, 2024

@sopel39 this is ARM, I'm not sure whether we ran these for Intel but I don't expect regression there as well

@wendigo
Copy link
Contributor Author

wendigo commented Aug 12, 2024

@sopel39 last time I've ran benchto benchmarks on Iceberg/SF1000 it showed ~10% performance improvement on read and ~10% on write.

@wendigo wendigo force-pushed the serafin/test-aircompressor branch 2 times, most recently from 23d5efe to b7da15d Compare August 12, 2024 12:54
@starburstdata-automation
Copy link

starburstdata-automation commented Aug 12, 2024

Started benchmark workflow for this PR with test type = iceberg/sf1000_parquet_part.

Building Trino finished with status: success
Benchmark finished with status: success
Status message: NO Regression found.
Benchmark Comparison Report

@wendigo wendigo force-pushed the serafin/test-aircompressor branch 2 times, most recently from 5729bf5 to f0433df Compare August 12, 2024 14:39
@wendigo
Copy link
Contributor Author

wendigo commented Aug 12, 2024

All tests are passing. This is ready to be reviewed @dain @electrum

@wendigo wendigo changed the title Update aircompressor to 2.0 Update aircompressor to 3.0 Sep 4, 2024
@wendigo
Copy link
Contributor Author

wendigo commented Sep 4, 2024

@dain I've updated this PR to use v3. Please review

@wendigo wendigo merged commit 66979f6 into master Sep 5, 2024
101 of 103 checks passed
@wendigo wendigo deleted the serafin/test-aircompressor branch September 5, 2024 12:00
@github-actions github-actions bot added this to the 456 milestone Sep 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla-signed hive Hive connector hudi Hudi connector iceberg Iceberg connector stale stale-ignore Use this label on PRs that should be ignored by the stale bot so they are not flagged or closed.
Development

Successfully merging this pull request may close these issues.

6 participants