Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ORC-1109: Use zstd instead of none in the default compress option #1035

Merged
merged 1 commit into from
Jan 31, 2022

Conversation

williamhyun
Copy link
Member

@williamhyun williamhyun commented Jan 31, 2022

What changes were proposed in this pull request?

This PR aims to use zstd instead of none in the default compress option.

Why are the changes needed?

This will reduce the hardware requirements for running benchmark.

$ du -h * | sort -nr
112G	github
 67G	sales
 21G	taxi
$ du -h */*none | sort -nr
663M	taxi/parquet.none
 28G	github/json.none
 22G	sales/json.none
 18G	github/avro.none
 14G	github/parquet.none
 12G	github/orc.none
 10G	taxi/json.none
4.9G	sales/avro.none
4.2G	sales/parquet.none
2.9G	sales/orc.none
2.0G	taxi/avro.none
1.2G	taxi/orc.none

How was this patch tested?

Manually generate benchmark data.

@github-actions github-actions bot added the JAVA label Jan 31, 2022
Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, I agree with you that the none's portion of generated data size is too big, @williamhyun .

Since we still generates none if we give the option explicitly, it looks reasonable to me.

Merged to main/1.7.

@dongjoon-hyun dongjoon-hyun merged commit bcfa71e into apache:main Jan 31, 2022
@dongjoon-hyun dongjoon-hyun added this to the 1.7.3 milestone Jan 31, 2022
dongjoon-hyun pushed a commit that referenced this pull request Jan 31, 2022
…#1035)

### What changes were proposed in this pull request?

This PR aims to use `zstd` instead of `none` in the default compress option.

### Why are the changes needed?

This will reduce the hardware requirements for running benchmark.

```
$ du -h * | sort -nr
112G	github
 67G	sales
 21G	taxi
```
```
$ du -h */*none | sort -nr
663M	taxi/parquet.none
 28G	github/json.none
 22G	sales/json.none
 18G	github/avro.none
 14G	github/parquet.none
 12G	github/orc.none
 10G	taxi/json.none
4.9G	sales/avro.none
4.2G	sales/parquet.none
2.9G	sales/orc.none
2.0G	taxi/avro.none
1.2G	taxi/orc.none
```

### How was this patch tested?
Manually generate benchmark data.

(cherry picked from commit bcfa71e)
Signed-off-by: Dongjoon Hyun <[email protected]>
@dongjoon-hyun
Copy link
Member

dongjoon-hyun commented Jan 31, 2022

I tested on the master branch again and noticed that the generated data folder is almost 50% in total. It's a huge reduce. Thanks.

$ du -h * | sort -nr
 52G	github
 41G	sales
8.1G	taxi

cxzl25 pushed a commit to cxzl25/orc that referenced this pull request Jan 11, 2024
…apache#1035)

### What changes were proposed in this pull request?

This PR aims to use `zstd` instead of `none` in the default compress option.

### Why are the changes needed?

This will reduce the hardware requirements for running benchmark. 

```
$ du -h * | sort -nr
112G	github
 67G	sales
 21G	taxi
```
```
$ du -h */*none | sort -nr
663M	taxi/parquet.none
 28G	github/json.none
 22G	sales/json.none
 18G	github/avro.none
 14G	github/parquet.none
 12G	github/orc.none
 10G	taxi/json.none
4.9G	sales/avro.none
4.2G	sales/parquet.none
2.9G	sales/orc.none
2.0G	taxi/avro.none
1.2G	taxi/orc.none
```

### How was this patch tested?
Manually generate benchmark data.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants