-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Format File Sizes Human-Readable in the CLI #2702
Conversation
I like the proposed new formatting. |
I think the human readable format is generally better so this is great! My only concern is that sometimes the full-precision ints are useful in development for spotting very small changes in compressed size when using Is there any easy way we can make it toggle back to raw bytes with |
@senhuang42, good point. I was thinking about whether The problem is that I'm not sure how I would best accomplish that. It wouldn't be enough to cancel the scaling and suffix. The value is currently stored as float, but even if it were a double, it would still lose the ability to carry full precision for files over 2^53 bytes in size. So to maintain full precision over the range of file sizes it would have to be transported as a You'd have to go back to preparing the value in a separate buffer. Which we could do, I guess. @Cyan4973, do you think that's worth doing? |
Actually I think it's not so hard to return to working with our own buffer. I'll play around with that. |
Yes, I believe that full-precision of size for benchmarking / measurement purposes is a good use case. Also :
This represents 8 PB. It feels like an acceptable limitation. |
Ok I've made a few changes (refer to commit messages). Let me know if you think these make sense. I chose to bump full precision display to require double-verbose because that way single-verbose still gets you human-readable display of each compression when processing multiple files. |
I agree, that seems a good choice. |
bike shedding: megabyte |
This produces the following formatting: Size | `zstd` | `ls -lh` ---------- | ------ | -------- 1 | 1 | 1 12 | 12 | 12 123 | 123 | 123 1234 | 1.21K | 1.3K 12345 | 12.1K | 13K 123456 | 121K | 121K 1234567 | 1.18M | 1.2M 12345678 | 11.8M | 12M 123456789 | 118M | 118M 1234567890 | 1.15G | 1.2G 999 | 999 | 999 1000 | 1000 | 1000 1001 | 1001 | 1001 1023 | 1023 | 1023 1024 | 1.000K | 1.0K 1025 | 1.00K | 1.1K 999999 | 977K | 977K 1000000 | 977K | 977K 1000001 | 977K | 977K 1023999 | 1000K | 1000K 1024000 | 1000K | 1000K 1024001 | 1000K | 1001K 1048575 | 1024K | 1.0M 1048576 | 1.000M | 1.0M 1048577 | 1.00M | 1.1M This was produced with the following invocation: ``` for N in 1 12 123 1234 12345 123456 1234567 12345678 123456789 1234567890 999 1000 1001 1023 1024 1025 999999 1000000 1000001 1023999 1024000 1024001 1048575 1048576 1048577; do head -c $N /dev/urandom > r$N done ./zstd -i1 -b1 -S r1 r12 r123 r1234 r12345 r123456 r1234567 r12345678 r123456789 r1234567890 r999 r1000 r1001 r1023 r1024 r1025 r999999 r1000000 r1000001 r1023999 r1024000 r1024001 r1048575 r1048576 r1048577 ```
Suggested by @aqrit, a little more verbose, but hopefully addresses a real ambiguity.
dbaab7b
to
87e94e3
Compare
The new changes include the
As well as in-progress compression in various ways:
|
Thanks @felixhandte ! Looks like a great improvement ! |
Let me start with: Thanks for working on this and getting merged so quickly. Small nitpick though:
We have conflicting data types? MiB for the before/after but MB/s for the rate? I would expect those to be the same. |
The speeds are also in |
Actually, speeds are indeed in |
If that is indeed the case, then why don't we change the new output to also be in MB instead of MiB? Since we're introducing the new human string format, no one is expecting it to be any format, let's make that MB. That way both units are the same. |
This PR extends @scottchiefbaker's #2696. It switches zstd's CLI output to printing human-readable representations of file sizes, rather than full-precision integers.
This table shows how this PR formats various sizes in comparison to
ls -lh
. There are some differences, but in general I prefer this formatting overls
's, since this provides more consistent 3-4 digits of precision and rounds-to-nearest rather than always rounding-up.zstd
ls -lh
Repro Instructions: