Skip to content
This repository has been archived by the owner on Jan 1, 2022. It is now read-only.

Binary size could be smaller #107

Open
epage opened this issue Dec 6, 2021 · 30 comments
Open

Binary size could be smaller #107

epage opened this issue Dec 6, 2021 · 30 comments

Comments

@epage
Copy link
Owner

epage commented Dec 6, 2021

Issue by smklein
Friday Oct 19, 2018 at 21:59 GMT
Originally opened as clap-rs/clap#1365


Rust Version

rustc 1.31.0-nightly (e7f5d4805 2018-10-18)

Affected Version of clap

2.32.0

Bug or Feature Request Summary

I compared the binary size of a "hello world" application, with and without clap, and came up with the following data:

Using a Cargo.toml:

  [profile.release]                                                                                                                                                                                                
  opt-level = 3                                                                                                                                                                                                    
  lto = true                                                                                                                                                                                                       
  panic ='abort'      

And a main.rs (of either this, or a version which only calls println!("Hello world"):

#![feature(alloc_system)]
extern crate alloc_system;
extern crate clap;                                                                                                                                                                                                                   
use clap::{App, Arg};

fn main() {
    let matches = App::new("Hello")                                                                                                                                                                              
                    .arg(Arg::with_name("foo")                                                                                                                                                                   
                         .short("f")                                                                                                                                                                             
                         .takes_value(true))                                                                                                                                                                     
                    .get_matches();                                                                                                                                                                              
    let foo = matches.value_of("foo").unwrap();                                                                                                                                                  
    println!("Hello, world: {}", foo);                                                                                                                                                                           }

I ran the following:

$ cargo rustc --release -- -C link-args=-static && strip target/release/hello

The "Non-clap" version of hello world resulted in the following output from bloaty:

     VM SIZE                      FILE SIZE
 --------------                --------------
  78.3%   283Ki .text            283Ki  78.3%
   8.2%  29.8Ki .rodata         29.8Ki   8.2%
   6.0%  21.7Ki .eh_frame       21.7Ki   6.0%
   3.7%  13.6Ki .data.rel.ro    13.7Ki   3.8%
   1.1%  4.16Ki .bss                 0   0.0%
   0.2%     568 [ELF Headers]   2.37Ki   0.7%
   0.6%  2.16Ki .dynsym         2.16Ki   0.6%
   0.0%       0 [Unmapped]      1.91Ki   0.5%
   0.4%  1.36Ki .dynstr         1.36Ki   0.4%
   0.3%  1.17Ki .rela.dyn       1.17Ki   0.3%
   0.2%     712 .got               712   0.2%
   0.2%     672 .rela.plt          672   0.2%
   0.2%     576 .dynamic           576   0.2%
   0.1%     534 [12 Others]        525   0.1%
   0.1%     464 .plt               464   0.1%
   0.1%     416 .gnu.hash          416   0.1%
   0.1%     320 .gnu.version_r     320   0.1%
   0.1%     272 .data              272   0.1%
   0.0%       0 .shstrtab          250   0.1%
   0.0%     184 .gnu.version       184   0.0%
   0.0%     152 .plt.got           152   0.0%
 100.0%   362Ki TOTAL            362Ki 100.0%

Where the clap version resulted in the following:

     VM SIZE                      FILE SIZE
 --------------                --------------
  82.6%   620Ki .text            620Ki  82.7%
   7.0%  52.6Ki .rodata         52.6Ki   7.0%
   5.4%  40.3Ki .eh_frame       40.3Ki   5.4%
   3.2%  23.8Ki .data.rel.ro    23.9Ki   3.2%
   0.6%  4.29Ki .bss                 0   0.0%
   0.1%     568 [ELF Headers]   2.37Ki   0.3%
   0.3%  2.27Ki .dynsym         2.27Ki   0.3%
   0.2%  1.41Ki .rela.dyn       1.41Ki   0.2%
   0.2%  1.38Ki .dynstr         1.38Ki   0.2%
   0.0%       0 [Unmapped]      1.12Ki   0.1%
   0.1%     760 .got               760   0.1%
   0.1%     576 .dynamic           576   0.1%
   0.1%     576 .rela.plt          576   0.1%
   0.1%     542 [12 Others]        501   0.1%
   0.1%     476 .gnu.hash          476   0.1%
   0.1%     400 .plt               400   0.1%
   0.0%     320 .gnu.version_r     320   0.0%
   0.0%     272 .data              272   0.0%
   0.0%       0 .shstrtab          250   0.0%
   0.0%     194 .gnu.version       194   0.0%
   0.0%     184 .plt.got           184   0.0%
 100.0%   751Ki TOTAL            750Ki 100.0%

(Diff for visibility)

     VM SIZE                    FILE SIZE
 --------------              --------------
  +119%  +336Ki .text         +336Ki  +119%
   +77% +22.8Ki .rodata      +22.8Ki   +77%
   +85% +18.6Ki .eh_frame    +18.6Ki   +85%
   +75% +10.2Ki .data.rel.ro +10.2Ki   +75%
   +20%    +240 .rela.dyn       +240   +20%
  +3.0%    +128 .bss               0  [ = ]
  +5.4%    +120 .dynsym         +120  +5.4%
   +14%     +60 .gnu.hash        +60   +14%
  +6.7%     +48 .got             +48  +6.7%
   +21%     +32 .plt.got         +32   +21%
   +24%     +32 .tbss              0  [ = ]
  +1.9%     +26 .dynstr          +26  +1.9%
  +5.4%     +10 .gnu.version     +10  +5.4%
 -28.9%     -24 [LOAD [RX]]      -24 -28.9%
 -13.8%     -64 .plt             -64 -13.8%
 -14.3%     -96 .rela.plt        -96 -14.3%
  [ = ]       0 [Unmapped]      -808 -41.2%
  +107%  +388Ki TOTAL         +388Ki  +107%

The amount of .text being generated here doubles the size of the binary.

Clap is really useful, and I'd love if I could use it on more constrained environments, where binary size might be expensive.

Chatting with @kbknapp , it seems that there are some changes to clap-rs underway to allow more flexible usage of the crate, potentially disabling some features at compile-time to shrink the binary, but I figured I'd file this bug to track.

Are there currently other ways of toggling clap features, to compare binary sizes for this evaluation?

@epage
Copy link
Owner Author

epage commented Dec 6, 2021

Comment by kbknapp
Saturday May 02, 2020 at 01:19 GMT


Re-energizing this topic. Not necessarily for action prior to 3.0 release, but something to keep in our mind and begin to tackle as available.

My comment from #1891

Along those same lines but also a separate issue entirely, at some point I would like to have the discussion about aggressive cargo feature use. Ideally, all non-essential sections of clap could be disabled (but enabled by default) to allow small binary sizes when desired. I've had quite a few people / companies approach me about clap's binary size being their only issue. In fact, I believe that is the number one factor that led Google to work on argh, was binary size is their top concern (for Fuschia) and don't need a lot of the "extra" features that clap provides.

@CreepySkeleton's comment from that same issue:

On the other hand, people from this anti-dependency camp likely see clap as bloat-full and having too much allegedly unneeded deps irrespective of term_size. Clap already pulls 7 crates in --no-default-features mode, 22 in default-features mode, and 27 in all-features mode; one extra crate - very tiny and very useful btw - won't change their opinion. It's kind of all-or-nothing situation, 8 deps (including transitive) kind of count as "as small as possible". I really doubt any of them would become our user if we feature-gate one or two tiny crates.

That's not to mention that clap itself takes as much time to build as (almost) all of the deps combined, and the compilation time is a much more common complaint among users (in my view). Along with code bloat, which would only decrease without this feature.

Regarding modularization of clap.

I've been thinking about it too, for some time, and I've come to conclusion that, while it's doable - to some extent, it would require full-scale refactoring. And by full-scale I mean "rewrite the parser and the validator entirely" which is quite an endeavor.

Finally, @Dylan-DPC's comment from that issue:

Given how popular clap is, the number of dependencies is definitely a huge concern. I've been planing of tackling a lighter clap in some form or the other post the 3.0 release. That discussion is beyond the scope of this issue.

@epage
Copy link
Owner Author

epage commented Dec 6, 2021

Comment by kbknapp
Saturday May 02, 2020 at 01:42 GMT


Just so my thoughts are known, I don't believe we need to remove all dependencies and features for the smallest possible size. What I do think is that we should know which dependencies have a low "weight to functionality" ratio.

"Weight" can assessed in two manners, code size added to clap, and compile time.

"Functionality" should be self evident.

I am definitely not in the camp of "make sure there are as few deps as possible." However, I am in the camp of I don't want to take on a dep that doesn't provide intrinsic value, or is a maintenance burden.

The harder crates to evaluate will be ones that don't add a lot to clap functionally, but at the same time don't add much if any compile time or code size either. I would propose for those style crates we look at either pulling the code internal, or dropping the crate.

I'm most concerned with the --no-default-features case, that case should be as small as possible, with as few high ratio deps as possible. I'm not concerned with the default features, or even --all-features cases, because at that point the consumer as chosen to go a heavier weight route, while there are lighter weight options.

This would be a great first issue, as it's not a lot of code changing, and doesn't require too much in depth knowledge of clap. At first, it will primarily be measuring the default crates, or seeing how widespread their use inside clap is.


Taking a quick look at our default crates I see:

Crate (+ transitive deps) Functionality Notes
bitflags Underpin of AppSettings
unicode-width Heavily used in Help Messages If we made auto generated help optional, this could become an optional feature
textwrap Used for Help Messages same as unicode-width
indexmap (autocfg) used in ArgMatcher/ArgMatches Perhaps we could use vec_map instead
os_str_bytes used in parsing clap fundamentally messes with OsStrs or bytes heavily...so this is probably well worth the dep
vec_map Used throughout Arg and usage parser Used to be an optional feature. Has perf gains over BTreeMap...maybe those gains need to be evaluated again to see if they're worth it

@epage
Copy link
Owner Author

epage commented Dec 6, 2021

Comment by kbknapp
Saturday May 02, 2020 at 02:03 GMT


Doing a --no-default-feautres --features std build on a simple:

App::new("foo").get_matches();

Gives us this output via cargo bloat in release mode:

cargo bloat --crates --release
[..]
    Analyzing target/release/fake

 File  .text     Size Crate
 9.3%  56.5% 285.5KiB clap
 4.8%  29.3% 148.1KiB std
 1.9%  11.7%  59.2KiB [Unknown]
 0.1%   0.5%   2.6KiB indexmap
 0.1%   0.3%   1.6KiB textwrap
 0.0%   0.1%     307B fake
 0.0%   0.0%      41B os_str_bytes
16.5% 100.0% 505.5KiB .text section size, the file size is 3.0MiB

Note: numbers above are a result of guesswork. They are not 100% correct and never will be.

Discounting clap and std, it looks like textwrap and indexmap need a close eye. If either has better alternatives, or is simple enough to not use/pull internal I'd say those may be on the chopping block.

Also, if we did something drastic like make auto help messages optional, that alone would cut a huge portion of the code along with size. Additionally and oddly enough, that may the easiest thing to modularize first.

"But wait!" I can hear you say. "Auto help generation is like THE thing clap is used for, right?" Sure. It's one of the things, but I'd argue not the most important at all. In many of the contexts where code size matters the most, help generation is one the things they don't care about because it's super easy to just write a help string manually for a small CLI.

@epage
Copy link
Owner Author

epage commented Dec 6, 2021

Comment by CreepySkeleton
Saturday May 02, 2020 at 02:07 GMT


Before we more any further, would you please test it in release mode 😄? Nobody cares about debug binaries, release the ones of concern here.

@epage
Copy link
Owner Author

epage commented Dec 6, 2021

Comment by kbknapp
Saturday May 02, 2020 at 02:08 GMT


Literally just edited the comment with that info 😆

@epage
Copy link
Owner Author

epage commented Dec 6, 2021

Comment by pksunkara
Saturday May 02, 2020 at 06:27 GMT


Did you enable lto? I think we should assume with that.

@epage
Copy link
Owner Author

epage commented Dec 6, 2021

Comment by CreepySkeleton
Saturday May 02, 2020 at 13:47 GMT


main.rs

use clap::*;

fn main() {
    App::new("app").get_matches();
}

With lto enabled (release build, no default features)

CreepySkeleton@builder:~/workspace/probe$ cargo bloat --crates --release
    [...]

 File  .text     Size Crate
19.4%  60.3% 277.7KiB clap
 9.0%  27.9% 128.5KiB std
 3.0%   9.2%  42.3KiB [Unknown]
 0.3%   1.0%   4.7KiB probe
 0.2%   0.6%   2.5KiB indexmap
 0.0%   0.1%     394B textwrap
32.2% 100.0% 460.3KiB .text section size, the file size is 1.4MiB

What interesting here is that the size of the main.rs crate increased from 307B to 4.7KiB. Basically, some code from std has been inlined and this is the biggest difference.

The overall difference is less than 10Kb (clap and its deps only) and it will thin as more features are actually used. All the further runs will be performed without LTO.

See also https://stackoverflow.com/a/52297790. TL;DR: LTO is not about reducing code size. z opt level is about it.

Another interesting read: https://internals.rust-lang.org/t/rust-staticlibs-and-optimizing-for-size/5746. No TL;DR here, go read, it's worthy.


Kevin, just to make sure we're on the same page: I wasn't counting you as a no-deps guy, I was just thinking aloud "how would we sell clap to those guys" and I'm pretty sure the answer is - we wouldn't.

If their wariness comes from concern that the deps affect binary size, we can righteously tell them that all the deps put together (in --no-default-features mode) take less than 5KiB, the next rustc release will likely change much more of that. The same can be said about compilation time: ~2sec difference, essentially a rounding error.

At the very least, if 5KiB do mater to you, you're working on embedded, and clap will likely never suit you. Take a look at getopts/argh or write it yourself.

I agree regarding help messages. It also drags error messages along.

What not clear to me is your logic about "either pulling the code internal, or dropping the crate". It sounds like "Let's copy-paste code from third party crates (we can't just drop them) in order to decrease our maintenance burden". But from that point on, we will be the ones who maintain this copy-pasted code. A contradiction. Could you please elaborate on that?

@epage
Copy link
Owner Author

epage commented Dec 6, 2021

Comment by pksunkara
Saturday May 02, 2020 at 15:27 GMT


At the very least, if 5KiB do mater to you, you're working on embedded, and clap will likely never suit you. Take a look at getopts/argh or write it yourself.

I don't think we should be saying that. We should be using cargo feature flags to make sure all the fields can use clap.

@epage
Copy link
Owner Author

epage commented Dec 6, 2021

Comment by TeXitoi
Saturday May 02, 2020 at 15:43 GMT


To make it work on embedded, you'll need do be alloc free (usable wIthout, String, Vec and such kind of collection).

@epage
Copy link
Owner Author

epage commented Dec 6, 2021

Comment by Dylan-DPC
Saturday May 02, 2020 at 15:57 GMT


There's a post 3.0 plan for that already which is why added the no_std feature

@epage
Copy link
Owner Author

epage commented Dec 6, 2021

Comment by CreepySkeleton
Saturday May 02, 2020 at 16:44 GMT


To clarify my statement about embedded:

  • On something like Rabspery PI clap will fly marvelously; you won't even need no-std there. It's close to a real PC after all.
  • If you have a chip with megabytes of ROM on board, you may be able to squeeze clap in. no_std, alloc, and accurate fine tuning (see modularization) will be required.
  • 128 KiB is all you've got? Too bad! Take a look at getopts (and you probably won't be able to afford even that).

To summarize all of that, you either have an extra hundred of KiBs at your disposal (and this is why I think we shouldn't optimize every single KiB) or clap is not for you.

And a bit of personal opinion: as someone with a (very limited) experience with embedded, I can say that allocations are almost tabu in there. Given that clap relies on Vec and stuff very heavily... I won't even try it if I need a CLI parser for embed. I'll go grab getopts.

@epage
Copy link
Owner Author

epage commented Dec 6, 2021

Comment by TeXitoi
Saturday May 02, 2020 at 17:04 GMT


Even getopts requires Vec and String. I personally don't see why I would need clap in embedded, that's a very niche usecase.

@epage
Copy link
Owner Author

epage commented Dec 6, 2021

Comment by TeXitoi
Saturday May 02, 2020 at 17:05 GMT


(embedded = bare metal on microcontroller)

@epage
Copy link
Owner Author

epage commented Dec 6, 2021

Comment by kbknapp
Saturday May 02, 2020 at 23:36 GMT


A lot to reply to here :) I don't think we need to be concerned with embedded (bare metal) right now. Maybe one day, but there are many things we can do that will improve clap before we worry about embedded.

I'm more concerned with embedded like, i.e. resource constrained devices, like small arm based (Pi, SBC, phone, etc.). I spoke with some Google employees who were looking at clap for that exact use case (presumably Fuschia), and binary size was their concern. Binary size isn't everyone's concern. To be clear, I'm not trying to supplant argh, but I do think we should be cognizant of that use case while building clap.

We're in a little bit of a catch-22 scenario. Many people assume argument parsing is such simple task that the deps should be ultra small and no transitive deps. That's understandable why one would think that. However, as all of us know, it turns out argument parsing is somewhat of a hidden beast. There are edge cases EVERYWHERE, and a near endless amount of features to implement.

It turns out argument parsing is complex.

Having said that, the issue we're discussing conflates a few concerns that we should probably be more explicit about:

  • Binary size of clap
  • Compile times of clap
  • Adoption and Perception of clap

Both binary size and compile times are affected by:

  • Deps
  • clap features and code structure
  • Rust and LLVM compile time features (LTO, dead code elimination, monomorphization, etc.)

While adoption and perception can be heavily influenced by binary size, compile times, and the dep graph as it's own construct.

So long as we're doing our due diligence when it comes to crates we take on as dependencies to ensure they're worth the price we pay in terms of binary size and compile times, we're good. Alternatively, if they negatively affect those areas, we should do our best to ensure they're optional.

Meanwhile, we should be thinking about the dep graph as it's own construct because right, wrong, or indifferent people do care about how many crates they pull in. Unfortunately, everyone's reason for caring can be different, for some it's binary size, for some it's compile times, for others it's the perception of risk (what happens if a crates has a security flaw, or goes away over night?).

We're addressing the first two concerns already, but the third (perception of risk) is primarily addressed by either using ultra popular, well supported crates, or crates related to/owned by the parent (which is usually signalled by the name, i.e. clap_* for us).


"how would we sell clap to those guys" and I'm pretty sure the answer is - we wouldn't.

You're correct in that part of the answer is, "we don't." But I think we mean different things by this. I mean we "We don't try to sell them on clap, we just do the best we can to care about that scenario."

We can do things like making clap modular enough, or structured in a way to turn off features the end user doesn't care about. This has an added benefit of potentially being able to make most of our deps optional.

And where they can't be option, oh well. At least we're trying. clap isn't going to fit every situation and that's fine. But we should at least do what we can to be an option for even those resource constrained situations.


If their wariness comes from concern that the deps affect binary size, we can righteously tell them that all the deps put together (in --no-default-features mode) take less than 5KiB, the next rustc release will likely change much more of that. The same can be said about compilation time: ~2sec difference, essentially a rounding error.

I totally agree with you. However, the problem is many times we don't get the chance to explain ourselves. They see the dep graph front-and-center and many times simply make assumptions from there. That's the part about the "no dep" crowd I'm not a fan of. I can't count the number of reddit comments, or blog posts I've seen that say, "Can you believe [program X] pulled in 170 crates?! That's absurd!!!" with no backing evidence as to why that's absurd. Were those 170 crates not required? How much compile time did they add, etc.? If the compile times were the EXACT same, but [program X] just copy/pasted the code from those crates, you probably wouldn't hear a peep. In fact, you'd probably hear how amazing it was that [program X] has no dependencies!

...I'm not a fan of that. But it's the world we live in.


What not clear to me is your logic about "either pulling the code internal, or dropping the crate". It sounds like "Let's copy-paste code from third party crates (we can't just drop them) in order to decrease our maintenance burden"

I should have been more clear 😜 Decreasing our maint burden and decreasing our dep graph are separate concerns that get intertwined. Also taking the code internal is only in the case where we're using a dep for a small well defined subset of what the crate offers, and we aren't concerned with maint burden part, just the raw number of deps part. This should only be a last resort for us.

This is one of those things that's counter intuitive; copying and pasting a subset of a crate into clap directly, (or even into a new sub crate like clap_foo may actually not make any difference in compile times or binary size. But it will still make many people "feel" better about using clap.

Is it dumb. Yep. But again, this should be a last resort for us and not something I'm actually looking at doing seriously. The only crate that comes to mind for this type of thing is term_size which is technically already a clap owned crate, but again the name doesn't signal that so when people compile clap they just assume it's some rando crate we pulled in as a dep. Am I suggesting re-naming term_size to clap_foo? Nope. Not unless we were in drastic territory and have exhausted all other options. We're nowhere near that! 😄

@epage
Copy link
Owner Author

epage commented Dec 6, 2021

Comment by pksunkara
Tuesday May 12, 2020 at 13:56 GMT


There are some good arguments regarding inclusion of deps in this reddit thread today, https://www.reddit.com/r/rust/comments/gi7v2v/is_it_wrong_of_me_to_think_that_rust_crates_have/

@epage
Copy link
Owner Author

epage commented Dec 6, 2021

Comment by BurntSushi
Tuesday May 12, 2020 at 15:18 GMT


@kbknapp invited me to participate in this issue, but I have to say, from reading the comments, it seems like there's an undercurrent of counter-productive "us vs them." If I'm coming from (your perspective) the "anti-dependency" camp and you already believe that you can't sell clap to me

Kevin, just to make sure we're on the same page: I wasn't counting you as a no-deps guy, I was just thinking aloud "how would we sell clap to those guys" and I'm pretty sure the answer is - we wouldn't.

then is it really worth my time to participate here if I'm treated as a lost cause in the first place?

reddit and low effort comments are a pit, and people love to make sweeping generalizations that makes the world look more extreme than it is. In reality, I think most people exercise good judgment and try to straddle this line as best as they can. Sometimes I'm on the "anti-dependency" side of the fence and take steps to reduce my dependency tree. But other time, I'm on the "yeah dependencies are great" side and try to resist the urge to remove dependencies.

I think @kbknapp has the right attitude here. The struggle is the important part. Doing due diligence on dependencies before bringing them in, and really weighing their costs and benefits, is what's important. And to me, that's what I see myself doing when I notice that a new dependency is about to come into my tree. What I see is a prior state in which I could opt out, but now I can't, and I don't know why. The code that was deleted looked pretty small, its commit history suggests it was very lightly edited and its performance was more than good enough for me. Moreover, when I first dropped the vec-map dependency from my use of Clap, I noticed some really crazy compile time improvements. I suspect there was a bug somewhere or some weird case, but still, the flexibility to drop dependencies was really nice.

I'm not familiar with Clap internals, and as someone who has written more than one argument parsing library, I am definitely someone who appreciates their complexity. I never take a good argument parser for granted. They are super hard to build because they need to balance so many competing concerns. IMO, Clap does a great job of this already.

So with that said, I'll give some opinions on Clap 3's required dependencies:

  • bitflags - I would personally just hand write any bit flags logic that I needed. In my view, this is a crate that optimizes for writing code, which I generally think isn't the right thing to be doing in widely used fundamental crates. So there's a bit of a philosophical opinion involved here.
  • textwrap - I would make this optional but enabled by default. I write all of my help strings and documentation wrapped to 79 columns anyway, which is probably good enough.
  • unicode-width - I would make this optional but enabled by default. If my help text is all ASCII, then an assumption that each byte is one column is reasonable.
  • indexmap - I don't have enough context to know what to do here, but it looks like it was recently added. I assume this is justified somehow through a major performance win, but otherwise, I would drop this.
  • os_str_bytes - This is definitely better than using unsafe code. If I knew more about why Clap needed this, then perhaps there is a way to get rid of this too. But yeah, if Clap needs to be mucking around with OsStrs, then the lack of an OsStr string API probably makes this really hard to the point where you pretty much have to do what os_str_bytes is doing. (I still wish std would just expose its internal WTF-8 representation, but there are good arguments against doing that.)
  • vec_map - I don't know the performance story here, but keeping it optional looked easy? But maybe there are some issues I don't know about.

@epage
Copy link
Owner Author

epage commented Dec 6, 2021

Comment by kbknapp
Tuesday May 12, 2020 at 15:25 GMT


Thank you for the detailed response! Before I can finish reading the comment and add any thoughts, let me address something real quick as a moderation note:

from reading the comments, it seems like there's an undercurrent of counter-productive "us vs them." If I'm coming from (your perspective) the "anti-dependency" camp and you already believe that you can't sell clap to me

I want to be very clear with everyone reading these messages (especially including the clap team) that this is not what I want to portray or the tone I want to encourage. Internet communications via text are hard, especially when we start considering native languages and such. (Edit: to be even more clear. I don't think @BurntSushi is portraying this tone. He's reading and noticing this tone from previous comments we made. And I should say thank you for pointing this out! 😄 )

I just want to be clear that the clap team is on the same page of, there is no "us vs them" and all opinions are welcome! We have differences of priorities and how to achieve those priorities, but all opinions are valid within this discussion. Ultimately the clap team will have to make a decision on which priority and implementation strategy to go with, but through discussions and issues like these my goal is to take all opinions in, consider them carefully, and make a decision that is best for clap and all consumers (current and future).


...now back to reading 😄

@epage
Copy link
Owner Author

epage commented Dec 6, 2021

Comment by BurntSushi
Tuesday May 12, 2020 at 15:35 GMT


Also, one other note: IIRC, cargo bloat has an option to list binary size by function. That might also be useful to look at for decreasing binary size. Sometimes you can get big wins by tweaking the inlining strategy used, for example. Or using trait objects is sometimes possible, although I've found it difficult to do in practice in many cases because of object safety rules.

@epage
Copy link
Owner Author

epage commented Dec 6, 2021

Comment by pksunkara
Tuesday May 12, 2020 at 15:44 GMT


I just want to be clear that the clap team is on the same page of, there is no "us vs them" and all opinions are welcome!

I strongly agree with this.

Unfortunately, I only started using Rust this year (maybe Dec last year), so I missed all the deps related discussions and am still trying to catch up on understanding the pros and cons related to rust ecosystem. That is why I was asking you the recent question on deps in termcolor @BurntSushi.

In Javascript before, while I do try to reduce our usage of deps, I still try to prefer commonly used packages and using them as deps instead of writing all of them myself which was the convention all across the ecosystem.

If we take the case of vec_map, my question is why do we want to make it optional? Shouldn't it be either including it completely or removing it completely? I think this is basic question we want to answer first.

@epage
Copy link
Owner Author

epage commented Dec 6, 2021

Comment by kbknapp
Tuesday May 12, 2020 at 16:28 GMT


If we take the case of vec_map, my question is why do we want to make it optional? Shouldn't it be either including it completely or removing it completely? I think this is basic question we want to answer first.

My understanding is there are several concerns that can also overlap with each other, and not everyone has the same concerns. As mentioned above, some of those include:

  • Binary Size
  • Compile Time
  • Transitive Deps (which repeats points 1-2 for each dep)

Adding a dep can have negative impacts on points 1 and 2, or both. Some people care about point 1, some people about point 2, and some about both. So if we consider either point a deps "cost" we have to weigh that cost against it's benefits (usually performance, or lower maintenance burden for us directly).

This is what I was trying to say earlier, we have to weigh any cost (including the subjective ones) against the objective benefit.

The reason for making it optional and not a binary include or not include is because people have various requirements. Some may not care about the binary size, or increased compile time if the feature provided is worth the cost, but since some do care we need a way to allow our consumers the same choices we're making at the library level.

For vec_map which is a self contained crate and no transitive deps, I would want to look at exactly what perf gains we get, compared with binary size/compile times. Add to that, since we pulled in IndexMap, perhaps we can use that instead and remove one or the other entirely? I haven't looked lately, so that may not be feasible.


To address the comments per dep from @BurntSushi :

  • bitflags - I hadn't really considered just pulling in the logic of bitflags to be honest. ...but now that you say it I'm sure it's actually simple subset since we don't particularly care about the macros, etc. just the functionality. Would probably worth a look at least.
  • textwrap - the hard part with this one is this dep helps us wrap at word boundaries intelligently. But perhaps making it optional for those that will manually wrap their help strings anyways.
  • unicode-width - I also like the idea of providing a, "I only use ASCII args/help strings (values excluded), so turn off all extra deps related to handling Unicode logic" flag
  • indexmap - Might be able to combine with vec_map, or relook at our exact use case. I need to familiarize myself with the code again.
  • os_str_bytes - This is probably staying for the time being. Clap is pretty much written with idea we use OsStr(ing) internally everywhere, and only assume/use UTF-8 at the consumer demarcation point. I've toyed with the idea of bstr since all we really care about is bytes, but not done enough research to see what overlap, if any, bstr and os_str_bytes have or which would be the best fit.
  • vec_map - Addressed in other comments. I just want to make sure we're getting a good cost:benefit ratio otherwise we may drop it entirely.

@epage
Copy link
Owner Author

epage commented Dec 6, 2021

Comment by BurntSushi
Tuesday May 12, 2020 at 17:49 GMT


I've toyed with the idea of bstr since all we really care about is bytes, but not done enough research to see what overlap, if any, bstr and os_str_bytes have or which would be the best fit.

You probably want to stick with os_str_bytes. os_str_bytes re-implements WTF-8 and gives you 100% correctness on both Unix and Windows. The only help bstr gives you with OS strings are utility functions for converting between &OsStr and &[u8], which is zero cost on Unix but does lossy decoding on Windows, so it sacrifices some correctness. (bstr does not re-implement WTF-8.)

I don't know how hard it would be to avoid doing the equivalent of os_str_bytes while maintaining correctness. It could be impossible. I could imagine some possibilities, like lossily decoding each argument for parsing only, which would work since the "significant" characters in the arg parser I believe are all ASCII. But then making sure you hand back the original OsString to the caller. But this is complex and could potentially be a performance regression. Unless std decides to change its tune and expose an OS string's encoding, os_str_bytes is probably the right answer.

@epage
Copy link
Owner Author

epage commented Dec 6, 2021

Comment by pksunkara
Tuesday May 12, 2020 at 17:51 GMT


I think using indexmap in place of vec_map would make the performance worse. And we need indexmap for ordered map IIRC.

There was a PR in the v2 commit history which removed bitflags but then you reverted it back @kbknapp. Not sure what happened but it was around 2 yrs ago.

@epage
Copy link
Owner Author

epage commented Dec 6, 2021

Comment by kbknapp
Tuesday May 12, 2020 at 17:57 GMT


I think using indexmap in place of vec_map would make the performance worse. And we need indexmap for ordered map IIRC.

That's entirely possible. I haven't done any investigating on the matter yet.

There was a PR in the v2 commit history which removed bitflags but then you reverted it back @kbknapp. Not sure what happened but it was around 2 yrs ago.

That was switching bitflags for a HashSet which had a significant negative performance impact which was the reasons for the revert.

@epage
Copy link
Owner Author

epage commented Dec 6, 2021

Comment by CreepySkeleton
Wednesday May 13, 2020 at 07:50 GMT


Also, one other note: IIRC, cargo bloat has an option to list binary size by function. That might also be useful to look at for decreasing binary size. Sometimes you can get big wins by tweaking the inlining strategy used, for example. Or using trait objects is sometimes possible, although I've found it difficult to do in practice in many cases because of object safety rules.

I actually looked at the list long time ago and found that

  • there are about 5 big (10KiB - 33KiB) functions, they hold ~90KiB of clap. Maybe we can shrink them a bit, but I wouldn't expect any breakthrough here.
  • there are a few dozens of pretty small but non-negligible functions (500B - 5KiB) that are responsible for the most of the bloat. This is where I'd expect we can get a win.
  • clap doesn't use generics internally at all. All (or most of) the generics are placed on public builder's functions and end up inlined.

  • textwrap - I would make this optional but enabled by default. I write all of my help strings and documentation wrapped to 79 columns anyway, which is probably good enough.
  • unicode-width - I would make this optional but enabled by default. If my help text is all ASCII, then an assumption that each byte is one column is reasonable.

Agreed.

  • bitflags - I would personally just hand write any bit flags logic that I needed. In my view, this is a crate that optimizes for writing code, which I generally think isn't the right thing to be doing in widely used fundamental crates. So there's a bit of a philosophical opinion involved here.

Counterpoints:

  • If you're suggesting to use a set of raw u64 constants, like

    const REQUIRED:     u64 = 1;
    const MULTIPLE_OCC: u64 = 1 << 1;
    const EMPTY_VALS:   u64 = 1 << 2 | TAKES_VAL;
    const GLOBAL:       u64 = 1 << 3;
    const HIDDEN:       u64 = 1 << 4;
    const TAKES_VAL:    u64 = 1 << 5;
    const USE_DELIM:    u64 = 1 << 6;

    Then there's a very significant downside to it: you can accidentally OR it with something that is not a bitflag, especially if names are similar and the types happen to match.

    bitflags prevents this by creating a separate type for it that is incompatible with everything else. Of course, you could create a separate type yourself, by hand, but then again, you'll be left to implementing OR, And, Not... by hand - and you better make sure you aren't making more errors when you'll be copy-pasting the impls!

  • This crate is de-facto standard for bitflags in rust world, judging by download count.

I don't really have very serious objections to importing the macro into clap codbase, but I'd rather avoid it nonetheless ("reinventing the wheel" argument).

As an argument for importing the macro, I'd say that bitflags looks mostly unmaintained at this point (four months without updates and a couple of unanswered PRs).

  • indexmap - I don't have enough context to know what to do here, but it looks like it was recently added. I assume this is justified somehow through a major performance win, but otherwise, I would drop this.

This is justified by the need of a specific data structure, and IndexMap satisfies it. We need:

  • A key-value-like representation with O(1) lookup. Probably even O(log(n)) would be fine, number of arguments (n) rarely exceeds a hundred or so. Both HashMap and BTreeMap could be used here.
  • Iteration order must match the insertion order. Neither HashMap nor BtreeMap fit.

Yes, we could use a HashMap coupled with a Vec to trace the insertions, but why reimplement the wheel?

  • os_str_bytes - This is definitely better than using unsafe code. If I knew more about why Clap needed this, then perhaps there is a way to get rid of this too. But yeah, if Clap needs to be mucking around with OsStrs, then the lack of an OsStr string API probably makes this really hard to the point where you pretty much have to do what os_str_bytes is doing. (I still wish std would just expose its internal WTF-8 representation, but there are good arguments against doing that.)

Well, either std decides to expose the wtf-8 representation - which is very unlikely - or we do what os_str_bytes does simply because we need these methods. And I'd rather use one, well-audited implementation (os_str_bytes appears to be one) instead of spawning our own unicorn impl with lots of unsafe.

This is core functionality. It breaks, we break. It shines, we shine, too.

bstr is great, but not quite what we need.

  • vec_map - I don't know the performance story here, but keeping it optional looked easy? But maybe there are some issues I don't know about.
  • vec_map - Addressed in other comments. I just want to make sure we're getting a good cost:benefit ratio otherwise we may drop it entirely.

90% sure we can simply replace it with BTreeMap or even plain Vec. I'll be looking into it.

@epage
Copy link
Owner Author

epage commented Dec 6, 2021

Comment by bb010g
Friday Jun 12, 2020 at 07:47 GMT


@kbknapp

"But wait!" I can hear you say. "Auto help generation is like THE thing clap is used for, right?" Sure. It's one of the things, but I'd argue not the most important at all. In many of the contexts where code size matters the most, help generation is one the things they don't care about because it's super easy to just write a help string manually for a small CLI.

Would also exposing static help generation via a build script and/or macro be feasible?

@epage
Copy link
Owner Author

epage commented Dec 6, 2021

Comment by ssokolow
Friday Jun 12, 2020 at 10:57 GMT


"But wait!" I can hear you say. "Auto help generation is like THE thing clap is used for, right?" Sure. It's one of the things, but I'd argue not the most important at all. In many of the contexts where code size matters the most, help generation is one the things they don't care about because it's super easy to just write a help string manually for a small CLI.

I'd disagree with that perception. For writing "strongly-typed scripts" that are meant to be tiny, I'd be willing to sacrifice functionality if the gains are proportionate to what's lost, and there do even exist argument parsers designed to be tiny that do --help generation in about 1/20th the size of clap. (pico-args has a comparison chart and, of them, gumdrop and argh will do --help generation.)

...the deal-breaker is that, last I checked, none of the tiny ones that do --help generation support taking OsString arguments, and that's non-negotiable. (I actually have some mojibake'd filenames on my drives right now.)

That said, I do consider --help generation non-negotiable, because I'm big on DRY in my code and, if the parser didn't have --help generation, I'd have to write a procedural macro or some other form of compile-time metaprogramming wheel-reinvention to ensure the --help and the parser can't drift out of sync.

@epage
Copy link
Owner Author

epage commented Dec 6, 2021

Comment by therealprof
Thursday Jul 02, 2020 at 13:28 GMT


Just randomly stumbled across this ticket and I do find one argument rather weird in this discussion which is "performance": How is that even a topic worth making size tradeoffs for?

In the end, a command line parser (typically) runs once per application invocation while bloat is evident even when the application is not used at all.

For this kind of crate size should always have a way higher priority than runtime performance.

@epage
Copy link
Owner Author

epage commented Dec 6, 2021

Comment by BurntSushi
Thursday Jul 02, 2020 at 13:32 GMT


@therealprof No, performance matters, a lot. This is a problem docopt had (and probably still has), where in certain configurations, its performance could be quite bad because of how its matching algorithm worked.

The main reason why performance matters for CLI parsers is when a lot of arguments are passed to the command, or when the command is repeatedly executed. This can happen pretty easily with things like xargs.

@epage
Copy link
Owner Author

epage commented Dec 6, 2021

Comment by therealprof
Thursday Jul 02, 2020 at 13:48 GMT


The main reason why performance matters for CLI parsers is when a lot of arguments are passed to the command, or when the command is repeatedly executed. This can happen pretty easily with things like xargs.

Sure, I wasn't implying that abysmal performance was acceptable. But you don't need superfancy order-preserving hashmaps to get perfectly fine CLI parsing speeds for 99+% of all applications, a regular Vec would do just fine. And for the rest we can always have a separate opt-in performance option which uses highly optimized external crates to cater for crazy fast 2 MB CLI arguments parsing needs.

@epage
Copy link
Owner Author

epage commented Dec 6, 2021

Comment by BurntSushi
Thursday Jul 02, 2020 at 13:55 GMT


I see, I didn't realize you were talking at that level of detail. I think at that point, you need a benchmark. I do share your skepticism on that specific point, but would definitely want to subject it to testing.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

1 participant