Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Binary size could be smaller #1365

Open
smklein opened this issue Oct 19, 2018 · 42 comments
Open

Binary size could be smaller #1365

smklein opened this issue Oct 19, 2018 · 42 comments
Labels
A-meta Area: administrative question or tracking issue C-enhancement Category: Raise on the bar on expectations S-waiting-on-mentor Status: Needs elaboration on the details before doing a 'Call for participation'
Milestone

Comments

@smklein
Copy link

smklein commented Oct 19, 2018

Rust Version

rustc 1.31.0-nightly (e7f5d4805 2018-10-18)

Affected Version of clap

2.32.0

Bug or Feature Request Summary

I compared the binary size of a "hello world" application, with and without clap, and came up with the following data:

Using a Cargo.toml:

  [profile.release]                                                                                                                                                                                                
  opt-level = 3                                                                                                                                                                                                    
  lto = true                                                                                                                                                                                                       
  panic ='abort'      

And a main.rs (of either this, or a version which only calls println!("Hello world"):

#![feature(alloc_system)]
extern crate alloc_system;
extern crate clap;                                                                                                                                                                                                                   
use clap::{App, Arg};

fn main() {
    let matches = App::new("Hello")                                                                                                                                                                              
                    .arg(Arg::with_name("foo")                                                                                                                                                                   
                         .short("f")                                                                                                                                                                             
                         .takes_value(true))                                                                                                                                                                     
                    .get_matches();                                                                                                                                                                              
    let foo = matches.value_of("foo").unwrap();                                                                                                                                                  
    println!("Hello, world: {}", foo);                                                                                                                                                                           }

I ran the following:

$ cargo rustc --release -- -C link-args=-static && strip target/release/hello

The "Non-clap" version of hello world resulted in the following output from bloaty:

     VM SIZE                      FILE SIZE
 --------------                --------------
  78.3%   283Ki .text            283Ki  78.3%
   8.2%  29.8Ki .rodata         29.8Ki   8.2%
   6.0%  21.7Ki .eh_frame       21.7Ki   6.0%
   3.7%  13.6Ki .data.rel.ro    13.7Ki   3.8%
   1.1%  4.16Ki .bss                 0   0.0%
   0.2%     568 [ELF Headers]   2.37Ki   0.7%
   0.6%  2.16Ki .dynsym         2.16Ki   0.6%
   0.0%       0 [Unmapped]      1.91Ki   0.5%
   0.4%  1.36Ki .dynstr         1.36Ki   0.4%
   0.3%  1.17Ki .rela.dyn       1.17Ki   0.3%
   0.2%     712 .got               712   0.2%
   0.2%     672 .rela.plt          672   0.2%
   0.2%     576 .dynamic           576   0.2%
   0.1%     534 [12 Others]        525   0.1%
   0.1%     464 .plt               464   0.1%
   0.1%     416 .gnu.hash          416   0.1%
   0.1%     320 .gnu.version_r     320   0.1%
   0.1%     272 .data              272   0.1%
   0.0%       0 .shstrtab          250   0.1%
   0.0%     184 .gnu.version       184   0.0%
   0.0%     152 .plt.got           152   0.0%
 100.0%   362Ki TOTAL            362Ki 100.0%

Where the clap version resulted in the following:

     VM SIZE                      FILE SIZE
 --------------                --------------
  82.6%   620Ki .text            620Ki  82.7%
   7.0%  52.6Ki .rodata         52.6Ki   7.0%
   5.4%  40.3Ki .eh_frame       40.3Ki   5.4%
   3.2%  23.8Ki .data.rel.ro    23.9Ki   3.2%
   0.6%  4.29Ki .bss                 0   0.0%
   0.1%     568 [ELF Headers]   2.37Ki   0.3%
   0.3%  2.27Ki .dynsym         2.27Ki   0.3%
   0.2%  1.41Ki .rela.dyn       1.41Ki   0.2%
   0.2%  1.38Ki .dynstr         1.38Ki   0.2%
   0.0%       0 [Unmapped]      1.12Ki   0.1%
   0.1%     760 .got               760   0.1%
   0.1%     576 .dynamic           576   0.1%
   0.1%     576 .rela.plt          576   0.1%
   0.1%     542 [12 Others]        501   0.1%
   0.1%     476 .gnu.hash          476   0.1%
   0.1%     400 .plt               400   0.1%
   0.0%     320 .gnu.version_r     320   0.0%
   0.0%     272 .data              272   0.0%
   0.0%       0 .shstrtab          250   0.0%
   0.0%     194 .gnu.version       194   0.0%
   0.0%     184 .plt.got           184   0.0%
 100.0%   751Ki TOTAL            750Ki 100.0%

(Diff for visibility)

     VM SIZE                    FILE SIZE
 --------------              --------------
  +119%  +336Ki .text         +336Ki  +119%
   +77% +22.8Ki .rodata      +22.8Ki   +77%
   +85% +18.6Ki .eh_frame    +18.6Ki   +85%
   +75% +10.2Ki .data.rel.ro +10.2Ki   +75%
   +20%    +240 .rela.dyn       +240   +20%
  +3.0%    +128 .bss               0  [ = ]
  +5.4%    +120 .dynsym         +120  +5.4%
   +14%     +60 .gnu.hash        +60   +14%
  +6.7%     +48 .got             +48  +6.7%
   +21%     +32 .plt.got         +32   +21%
   +24%     +32 .tbss              0  [ = ]
  +1.9%     +26 .dynstr          +26  +1.9%
  +5.4%     +10 .gnu.version     +10  +5.4%
 -28.9%     -24 [LOAD [RX]]      -24 -28.9%
 -13.8%     -64 .plt             -64 -13.8%
 -14.3%     -96 .rela.plt        -96 -14.3%
  [ = ]       0 [Unmapped]      -808 -41.2%
  +107%  +388Ki TOTAL         +388Ki  +107%

The amount of .text being generated here doubles the size of the binary.

Clap is really useful, and I'd love if I could use it on more constrained environments, where binary size might be expensive.

Chatting with @kbknapp , it seems that there are some changes to clap-rs underway to allow more flexible usage of the crate, potentially disabling some features at compile-time to shrink the binary, but I figured I'd file this bug to track.

Are there currently other ways of toggling clap features, to compare binary sizes for this evaluation?

@CreepySkeleton CreepySkeleton added C: other S-waiting-on-decision Status: Waiting on a go/no-go before implementing labels Feb 1, 2020
@pksunkara pksunkara added this to the 3.1 milestone Apr 9, 2020
@kbknapp
Copy link
Member

kbknapp commented May 2, 2020

Re-energizing this topic. Not necessarily for action prior to 3.0 release, but something to keep in our mind and begin to tackle as available.

My comment from #1891

Along those same lines but also a separate issue entirely, at some point I would like to have the discussion about aggressive cargo feature use. Ideally, all non-essential sections of clap could be disabled (but enabled by default) to allow small binary sizes when desired. I've had quite a few people / companies approach me about clap's binary size being their only issue. In fact, I believe that is the number one factor that led Google to work on argh, was binary size is their top concern (for Fuschia) and don't need a lot of the "extra" features that clap provides.

@CreepySkeleton's comment from that same issue:

On the other hand, people from this anti-dependency camp likely see clap as bloat-full and having too much allegedly unneeded deps irrespective of term_size. Clap already pulls 7 crates in --no-default-features mode, 22 in default-features mode, and 27 in all-features mode; one extra crate - very tiny and very useful btw - won't change their opinion. It's kind of all-or-nothing situation, 8 deps (including transitive) kind of count as "as small as possible". I really doubt any of them would become our user if we feature-gate one or two tiny crates.

That's not to mention that clap itself takes as much time to build as (almost) all of the deps combined, and the compilation time is a much more common complaint among users (in my view). Along with code bloat, which would only decrease without this feature.

Regarding modularization of clap.

I've been thinking about it too, for some time, and I've come to conclusion that, while it's doable - to some extent, it would require full-scale refactoring. And by full-scale I mean "rewrite the parser and the validator entirely" which is quite an endeavor.

Finally, @Dylan-DPC's comment from that issue:

Given how popular clap is, the number of dependencies is definitely a huge concern. I've been planing of tackling a lighter clap in some form or the other post the 3.0 release. That discussion is beyond the scope of this issue.

@kbknapp
Copy link
Member

kbknapp commented May 2, 2020

Just so my thoughts are known, I don't believe we need to remove all dependencies and features for the smallest possible size. What I do think is that we should know which dependencies have a low "weight to functionality" ratio.

"Weight" can assessed in two manners, code size added to clap, and compile time.

"Functionality" should be self evident.

I am definitely not in the camp of "make sure there are as few deps as possible." However, I am in the camp of I don't want to take on a dep that doesn't provide intrinsic value, or is a maintenance burden.

The harder crates to evaluate will be ones that don't add a lot to clap functionally, but at the same time don't add much if any compile time or code size either. I would propose for those style crates we look at either pulling the code internal, or dropping the crate.

I'm most concerned with the --no-default-features case, that case should be as small as possible, with as few high ratio deps as possible. I'm not concerned with the default features, or even --all-features cases, because at that point the consumer as chosen to go a heavier weight route, while there are lighter weight options.

This would be a great first issue, as it's not a lot of code changing, and doesn't require too much in depth knowledge of clap. At first, it will primarily be measuring the default crates, or seeing how widespread their use inside clap is.


Taking a quick look at our default crates I see:

Crate (+ transitive deps) Functionality Notes
bitflags Underpin of AppSettings
unicode-width Heavily used in Help Messages If we made auto generated help optional, this could become an optional feature
textwrap Used for Help Messages same as unicode-width
indexmap (autocfg) used in ArgMatcher/ArgMatches Perhaps we could use vec_map instead
os_str_bytes used in parsing clap fundamentally messes with OsStrs or bytes heavily...so this is probably well worth the dep
vec_map Used throughout Arg and usage parser Used to be an optional feature. Has perf gains over BTreeMap...maybe those gains need to be evaluated again to see if they're worth it

@kbknapp
Copy link
Member

kbknapp commented May 2, 2020

Doing a --no-default-feautres --features std build on a simple:

App::new("foo").get_matches();

Gives us this output via cargo bloat in release mode:

cargo bloat --crates --release
[..]
    Analyzing target/release/fake

 File  .text     Size Crate
 9.3%  56.5% 285.5KiB clap
 4.8%  29.3% 148.1KiB std
 1.9%  11.7%  59.2KiB [Unknown]
 0.1%   0.5%   2.6KiB indexmap
 0.1%   0.3%   1.6KiB textwrap
 0.0%   0.1%     307B fake
 0.0%   0.0%      41B os_str_bytes
16.5% 100.0% 505.5KiB .text section size, the file size is 3.0MiB

Note: numbers above are a result of guesswork. They are not 100% correct and never will be.

Discounting clap and std, it looks like textwrap and indexmap need a close eye. If either has better alternatives, or is simple enough to not use/pull internal I'd say those may be on the chopping block.

Also, if we did something drastic like make auto help messages optional, that alone would cut a huge portion of the code along with size. Additionally and oddly enough, that may the easiest thing to modularize first.

"But wait!" I can hear you say. "Auto help generation is like THE thing clap is used for, right?" Sure. It's one of the things, but I'd argue not the most important at all. In many of the contexts where code size matters the most, help generation is one the things they don't care about because it's super easy to just write a help string manually for a small CLI.

@CreepySkeleton
Copy link
Contributor

Before we more any further, would you please test it in release mode 😄? Nobody cares about debug binaries, release the ones of concern here.

@kbknapp
Copy link
Member

kbknapp commented May 2, 2020

Literally just edited the comment with that info 😆

@pksunkara
Copy link
Member

Did you enable lto? I think we should assume with that.

@CreepySkeleton
Copy link
Contributor

main.rs

use clap::*;

fn main() {
    App::new("app").get_matches();
}

With lto enabled (release build, no default features)

CreepySkeleton@builder:~/workspace/probe$ cargo bloat --crates --release
    [...]

 File  .text     Size Crate
19.4%  60.3% 277.7KiB clap
 9.0%  27.9% 128.5KiB std
 3.0%   9.2%  42.3KiB [Unknown]
 0.3%   1.0%   4.7KiB probe
 0.2%   0.6%   2.5KiB indexmap
 0.0%   0.1%     394B textwrap
32.2% 100.0% 460.3KiB .text section size, the file size is 1.4MiB

What interesting here is that the size of the main.rs crate increased from 307B to 4.7KiB. Basically, some code from std has been inlined and this is the biggest difference.

The overall difference is less than 10Kb (clap and its deps only) and it will thin as more features are actually used. All the further runs will be performed without LTO.

See also https://stackoverflow.com/a/52297790. TL;DR: LTO is not about reducing code size. z opt level is about it.

Another interesting read: https://internals.rust-lang.org/t/rust-staticlibs-and-optimizing-for-size/5746. No TL;DR here, go read, it's worthy.


Kevin, just to make sure we're on the same page: I wasn't counting you as a no-deps guy, I was just thinking aloud "how would we sell clap to those guys" and I'm pretty sure the answer is - we wouldn't.

If their wariness comes from concern that the deps affect binary size, we can righteously tell them that all the deps put together (in --no-default-features mode) take less than 5KiB, the next rustc release will likely change much more of that. The same can be said about compilation time: ~2sec difference, essentially a rounding error.

At the very least, if 5KiB do mater to you, you're working on embedded, and clap will likely never suit you. Take a look at getopts/argh or write it yourself.

I agree regarding help messages. It also drags error messages along.

What not clear to me is your logic about "either pulling the code internal, or dropping the crate". It sounds like "Let's copy-paste code from third party crates (we can't just drop them) in order to decrease our maintenance burden". But from that point on, we will be the ones who maintain this copy-pasted code. A contradiction. Could you please elaborate on that?

@pksunkara
Copy link
Member

At the very least, if 5KiB do mater to you, you're working on embedded, and clap will likely never suit you. Take a look at getopts/argh or write it yourself.

I don't think we should be saying that. We should be using cargo feature flags to make sure all the fields can use clap.

@TeXitoi
Copy link
Contributor

TeXitoi commented May 2, 2020

To make it work on embedded, you'll need do be alloc free (usable wIthout, String, Vec and such kind of collection).

@Dylan-DPC-zz
Copy link

There's a post 3.0 plan for that already which is why added the no_std feature

@CreepySkeleton
Copy link
Contributor

CreepySkeleton commented May 2, 2020

To clarify my statement about embedded:

  • On something like Rabspery PI clap will fly marvelously; you won't even need no-std there. It's close to a real PC after all.
  • If you have a chip with megabytes of ROM on board, you may be able to squeeze clap in. no_std, alloc, and accurate fine tuning (see modularization) will be required.
  • 128 KiB is all you've got? Too bad! Take a look at getopts (and you probably won't be able to afford even that).

To summarize all of that, you either have an extra hundred of KiBs at your disposal (and this is why I think we shouldn't optimize every single KiB) or clap is not for you.

And a bit of personal opinion: as someone with a (very limited) experience with embedded, I can say that allocations are almost tabu in there. Given that clap relies on Vec and stuff very heavily... I won't even try it if I need a CLI parser for embed. I'll go grab getopts.

@TeXitoi
Copy link
Contributor

TeXitoi commented May 2, 2020

Even getopts requires Vec and String. I personally don't see why I would need clap in embedded, that's a very niche usecase.

@TeXitoi
Copy link
Contributor

TeXitoi commented May 2, 2020

(embedded = bare metal on microcontroller)

@kbknapp
Copy link
Member

kbknapp commented May 2, 2020

A lot to reply to here :) I don't think we need to be concerned with embedded (bare metal) right now. Maybe one day, but there are many things we can do that will improve clap before we worry about embedded.

I'm more concerned with embedded like, i.e. resource constrained devices, like small arm based (Pi, SBC, phone, etc.). I spoke with some Google employees who were looking at clap for that exact use case (presumably Fuschia), and binary size was their concern. Binary size isn't everyone's concern. To be clear, I'm not trying to supplant argh, but I do think we should be cognizant of that use case while building clap.

We're in a little bit of a catch-22 scenario. Many people assume argument parsing is such simple task that the deps should be ultra small and no transitive deps. That's understandable why one would think that. However, as all of us know, it turns out argument parsing is somewhat of a hidden beast. There are edge cases EVERYWHERE, and a near endless amount of features to implement.

It turns out argument parsing is complex.

Having said that, the issue we're discussing conflates a few concerns that we should probably be more explicit about:

  • Binary size of clap
  • Compile times of clap
  • Adoption and Perception of clap

Both binary size and compile times are affected by:

  • Deps
  • clap features and code structure
  • Rust and LLVM compile time features (LTO, dead code elimination, monomorphization, etc.)

While adoption and perception can be heavily influenced by binary size, compile times, and the dep graph as it's own construct.

So long as we're doing our due diligence when it comes to crates we take on as dependencies to ensure they're worth the price we pay in terms of binary size and compile times, we're good. Alternatively, if they negatively affect those areas, we should do our best to ensure they're optional.

Meanwhile, we should be thinking about the dep graph as it's own construct because right, wrong, or indifferent people do care about how many crates they pull in. Unfortunately, everyone's reason for caring can be different, for some it's binary size, for some it's compile times, for others it's the perception of risk (what happens if a crates has a security flaw, or goes away over night?).

We're addressing the first two concerns already, but the third (perception of risk) is primarily addressed by either using ultra popular, well supported crates, or crates related to/owned by the parent (which is usually signalled by the name, i.e. clap_* for us).


"how would we sell clap to those guys" and I'm pretty sure the answer is - we wouldn't.

You're correct in that part of the answer is, "we don't." But I think we mean different things by this. I mean we "We don't try to sell them on clap, we just do the best we can to care about that scenario."

We can do things like making clap modular enough, or structured in a way to turn off features the end user doesn't care about. This has an added benefit of potentially being able to make most of our deps optional.

And where they can't be option, oh well. At least we're trying. clap isn't going to fit every situation and that's fine. But we should at least do what we can to be an option for even those resource constrained situations.


If their wariness comes from concern that the deps affect binary size, we can righteously tell them that all the deps put together (in --no-default-features mode) take less than 5KiB, the next rustc release will likely change much more of that. The same can be said about compilation time: ~2sec difference, essentially a rounding error.

I totally agree with you. However, the problem is many times we don't get the chance to explain ourselves. They see the dep graph front-and-center and many times simply make assumptions from there. That's the part about the "no dep" crowd I'm not a fan of. I can't count the number of reddit comments, or blog posts I've seen that say, "Can you believe [program X] pulled in 170 crates?! That's absurd!!!" with no backing evidence as to why that's absurd. Were those 170 crates not required? How much compile time did they add, etc.? If the compile times were the EXACT same, but [program X] just copy/pasted the code from those crates, you probably wouldn't hear a peep. In fact, you'd probably hear how amazing it was that [program X] has no dependencies!

...I'm not a fan of that. But it's the world we live in.


What not clear to me is your logic about "either pulling the code internal, or dropping the crate". It sounds like "Let's copy-paste code from third party crates (we can't just drop them) in order to decrease our maintenance burden"

I should have been more clear 😜 Decreasing our maint burden and decreasing our dep graph are separate concerns that get intertwined. Also taking the code internal is only in the case where we're using a dep for a small well defined subset of what the crate offers, and we aren't concerned with maint burden part, just the raw number of deps part. This should only be a last resort for us.

This is one of those things that's counter intuitive; copying and pasting a subset of a crate into clap directly, (or even into a new sub crate like clap_foo may actually not make any difference in compile times or binary size. But it will still make many people "feel" better about using clap.

Is it dumb. Yep. But again, this should be a last resort for us and not something I'm actually looking at doing seriously. The only crate that comes to mind for this type of thing is term_size which is technically already a clap owned crate, but again the name doesn't signal that so when people compile clap they just assume it's some rando crate we pulled in as a dep. Am I suggesting re-naming term_size to clap_foo? Nope. Not unless we were in drastic territory and have exhausted all other options. We're nowhere near that! 😄

@pksunkara
Copy link
Member

There are some good arguments regarding inclusion of deps in this reddit thread today, https://www.reddit.com/r/rust/comments/gi7v2v/is_it_wrong_of_me_to_think_that_rust_crates_have/

@BurntSushi
Copy link
Contributor

@kbknapp invited me to participate in this issue, but I have to say, from reading the comments, it seems like there's an undercurrent of counter-productive "us vs them." If I'm coming from (your perspective) the "anti-dependency" camp and you already believe that you can't sell clap to me

Kevin, just to make sure we're on the same page: I wasn't counting you as a no-deps guy, I was just thinking aloud "how would we sell clap to those guys" and I'm pretty sure the answer is - we wouldn't.

then is it really worth my time to participate here if I'm treated as a lost cause in the first place?

reddit and low effort comments are a pit, and people love to make sweeping generalizations that makes the world look more extreme than it is. In reality, I think most people exercise good judgment and try to straddle this line as best as they can. Sometimes I'm on the "anti-dependency" side of the fence and take steps to reduce my dependency tree. But other time, I'm on the "yeah dependencies are great" side and try to resist the urge to remove dependencies.

I think @kbknapp has the right attitude here. The struggle is the important part. Doing due diligence on dependencies before bringing them in, and really weighing their costs and benefits, is what's important. And to me, that's what I see myself doing when I notice that a new dependency is about to come into my tree. What I see is a prior state in which I could opt out, but now I can't, and I don't know why. The code that was deleted looked pretty small, its commit history suggests it was very lightly edited and its performance was more than good enough for me. Moreover, when I first dropped the vec-map dependency from my use of Clap, I noticed some really crazy compile time improvements. I suspect there was a bug somewhere or some weird case, but still, the flexibility to drop dependencies was really nice.

I'm not familiar with Clap internals, and as someone who has written more than one argument parsing library, I am definitely someone who appreciates their complexity. I never take a good argument parser for granted. They are super hard to build because they need to balance so many competing concerns. IMO, Clap does a great job of this already.

So with that said, I'll give some opinions on Clap 3's required dependencies:

  • bitflags - I would personally just hand write any bit flags logic that I needed. In my view, this is a crate that optimizes for writing code, which I generally think isn't the right thing to be doing in widely used fundamental crates. So there's a bit of a philosophical opinion involved here.
  • textwrap - I would make this optional but enabled by default. I write all of my help strings and documentation wrapped to 79 columns anyway, which is probably good enough.
  • unicode-width - I would make this optional but enabled by default. If my help text is all ASCII, then an assumption that each byte is one column is reasonable.
  • indexmap - I don't have enough context to know what to do here, but it looks like it was recently added. I assume this is justified somehow through a major performance win, but otherwise, I would drop this.
  • os_str_bytes - This is definitely better than using unsafe code. If I knew more about why Clap needed this, then perhaps there is a way to get rid of this too. But yeah, if Clap needs to be mucking around with OsStrs, then the lack of an OsStr string API probably makes this really hard to the point where you pretty much have to do what os_str_bytes is doing. (I still wish std would just expose its internal WTF-8 representation, but there are good arguments against doing that.)
  • vec_map - I don't know the performance story here, but keeping it optional looked easy? But maybe there are some issues I don't know about.

@kbknapp
Copy link
Member

kbknapp commented May 12, 2020

Thank you for the detailed response! Before I can finish reading the comment and add any thoughts, let me address something real quick as a moderation note:

from reading the comments, it seems like there's an undercurrent of counter-productive "us vs them." If I'm coming from (your perspective) the "anti-dependency" camp and you already believe that you can't sell clap to me

I want to be very clear with everyone reading these messages (especially including the clap team) that this is not what I want to portray or the tone I want to encourage. Internet communications via text are hard, especially when we start considering native languages and such. (Edit: to be even more clear. I don't think @BurntSushi is portraying this tone. He's reading and noticing this tone from previous comments we made. And I should say thank you for pointing this out! 😄 )

I just want to be clear that the clap team is on the same page of, there is no "us vs them" and all opinions are welcome! We have differences of priorities and how to achieve those priorities, but all opinions are valid within this discussion. Ultimately the clap team will have to make a decision on which priority and implementation strategy to go with, but through discussions and issues like these my goal is to take all opinions in, consider them carefully, and make a decision that is best for clap and all consumers (current and future).


...now back to reading 😄

@BurntSushi
Copy link
Contributor

BurntSushi commented May 12, 2020

Also, one other note: IIRC, cargo bloat has an option to list binary size by function. That might also be useful to look at for decreasing binary size. Sometimes you can get big wins by tweaking the inlining strategy used, for example. Or using trait objects is sometimes possible, although I've found it difficult to do in practice in many cases because of object safety rules.

@pksunkara
Copy link
Member

pksunkara commented May 12, 2020

I just want to be clear that the clap team is on the same page of, there is no "us vs them" and all opinions are welcome!

I strongly agree with this.

Unfortunately, I only started using Rust this year (maybe Dec last year), so I missed all the deps related discussions and am still trying to catch up on understanding the pros and cons related to rust ecosystem. That is why I was asking you the recent question on deps in termcolor @BurntSushi.

In Javascript before, while I do try to reduce our usage of deps, I still try to prefer commonly used packages and using them as deps instead of writing all of them myself which was the convention all across the ecosystem.

If we take the case of vec_map, my question is why do we want to make it optional? Shouldn't it be either including it completely or removing it completely? I think this is basic question we want to answer first.

@kbknapp
Copy link
Member

kbknapp commented May 12, 2020

If we take the case of vec_map, my question is why do we want to make it optional? Shouldn't it be either including it completely or removing it completely? I think this is basic question we want to answer first.

My understanding is there are several concerns that can also overlap with each other, and not everyone has the same concerns. As mentioned above, some of those include:

  • Binary Size
  • Compile Time
  • Transitive Deps (which repeats points 1-2 for each dep)

Adding a dep can have negative impacts on points 1 and 2, or both. Some people care about point 1, some people about point 2, and some about both. So if we consider either point a deps "cost" we have to weigh that cost against it's benefits (usually performance, or lower maintenance burden for us directly).

This is what I was trying to say earlier, we have to weigh any cost (including the subjective ones) against the objective benefit.

The reason for making it optional and not a binary include or not include is because people have various requirements. Some may not care about the binary size, or increased compile time if the feature provided is worth the cost, but since some do care we need a way to allow our consumers the same choices we're making at the library level.

For vec_map which is a self contained crate and no transitive deps, I would want to look at exactly what perf gains we get, compared with binary size/compile times. Add to that, since we pulled in IndexMap, perhaps we can use that instead and remove one or the other entirely? I haven't looked lately, so that may not be feasible.


To address the comments per dep from @BurntSushi :

  • bitflags - I hadn't really considered just pulling in the logic of bitflags to be honest. ...but now that you say it I'm sure it's actually simple subset since we don't particularly care about the macros, etc. just the functionality. Would probably worth a look at least.
  • textwrap - the hard part with this one is this dep helps us wrap at word boundaries intelligently. But perhaps making it optional for those that will manually wrap their help strings anyways.
  • unicode-width - I also like the idea of providing a, "I only use ASCII args/help strings (values excluded), so turn off all extra deps related to handling Unicode logic" flag
  • indexmap - Might be able to combine with vec_map, or relook at our exact use case. I need to familiarize myself with the code again.
  • os_str_bytes - This is probably staying for the time being. Clap is pretty much written with idea we use OsStr(ing) internally everywhere, and only assume/use UTF-8 at the consumer demarcation point. I've toyed with the idea of bstr since all we really care about is bytes, but not done enough research to see what overlap, if any, bstr and os_str_bytes have or which would be the best fit.
  • vec_map - Addressed in other comments. I just want to make sure we're getting a good cost:benefit ratio otherwise we may drop it entirely.

@mgeisler
Copy link
Contributor

mgeisler commented Feb 6, 2022

If these are raw binary sizes then I'm assuming they include debug information which explains why they are so large. Running size on the binaries will show the size of the different sections, including .text and the various debug sections.

Nope, it's the size of the binary compiled with --release and after stripping. Furthermore, I use a Cargo profile which I believe is optimized for letting the compiler inline as much as possible:

[profile.release]
lto = true
codegen-units = 1

This is from https://github.com/johnthagen/min-sized-rust. I did not use opt-level = "z" since I wanted to measure the size of a typical program optimized for speed instead of size (when testing with opt-level = "z" I actually didn't see any improvements in the delta).

That said, I think this is further showing the importance of a modularization effort. The cost for wrapping is being paid by everyone but is only needed for automatic, runtime help generation.

Yeah, though for me, having a --help that looks good on any terminal is an important part of a program. Unless you target an embedded platform with strict size limitations, I think a well-functioning command line program should be able to handle --help nicely.

As an example, cargo --help looks weird on my current terminal. This is the output (slightly abbreviated):

% cargo --help
Rust's package manager

USAGE:
    cargo [+toolchain] [OPTIONS] [SUBCOMMAND]

OPTIONS:
    -V, --version               Print version info and exit
        --list                  List installed commands
        --explain <CODE>        Run `rustc --explain CODE`
    -v, --verbose               Use verbose output (-vv very verbose/build.rs ou
tput)
    -q, --quiet                 Do not print cargo log messages
        --config <KEY=VALUE>    Override a configuration value (unstable)
    -Z <FLAG>                   Unstable (nightly-only) flags to Cargo, see 'car
go -Z help' for
                                details
    -h, --help                  Print help information
[...]

My best guess is that it's outputting a string which is hard-wrapped in the source code.

However, I might care more about well-wrapped text than most since I've been playing with textwrap for ~5 years now 😄

@epage
Copy link
Member

epage commented Apr 29, 2022

btw we recently dropped the dependency on memchr in 83f1b16

@epage epage modified the milestones: 3.x, 4.x Jun 8, 2022
epage added a commit to epage/clap that referenced this issue Aug 17, 2022
With being able to accept owned types now, `clap_derive` no longer needs
`once_cell` to make dynamic data static.

This helps towards clap-rs#1365, clap-rs#2037
epage added a commit to epage/clap that referenced this issue Aug 19, 2022
With being able to accept owned types now, `clap_derive` no longer needs
`once_cell` to make dynamic data static.

This helps towards clap-rs#1365, clap-rs#2037
Calder-Ty pushed a commit to Calder-Ty/clap that referenced this issue Aug 24, 2022
With being able to accept owned types now, `clap_derive` no longer needs
`once_cell` to make dynamic data static.

This helps towards clap-rs#1365, clap-rs#2037
epage added a commit to epage/clap that referenced this issue Aug 24, 2022
For now, there isn't much a custom implementation can do.

Going from `Rich` to `Null` drops about 6 KiB from the binary

This is a part of clap-rs#1365 and clap-rs#1384
epage added a commit to epage/clap that referenced this issue Aug 24, 2022
For now, there isn't much a custom implementation can do.

Going from `Rich` to `Null` drops about 6 KiB from the binary

This is a part of clap-rs#1365 and clap-rs#1384
epage added a commit to epage/clap that referenced this issue Aug 25, 2022
The immediate benefit is binary size but this also makes us more
flexible on the implementation, like allowing wrapping of `StyledStr`.

This removed 12 KiB from `.text`

This helps towards clap-rs#1365 and probably clap-rs#2037
@epage epage pinned this issue Sep 15, 2022
yujincheng08 added a commit to LSPosed/Metagisk that referenced this issue Jun 20, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-meta Area: administrative question or tracking issue C-enhancement Category: Raise on the bar on expectations S-waiting-on-mentor Status: Needs elaboration on the details before doing a 'Call for participation'
Projects
None yet
Development

No branches or pull requests