Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplify the takes_value API (range-based takes_values) #2688

Closed
epage opened this issue Aug 13, 2021 Discussed in #2676 · 16 comments
Closed

Simplify the takes_value API (range-based takes_values) #2688

epage opened this issue Aug 13, 2021 Discussed in #2676 · 16 comments
Assignees
Labels
A-builder Area: Builder API C-enhancement Category: Raise on the bar on expectations E-hard Call for participation: Experience needed to fix: Hard / a lot M-breaking-change Meta: Implementing or merging this will introduce a breaking change.
Milestone

Comments

@epage
Copy link
Member

epage commented Aug 13, 2021

Discussed in #2676

Originally posted by epage August 10, 2021
Currently, we have

  • takes_value (accept a value per occurrence)
  • multiple_values (accept multiple values per occurrence)
  • min_values (minimum number of values across all occurrences)
  • max_values (maximum number of values across all occurrences)
  • number_of_values (average number of values per occurrence)

We then have to clarify how these are related to each other, for example:

NOTE: This does not implicitly set Arg::multiple(true). This is because -o val -o val is multiple occurrences but a single value and -o val1 val2 is a single occurrence with multiple values. For positional arguments this does set Arg::multiple(true) because there is no way to determine the difference between multiple occurrences and multiple values.

(taken from docs.rs, since updated to say multiple_occurrences).

I had already been brainstorming on this idea when kbknapp mentioned

I think the crux of the problem is that we previously didn’t do a good job of articulating the difference between multiple_values, multiple_occurrences, and max_values_per_occurrence.

I think we can simplify this problem down to

  • takes_values(num: impl clap::IntoRange) with
    • impl IntoRange for usize
    • impl IntoRange for ...each range type
    • The range is for values per occurrence
  • Remove all other previously mentioned functions as either
    • Redundant
    • Easily confused with takes_values(range) but works differently and the value of native support is relatively low

So you can do take_values(0), take_values(2), take_values(1..), take_values(0..=1)

  • Really, the motivation is to support all the various range types. Taking in a number is just a convenience while we are at it. I considered taking in a bool, but I think that is making it too magical and isn't as needed.
  • This was named based on Remove magic arg type in `Args` #2674 going through and takes_values(1) being the default.

While I was already talking with kbknapp, I brought this up and is thoughts were:

That is an interesting idea! It feels like it might incur some cognitive load to understand, although once you see it it makes total sense. I'd love to find a way to play with this idea without throwing it front and center and in the API all of a sudden.

Notes

For the question from kbknapp

For example, which makes more sense for a positional argument; multiple_values or multiple_occurrences? I think you and I would say multiple_values, but I can totally see how someone may try multiple_occurrences. This was papered over by having just multiple previously and things just working. However, we’ve seen that strategy didn’t work in the long run as it created more ambiguity than it solved.

I would say occurrences, actually. This would leave room for the two features to compose, like multiple_occurrences(true).takes_values(2). Not a common pattern but I think it speaks to an API when the features compose together in a natural way which this allows.

RE Delimiters

I would posit that common cases for multiple values are

  • 0..=1
  • 2
  • 0.. or 1.. with a delimiter

We should raise awareness of require_delimiter in the docs for take_values.

Benefits

  • Clarifies our messaging on these settings plus occurrences since the focus is on fixed number of values or delimiters
  • Reduces ambiguity around how the flags interact with occurrences and values
  • Clear, concise way of specifying all related parameters rather twiddling with each one individual and there being confusion on how they interact
  • Uses higher level Rust types rather than a bool and numbers just hanging around (which ties into the Rust API guidelines)
  • I suspect this will also help us towards a more maintainable value/occurrence/inferring behavior.
@pksunkara pksunkara added this to the 4.0 milestone Aug 14, 2021
@epage epage self-assigned this Oct 19, 2021
@epage epage added the M-breaking-change Meta: Implementing or merging this will introduce a breaking change. label Oct 19, 2021
@epage epage added A-builder Area: Builder API A-derive Area: #[derive]` macro API C-enhancement Category: Raise on the bar on expectations E-hard Call for participation: Experience needed to fix: Hard / a lot and removed C: args A-derive Area: #[derive]` macro API labels Dec 8, 2021
@ldm0 ldm0 self-assigned this May 2, 2022
@ldm0
Copy link
Member

ldm0 commented May 4, 2022

I think this could be a non-breaking change by not changing existing api and behaviour.

Wanna add two methods: takes_values, takes_values_total

  • Without multiple occurrences:

takes_values(X) == takes_values_total(X)
takes_values(1..) == .min_values(1)
takes_values(..4) == .max_values(3)
takes_values(1..4) == .min_values(1).max_values(3)
takes_values(1..=3) == .min_values(1).max_values(3)
takes_values(2) == number of values strictly equals to 2

  • with multiple occurrences:

takes_values_total(2..) == total number of values doesn't smaller than 2
takes_values_total(..4) == total number of values doesn't exceed 3
takes_values_total(..=3) == total number of values doesn't exceed 3
takes_values_total(4) == total number of values equals to 4
takes_values(X) == takes_values_total(X) but for each occurrence

min_values max_values's internal representation can be easily migrated(to share the same internal representation with takes_values_total).

But number_of_values cannot be migrated due to it's strange internal representation.

Currently number_of_values's behaviour with multiple occurrences on is pretty strange: when number_of_values(2), clap accepts --flag 1 --flag 2 3 4, since 4 % 2 == 0.

The reason is that before the refactor for implementing grouped_values_of, there is no way to check number of values in each value group. After takes_values is introduced, we can use takes_values(2) to accept --flag 1 2 --flag 3 4 and reject --flag 1 --flag 2 3 4, which looks much better.

@ldm0
Copy link
Member

ldm0 commented May 4, 2022

After these apis are introduced, deriving Vec<[T; N]> and [[T; N1]; N2] could be implemented.

In clap 4, I think number_of_values should be removed due to it's strange behaviour.

@epage
Copy link
Member Author

epage commented May 5, 2022

There are some points that I think need some clarification

  • Your listing says "Without multiple occurrences", do you mean "Independent of multiple occurrences"?
  • I appreciate showing what the new API maps to in the old API but I think it'd also help if you were explicit on what the intent is
  • Could you explain what is wrong with number_of_values that you feel the new API is needed and why number_of_values should be deprecated?
  • You mention deprecating number_of_values, would you also recommend deprecating min_values and max_values?
  • How much of a use case is there for takes_values_total? I'm used to limiting the number of entries for tuple-like arguments where its tied to the occurrence. Should we instead shift the emphasis to user-side validation or more generic validation mechanisms?
    • As part of our effort to reduce complexity, binary size, and build times, we are wanting to emphasize creating building blocks for users and make things easy in the common cases rather than in every case.

@ldm0
Copy link
Member

ldm0 commented May 5, 2022

Your listing says "Without multiple occurrences", do you mean "Independent of multiple occurrences"?

without multiple occurrences == arg.multiple_occurrence(false)

Could you explain what is wrong with number_of_values that you feel the new API is needed and why number_of_values should be deprecated?

check sentences after "But number_of_values cannot be migrated due to it's strange internal representation."

You mention deprecating number_of_values, would you also recommend deprecating min_values and max_values?

Yes, they can be deprecated, but it's not a must. Since they becomes shortcuts of takes_values_total after #3678

How much of a use case is there for takes_values_total?

Definitly needed. This has little binary size & build times influence and it's makes the logic completes. min_values and max_values are just shortcuts of them.

@epage

@epage
Copy link
Member Author

epage commented May 5, 2022

There is a lot of being elided from the descriptions being given. Instead of copying your past notes, could you re-write this addressing the following:

  • What is the current behavior of min_values and max_values and what problems does it have?
  • What is the current behavior of number_of_values and what problems does it have?
  • What is the proposed behavior of the new functions?
  • How are the new functions related to the old functions?
  • How do the new functions solve problems with existing functions?
  • How do the new functions solve any other use cases, like the derive case?
  • Why should we not solve the problem with other existing or planned features?
    • We are not just targeting binary size and compile times but providing an API surface where people can find the features they need.

I just want to re-emphasize that communicating our thoughts and recording them down is important in a collaborative open source project. We need to get each other on the same page and help people in the future understand the motivation for why things were done.

@ldm0
Copy link
Member

ldm0 commented May 6, 2022

  1. min_values(N) == takes_value_total(N..) max_values(N) == takes_value_total(1..=N), and the only problem is that their functionaliry are duplicated. They can be removed, but I have no bias.
  2. Currently number_of_values's behaviour with multiple occurrences on is pretty strange: when number_of_values(2), clap accepts --flag 1 --flag 2 3 4, since 4 % 2 == 0. I dislike it.
  3. I think this is clear:
  • without multiple occurrences:
    takes_values(X) == takes_values_total(X)
    takes_values(1..) == .min_values(1)
    takes_values(..4) == .max_values(3)
    takes_values(1..4) == .min_values(1).max_values(3)
    takes_values(1..=3) == .min_values(1).max_values(3)
    takes_values(2) == number of values strictly equals to 2
  • with multiple occurrences:
    takes_values_total(2..) == total number of values doesn't smaller than 2
    takes_values_total(..4) == total number of values doesn't exceed 3
    takes_values_total(..=3) == total number of values doesn't exceed 3
    takes_values_total(4) == total number of values equals to 4
    takes_values(X) == takes_values_total(X) but for each occurrence
  1. Stated above。
  2. Currently there is no way to accept --flag 1 2 --flag 3 4 and reject --flag 1 --flag 2 3 4, the new api makes it possible. And also, the new api is not for problem solving, it's for simplicity.
  3. It makes deriving Vec<[T; N]> and [[T; N1]; N2] possible.
  4. The api is simple enough and expressive enough to be added. And people can easily understand them(compared to number_of_values)

@epage
Copy link
Member Author

epage commented May 6, 2022

Please take the time to write up a clear description rather than repeating what has previously been said. If it feels like it'll take a long time to do so, consider how long having unclear explanations has already taken.

Think in terms of a random clap user wanting to look at this proposal to understand what is happening. How well can they understand it?

Currently number_of_values's behaviour with multiple occurrences on is pretty strange: when number_of_values(2), clap accepts --flag 1 --flag 2 3 4, since 4 % 2 == 0. I dislike it.

This is repeating what was said before. This is an example which help to clarify descriptions but aren't sufficient to describe behavior on their own as it leaves the user to infer what is happening which leaves room for misunderstanding

I think this is clear:

The fact that this is now the third time I'm asking for a re-explanation I think demonstrates that this is not an effective way of communicating things

  • Its describing behavior conditioned on what multiple_occurrences is being set to but doesn't describe how the functions behavior with the opposite setting
  • Its describing the API in terms of other parts of the API when I'm looking for a description of intent where it stands alone so there is no ambiguity.
  • If takes_values(1..) == .min_values(1) and min_values(N) == takes_value_total(N..), then doesn't that mean that takes_values(1..) == takes_value_total(1..)?

It makes deriving Vec<[T; N]> and [[T; N1]; N2] possible.

I can't see how the proposed helps with this because as described, it doesn't seem to handle this.

  • See my earlier comment "If takes_values(1..) == .min_values(1) and min_values(N) == takes_value_total(N..), then doesn't that mean that takes_values(1..) == takes_value_total(1..)?"

@epage
Copy link
Member Author

epage commented May 6, 2022

How do these new functions interact with multiple_values and takes_value?

Working on some other Issues reminded me that its not just these flags but how some of them compose together

epage pushed a commit to epage/clap that referenced this issue May 6, 2022
This is split out of clap-rs#2688 as several changes I'm working on need it.
@tmccombs
Copy link
Contributor

Are the issues with the current options resolved by using an action of ArgAction::Append?

For example, if I had:

Arg::new("test")
    .short('t')
    .action(ArgAction::Append)
    .min_values(2)
    .max_values(3)

would that mean that --t 1 2 -t 3 4 -t 3 5 6' is valid, and result in a value of vec![vec!["1", "2"], vec!["3", "4"], vec!["3", "5", "5"]]and -t 1 -t 2 3` is invalid?

The documentation isn't very clear on this.

@tmccombs
Copy link
Contributor

tmccombs commented Jun 15, 2022

Looking through the source code, it looks like it probably doesn't. So I'll try and answer the questions @epage listed in a way that doesn't depend on the new API proposed.

  • What is the current behavior of min_values and max_values and what problems does it have?

min_values and max_values set limits on the total number of values, rather than on the number of values in a single occurrence of the option. So if an option is supplied multiple times, these limits apply to the sum of the number of values in all such occurences. Clap doesn't currently have a way to express limits on the number of values supplied to a single occurrence when multiple_occurences is true, or the Append action is used. This is problematic in a number of scenarios such as:

  • If an option can be supplied than once, and each occurence must have at least one (and possibly more) value supplied. Should generally be used with a delimiter, terminator, etc.
  • If an option can be optionally supplied with a value for each occurence
  • If an option can have at most N values (for some value of n) per occurence

See also #3542

  • What is the current behavior of number_of_values and what problems does it have?

Currently, if number_of_values is supplied, and multiple_occurences is true, or the Append action is used, then rather than requiring each occurrence to have that number of values, it requires that the total number of values is a multiple of that number.

For example, if you supply a number_of_values of 2 for an option that can occur multiple times, then -t 1 -t 3 4, -t 1 2 3 4, -t -t 1 -t 2 3 4, and -t 1 -t 2 -t 3 -t 4 are all considered valid, which is almost certainly not what you want.

I suspect this behavior wasn't intentional, and could probably be considered a bug. However, even if it is fixed, it is somewhat confusing that number_of_values is per-occurrence, while min_values and max_values are total.

  • What is the proposed behavior of the new functions?

From what I understand:

takes_values_total takes a possibly open range that specifies the minium and maximum number of values combined across all occurrences. If the number of values in total is outside of the specified range, the arguments are invalid. This is equivalent to a combination of the currentmin_values and max_values.

takes_values specifies a (possibly open) range of the number of values per occurrence. that is each occurrence of the option must have a number of values that lies within the range. If the number of values given to any one occurrence of the option falls outside the given range, it should be considered invalid.

  • How are the new functions related to the old functions?

I'm not sure there is much to add here. The old functions could be deprecated and implemented in terms of the new ones. And removed in a backwards incompatible release. Or possibly they are considered valuable enough to keep. It's also possible the in a backwards incompatible release, the behavior of these functions is changed to be per-occurrence instead of total. Alternatively, new min_per_occurence, max_per_occurence, and number_of_values_per_occurence (bikeshed the names) could be added.

  • How do the new functions solve problems with existing functions?

takes_values_total does not solve the problems mentioned above, but might be a slightly nicer API.

takes_values provides an api to constrain the number of arguments on a per-occurrence basis when multiple occurrences of the same option are allowed.

  • How do the new functions solve any other use cases, like the derive case?

I'm not sure.

  • Why should we not solve the problem with other existing or planned features?

I don't really care how the problem is solved, as long as there is a way to have constraints on the number of values supplied per-occurrence. I mentioned some other possibilities above. Perhaps another way to do it would be to have a different Action variant, which would switch into a mode where min_values, max_values, and number_of_values applies to each occurrence rather than the total (or maybe change it so the existing Append action behaves that way in v4?)

I would like to mention that the current behavior of min_values bit me, and it took me quite a while to figure out what was going on. At the very least the documentation for min_values and max_values should be updated to explain that it is the total number of values, not the values for a single occurrence.

@epage
Copy link
Member Author

epage commented Jun 15, 2022

I don't really care how the problem is solved, as long as there is a way to have constraints on the number of values supplied per-occurrence

I've updated my original proposal to clarify that takes_values(range) is intended to be per occurrence.

And while I appreciate the thorough investigation into ldm0's work, I think the more important aspect is to understand their intent and working through that to come to agreement before they move forward with their implementation.

@mzagrabe
Copy link

mzagrabe commented Jul 7, 2022

@epage What do you think of having a min_occurrences() function in similar fashion to min_values()? This could be used for ArgAction::Append options.

@epage
Copy link
Member Author

epage commented Jul 11, 2022

@mzagrabe what is the motivation / use-case for having a baked in min_occurrences? How does that compare to just using either required or #3008?

epage added a commit to epage/clap that referenced this issue Jul 25, 2022
For num_vals and friends, this only implements hacks until clap-rs#2688

Fixes clap-rs#3021
epage added a commit that referenced this issue Jul 28, 2022
fix: Misc clean up in prep for #2688
@epage
Copy link
Member Author

epage commented Aug 3, 2022

In master (v4) I've started work on the original proposal in this issue.

Current status:

  • Shift focus from takes_value / multiple_values to action in docs
  • Modify number_of_values to take a range
  • Change number_of_values to be per occurrence
  • Change number_of_values to be num_args (ie how many argv items are consumed)
  • Update arg! and derive to use number_of_values for Option<Option<T>>
  • Remove min_values and max_values from the API
  • Remove takes_value and multiple_values from API
  • Resolve how we use takes_value and multiple_values internally
  • Re-evaluate use_value_delimiter and require_value_delimiter with the new num_args behavior
  • Re-evaluate where num_args shows up in rustdoc output
  • Update tutorial to not primarily use arg! to better teach actions, drawing attention away from num_args

Somewhat related

  • Cleaned up rendering of arg values
  • Show ... for ArgAction::Count for args in usage
  • Correctly show optional values in usage for positionals and args

@epage
Copy link
Member Author

epage commented Aug 4, 2022

While not quite done, we've dropped about 2k from .text through this work

This was done by running cargo bloat --release --example cargo-example --features cargo against 11883b6 and #4028

epage added a commit to epage/clap that referenced this issue Aug 4, 2022
TakesValue helps because it provides a way for a lot of settings to say
a value is needed without specifying how many.  Multiple values didn't
have enough call sites to make this worthwhile.

This is a part of clap-rs#2688
@epage
Copy link
Member Author

epage commented Aug 9, 2022

Forgot to call out that #4050 took care of the tutorial

@epage epage closed this as completed Aug 9, 2022
edmondsfastly added a commit to fastly/dnstap-utils that referenced this issue Oct 27, 2022
See the clap CHANGELOG:
https://github.com/clap-rs/clap/blob/v4.0.18/CHANGELOG.md.

`parse(from_os_str)` for a PathBuf just becomes `value_parser`:

  - For `#[clap(parse(from_os_str)]` for `PathBuf`, replace it with
    `#[clap(value_parser)]` (as mentioned earlier this will call
    `value_parser!(PathBuf)` which will auto-select the right `ValueParser`
    automatically).

For `min_values = 2`, this becomes `num_args(2..)`. This is a new
interface that replaces a number of old ones:

    **`Arg::num_args(range)`**

    Clap has had several ways for controlling how many values will be
    captured without always being clear on how they interacted,
    including
    - `Arg::multiple_values(true)`
    - `Arg::number_of_values(4)`
    - `Arg::min_values(2)`
    - `Arg::max_values(20)`
    - `Arg::takes_value(true)`

    These have now all been collapsed into `Arg::num_args` which accepts
    both single values and ranges of values.  `num_args` controls how
    many raw arguments on the command line will be captured as values
    per occurrence and independent of value delimiters.

    See [Issue 2688](clap-rs/clap#2688) for
    more background.

`validator` has been removed, see
https://epage.github.io/blog/2022/06/clap-32-last-call-before-40/. We
were previously using `validator = ...` to set a function to validate
that the value of the DSCP parameter was in the allowed range (0-63).
This no longer requires an entire function and we can just write
`value_parser = clap::value_parser!(u8).range(0..63)`. However, we lose
the ability to detect Windows usage (where we don't support setting the
DSCP value) at argument parsing time.

For setting the verbosity level based on the number of `-v`'s:

  - For `#[clap(parse(from_occurrences))]` replaced with `#[clap(action
    = ArgAction::Count)]` though the field's type must be `u8` (#3794)
tkmcmaster added a commit to FamilySearch/pewpew that referenced this issue Jan 27, 2023
- Cleaned up the start-at to only parse once
- Fixed the multiple --include. Clap no longer allows multiple occurences, it only allows multiple passed on one occurence. See clap-rs/clap#2688 and https://github.com/clap-rs/clap/blob/master/CHANGELOG.md#400---2022-09-28
- This does introduce a bug that if you specify the config file immediately after --include(s) it will think it's part of the --include. The user must either pass another option after -i or put the config file before the -i
tkmcmaster added a commit to FamilySearch/pewpew that referenced this issue Jan 30, 2023
* Fixed clippy warnings

* Initial update of dependencies

* Updated clap and base64 dependencies

* Fixed multiple includes on try

- Cleaned up the start-at to only parse once
- Fixed the multiple --include. Clap no longer allows multiple occurences, it only allows multiple passed on one occurence. See clap-rs/clap#2688 and https://github.com/clap-rs/clap/blob/master/CHANGELOG.md#400---2022-09-28
- This does introduce a bug that if you specify the config file immediately after --include(s) it will think it's part of the --include. The user must either pass another option after -i or put the config file before the -i

* Updated base64 dependency in config and hdr-histogram-wasm

* Updated additional rust dependencies

* Initial update of Node.js dependencies

* Updated Chart.js dependency

* Updated actions to latest versions
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-builder Area: Builder API C-enhancement Category: Raise on the bar on expectations E-hard Call for participation: Experience needed to fix: Hard / a lot M-breaking-change Meta: Implementing or merging this will introduce a breaking change.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants