Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

0.2 Syntax regarding .at() and .get() got a little less usable #338

Closed
winks opened this issue Feb 12, 2017 · 6 comments · Fixed by #343
Closed

0.2 Syntax regarding .at() and .get() got a little less usable #338

winks opened this issue Feb 12, 2017 · 6 comments · Fixed by #343
Labels

Comments

@winks
Copy link

winks commented Feb 12, 2017

Hi,
not a huge problem but I just stumbled over this while bumping "regex = 0.1" to "0.2:

I was using a lot of this (IRC parsing code):

    let caps = re_priv.captures(line).unwrap();
    let sender = caps.at(1).unwrap()
    let msg = caps.at(2).unwrap_or("");

Now it looks like this (thanks #rust for the help)

    let caps = re_priv.captures(line).unwrap();
    let sender = caps.get(1).unwrap().as_str();
    let msg = caps.get(2).as_ref().map(regex::Match::as_str).unwrap_or("");
    // alternative:
    let msg = caps.get(2).map_or("", |m| m.as_str());

So my reasoning is now this:

  • In this use case I don't care about much error handling
  • this is inherently text, so no match/empty string can make sense
  • both of the two 0.2 versions are either verbose or not easily guessable (at least to a beginner)

So I don't really want to complain, but I think the syntax (for this use case) got less usable and more verbose, but I don't know if it's worth fixing or if .at() had any other problems.

@BurntSushi
Copy link
Member

I'm not sure what to do about this. What do you think the alternative is? Adding more methods doesn't seem like a good idea, since that increases the number of choices a newcomer is faced with, which can be daunting.

Did you look at the documentation while writing code? Could you say something about why that didn't help you? In particular, take a look at the examples for captures, which shows yet another alternative:

let msg = caps[2];

In this use case I don't care about much error handling

The standard answer here is "this is what unwrap is for." But in this case, you can access captures by index. e.g., let caps = re.captures(line).unwrap(); println!("{}", caps[2]).

this is inherently text, so no match/empty string can make sense

But not always. The API must provide a way to differentiate between "no match" and "empty string match."

both of the two 0.2 versions are either verbose or not easily guessable (at least to a beginner)

The key change was the addition of the Match data type, which encapsulates both the positions and the text of any particular match. While it introduced a new data type (an increase in complexity) it also decreased the number of different accessor methods (a decrease in complexity).

@BurntSushi
Copy link
Member

BurntSushi commented Feb 12, 2017

OK, after reading IRC and reflecting more, I see that you considered the &caps[2] syntax, but didn't want it because it panics if there's no match. In that case, you need to handle the "no match" case explicitly, so doing the explicit unwrap seems fine to me. I personally think either of these are pretty palatable:

let msg = caps.get(2).map(|m| m.as_str()).unwrap_or("");
let msg = caps.get(2).map_or("", |m| m.as_str());

Doing the eta reduction in this case tends to make it a bit more verbose because of the as_ref call.

@winks
Copy link
Author

winks commented Feb 12, 2017

Thanks for the quick answer.

I hadn't thought about "no match" vs "empty string match", correct.

Maybe I'm biased as I hadn't known map_or and didn't think of using map. I'm just saying I think the unwrap_or("") as a proxy of "I don't care much about (exact) error handling" - i.e. I don't want unwrap to panic, but I know what I'm doing and "empty string" is fine - but I'm probably oversimplifying and this is a special use case that just sounds normal to me :)

I think adding this map_or example to the docs as a "hey, I just need the match or an empty string and don't want to futz around with Option(Match) in this case, just like the old .at(x).unwrap_or("") would be 100% fine to me.

TLDR: the use in 0.1 seemed very convenient while I think in 0.2 this is less nice of a shorthand. Nothing more.

@BurntSushi
Copy link
Member

Improving the docs is something I can get on board with.

In general, I think I'm hesitant to add new methods that conveniences over a simple combinator or two, and would prefer to solve those issues for newcomers with better docs. However, I'd be willing to bend on that if there was compelling evidence that it was a big stumbling block for folks.

@BurntSushi BurntSushi added the doc label Feb 12, 2017
@BurntSushi
Copy link
Member

I have a doc fix incoming for this.

@winks I think it's worth pointing out that in some number of cases, using indexing, e.g., &caps[3], might very well be sufficient. In particular, if you have a Captures, then you know you have a match, and if you know capture group 3 had to have matched if the entire regex matched, then &caps[3] will never panic. The only reason why you'd need to drop down to caps.get(3) is if capture group 3 was in an optional path of the regex and therefore may not have necessarily participated in the match.

BurntSushi added a commit to BurntSushi/regex that referenced this issue Feb 18, 2017
The example shows how to get the matched text of any capture group while
defaulting to the empty string if that particular group didn't participate
in the match.

Fixes rust-lang#338
@BurntSushi BurntSushi mentioned this issue Feb 18, 2017
@winks
Copy link
Author

winks commented Feb 18, 2017

Those comments look awesome, I can't wish for more :) 👍

For completeness, this was my main example (sending !quit or !quit some_text to an irc bot):

    let event_cmd_quit = format!("!quit(\\s+)?(.*)?");
    let re_cmd_quit = Regex::new(&event_cmd_quit[..]).unwrap();
    let caps_cmd = re_cmd_quit.captures(msg).unwrap();
    let quit_msg = caps_cmd.get(2).map_or("", |m| m.as_str());

it has two possible matches in the end, but just the non-captured part is fine on its own. Seeing it again I should've probably used '(\s+(.+))?' - but I guess with "optional capture group at the end" it's always the same edge case that &caps[x] is not the best choice.

Anyway, this can be closed from my point of view - and thanks for your time.

bors added a commit that referenced this issue Feb 18, 2017
Fixes

This PR contains a series of commits that fixes several minor bugs.

Fixes #321, Fixes #334, Fixes #326, Fixes #333, Fixes #338
@bors bors closed this as completed in #343 Feb 18, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants