Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(re)design regex API #88

Closed
StefanKarpinski opened this issue Jun 26, 2011 · 9 comments
Closed

(re)design regex API #88

StefanKarpinski opened this issue Jun 26, 2011 · 9 comments
Assignees
Labels
breaking This change will break code

Comments

@StefanKarpinski
Copy link
Sponsor Member

The current approach of returning a RegexMatch object that sort of pretends to be a String is a bit fishy. We probably need to spend a good bit more time figuring out how to make application of regexes nicely usable.

@ghost ghost assigned StefanKarpinski Jun 26, 2011
@JeffBezanson
Copy link
Sponsor Member

There ought to be plenty of prior art here. I'm sure you know what to do from perl and ruby.

@StefanKarpinski
Copy link
Sponsor Member Author

Honestly, neither of these languages do a really clean job — there's generally lots of dynamic variables with funny names involved. That's how Perl's always done it, and Ruby inherited that bit of nastiness from Perl. Not sure we want to go that route. The "standard" way to pass the captures of a regex is to set the variables $1, $2, .... Oh, and also $``, $&, and $'`. There are more. It's kind of a mess. Does it work? Sure. Is it pretty? No.

@JeffBezanson
Copy link
Sponsor Member

Oh right that's rather awful. What about python?

@StefanKarpinski
Copy link
Sponsor Member Author

You can read about Python's support here. IMO, we're already beating Python because we can have special strings that generate regex objects at compile time and have convenient escape behavior. Strange as it may seem, I think we're in uncharted territory here. There's a reason why Larry Wall set out to give the whole business a makeover in Perl 6.

@StefanKarpinski
Copy link
Sponsor Member Author

So I had some interesting thoughts about this today. Currently the RegexMatch type is declared like this:

type RegexMatch
  match::Union((),String)
  captures::Tuple
  offset::Index
end

Now that .match field is a dead giveaway that something is rotten in Denmark: this Union((),String) type looks just like the kind of nonsense that the recent constructor revamp was designed to do away with. Except here it exists for a completely different reason: because the match() function returns a RegexMatch object whether the pattern matches or not. And that's the rub. The .match field has to both indicate whether there was a match and contain the match if there was one. But if there wasn't a match, why the hell do you need the RegexMatch object? You don't.

So a much better API would be something like this:

match(re::Regex, str::String, cb::RegexMatch-->Any)

where a RegexMatch object is only created if the pattern matches and the callback is only applied to that RegexMatch object when the pattern matches, so the .match field will never be (). Of course, then you'd also want to be able to specify an alternative action to take when the pattern doesn't match:

match(re::Regex, str::String, ifcb::RegexMatch-->Any, elsecb::None-->Any)

This is starting to look very much like an if-else expression, however. Or something that could be done with a macro. I'm just sketching with made up syntax now, but you'd want to write something like this:

match r"^a+b+a+$" "abba" do (m)
  # do something with m here
else
  # so something else (m doesn't exist)
end

This is horrible and muddled syntax, but you get what I'm driving at here. Why can't we do this with just an if-else? Two main issues: you can't conditionally bind something easily, and you can't return a value and have it act as a boolean. You'd want something like this:

if m = match(r"^a+b+a+$", "abba")
  # ^ m acts like a boolean here
  # but as a match object in here
else
  # m doesn't exist at all here
end

@StefanKarpinski
Copy link
Sponsor Member Author

Actually, this does remind me a bit of a pattern matching case statement:

case "abba"
when r"^a+b+a+$" => m
  # do something with m
else
  # do something without m
end

@JeffBezanson
Copy link
Sponsor Member

We could have such a pattern-matching case statement that calls a standard match method. match needs to return an object of this type:

type MatchResult{T}
  matched::Bool
  match::T
end

If matched is false, the other field is undefined. Then the case syntax can take care of checking this and introducing a binding for the value of the match field when it is valid. Or we could use something more like a Maybe type.

@ViralBShah
Copy link
Member

Is this more like a 2.0 issue?

@StefanKarpinski
Copy link
Sponsor Member Author

This can actually be done via the do-block syntax proposed in #441, which is not breaking. Thus this ceases to be a breaking feature.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
breaking This change will break code
Projects
None yet
Development

No branches or pull requests

3 participants