-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Regex: enable support for partial matches #33688
base: master
Are you sure you want to change the base?
Conversation
Alternative to #33646 which I hadn't seen earlier unfortunately. |
The bit I'm not sure about is the |
Indeed,
Now, that's probably not something we're ever going to support, so maybe that's a good thing? At least we know that Perl won't introduce the Ruby on the other hand only has these modifiers according to the Regexp docs:
Anyway, it seems safest to only allow the option initially and we can consider the |
EDIT: ouch, I hadn't seen your second answer, I kept this comment opened for a while before posting!
Oh I didn't know! So for Perl5 it looks like "p" modifier meant something, but is ignored since version 5.20 (cf. https://perldoc.perl.org/perlre.html for example). So I'm not attached to the Side note: actually, in Perl6, (it looks like) "matching adverbs" can be used only when matching, not when defining a regex independantly of a matching context. This raises the question in Julia of whether a matching option, like "partial match", should really be part of the |
missing | ||
else | ||
rc >= 0 | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm confused about why you're returning missing
here. Is it as a way to indicate that there was a partial match?
If you want to do that, maybe it's better just to return rc
instead and let the caller check the return value? It seems like a weird pun to use missing
to mean "partial match".
Or maybe we define an enum for the return values or something.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it as a way to indicate that there was a partial match?
Here it's an implementation detail, but it gives directly the property that occursin(rx, str)
can return missing
for a partial match (cf. the OP). Partial match can be interpreted as "the end of the string is still not available, can I know yet whether I have a match?" In the context, missing
means I don't know yet. And this allows to use three valued logic, which makes sense to me here. But in this function here indeed it is only an implementation detail which I don't mind to change, but I'm interested also to discuss the API for occursin
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. In my original PR, #33646, I simply returned true
if the match is a partial match and you explicitly requested partial matching, OR if it's a complete match.
I guess the downside is that you lose the information about which type of match you got.
I like the change you made in this PR to propagate whether or not it was a partial match by adding a partial
field to the RegexMatch
. I just don't love the pun on three-valued logic. In my view, the match isn't missing at all -- it's partially matched, which is what you were asking it to check for. I would just change this to either return rc
directly, and check it down below, or even better would be to define an enum here, and use that below.
Could you make that change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for such a delay! I made the change now, by using an Enum (no, constants, as @enum
doesn't work in "pcre.jl"). I think I agree that a partial match in occursin
should return true
. Maybe we can consider later to add an option to get the information, e.g. occursin(rx, str, partial=missing)
to mean: if the match is partial, return missing
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with @StefanKarpinski about the above. It's unfortunate, but i don't think we can use p
to mean partial, because of weird conflicting flag names. We can always add it back later, as he suggests, but it would be nice to have this PR merged! :)
I've left another follow up comment on your use of missing
. Sorry it took me so long to reply! :)
missing | ||
else | ||
rc >= 0 | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. In my original PR, #33646, I simply returned true
if the match is a partial match and you explicitly requested partial matching, OR if it's a complete match.
I guess the downside is that you lose the information about which type of match you got.
I like the change you made in this PR to propagate whether or not it was a partial match by adding a partial
field to the RegexMatch
. I just don't love the pun on three-valued logic. In my view, the match isn't missing at all -- it's partially matched, which is what you were asking it to check for. I would just change this to either return rc
directly, and check it down below, or even better would be to define an enum here, and use that below.
Could you make that change?
b3708eb
to
f6bfae3
Compare
I trimmed down this PR to the minimum, i.e. it doesn't add any new official API, and I followed @NHDaly 's suggestion to not use Introducing a "match flag" in the I also didn't add the possibility to specify partial matches in the So only the undocumented |
f6bfae3
to
2d1bbf8
Compare
Whatever happened to this PR? Should we close or can it be necro-ed? |
String
str
is a partial match for regexr
whenstr
is a potential prefix of a match forr
. See for example https://www.pcre.org/current/doc/html/pcre2partial.html for a motivating example. Another example where I may need it is for #33617PCRE supports that, so it's easy to expose this functionality, only the API has to be decided.
There are two options for partial matches:
PCRE_PARTIAL_SOFT
andPCRE_PARTIAL_HARD
. With the soft option, a full match is prefered over a partial one, the hard option is the opposite.The proposed API here is:
Regex
constructor: add thep
flag to meanPCRE_PARTIAL_SOFT
, e.g.r"abc"p
match
: add apartial
field to mean that the match was partialeachmatch
: seems to work automatically (only the last element of the iterator can be partial), but we may forbid the presence of a partial flagfindnext
: still needs to be updated, but I guess the choice of supporting partial matches should match that foreachmatch
occursin
/startswith
: with the current implementation,occursin(r, str)
returns missing ifstr
is a partial match (implemenation detail leak). I don't find it so bad: if we consider that the stringstr
is "incomplete" when a partial flag is passed, and there is a partial match, we can only conclude that the information needed to say whether there is a match (the yet unvavailable end of the string) is missing. Should this returntrue
instead? (then we can't discriminate between a full & partial match).endswith
fails with the "partial" options (incompatible options).