Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More NotWordBoundaryAscii woes #264

Closed
SeanRBurton opened this issue Jul 11, 2016 · 3 comments
Closed

More NotWordBoundaryAscii woes #264

SeanRBurton opened this issue Jul 11, 2016 · 3 comments

Comments

@SeanRBurton
Copy link
Contributor

I'm sorry to say that another version of issue #241 has reared its ugly head.

Running Group { e: NotWordBoundaryAscii, i: Some(1), name: None } against "\u{28f3e}" in the same way as in that issue gives the match None with the DFA, but Some((0, 0)) with the NFA.

@BurntSushi
Copy link
Member

BurntSushi commented Aug 5, 2016

A smallerish repro:

extern crate regex;

fn main() {
    let re0 = regex::Regex::new(r"(?-u)\B").unwrap();
    let re1 = regex::Regex::new(r"(?-u)(\B)").unwrap();
    let s = "\u{28f3e}";
    assert!(re0.is_match(s));
    assert!(re1.is_match(s));
    assert!(re0.find(s).is_some());
    assert!(re1.find(s).is_some());
    assert!(re0.captures(s).is_some());
    assert!(re1.captures(s).is_some());
}

@BurntSushi
Copy link
Member

The above actually fails on the last assert. My guess is that something about the DFA -> NFA interaction is bunk, because the presence of the capture group and the call to captures is what will force the exec engine to run the NFA after the DFA.

@BurntSushi
Copy link
Member

Indeed. The code was trying to be cute about setting the end of the haystack. In this case, it was adding 1 unconditionally to the end of the match location when in fact it needs to add a number of bytes equal to the next encoded codepoint.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants