Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nfa vs dfa mismatch #345

Closed
BurntSushi opened this issue Feb 20, 2017 · 3 comments
Closed

nfa vs dfa mismatch #345

BurntSushi opened this issue Feb 20, 2017 · 3 comments
Labels

Comments

@BurntSushi
Copy link
Member

BurntSushi commented Feb 20, 2017

This program does not pass all asserts:

extern crate regex;

use regex::internal::ExecBuilder;

pub const HAYSTACK_BYTES: &'static [u8] = b"\x3D\x86\x3D\x79";

fn main() {
    let pattern = r"=\b";
    let haystack = &HAYSTACK_BYTES;
    let re0 = ExecBuilder::new(pattern).build().unwrap().into_byte_regex();
    let re1 = ExecBuilder::new(pattern).nfa().build().unwrap().into_byte_regex();
    assert_eq!(re0.is_match(haystack), re1.is_match(haystack));
    assert_eq!(re0.find_iter(haystack).collect::<Vec<_>>(),
               re1.find_iter(haystack).collect::<Vec<_>>());
}

Found by @lukaslueg in #321

@BurntSushi
Copy link
Member Author

Looks like this is probably a discrepancy in how word boundaries and invalid UTF-8 are handled in the matching engines. Notably, the pattern =(?-u:\b) also fails, but the is_match results are inverted.

@SeanRBurton
Copy link
Contributor

SeanRBurton commented Dec 10, 2017

A similar example:

fn main() {
    let re0 = ExecBuilder::new("$\\B(?m:^)").only_utf8(false).build().unwrap();
    let re1 = ExecBuilder::new("$\\B(?m:^)").only_utf8(false).nfa().build().unwrap();
    let re0 = re0.into_regex();
    let re1 = re1.into_regex();
    assert_eq!(re0.is_match("\n"), re1.is_match("\n"));
}

@BurntSushi
Copy link
Member Author

This appears to no longer be an issue. I suspect it was fixed by #561.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants