Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ExecBuilder::new() differs from ExecBuilder::new().nfa().build() #268

Closed
lukaslueg opened this issue Jul 18, 2016 · 6 comments
Closed

ExecBuilder::new() differs from ExecBuilder::new().nfa().build() #268

lukaslueg opened this issue Jul 18, 2016 · 6 comments

Comments

@lukaslueg
Copy link
Contributor

AFL found this example which causes the assertion in main() to fail. This is probably a duplicate of the other reports and there are quite a lot of execution-paths in regex-0.1.73 showing this behavior.

extern crate regex;

use regex::internal::ExecBuilder;

fn main() {
    let res = r"\x20|^d";
    let re0 = ExecBuilder::new(&res).build().unwrap().into_byte_regex();
    let re1 = ExecBuilder::new(&res).nfa().build().unwrap().into_byte_regex();
    let s = " d".as_bytes();
    let m0 = (re0.is_match(s),
              re0.find_iter(s).collect::<Vec<(usize, usize)>>(),
              re0.split(s).collect::<Vec<_>>(),
              re0.captures_iter(s).map(|c| format!("{:?}", c)).collect::<Vec<_>>());
    let m1 = (re1.is_match(s),
              re1.find_iter(s).collect::<Vec<(usize, usize)>>(),
              re1.split(s).collect::<Vec<_>>(),
              re1.captures_iter(s).map(|c| format!("{:?}", c)).collect::<Vec<_>>());
    assert_eq!(m0, m1);
    println!("Done");
}
@lukaslueg
Copy link
Contributor Author

Ping.

Another example is \x00*^$ which has is_match() being true or false depending on the execution engine.

@BurntSushi
Copy link
Member

@lukaslueg Thanks! I'll be issuing a PR with a fix shortly. I couldn't reproduce any problem with the \x00*^$ regex though. Could you please provide a haystack that you used?

@BurntSushi
Copy link
Member

The specific problem with r"\x20|^d" is that the literal detector falsely claimed that we could search for \x20 or d at any point. In fact, we cannot---we can only search for d at the beginning. We could make the literal detector (a lot) smarter, but for now, I fixed this by disabling literal optimizations when the regex is partially anchored. That is, when the regex has at least one matchable sub-expression that must match the beginning of the haystack.

@lukaslueg
Copy link
Contributor Author

More examples, gathered from a3422ff:

byte pattern "\xcc?^"

thread '<main>' panicked at 'assertion failed: `(left == right)` (left: `(true, [(0, 0)], [[], [141, 35, 59, 26, 164, 115, 51, 5, 102, 111, 111, 98, 97, 114, 88, 92, 15, 48, 116, 228, 155, 164]], ["Captures({0: Some(\"\")})"])`, right: `(true, [(0, 0), (1, 1)], [[], [141], [35, 59, 26, 164, 115, 51, 5, 102, 111, 111, 98, 97, 114, 88, 92, 15, 48, 116, 228, 155, 164]], ["Captures({0: Some(\"\")})", "Captures({0: Some(\"\")})"])`)', src/main.rs:57
byte pattern "^\xf7|4\xff\d\x8a\x8a\x8a\x8a\x8a\x8a\x8a\x8a\x8a\x8a\x8a\x8a\x8a##########[] d\x8a\x8a\x8a\x8a\x8a\x8a\x8a\x8a\x8a\x8a\x8a\x8a\x8a##########[] #####\x80\S7|$"

thread '<main>' panicked at 'assertion failed: `(left == right)` (left: `(false, [], [[141, 35, 59, 26, 164, 115, 51, 5, 102, 111, 111, 98, 97, 114, 88, 92, 15, 48, 116, 228, 155, 164]], [])`, right: `(true, [(0, 0), (4, 4), (22, 22)], [[], [141, 35, 59, 26], [164, 115, 51, 5, 102, 111, 111, 98, 97, 114, 88, 92, 15, 48, 116, 228, 155, 164]], ["Captures({0: Some(\"\")})", "Captures({0: Some(\"\")})", "Captures({0: Some(\"\")})"])`)', src/main.rs:57
byte pattern "^|ddp\xff\xffdddddlQd@\x80"

thread '<main>' panicked at 'assertion failed: `(left == right)` (left: `(true, [(0, 0)], [[], [141, 35, 59, 2
6, 164, 115, 51, 5, 102, 111, 111, 98, 97, 114, 88, 92, 15, 48, 116, 228, 155, 164]], ["Captures({0: Some(\"\"
)})"])`, right: `(true, [(0, 0), (1, 1)], [[], [141], [35, 59, 26, 164, 115, 51, 5, 102, 111, 111, 98, 97, 114
, 88, 92, 15, 48, 116, 228, 155, 164]], ["Captures({0: Some(\"\")})", "Captures({0: Some(\"\")})"])`)', src/ma
in.rs:57

The haystack was "jSM7GqRzMwVmb29iYXJYXA8wdOSbpA==" (base64 encoded)

@lukaslueg
Copy link
Contributor Author

Ping

@BurntSushi
Copy link
Member

@lukaslueg I created a distinct ticket for those examples: #277 Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants