-
Notifications
You must be signed in to change notification settings - Fork 437
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Test Regex Scraped from crates.io #468
Comments
That would be interesting! One thing worth pointing out that a quickcheck generated string is very unlikely to produce a case that causes a match to happen. Instead, it would only test non-match agreement. Which still seems like a worthwhile thing! |
Hmm. Good point. One thing I remember reading in a note Russ Cox made about the testing approach for RE2 is that he wrote some code to construct a random matching string from a regex, so it would might be worthwhile to give that a crack. The two sources of random input would probably do a pretty good job of testing both the positive and negative cases. |
Yup, that's another good avenue to try! |
After a first pass at this just using quickcheck to generate random input, I've turned up the following failing test cases. extern crate regex;
#[test]
fn word_boundary_backtracking_default_mismatch() {
use regex::internal::ExecBuilder;
let backtrack_re = ExecBuilder::new(r"\b")
.bounded_backtracking()
.build()
.map(|exec| exec.into_regex())
.map_err(|err| format!("{}", err))
.unwrap();
let default_re = ExecBuilder::new(r"\b")
.build()
.map(|exec| exec.into_regex())
.map_err(|err| format!("{}", err))
.unwrap();
let input = "䅅\\u{a0}";
let fi1 = backtrack_re.find_iter(input);
let fi2 = default_re.find_iter(input);
for (m1, m2) in fi1.zip(fi2) {
assert_eq!(m1, m2);
}
}
#[test]
fn uppercut_s_backtracking_bytes_default_bytes_mismatch() {
use regex::internal::ExecBuilder;
let backtrack_bytes_re = ExecBuilder::new("^S")
.bounded_backtracking()
.only_utf8(false)
.build()
.map(|exec| exec.into_byte_regex())
.map_err(|err| format!("{}", err))
.unwrap();
let default_bytes_re = ExecBuilder::new("^S")
.only_utf8(false)
.build()
.map(|exec| exec.into_byte_regex())
.map_err(|err| format!("{}", err))
.unwrap();
let input = vec![83, 83];
let s1 = backtrack_bytes_re.split(&input);
let s2 = default_bytes_re.split(&input);
for (chunk1, chunk2) in s1.zip(s2) {
assert_eq!(chunk1, chunk2);
}
}
#[test]
fn unicode_lit_star_backtracking_utf8bytes_default_utf8bytes_mismatch() {
use regex::internal::ExecBuilder;
let backtrack_bytes_re = ExecBuilder::new(r"^(?u:\*)")
.bounded_backtracking()
.bytes(true)
.build()
.map(|exec| exec.into_regex())
.map_err(|err| format!("{}", err))
.unwrap();
let default_bytes_re = ExecBuilder::new(r"^(?u:\*)")
.bytes(true)
.build()
.map(|exec| exec.into_regex())
.map_err(|err| format!("{}", err))
.unwrap();
let input = "**";
let s1 = backtrack_bytes_re.split(input);
let s2 = default_bytes_re.split(input);
for (chunk1, chunk2) in s1.zip(s2) {
assert_eq!(chunk1, chunk2);
}
} The last two look like the are probably dups. My work on this currently lives here, and I think it is basically ready for a PR except for a few config issues.
|
It may be worth looking at https://crates.io/crates/regex_generate when I get around to generating matching strings from a regex. |
I'm going to say that this is closed by the work done a while back in #472. If there's something else we should, please feel free to file a new issue! |
As part of evaluating my masters thesis, I scraped
crates.io
for regex and ran the resulting regex though my compiler to see how many could be optimized. I don't think it would be too hard to clean up my scraping script a bit and then write a test which executes each of the regex on a quickcheck generated string with all (3? 5? 6? depends how you count them) backends. I'm not sure when I'll do this, but I wanted to leave a note here so that I don't forget.The text was updated successfully, but these errors were encountered: