Skip to content

Commit

Permalink
Auto merge of #27474 - bluss:twoway-reverse, r=brson
Browse files Browse the repository at this point in the history
StrSearcher: Implement the complete reverse case for the two way algorithm

Fix quadratic behavior in StrSearcher in reverse search with periodic
needles.

This commit adds the missing pieces for the "short period" case in
reverse search. The short case will show up when the needle is literally
periodic, for example "abababab".

Two way uses a "critical factorization" of the needle: x = u v.

Searching matches v first, if mismatch at character k, skip k forward.
Matching u, if mismatch, skip period(x) forward.

To avoid O(mn) behavior after mismatch in u, memorize the already
matched prefix.

The short period case requires that |u| < period(x).

For the reverse search we need to compute a different critical
factorization x = u' v' where |v'| < period(x), because we are searching
for the reversed needle. A short v' also benefits the algorithm in
general.

The reverse critical factorization is computed quickly by using the same
maximal suffix algorithm, but terminating as soon as we have a location
with local period equal to period(x).

This adds extra fields crit_pos_back and memory_back for the reverse
case. The new overhead for TwoWaySearcher::new is low, and additionally
I think the "short period" case is uncommon in many applications of
string search.

The maximal_suffix methods were updated in documentation and the
algorithms updated to not use !0 and wrapping add, variable left is now
1 larger, offset 1 smaller.

Use periodicity when computing byteset: in the periodic case, just
iterate over one period instead of the whole needle.

Example before (rfind) after (twoway_rfind) benchmark shows the removal
of quadratic behavior.

needle: "ab" * 100, haystack: ("bb" + "ab" * 100) * 100

```
test periodic::rfind           ... bench:   1,926,595 ns/iter (+/- 11,390) = 10 MB/s
test periodic::twoway_rfind    ... bench:      51,740 ns/iter (+/- 66) = 386 MB/s
```
  • Loading branch information
bors committed Aug 18, 2015
2 parents e35fd74 + 01e8812 commit de67d62
Show file tree
Hide file tree
Showing 2 changed files with 213 additions and 64 deletions.
20 changes: 20 additions & 0 deletions src/libcollectionstest/str.rs
Original file line number Diff line number Diff line change
Expand Up @@ -85,6 +85,26 @@ fn test_find_str() {
assert_eq!(data[43..86].find("ย中"), Some(67 - 43));
assert_eq!(data[43..86].find("iệt"), Some(77 - 43));
assert_eq!(data[43..86].find("Nam"), Some(83 - 43));

// find every substring -- assert that it finds it, or an earlier occurence.
let string = "Việt Namacbaabcaabaaba";
for (i, ci) in string.char_indices() {
let ip = i + ci.len_utf8();
for j in string[ip..].char_indices()
.map(|(i, _)| i)
.chain(Some(string.len() - ip))
{
let pat = &string[i..ip + j];
assert!(match string.find(pat) {
None => false,
Some(x) => x <= i,
});
assert!(match string.rfind(pat) {
None => false,
Some(x) => x >= i,
});
}
}
}

fn s(x: &str) -> String { x.to_string() }
Expand Down
Loading

0 comments on commit de67d62

Please sign in to comment.