Major refactoring and performance improvements. #91

BurntSushi · 2015-06-15T04:18:00Z

Overview of changes:

Instruction set has been redesigned to be smaller, mostly by
collapsing empty-width matches into one instruction type.
In addition to moving instruction-matching out of the matching
engine, this makes matching engine code much simpler.
Rewrote input handling to use an inline representation of
Option<char> and clearer position handling with the Input trait.
Added a new bounded backtracking matching engine that is invoked for
small regexes/inputs. It's about twice as fast as the full NFA
matching engine.
Implemented caching for both the NFA and backtracking engines.
This avoids costly allocations on subsequent uses of the regex.
Overhauled prefix handling at both discovery and matching.
Namely, sets of prefix literals can now be extracted from regexes.
Depending on what the prefixes look like, an Aho-Corasick DFA
is built from them.
(This adds a dependency on the aho-corasick crate.)
When appropriate, use memchr to jump around in the input when
there is a single common byte prefix.
(This adds a dependency on the memchr crate.)
Bring the regex! macro up to date. Unfortunately, it still
implements the full NFA matching engine and doesn't yet have
access to the new prefix DFA handling. Thus, its performance
has gotten worse than the dynamic implementation in most
cases. The docs have been updated to reflect this change.

Surprisingly, all of this required exactly one new application of
unsafe, which is isolated in the memchr crate. (Aho-Corasick has no
unsafe either!)

There should be no breaking changes in this commit. The only public
facing change is the addition of a method to the Replacer trait, but
it comes with a default implementation so that existing implementors
won't break. (Its purpose is to serve as a hint as to whether or not
replacement strings need to be expanded. This is crucial to speeding
up simple replacements.)

Closes #21.

rust-highfive · 2015-06-15T04:18:13Z

r? @huonw

(rust_highfive has picked a reviewer for you, use r? to override)

BurntSushi · 2015-06-15T04:18:38Z

cc @WillEngler @alexcrichton

BurntSushi · 2015-06-15T04:19:06Z

Before:

test bench::anchored_literal_long_match      ... bench:         373 ns/iter (+/- 5)
test bench::anchored_literal_long_non_match  ... bench:         202 ns/iter (+/- 12)
test bench::anchored_literal_short_match     ... bench:         380 ns/iter (+/- 135)
test bench::anchored_literal_short_non_match ... bench:         211 ns/iter (+/- 1)
test bench::easy0_1K                         ... bench:       2,723 ns/iter (+/- 101) = 376 MB/s
test bench::easy0_32                         ... bench:         255 ns/iter (+/- 2) = 125 MB/s
test bench::easy0_32K                        ... bench:      81,845 ns/iter (+/- 598) = 400 MB/s
test bench::easy1_1K                         ... bench:       3,872 ns/iter (+/- 783) = 264 MB/s
test bench::easy1_32                         ... bench:         287 ns/iter (+/- 143) = 111 MB/s
test bench::easy1_32K                        ... bench:     115,340 ns/iter (+/- 4,717) = 284 MB/s
test bench::hard_1K                          ... bench:      52,484 ns/iter (+/- 472) = 19 MB/s
test bench::hard_32                          ... bench:       1,923 ns/iter (+/- 49) = 16 MB/s
test bench::hard_32K                         ... bench:   1,710,214 ns/iter (+/- 9,733) = 19 MB/s
test bench::literal                          ... bench:         337 ns/iter (+/- 13)
test bench::match_class                      ... bench:       2,141 ns/iter (+/- 7)
test bench::match_class_in_range             ... bench:       2,301 ns/iter (+/- 7)
test bench::medium_1K                        ... bench:      31,696 ns/iter (+/- 961) = 32 MB/s
test bench::medium_32                        ... bench:       1,155 ns/iter (+/- 71) = 27 MB/s
test bench::medium_32K                       ... bench:   1,016,101 ns/iter (+/- 12,090) = 32 MB/s
test bench::no_exponential                   ... bench:     262,801 ns/iter (+/- 1,332)
test bench::not_literal                      ... bench:       1,729 ns/iter (+/- 3)
test bench::one_pass_long_prefix             ... bench:         779 ns/iter (+/- 4)
test bench::one_pass_long_prefix_not         ... bench:         779 ns/iter (+/- 6)
test bench::one_pass_short_a                 ... bench:       1,943 ns/iter (+/- 10)
test bench::one_pass_short_a_not             ... bench:       2,545 ns/iter (+/- 9)
test bench::one_pass_short_b                 ... bench:       1,364 ns/iter (+/- 4)
test bench::one_pass_short_b_not             ... bench:       2,029 ns/iter (+/- 22)
test bench::replace_all                      ... bench:       3,185 ns/iter (+/- 12)

After:

test bench::anchored_literal_long_match      ... bench:         206 ns/iter (+/- 7)
test bench::anchored_literal_long_non_match  ... bench:          97 ns/iter (+/- 1)
test bench::anchored_literal_short_match     ... bench:         193 ns/iter (+/- 1)
test bench::anchored_literal_short_non_match ... bench:          86 ns/iter (+/- 0)
test bench::easy0_1K                         ... bench:         356 ns/iter (+/- 136) = 2876 MB/s
test bench::easy0_1MB                        ... bench:     352,434 ns/iter (+/- 7,874) = 2974 MB/s
test bench::easy0_32                         ... bench:          72 ns/iter (+/- 21) = 444 MB/s
test bench::easy0_32K                        ... bench:      11,053 ns/iter (+/- 1,388) = 2964 MB/s
test bench::easy1_1K                         ... bench:         331 ns/iter (+/- 162) = 3093 MB/s
test bench::easy1_1MB                        ... bench:     353,723 ns/iter (+/- 6,836) = 2964 MB/s
test bench::easy1_32                         ... bench:          73 ns/iter (+/- 20) = 438 MB/s
test bench::easy1_32K                        ... bench:      10,297 ns/iter (+/- 1,137) = 3182 MB/s
test bench::hard_1K                          ... bench:      34,951 ns/iter (+/- 171) = 29 MB/s
test bench::hard_1MB                         ... bench:  63,323,613 ns/iter (+/- 279,582) = 15 MB/s
test bench::hard_32                          ... bench:       1,131 ns/iter (+/- 13) = 28 MB/s
test bench::hard_32K                         ... bench:   1,099,921 ns/iter (+/- 1,338) = 29 MB/s
test bench::literal                          ... bench:          16 ns/iter (+/- 0)
test bench::match_class                      ... bench:         188 ns/iter (+/- 0)
test bench::match_class_in_range             ... bench:         188 ns/iter (+/- 0)
test bench::match_class_unicode              ... bench:       1,940 ns/iter (+/- 10)
test bench::medium_1K                        ... bench:       5,262 ns/iter (+/- 256) = 194 MB/s
test bench::medium_1MB                       ... bench:   5,295,539 ns/iter (+/- 9,808) = 197 MB/s
test bench::medium_32                        ... bench:         217 ns/iter (+/- 19) = 147 MB/s
test bench::medium_32K                       ... bench:     169,169 ns/iter (+/- 1,606) = 193 MB/s
test bench::no_exponential                   ... bench:     293,739 ns/iter (+/- 1,632)
test bench::not_literal                      ... bench:       1,371 ns/iter (+/- 136)
test bench::one_pass_long_prefix             ... bench:         337 ns/iter (+/- 6)
test bench::one_pass_long_prefix_not         ... bench:         341 ns/iter (+/- 6)
test bench::one_pass_short_a                 ... bench:       1,399 ns/iter (+/- 16)
test bench::one_pass_short_a_not             ... bench:       1,229 ns/iter (+/- 13)
test bench::one_pass_short_b                 ... bench:         844 ns/iter (+/- 24)
test bench::one_pass_short_b_not             ... bench:         849 ns/iter (+/- 45)
test bench::replace_all                      ... bench:         579 ns/iter (+/- 3)

WillEngler · 2015-06-15T15:09:05Z

My thoughts, focusing on the main library (i.e. not the regex! macro):

Way to go on bringing down the benchmarks! The Aho-Corasick algorithm is new to me.
In terms of readability, the Input trait is a big help for me. Same goes for simplifying the instruction set.

Sorry I don't have any constructive criticism to offer. Well done!

alexcrichton · 2015-06-15T19:54:15Z

Wow! This is super impressive, nice work @BurntSushi! I think that you're definitely the most familiar with the design of this library, so you're probably the best person to review it as well :)

From a high-level perspective, all I'd have to offer are:

I vaguely remember that memchr isn't available on Windows, but I'm not sure if this is actually a thing. I just set up appveyor Windows CI for this crate, though, so that may help in finding out!
Adding two dependencies should be fine for now, so I'm not too worried about that.
In terms of not modifying the public interface much, that sounds good to me!

I'm fine merging this whenever you're ready, although I'd just wait to get confirmation that it works ok on Windows (I can help with any automation issues, but I think a force-push of this branch will trigger the status checks).

Overview of changes: * Instruction set has been redesigned to be smaller, mostly by collapsing empty-width matches into one instruction type. In addition to moving instruction-matching out of the matching engine, this makes matching engine code much simpler. * Rewrote input handling to use an inline representation of `Option<char>` and clearer position handling with the `Input` trait. * Added a new bounded backtracking matching engine that is invoked for small regexes/inputs. It's about twice as fast as the full NFA matching engine. * Implemented caching for both the NFA and backtracking engines. This avoids costly allocations on subsequent uses of the regex. * Overhauled prefix handling at both discovery and matching. Namely, sets of prefix literals can now be extracted from regexes. Depending on what the prefixes look like, an Aho-Corasick DFA is built from them. (This adds a dependency on the `aho-corasick` crate.) * When appropriate, use `memchr` to jump around in the input when there is a single common byte prefix. (This adds a dependency on the `memchr` crate.) * Bring the `regex!` macro up to date. Unfortunately, it still implements the full NFA matching engine and doesn't yet have access to the new prefix DFA handling. Thus, its performance has gotten *worse* than the dynamic implementation in most cases. The docs have been updated to reflect this change. Surprisingly, all of this required exactly one new application of `unsafe`, which is isolated in the `memchr` crate. (Aho-Corasick has no `unsafe` either!) There should be *no* breaking changes in this commit. The only public facing change is the addition of a method to the `Replacer` trait, but it comes with a default implementation so that existing implementors won't break. (Its purpose is to serve as a hint as to whether or not replacement strings need to be expanded. This is crucial to speeding up simple replacements.) Closes #21.

BurntSushi · 2015-06-16T01:20:14Z

@alexcrichton Fantastic. I know virtually nothing about Windows, but I was hoping this meant it had memchr. If not, no biggie, and I'll adjust the memchr crate to provide a fallback implementation.

If tests pass, then I think I'm ready to merge. But I'm tired and it is not wise to merge such large changes while tired, so I will wait for tomorrow. :-)

BurntSushi · 2015-06-16T01:22:15Z

@WillEngler Any comments are good! I'm glad you like Input. It was virtually impossible to understand input handling before. If you come across things that would make a DFA easier to implement, please don't be shy!

RE Aho-Corasick: It's possible that this will be obsoleted by a really good DFA implementation. :-)

BurntSushi · 2015-06-16T01:26:01Z

@alexcrichton Looks like the appveyor stuff failed. I'm not sure how to debug it (and it doesn't look related to memchr). I'll see about firing up a Windows VM tomorrow. Thanks!

alexcrichton · 2015-06-16T17:27:08Z

Oh I think it may have just needed a rebase, but looks like it's all green now!

Major refactoring and performance improvements.

BurntSushi · 2015-06-16T20:09:53Z

Merged and published on crates.io in regex 0.1.34 and regex_macros 0.1.20 (no changes to regex-syntax).

tafia · 2015-06-24T08:01:25Z

congrats!
it's impressive indeed.

rust-highfive assigned huonw Jun 15, 2015

WillEngler mentioned this pull request Jun 15, 2015

implement a DFA matcher #66

Closed

BurntSushi force-pushed the nfa-compile-refactor branch from 32b86e9 to c86c025 Compare June 16, 2015 01:18

BurntSushi added a commit that referenced this pull request Jun 16, 2015

Merge pull request #91 from rust-lang/nfa-compile-refactor

f09a287

Major refactoring and performance improvements.

BurntSushi merged commit f09a287 into master Jun 16, 2015

BurntSushi deleted the nfa-compile-refactor branch June 16, 2015 20:04

mkpankov mentioned this pull request Jun 24, 2015

do simple literal prefix scanning in regex! #95

Closed

BurntSushi unassigned huonw Sep 21, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Major refactoring and performance improvements. #91

Major refactoring and performance improvements. #91

BurntSushi commented Jun 15, 2015

rust-highfive commented Jun 15, 2015

BurntSushi commented Jun 15, 2015

BurntSushi commented Jun 15, 2015

WillEngler commented Jun 15, 2015

alexcrichton commented Jun 15, 2015

BurntSushi commented Jun 16, 2015

BurntSushi commented Jun 16, 2015

BurntSushi commented Jun 16, 2015

alexcrichton commented Jun 16, 2015

BurntSushi commented Jun 16, 2015

tafia commented Jun 24, 2015

Major refactoring and performance improvements. #91

Major refactoring and performance improvements. #91

Conversation

BurntSushi commented Jun 15, 2015

rust-highfive commented Jun 15, 2015

BurntSushi commented Jun 15, 2015

BurntSushi commented Jun 15, 2015

WillEngler commented Jun 15, 2015

alexcrichton commented Jun 15, 2015

BurntSushi commented Jun 16, 2015

BurntSushi commented Jun 16, 2015

BurntSushi commented Jun 16, 2015

alexcrichton commented Jun 16, 2015

BurntSushi commented Jun 16, 2015

tafia commented Jun 24, 2015