Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Major refactoring and performance improvements. #91

Merged
merged 1 commit into from
Jun 16, 2015

Conversation

BurntSushi
Copy link
Member

Overview of changes:

  • Instruction set has been redesigned to be smaller, mostly by
    collapsing empty-width matches into one instruction type.
    In addition to moving instruction-matching out of the matching
    engine, this makes matching engine code much simpler.
  • Rewrote input handling to use an inline representation of
    Option<char> and clearer position handling with the Input trait.
  • Added a new bounded backtracking matching engine that is invoked for
    small regexes/inputs. It's about twice as fast as the full NFA
    matching engine.
  • Implemented caching for both the NFA and backtracking engines.
    This avoids costly allocations on subsequent uses of the regex.
  • Overhauled prefix handling at both discovery and matching.
    Namely, sets of prefix literals can now be extracted from regexes.
    Depending on what the prefixes look like, an Aho-Corasick DFA
    is built from them.
    (This adds a dependency on the aho-corasick crate.)
  • When appropriate, use memchr to jump around in the input when
    there is a single common byte prefix.
    (This adds a dependency on the memchr crate.)
  • Bring the regex! macro up to date. Unfortunately, it still
    implements the full NFA matching engine and doesn't yet have
    access to the new prefix DFA handling. Thus, its performance
    has gotten worse than the dynamic implementation in most
    cases. The docs have been updated to reflect this change.

Surprisingly, all of this required exactly one new application of
unsafe, which is isolated in the memchr crate. (Aho-Corasick has no
unsafe either!)

There should be no breaking changes in this commit. The only public
facing change is the addition of a method to the Replacer trait, but
it comes with a default implementation so that existing implementors
won't break. (Its purpose is to serve as a hint as to whether or not
replacement strings need to be expanded. This is crucial to speeding
up simple replacements.)

Closes #21.

@rust-highfive
Copy link

r? @huonw

(rust_highfive has picked a reviewer for you, use r? to override)

@BurntSushi
Copy link
Member Author

cc @WillEngler @alexcrichton

@BurntSushi
Copy link
Member Author

Before:

test bench::anchored_literal_long_match      ... bench:         373 ns/iter (+/- 5)
test bench::anchored_literal_long_non_match  ... bench:         202 ns/iter (+/- 12)
test bench::anchored_literal_short_match     ... bench:         380 ns/iter (+/- 135)
test bench::anchored_literal_short_non_match ... bench:         211 ns/iter (+/- 1)
test bench::easy0_1K                         ... bench:       2,723 ns/iter (+/- 101) = 376 MB/s
test bench::easy0_32                         ... bench:         255 ns/iter (+/- 2) = 125 MB/s
test bench::easy0_32K                        ... bench:      81,845 ns/iter (+/- 598) = 400 MB/s
test bench::easy1_1K                         ... bench:       3,872 ns/iter (+/- 783) = 264 MB/s
test bench::easy1_32                         ... bench:         287 ns/iter (+/- 143) = 111 MB/s
test bench::easy1_32K                        ... bench:     115,340 ns/iter (+/- 4,717) = 284 MB/s
test bench::hard_1K                          ... bench:      52,484 ns/iter (+/- 472) = 19 MB/s
test bench::hard_32                          ... bench:       1,923 ns/iter (+/- 49) = 16 MB/s
test bench::hard_32K                         ... bench:   1,710,214 ns/iter (+/- 9,733) = 19 MB/s
test bench::literal                          ... bench:         337 ns/iter (+/- 13)
test bench::match_class                      ... bench:       2,141 ns/iter (+/- 7)
test bench::match_class_in_range             ... bench:       2,301 ns/iter (+/- 7)
test bench::medium_1K                        ... bench:      31,696 ns/iter (+/- 961) = 32 MB/s
test bench::medium_32                        ... bench:       1,155 ns/iter (+/- 71) = 27 MB/s
test bench::medium_32K                       ... bench:   1,016,101 ns/iter (+/- 12,090) = 32 MB/s
test bench::no_exponential                   ... bench:     262,801 ns/iter (+/- 1,332)
test bench::not_literal                      ... bench:       1,729 ns/iter (+/- 3)
test bench::one_pass_long_prefix             ... bench:         779 ns/iter (+/- 4)
test bench::one_pass_long_prefix_not         ... bench:         779 ns/iter (+/- 6)
test bench::one_pass_short_a                 ... bench:       1,943 ns/iter (+/- 10)
test bench::one_pass_short_a_not             ... bench:       2,545 ns/iter (+/- 9)
test bench::one_pass_short_b                 ... bench:       1,364 ns/iter (+/- 4)
test bench::one_pass_short_b_not             ... bench:       2,029 ns/iter (+/- 22)
test bench::replace_all                      ... bench:       3,185 ns/iter (+/- 12)

After:

test bench::anchored_literal_long_match      ... bench:         206 ns/iter (+/- 7)
test bench::anchored_literal_long_non_match  ... bench:          97 ns/iter (+/- 1)
test bench::anchored_literal_short_match     ... bench:         193 ns/iter (+/- 1)
test bench::anchored_literal_short_non_match ... bench:          86 ns/iter (+/- 0)
test bench::easy0_1K                         ... bench:         356 ns/iter (+/- 136) = 2876 MB/s
test bench::easy0_1MB                        ... bench:     352,434 ns/iter (+/- 7,874) = 2974 MB/s
test bench::easy0_32                         ... bench:          72 ns/iter (+/- 21) = 444 MB/s
test bench::easy0_32K                        ... bench:      11,053 ns/iter (+/- 1,388) = 2964 MB/s
test bench::easy1_1K                         ... bench:         331 ns/iter (+/- 162) = 3093 MB/s
test bench::easy1_1MB                        ... bench:     353,723 ns/iter (+/- 6,836) = 2964 MB/s
test bench::easy1_32                         ... bench:          73 ns/iter (+/- 20) = 438 MB/s
test bench::easy1_32K                        ... bench:      10,297 ns/iter (+/- 1,137) = 3182 MB/s
test bench::hard_1K                          ... bench:      34,951 ns/iter (+/- 171) = 29 MB/s
test bench::hard_1MB                         ... bench:  63,323,613 ns/iter (+/- 279,582) = 15 MB/s
test bench::hard_32                          ... bench:       1,131 ns/iter (+/- 13) = 28 MB/s
test bench::hard_32K                         ... bench:   1,099,921 ns/iter (+/- 1,338) = 29 MB/s
test bench::literal                          ... bench:          16 ns/iter (+/- 0)
test bench::match_class                      ... bench:         188 ns/iter (+/- 0)
test bench::match_class_in_range             ... bench:         188 ns/iter (+/- 0)
test bench::match_class_unicode              ... bench:       1,940 ns/iter (+/- 10)
test bench::medium_1K                        ... bench:       5,262 ns/iter (+/- 256) = 194 MB/s
test bench::medium_1MB                       ... bench:   5,295,539 ns/iter (+/- 9,808) = 197 MB/s
test bench::medium_32                        ... bench:         217 ns/iter (+/- 19) = 147 MB/s
test bench::medium_32K                       ... bench:     169,169 ns/iter (+/- 1,606) = 193 MB/s
test bench::no_exponential                   ... bench:     293,739 ns/iter (+/- 1,632)
test bench::not_literal                      ... bench:       1,371 ns/iter (+/- 136)
test bench::one_pass_long_prefix             ... bench:         337 ns/iter (+/- 6)
test bench::one_pass_long_prefix_not         ... bench:         341 ns/iter (+/- 6)
test bench::one_pass_short_a                 ... bench:       1,399 ns/iter (+/- 16)
test bench::one_pass_short_a_not             ... bench:       1,229 ns/iter (+/- 13)
test bench::one_pass_short_b                 ... bench:         844 ns/iter (+/- 24)
test bench::one_pass_short_b_not             ... bench:         849 ns/iter (+/- 45)
test bench::replace_all                      ... bench:         579 ns/iter (+/- 3)

@WillEngler
Copy link

My thoughts, focusing on the main library (i.e. not the regex! macro):

  1. Way to go on bringing down the benchmarks! The Aho-Corasick algorithm is new to me.
  2. In terms of readability, the Input trait is a big help for me. Same goes for simplifying the instruction set.

Sorry I don't have any constructive criticism to offer. Well done!

@alexcrichton
Copy link
Member

Wow! This is super impressive, nice work @BurntSushi! I think that you're definitely the most familiar with the design of this library, so you're probably the best person to review it as well :)

From a high-level perspective, all I'd have to offer are:

  • I vaguely remember that memchr isn't available on Windows, but I'm not sure if this is actually a thing. I just set up appveyor Windows CI for this crate, though, so that may help in finding out!
  • Adding two dependencies should be fine for now, so I'm not too worried about that.
  • In terms of not modifying the public interface much, that sounds good to me!

I'm fine merging this whenever you're ready, although I'd just wait to get confirmation that it works ok on Windows (I can help with any automation issues, but I think a force-push of this branch will trigger the status checks).

Overview of changes:

* Instruction set has been redesigned to be smaller, mostly by
  collapsing empty-width matches into one instruction type.
  In addition to moving instruction-matching out of the matching
  engine, this makes matching engine code much simpler.
* Rewrote input handling to use an inline representation of
  `Option<char>` and clearer position handling with the `Input` trait.
* Added a new bounded backtracking matching engine that is invoked for
  small regexes/inputs. It's about twice as fast as the full NFA
  matching engine.
* Implemented caching for both the NFA and backtracking engines.
  This avoids costly allocations on subsequent uses of the regex.
* Overhauled prefix handling at both discovery and matching.
  Namely, sets of prefix literals can now be extracted from regexes.
  Depending on what the prefixes look like, an Aho-Corasick DFA
  is built from them.
  (This adds a dependency on the `aho-corasick` crate.)
* When appropriate, use `memchr` to jump around in the input when
  there is a single common byte prefix.
  (This adds a dependency on the `memchr` crate.)
* Bring the `regex!` macro up to date. Unfortunately, it still
  implements the full NFA matching engine and doesn't yet have
  access to the new prefix DFA handling. Thus, its performance
  has gotten *worse* than the dynamic implementation in most
  cases. The docs have been updated to reflect this change.

Surprisingly, all of this required exactly one new application of
`unsafe`, which is isolated in the `memchr` crate. (Aho-Corasick has no
`unsafe` either!)

There should be *no* breaking changes in this commit. The only public
facing change is the addition of a method to the `Replacer` trait, but
it comes with a default implementation so that existing implementors
won't break. (Its purpose is to serve as a hint as to whether or not
replacement strings need to be expanded. This is crucial to speeding
up simple replacements.)

Closes #21.
@BurntSushi
Copy link
Member Author

@alexcrichton Fantastic. I know virtually nothing about Windows, but I was hoping this meant it had memchr. If not, no biggie, and I'll adjust the memchr crate to provide a fallback implementation.

If tests pass, then I think I'm ready to merge. But I'm tired and it is not wise to merge such large changes while tired, so I will wait for tomorrow. :-)

@BurntSushi
Copy link
Member Author

@WillEngler Any comments are good! I'm glad you like Input. It was virtually impossible to understand input handling before. If you come across things that would make a DFA easier to implement, please don't be shy!

RE Aho-Corasick: It's possible that this will be obsoleted by a really good DFA implementation. :-)

@BurntSushi
Copy link
Member Author

@alexcrichton Looks like the appveyor stuff failed. I'm not sure how to debug it (and it doesn't look related to memchr). I'll see about firing up a Windows VM tomorrow. Thanks!

@alexcrichton
Copy link
Member

Oh I think it may have just needed a rebase, but looks like it's all green now!

BurntSushi added a commit that referenced this pull request Jun 16, 2015
Major refactoring and performance improvements.
@BurntSushi BurntSushi merged commit f09a287 into master Jun 16, 2015
@BurntSushi BurntSushi deleted the nfa-compile-refactor branch June 16, 2015 20:04
@BurntSushi
Copy link
Member Author

Merged and published on crates.io in regex 0.1.34 and regex_macros 0.1.20 (no changes to regex-syntax).

@tafia
Copy link

tafia commented Jun 24, 2015

congrats!
it's impressive indeed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

optimize literal alternations
6 participants