Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Major refactoring and performance improvements. #91

Merged
merged 1 commit into from
Jun 16, 2015

Commits on Jun 16, 2015

  1. Major refactoring and performance improvements.

    Overview of changes:
    
    * Instruction set has been redesigned to be smaller, mostly by
      collapsing empty-width matches into one instruction type.
      In addition to moving instruction-matching out of the matching
      engine, this makes matching engine code much simpler.
    * Rewrote input handling to use an inline representation of
      `Option<char>` and clearer position handling with the `Input` trait.
    * Added a new bounded backtracking matching engine that is invoked for
      small regexes/inputs. It's about twice as fast as the full NFA
      matching engine.
    * Implemented caching for both the NFA and backtracking engines.
      This avoids costly allocations on subsequent uses of the regex.
    * Overhauled prefix handling at both discovery and matching.
      Namely, sets of prefix literals can now be extracted from regexes.
      Depending on what the prefixes look like, an Aho-Corasick DFA
      is built from them.
      (This adds a dependency on the `aho-corasick` crate.)
    * When appropriate, use `memchr` to jump around in the input when
      there is a single common byte prefix.
      (This adds a dependency on the `memchr` crate.)
    * Bring the `regex!` macro up to date. Unfortunately, it still
      implements the full NFA matching engine and doesn't yet have
      access to the new prefix DFA handling. Thus, its performance
      has gotten *worse* than the dynamic implementation in most
      cases. The docs have been updated to reflect this change.
    
    Surprisingly, all of this required exactly one new application of
    `unsafe`, which is isolated in the `memchr` crate. (Aho-Corasick has no
    `unsafe` either!)
    
    There should be *no* breaking changes in this commit. The only public
    facing change is the addition of a method to the `Replacer` trait, but
    it comes with a default implementation so that existing implementors
    won't break. (Its purpose is to serve as a hint as to whether or not
    replacement strings need to be expanded. This is crucial to speeding
    up simple replacements.)
    
    Closes #21.
    BurntSushi committed Jun 16, 2015
    Configuration menu
    Copy the full SHA
    c86c025 View commit details
    Browse the repository at this point in the history