Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix a few bugs #270

Merged
merged 4 commits into from
Aug 5, 2016
Merged

fix a few bugs #270

merged 4 commits into from
Aug 5, 2016

Commits on Aug 4, 2016

  1. Reset the DFA cache based on better approximation.

    Typically, when a DFA blows up in size, it happens for two reasons:
    
      1. It accumulates many states.
      2. Each state accumulates more and more NFA states.
    
    Our previous approximation for the size of the DFA accounted for (1) but
    used a constant for the size of (2). This can turn out to result in very
    large differences (in the MBs) between the approximate and actual size
    of the DFA.
    
    Since computing the actual size is expensive, we compute it as a sum as
    states are added.
    
    The end result is that we more stringently respect the memory set by the
    caller.
    BurntSushi committed Aug 4, 2016
    Configuration menu
    Copy the full SHA
    a1809fb View commit details
    Browse the repository at this point in the history

Commits on Aug 5, 2016

  1. Disable literal optimizations for partially anchored regexes.

    The specific problem here is that our literal search doesn't know about
    anchors, so it will try to search all of the detected literals in a regex.
    In a regex like `a|^b`, the literal `b` should only be searched for at the
    beginning of the haystack and in no other place.
    
    The right way to fix this is probably to make the literal detector smarter,
    but the literal detector is already too complex. Instead, this commit
    detects whether a regex is partially anchored (that is, when the regex has
    at least one matchable sub-expression that is anchored), and if so,
    disables the literal engine.
    
    Note that this doesn't disable all literal optimizations, just the
    optimization that opts out of regex engines entirely. Both the DFA and the
    NFA will still use literal prefixes to search. Namely, if it searches and
    finds a literal that needs to be anchored but isn't in the haystack, then
    the regex engine rules it out as a false positive.
    
    Fixes rust-lang#268.
    BurntSushi committed Aug 5, 2016
    Configuration menu
    Copy the full SHA
    225f8e1 View commit details
    Browse the repository at this point in the history
  2. Adjust the end of the haystack after a DFA match.

    If the caller asks for captures, and the DFA runs, and there's a match,
    and there are actually captures in the regex, then the haystack sent to
    the NFA is shortened to correspond to only the match plus some room at the
    end for matching zero-width assertions. This "room at the end" needs to be
    big enough to at least fit an UTF-8 encoded Unicode codepoint.
    
    Fixes rust-lang#264.
    BurntSushi committed Aug 5, 2016
    Configuration menu
    Copy the full SHA
    1882b2c View commit details
    Browse the repository at this point in the history
  3. Don't build regex-debug on Rust 1.3.

    Docopt uses lazy_static! 2.x, but lazy_static required a new minimum
    Rust version in 2.1.
    BurntSushi committed Aug 5, 2016
    Configuration menu
    Copy the full SHA
    16931b0 View commit details
    Browse the repository at this point in the history