-
Notifications
You must be signed in to change notification settings - Fork 443
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add multi-regex matching #156
Comments
I managed to get something half-working in the Even though I wound up with something less powerful than what I wanted, I do think being able to tell which regexes match some text in a single pass is useful. At least one important use case I can think of is a URL router, which could contain possibly many regexes but usually only has one URL to search. It would be very hard to build something like that out-of-crate (and is also fast), which to me suggests it has a place in this library. If a caller needs more detailed info about the match (like captures), then they can re-run only the matched regexes on the search text one at a time. |
Regex sets permit matching multiple (possibly overlapping) regular expressions in a single scan of the search text. This adds a few new types, with `RegexSet` being the primary one. All matching engines support regex sets, including the lazy DFA. This commit also refactors a lot of the code around handling captures into a central `Search`, which now also includes a set of matches that is used by regex sets to determine which regex has matched. We also merged the `Program` and `Insts` type, which were split up when adding the lazy DFA, but the code seemed more complicated because of it. Closes #156.
Regex sets permit matching multiple (possibly overlapping) regular expressions in a single scan of the search text. This adds a few new types, with `RegexSet` being the primary one. All matching engines support regex sets, including the lazy DFA. This commit also refactors a lot of the code around handling captures into a central `Search`, which now also includes a set of matches that is used by regex sets to determine which regex has matched. We also merged the `Program` and `Insts` type, which were split up when adding the lazy DFA, but the code seemed more complicated because of it. Closes #156.
Lately I've been thinking a lot about providing a "multi regex" similar to RE2's "regex set" functionality. The problem they solve is, "I have multiple regexes that I want to run over some large search text once, and I want to see every match." The poor man's way of doing this is to combine them in a single regex of alternations, e.g.,
re1|re2|re3|...
. Two problems with that though:We can start relatively simple by providing an API that answers these three questions:
is_match
)find
)find_iter
)Adding capture groups to this API seems possible, but is tricky, so I suggest doing that after an initial implementation is done.
The text was updated successfully, but these errors were encountered: