-
Notifications
You must be signed in to change notification settings - Fork 443
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Regex::captures forces an allocation on every call #219
Comments
Well, this doesn't quite work. fn new_captures<'r, 't>(&'r self, text: &'t str) -> Captures { ... } The reason being that all Another alternative is to move ownership of impl Regex {
// Returns empty storage for captures for use with read_captures.
fn new_captures(&self) -> Captures<'static> { ... }
// Sets capture locations from searching over text.
// If the match failed, the captures returned will report a failed match.
fn read_captures<'a, 't>(&self, caps: Captures<'a>, text: &'t str) -> Captures<'t> { ... }
fn read_captures_iter<'r, 't>(&self, text: &'t str) -> ReadCapturesIter<'r, 't> { ... }
}
type ReadCapturesIter<'r, 't> { ... }
impl<'r, 't> ReadCapturesIter<'r, 't> {
// Sets capture locations from searching over text.
// If the match failed, the captures returned will reported a failed match.
fn captures<'a>(&mut self, caps: Captures<'a>) -> Captures<'t> { ... }
} And we'll also need: impl<'t> Captures<'t> {
// Returns true if and only if the most recent search using these captures
// returned a match.
fn is_match(&self) -> bool { ... }
} If we do go this route, I think it at least answers the question of whether this should replace the existing API or not (it shouldn't, because it's clunky). I don't think I've seen this kind of API used much elsewhere, which also concerns me. I guess the lifetime pattern here is a bit weird. Basically, we want to enable the caller to reuse a particular allocation, but it's being reused inside a larger structure that offers more conveniences, and those conveniences require borrowing the searched text. And the searched text can vary while the allocation is reused. Another alternative is to create a new captures type ( Instead of defining a new type, we could do something like exposing the underlying representation, e.g., |
This commit exposes two new areas of API surface: 1. A new `captures_read` method which provides a way to access the offsets of submatches while amortizing the allocation of the space required to store those offsets. Callers should still of course prefer to use the higher level `captures` method, but if performance dictates, this lower level API may be useful. 2. New "at" variants of shortest_match/is_match/find/captures/captures_read that permit controlling where the start of a search begins within a slice. This is typically useful for controlling the match semantics of look-around operators such as `^` and `$`, and are necessary for implementing non-overlapping iterators. Fixes rust-lang#219
This commit exposes two new areas of API surface: 1. A new `captures_read` method which provides a way to access the offsets of submatches while amortizing the allocation of the space required to store those offsets. Callers should still of course prefer to use the higher level `captures` method, but if performance dictates, this lower level API may be useful. 2. New "at" variants of shortest_match/is_match/find/captures/captures_read that permit controlling where the start of a search begins within a slice. This is typically useful for controlling the match semantics of look-around operators such as `^` and `$`, and are necessary for implementing non-overlapping iterators. Fixes rust-lang#219
This commit exposes two new areas of API surface: 1. A new `captures_read` method which provides a way to access the offsets of submatches while amortizing the allocation of the space required to store those offsets. Callers should still of course prefer to use the higher level `captures` method, but if performance dictates, this lower level API may be useful. 2. New "at" variants of shortest_match/is_match/find/captures/captures_read that permit controlling where the start of a search begins within a slice. This is typically useful for controlling the match semantics of look-around operators such as `^` and `$`, and are necessary for implementing non-overlapping iterators. Fixes #219
Every time one calls
captures
, a new allocation for storing the location of captures is created. This allocation has size proportional to the number of captures in the regex.This is also true for
captures_iter
, where every iteration results in a new allocation. An iterator could reuse the allocation in theory, but ownership of the captures is transferred to the caller. Even if we could reuse the capture locations, we couldn't give the caller a mutable borrow, since that immediately puts us in the "streaming iterator" conundrum.The most sensible API I can think of is to:
Captures
values from a givenRegex
such that it has the right size.Captures
to a call tocaptures
, which lets the caller control the allocation.It's not quite clear how to apply this to
captures_iter
while still implementingIterator
. I suspect we should probably borrow from theio::BufReader::read_line
style methods. e.g.,And I think this would work well.
Main questions:
The text was updated successfully, but these errors were encountered: