-
Notifications
You must be signed in to change notification settings - Fork 443
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
possible memory leak in use of RegexSet #875
Comments
Can you provide a repro? Here's one: use regex::RegexSet;
const ITERS: usize = 1_000;
fn main() {
let haystack = "Шерлок Холмс ".repeat(1_000);
let re = RegexSet::new(&[
r"\pL{5}",
r"\w{6}",
r"\w{6}\s+\w{5}",
]).unwrap();
let mut count = 0;
for _ in 0..ITERS {
if let Some(_) = re.matches(&haystack).into_iter().next() {
count += 1;
}
}
println!("{:?}", count);
} Running the program as-is:
Modifying the program to have
So an order of magnitude more searches, but the memory usage remains invariant. Here's another variant, where we compile the use regex::RegexSet;
const ITERS: usize = 100;
fn main() {
let haystack = "Шерлок Холмс ".repeat(1_000);
let mut count = 0;
for _ in 0..ITERS {
let re = RegexSet::new(&[
r"\pL{5}",
r"\w{6}",
r"\w{6}\s+\w{5}",
]).unwrap();
if let Some(_) = re.matches(&haystack).into_iter().next() {
count += 1;
}
}
println!("{:?}", count);
} Running that gives:
Setting
In this case also, memory usage is invariant despite an order of magnitude more This is generally what I would expect. With that said...
I would prefer not. It's really an internal implementation detail. I would be
Technically that's what In general, I'd rather limits in the API be a last resort. I would want to
That's the money shot. Technically, the answer to this is "yes." There is no The "cache" exists at two levels:
The guts of that second piece are implemented here: Lines 166 to 277 in db1efc6
Conceptually, and ignoring optimizations, it's pretty simple:
State doesn't care which thread it gets sent to. It's all interchangeable. As I hinted at above, this is the part where cache size is unbounded. There is Why? Because I'm waiting for a real world use case.. I'd also like to make clear one specific point: the expectation is that Another experiment you could do, which could potentially also be a work-around, So... bottom line, I'd like to see this fixed, and the most important thing If I've got your usage completely wrong and you aren't even using multiple Also, any use of Regex above is interchangeable with RegexSet. They both use |
Hi! I haven't been able to reproduce this, so I think something else must have been going on. Thanks for the thorough explanation. |
Aye. If you do wind up finding a repro, I would be happy to re-open this, dig into it and figure out a solution. |
Discussed in #874
Originally posted by adamchalmers June 22, 2022
Hi! I'm trying to find the source of some memory leaks in a gRPC server my team is building. One of the endpoints runs some RegexSets over some inputs.
I'm using dhat to measure memory usage, and I notice running
RegexSet::matches(field).into_iter().next()
adds around 2.4kb of memory leak per run. I know the RegexSets have an internal cache where they store programs. I was wondering if this cache grows unboundedly? Is it possible to put a limit on the cache size, or to describe the cache behaviour in the crate's docs?The text was updated successfully, but these errors were encountered: