possible memory leak in use of RegexSet #875

BurntSushi · 2022-06-22T23:31:31Z

Discussed in #874

^{Originally posted by adamchalmers June 22, 2022}
Hi! I'm trying to find the source of some memory leaks in a gRPC server my team is building. One of the endpoints runs some RegexSets over some inputs.

I'm using dhat to measure memory usage, and I notice running RegexSet::matches(field).into_iter().next() adds around 2.4kb of memory leak per run. I know the RegexSets have an internal cache where they store programs. I was wondering if this cache grows unboundedly? Is it possible to put a limit on the cache size, or to describe the cache behaviour in the crate's docs?

The text was updated successfully, but these errors were encountered:

BurntSushi · 2022-06-22T23:32:07Z

cc @adamchalmers

BurntSushi · 2022-06-22T23:32:12Z

Can you provide a repro? Here's one:

use regex::RegexSet;

const ITERS: usize = 1_000;

fn main() {
    let haystack = "Шерлок Холмс ".repeat(1_000);
    let re = RegexSet::new(&[
        r"\pL{5}",
        r"\w{6}",
        r"\w{6}\s+\w{5}",
    ]).unwrap();

    let mut count = 0;
    for _ in 0..ITERS {
        if let Some(_) = re.matches(&haystack).into_iter().next() {
            count += 1;
        }
    }
    println!("{:?}", count);
}

Running the program as-is:

$ cargo build --release
$ time ./target/release/d874
1000

real    1.039
user    1.032
sys     0.007
maxmem  9 MB
faults  0

Modifying the program to have ITERS = 10_000 gives:

$ time ./target/release/d874
10000

real    10.270
user    10.261
sys     0.007
maxmem  9 MB
faults  0

So an order of magnitude more searches, but the memory usage remains invariant.

Here's another variant, where we compile the RegexSet in the loop:

use regex::RegexSet;

const ITERS: usize = 100;

fn main() {
    let haystack = "Шерлок Холмс ".repeat(1_000);

    let mut count = 0;
    for _ in 0..ITERS {
        let re = RegexSet::new(&[
            r"\pL{5}",
            r"\w{6}",
            r"\w{6}\s+\w{5}",
        ]).unwrap();
        if let Some(_) = re.matches(&haystack).into_iter().next() {
            count += 1;
        }
    }
    println!("{:?}", count);
}

Running that gives:

$ time ./target/release/d874
100

real    0.446
user    0.439
sys     0.007
maxmem  10 MB
faults  0

Setting ITERS = 1_000 gives:

$ time ./target/release/d874
1000

real    4.217
user    4.205
sys     0.010
maxmem  10 MB
faults  0

In this case also, memory usage is invariant despite an order of magnitude more
compilations/searches.

This is generally what I would expect. With that said...

describe the cache behaviour in the crate's docs?

I would prefer not. It's really an internal implementation detail. I would be
open to, perhaps, including something in PERFORMANCE.md though.

Is it possible to put a limit on the cache size,

Technically that's what RegexSetBuilder::dfa_size_limit is for. However,
there may be other relevant limits, such as the total number of such caches
built.

In general, I'd rather limits in the API be a last resort. I would want to
understand the leak first, and whether there is a way to robustly fix it in
general.

I was wondering if this cache grows unboundedly?

That's the money shot. Technically, the answer to this is "yes." There is no
technical reason for why the cache must grow unbounded.

The "cache" exists at two levels:

Each regex engine gets its own cache that basically involves regex engine
specific state that is mutated during search.
Since a Regex can be used from multiple threads simultaneously, there are
two ways to give mutable space to each regex engine. Either the mutable
space itself becomes thread safe (effectively a non-starter, but not
impossible, but likely impossible without sacrificing search time), or the
regex maintains multiple scratch spaces per regex engine so that they are
entire independent. The current implementation takes the latter approach and
will for the forseeable future.

The guts of that second piece are implemented here:

regex/src/pool.rs

Lines 166 to 277 in db1efc6

    
           /// A guard that is returned when a caller requests a value from the pool. 
        
           /// 
        
           /// The purpose of the guard is to use RAII to automatically put the value back 
        
           /// in the pool once it's dropped. 
        
           #[derive(Debug)] 
        
           pub struct PoolGuard<'a, T: Send> { 
        
               /// The pool that this guard is attached to. 
        
               pool: &'a Pool<T>, 
        
               /// This is None when the guard represents the special "owned" value. In 
        
               /// which case, the value is retrieved from 'pool.owner_val'. 
        
               value: Option<Box<T>>, 
        
           } 
        
           impl<T: Send> Pool<T> { 
        
               /// Create a new pool. The given closure is used to create values in the 
        
               /// pool when necessary. 
        
               pub fn new(create: CreateFn<T>) -> Pool<T> { 
        
                   let owner = AtomicUsize::new(0); 
        
                   let owner_val = create(); 
        
                   Pool { stack: Mutex::new(vec![]), create, owner, owner_val } 
        
               } 
        
               /// Get a value from the pool. The caller is guaranteed to have exclusive 
        
               /// access to the given value. 
        
               /// 
        
               /// Note that there is no guarantee provided about which value in the 
        
               /// pool is returned. That is, calling get, dropping the guard (causing 
        
               /// the value to go back into the pool) and then calling get again is NOT 
        
               /// guaranteed to return the same value received in the first get call. 
        
               #[cfg_attr(feature = "perf-inline", inline(always))] 
        
               pub fn get(&self) -> PoolGuard<'_, T> { 
        
                   // Our fast path checks if the caller is the thread that "owns" this 
        
                   // pool. Or stated differently, whether it is the first thread that 
        
                   // tried to extract a value from the pool. If it is, then we can return 
        
                   // a T to the caller without going through a mutex. 
        
                   // 
        
                   // SAFETY: We must guarantee that only one thread gets access to this 
        
                   // value. Since a thread is uniquely identified by the THREAD_ID thread 
        
                   // local, it follows that is the caller's thread ID is equal to the 
        
                   // owner, then only one thread may receive this value. 
        
                   let caller = THREAD_ID.with(|id| *id); 
        
                   let owner = self.owner.load(Ordering::Relaxed); 
        
                   if caller == owner { 
        
                       return self.guard_owned(); 
        
                   } 
        
                   self.get_slow(caller, owner) 
        
               } 
        
               /// This is the "slow" version that goes through a mutex to pop an 
        
               /// allocated value off a stack to return to the caller. (Or, if the stack 
        
               /// is empty, a new value is created.) 
        
               /// 
        
               /// If the pool has no owner, then this will set the owner. 
        
               #[cold] 
        
               fn get_slow(&self, caller: usize, owner: usize) -> PoolGuard<'_, T> { 
        
                   use std::sync::atomic::Ordering::Relaxed; 
        
                   if owner == 0 { 
        
                       // The sentinel 0 value means this pool is not yet owned. We 
        
                       // try to atomically set the owner. If we do, then this thread 
        
                       // becomes the owner and we can return a guard that represents 
        
                       // the special T for the owner. 
        
                       let res = self.owner.compare_exchange(0, caller, Relaxed, Relaxed); 
        
                       if res.is_ok() { 
        
                           return self.guard_owned(); 
        
                       } 
        
                   } 
        
                   let mut stack = self.stack.lock().unwrap(); 
        
                   let value = match stack.pop() { 
        
                       None => Box::new((self.create)()), 
        
                       Some(value) => value, 
        
                   }; 
        
                   self.guard_stack(value) 
        
               } 
        
               /// Puts a value back into the pool. Callers don't need to call this. Once 
        
               /// the guard that's returned by 'get' is dropped, it is put back into the 
        
               /// pool automatically. 
        
               fn put(&self, value: Box<T>) { 
        
                   let mut stack = self.stack.lock().unwrap(); 
        
                   stack.push(value); 
        
               } 
        
               /// Create a guard that represents the special owned T. 
        
               fn guard_owned(&self) -> PoolGuard<'_, T> { 
        
                   PoolGuard { pool: self, value: None } 
        
               } 
        
               /// Create a guard that contains a value from the pool's stack. 
        
               fn guard_stack(&self, value: Box<T>) -> PoolGuard<'_, T> { 
        
                   PoolGuard { pool: self, value: Some(value) } 
        
               } 
        
           } 
        
           impl<'a, T: Send> PoolGuard<'a, T> { 
        
               /// Return the underlying value. 
        
               pub fn value(&self) -> &T { 
        
                   match self.value { 
        
                       None => &self.pool.owner_val, 
        
                       Some(ref v) => &**v, 
        
                   } 
        
               } 
        
           } 
        
           impl<'a, T: Send> Drop for PoolGuard<'a, T> { 
        
               #[cfg_attr(feature = "perf-inline", inline(always))] 
        
               fn drop(&mut self) { 
        
                   if let Some(value) = self.value.take() { 
        
                       self.pool.put(value); 
        
                   } 
        
               } 
        
           }

Conceptually, and ignoring optimizations, it's pretty simple:

When a search is executed, it requests cache state from the pool.
If cache state exists, it's popped from its internal storage and given to
the caller in a guard. When the guard is dropped, the state gets pushed back
into the internal storage.
If cache state doesn't exist, then new state is created and returned inside
a guard. As above, once the guard is dropped, the state is pushed back into
the internal storage.

State doesn't care which thread it gets sent to. It's all interchangeable.

As I hinted at above, this is the part where cache size is unbounded. There is
no step in the current implementation that compacts the internal storage of the
pool.

Why? Because I'm waiting for a real world use case..
Your use case might be the one. But maybe not. I really need a repro.

I'd also like to make clear one specific point: the expectation is that
the cache grows without bound, but only in relation to the number of regex
searches executing simultaneously for any single regex. So for example, if
you have 100 threads running but only 10 of them are executing a search for the
same regex simultaneously, then the internal pool should only have 10 copies
of the cache state. (Now, maybe there is a bug and that isn't happening.)

Another experiment you could do, which could potentially also be a work-around,
is to create a new regex for each thread you execute them in. That is, instead
of creating one and sharing it across multiple threads, you create one for
each thread. This can be done by cloning the regex. Internally, Regex uses
reference counting for its immutable state. But each cloned regex gets its
own pool. So if your leak is coming from the pool and it can be traced to the
lack of compaction in the pool---perhaps because threads are spinning up and
down---then in theory this would eliminate the problem because once the thread
gets spun down, the Regex it owns will be freed and so will its pool.
This advice is already in PERFORMANCE.md.
It might also improve search time as well, since the "owning" thread of a Regex
has a special fast path with the pool.

So... bottom line, I'd like to see this fixed, and the most important thing
that can be done to get it fixed is probably to give me a minimal reproduction
of the issue.

If I've got your usage completely wrong and you aren't even using multiple
threads, then that would be pretty bewildering. :-)

Also, any use of Regex above is interchangeable with RegexSet. They both use
identical machinery internally.

adamchalmers · 2022-07-13T12:16:15Z

Hi! I haven't been able to reproduce this, so I think something else must have been going on. Thanks for the thorough explanation.

BurntSushi · 2022-07-13T12:21:59Z

Aye. If you do wind up finding a repro, I would be happy to re-open this, dig into it and figure out a solution.

BurntSushi closed this as completed Jul 13, 2022

BurntSushi added the question label Jul 13, 2022

Dalvany mentioned this issue Jul 31, 2022

Improve memory Dalvany/rphonetic#29

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

possible memory leak in use of RegexSet #875

possible memory leak in use of RegexSet #875

BurntSushi commented Jun 22, 2022

BurntSushi commented Jun 22, 2022

BurntSushi commented Jun 22, 2022

adamchalmers commented Jul 13, 2022

BurntSushi commented Jul 13, 2022

possible memory leak in use of RegexSet #875

possible memory leak in use of RegexSet #875

Comments

BurntSushi commented Jun 22, 2022

Discussed in #874

BurntSushi commented Jun 22, 2022

BurntSushi commented Jun 22, 2022

adamchalmers commented Jul 13, 2022

BurntSushi commented Jul 13, 2022