Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

possible memory leak in use of RegexSet #875

Closed
BurntSushi opened this issue Jun 22, 2022 Discussed in #874 · 4 comments
Closed

possible memory leak in use of RegexSet #875

BurntSushi opened this issue Jun 22, 2022 Discussed in #874 · 4 comments
Labels

Comments

@BurntSushi
Copy link
Member

Discussed in #874

Originally posted by adamchalmers June 22, 2022
Hi! I'm trying to find the source of some memory leaks in a gRPC server my team is building. One of the endpoints runs some RegexSets over some inputs.

I'm using dhat to measure memory usage, and I notice running RegexSet::matches(field).into_iter().next() adds around 2.4kb of memory leak per run. I know the RegexSets have an internal cache where they store programs. I was wondering if this cache grows unboundedly? Is it possible to put a limit on the cache size, or to describe the cache behaviour in the crate's docs?

@BurntSushi
Copy link
Member Author

cc @adamchalmers

@BurntSushi
Copy link
Member Author

Can you provide a repro? Here's one:

use regex::RegexSet;

const ITERS: usize = 1_000;

fn main() {
    let haystack = "Шерлок Холмс ".repeat(1_000);
    let re = RegexSet::new(&[
        r"\pL{5}",
        r"\w{6}",
        r"\w{6}\s+\w{5}",
    ]).unwrap();

    let mut count = 0;
    for _ in 0..ITERS {
        if let Some(_) = re.matches(&haystack).into_iter().next() {
            count += 1;
        }
    }
    println!("{:?}", count);
}

Running the program as-is:

$ cargo build --release
$ time ./target/release/d874
1000

real    1.039
user    1.032
sys     0.007
maxmem  9 MB
faults  0

Modifying the program to have ITERS = 10_000 gives:

$ time ./target/release/d874
10000

real    10.270
user    10.261
sys     0.007
maxmem  9 MB
faults  0

So an order of magnitude more searches, but the memory usage remains invariant.

Here's another variant, where we compile the RegexSet in the loop:

use regex::RegexSet;

const ITERS: usize = 100;

fn main() {
    let haystack = "Шерлок Холмс ".repeat(1_000);

    let mut count = 0;
    for _ in 0..ITERS {
        let re = RegexSet::new(&[
            r"\pL{5}",
            r"\w{6}",
            r"\w{6}\s+\w{5}",
        ]).unwrap();
        if let Some(_) = re.matches(&haystack).into_iter().next() {
            count += 1;
        }
    }
    println!("{:?}", count);
}

Running that gives:

$ time ./target/release/d874
100

real    0.446
user    0.439
sys     0.007
maxmem  10 MB
faults  0

Setting ITERS = 1_000 gives:

$ time ./target/release/d874
1000

real    4.217
user    4.205
sys     0.010
maxmem  10 MB
faults  0

In this case also, memory usage is invariant despite an order of magnitude more
compilations/searches.

This is generally what I would expect. With that said...

describe the cache behaviour in the crate's docs?

I would prefer not. It's really an internal implementation detail. I would be
open to, perhaps, including something in PERFORMANCE.md though.

Is it possible to put a limit on the cache size,

Technically that's what RegexSetBuilder::dfa_size_limit is for. However,
there may be other relevant limits, such as the total number of such caches
built.

In general, I'd rather limits in the API be a last resort. I would want to
understand the leak first, and whether there is a way to robustly fix it in
general.

I was wondering if this cache grows unboundedly?

That's the money shot. Technically, the answer to this is "yes." There is no
technical reason for why the cache must grow unbounded.

The "cache" exists at two levels:

  • Each regex engine gets its own cache that basically involves regex engine
    specific state that is mutated during search.
  • Since a Regex can be used from multiple threads simultaneously, there are
    two ways to give mutable space to each regex engine. Either the mutable
    space itself becomes thread safe (effectively a non-starter, but not
    impossible, but likely impossible without sacrificing search time), or the
    regex maintains multiple scratch spaces per regex engine so that they are
    entire independent. The current implementation takes the latter approach and
    will for the forseeable future.

The guts of that second piece are implemented here:

regex/src/pool.rs

Lines 166 to 277 in db1efc6

/// A guard that is returned when a caller requests a value from the pool.
///
/// The purpose of the guard is to use RAII to automatically put the value back
/// in the pool once it's dropped.
#[derive(Debug)]
pub struct PoolGuard<'a, T: Send> {
/// The pool that this guard is attached to.
pool: &'a Pool<T>,
/// This is None when the guard represents the special "owned" value. In
/// which case, the value is retrieved from 'pool.owner_val'.
value: Option<Box<T>>,
}
impl<T: Send> Pool<T> {
/// Create a new pool. The given closure is used to create values in the
/// pool when necessary.
pub fn new(create: CreateFn<T>) -> Pool<T> {
let owner = AtomicUsize::new(0);
let owner_val = create();
Pool { stack: Mutex::new(vec![]), create, owner, owner_val }
}
/// Get a value from the pool. The caller is guaranteed to have exclusive
/// access to the given value.
///
/// Note that there is no guarantee provided about which value in the
/// pool is returned. That is, calling get, dropping the guard (causing
/// the value to go back into the pool) and then calling get again is NOT
/// guaranteed to return the same value received in the first get call.
#[cfg_attr(feature = "perf-inline", inline(always))]
pub fn get(&self) -> PoolGuard<'_, T> {
// Our fast path checks if the caller is the thread that "owns" this
// pool. Or stated differently, whether it is the first thread that
// tried to extract a value from the pool. If it is, then we can return
// a T to the caller without going through a mutex.
//
// SAFETY: We must guarantee that only one thread gets access to this
// value. Since a thread is uniquely identified by the THREAD_ID thread
// local, it follows that is the caller's thread ID is equal to the
// owner, then only one thread may receive this value.
let caller = THREAD_ID.with(|id| *id);
let owner = self.owner.load(Ordering::Relaxed);
if caller == owner {
return self.guard_owned();
}
self.get_slow(caller, owner)
}
/// This is the "slow" version that goes through a mutex to pop an
/// allocated value off a stack to return to the caller. (Or, if the stack
/// is empty, a new value is created.)
///
/// If the pool has no owner, then this will set the owner.
#[cold]
fn get_slow(&self, caller: usize, owner: usize) -> PoolGuard<'_, T> {
use std::sync::atomic::Ordering::Relaxed;
if owner == 0 {
// The sentinel 0 value means this pool is not yet owned. We
// try to atomically set the owner. If we do, then this thread
// becomes the owner and we can return a guard that represents
// the special T for the owner.
let res = self.owner.compare_exchange(0, caller, Relaxed, Relaxed);
if res.is_ok() {
return self.guard_owned();
}
}
let mut stack = self.stack.lock().unwrap();
let value = match stack.pop() {
None => Box::new((self.create)()),
Some(value) => value,
};
self.guard_stack(value)
}
/// Puts a value back into the pool. Callers don't need to call this. Once
/// the guard that's returned by 'get' is dropped, it is put back into the
/// pool automatically.
fn put(&self, value: Box<T>) {
let mut stack = self.stack.lock().unwrap();
stack.push(value);
}
/// Create a guard that represents the special owned T.
fn guard_owned(&self) -> PoolGuard<'_, T> {
PoolGuard { pool: self, value: None }
}
/// Create a guard that contains a value from the pool's stack.
fn guard_stack(&self, value: Box<T>) -> PoolGuard<'_, T> {
PoolGuard { pool: self, value: Some(value) }
}
}
impl<'a, T: Send> PoolGuard<'a, T> {
/// Return the underlying value.
pub fn value(&self) -> &T {
match self.value {
None => &self.pool.owner_val,
Some(ref v) => &**v,
}
}
}
impl<'a, T: Send> Drop for PoolGuard<'a, T> {
#[cfg_attr(feature = "perf-inline", inline(always))]
fn drop(&mut self) {
if let Some(value) = self.value.take() {
self.pool.put(value);
}
}
}

Conceptually, and ignoring optimizations, it's pretty simple:

  • When a search is executed, it requests cache state from the pool.
  • If cache state exists, it's popped from its internal storage and given to
    the caller in a guard. When the guard is dropped, the state gets pushed back
    into the internal storage.
  • If cache state doesn't exist, then new state is created and returned inside
    a guard. As above, once the guard is dropped, the state is pushed back into
    the internal storage.

State doesn't care which thread it gets sent to. It's all interchangeable.

As I hinted at above, this is the part where cache size is unbounded. There is
no step in the current implementation that compacts the internal storage of the
pool.

Why? Because I'm waiting for a real world use case..
Your use case might be the one. But maybe not. I really need a repro.

I'd also like to make clear one specific point: the expectation is that
the cache grows without bound, but only in relation to the number of regex
searches executing simultaneously for any single regex
. So for example, if
you have 100 threads running but only 10 of them are executing a search for the
same regex simultaneously, then the internal pool should only have 10 copies
of the cache state. (Now, maybe there is a bug and that isn't happening.)

Another experiment you could do, which could potentially also be a work-around,
is to create a new regex for each thread you execute them in. That is, instead
of creating one and sharing it across multiple threads, you create one for
each thread. This can be done by cloning the regex. Internally, Regex uses
reference counting for its immutable state. But each cloned regex gets its
own pool. So if your leak is coming from the pool and it can be traced to the
lack of compaction in the pool---perhaps because threads are spinning up and
down---then in theory this would eliminate the problem because once the thread
gets spun down, the Regex it owns will be freed and so will its pool.
This advice is already in PERFORMANCE.md.
It might also improve search time as well, since the "owning" thread of a Regex
has a special fast path with the pool.

So... bottom line, I'd like to see this fixed, and the most important thing
that can be done to get it fixed is probably to give me a minimal reproduction
of the issue.

If I've got your usage completely wrong and you aren't even using multiple
threads, then that would be pretty bewildering. :-)

Also, any use of Regex above is interchangeable with RegexSet. They both use
identical machinery internally.

@adamchalmers
Copy link

Hi! I haven't been able to reproduce this, so I think something else must have been going on. Thanks for the thorough explanation.

@BurntSushi
Copy link
Member Author

Aye. If you do wind up finding a repro, I would be happy to re-open this, dig into it and figure out a solution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants