Change PickleCache from simple LRU to a more modern policy (like windowed LFU/SLRU) #45

jamadden · 2016-09-24T14:33:09Z

Over in RelStorage we've been having a discussion based on the observation that strict LRU isn't necessarily a good cache policy for varying workloads.

Basically, if most queries/transactions/requests use a certain set of objects, those objects shouldn't be evicted from the cache just because an outlier request comes in that scans a different BTree. LRU is prone to that problem. More adaptive approaches aren't.

@ben-manes pointed us to a policy that can be near optimal for a variety of workloads.

I think that would be a good fit for the PickleCache.

I have a C and CFFI implementation that I'm using in RelStorage. We may or may not be able to directly share the code, I don't know (it'd be cool to split this off as its own library on PyPI and distribute binary wheels just for it so I don't have to do that for RelStorage), but we could at least start with it.

Other notes:

The code as is doesn't implement the probability frequency sketch. This feature cheaply allows the cache to have a "memory" of keys it has evicted so if they show up again soon it knows it made the wrong decision and is more biased towards keeping them. Given the way keys change in RelStorage simulations showed this feature wasn't particularly useful there, but since keys are unchanging OIDS in this cache, it might be more useful here. But that can always be added later.
Garbage collection (incrgc) and tuning become simpler, as the cache always stays at its preferred size, automatically, based on the frequency and LRU-ness of the data. We know exactly which items are good to evict from the cache. The drain_resistance goes away.
Of course, this is complicated by the fact that the pickle cache has the option to be based on estimated byte size in addition to object count. Ideally one of those would go away to be able to take full advantage of the existing code (plus the combination makes it difficult to reason about cache behaviour). My vote is for eliminating byte size at this level; let the lower levels worry about caching in terms of bytes.
Does anybody have some cache traces from an actual ZODB production instance, and info on where they came from? That could be plugged into simulations.

The text was updated successfully, but these errors were encountered:

tseaver · 2016-09-30T14:34:35Z

Seems reasonable to me to experiment. Maybe distributing it separate package would keep the focus tight. We might need to add a hook (an environment variable, maybe?) to let people configure which cache implementation to use, at least for benchmarking purposes.

jamadden · 2016-09-30T15:33:28Z

Distributing it (where "it" is an implementation of the PickleCache) as a separate package might be difficult, at least as far as the C version goes.

The CFFI version could be built and distributed separately, but that's always going to have overhead that a pure C implementation doesn't (although in RelStorage, the CFFI implementation is quite a bit faster than the CFFI implementation currently shipping with persistent).

Distributing a C version could be possible, but because it couldn't use the CPersistentRing struct that's embedded in a persistent object (the struct definition is quite different), we'd lose quite a lot of the benefit of that, and it would complicate memory management 😢 (CPython memory management is something I know relatively little about).

I could probably implement it here and run zodbshootout, but unfortunately that's not a very realistic workload (although RelStorage's cache did show notable improvements)

jimfulton · 2018-09-22T20:56:08Z

Drive by :) comments:

It would be nice to decouple the cache implementation from the persistence semantics. I think this would be pretty straightforward.
We define a C API for a pluggable cache.
Ideally, IMO, we'd wrap some existing C implementation(s) with an adapter to this API and then expose the wrapper/adapter via Capsules. This way, the cache can be configured by passing a module or C API exported by the module.

jamadden mentioned this issue Sep 24, 2016

Distribute the CFFI ring implementation in the wheels? #43

Closed

jamadden mentioned this issue Jan 5, 2017

Add option to *not* resize the object cache zodb/zodbshootout#24

Open

jamadden added a commit that referenced this issue Jul 31, 2018

Build CFFI modules the modern way. Fixes #75 and fixes #45

d7383a5

jamadden mentioned this issue Aug 1, 2018

Can we require CFFI for PURE_PYTHON usage on CPython? #77

Closed

jamadden mentioned this issue Jun 28, 2019

Idea: save/restore active objects on DB close/create zopefoundation/ZODB#163

Open

jamadden mentioned this issue Nov 6, 2019

PyPI wheels for Python 3.8 #117

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Change PickleCache from simple LRU to a more modern policy (like windowed LFU/SLRU) #45

Change PickleCache from simple LRU to a more modern policy (like windowed LFU/SLRU) #45

jamadden commented Sep 24, 2016

tseaver commented Sep 30, 2016

jamadden commented Sep 30, 2016

jimfulton commented Sep 22, 2018

Change PickleCache from simple LRU to a more modern policy (like windowed LFU/SLRU) #45

Change PickleCache from simple LRU to a more modern policy (like windowed LFU/SLRU) #45

Comments

jamadden commented Sep 24, 2016

tseaver commented Sep 30, 2016

jamadden commented Sep 30, 2016

jimfulton commented Sep 22, 2018