Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cache persistence to disk #778

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open

Cache persistence to disk #778

wants to merge 1 commit into from

Conversation

souleb
Copy link
Member

@souleb souleb commented May 30, 2024

follow-up to #766

This PR adds a persistence feature to the cache pkg.

The principal use cases is for image-reflector-automation to cache repository tags and to be able to persist them when the controller stop.

@souleb souleb requested a review from a team as a code owner May 30, 2024 13:17
@makkes
Copy link
Member

makkes commented May 30, 2024

If you change the base here to persist-cache, then it's possible to see only the difference to #766.

@souleb souleb changed the base branch from main to enable-cachin-auth-tokens May 30, 2024 20:42
@souleb souleb force-pushed the persist-cache branch 3 times, most recently from 2ecea0f to c0c8ee5 Compare May 31, 2024 09:24
@souleb souleb force-pushed the enable-cachin-auth-tokens branch 2 times, most recently from c6ec0fb to 6b7b355 Compare June 10, 2024 13:08
@souleb souleb force-pushed the enable-cachin-auth-tokens branch 10 times, most recently from fdefd65 to d838d8a Compare June 14, 2024 14:20
Base automatically changed from enable-cachin-auth-tokens to main June 14, 2024 14:36
@souleb souleb force-pushed the persist-cache branch 2 times, most recently from a237166 to 044b481 Compare June 17, 2024 12:44
cache/cache.go Outdated Show resolved Hide resolved
cache/cache.go Outdated Show resolved Hide resolved
@@ -45,6 +45,8 @@ type Store[T any] interface {
}

// Expirable is an interface for a cache store that supports expiration.
// It extends the Store interface.
// It also provides disk persistence.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not clear to me what is the association between expirable cache and persistence. The core cache should also be able to have persistence.
The LRU cache may also have persistence.

If we don't want the Store interface to have persistence, a new interface extending the Store interface, say Persistable, can be made. The default cache can implement both Expirable and Persistable.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I started with a persistable interface. Now, LRU does not need persistence in my opinion because items can be evicted at every moment, there is no way to know what is being persisted. I don't see any value there. Also this is a cache so in most cases we don't need to persist it. For the expiration cache, users can configure the retention period, so there can be use cases where we want to start with a snapshot. That's why I finally decided to add the persist method in the Expirable interface.

But I guess there might be other implementations in the future that may need it. I'll reintroduce persistable.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still not sure how LRU and expirable cache are different in terms of persistence. An application would benefits in the same way on restart if any reusable cache data is available for the application, regardless of the cache item storage policy. The application may continue to use the same few items which are present in the cache and can avoid fetching them again.

@@ -53,6 +55,8 @@ type Expirable[T any] interface {
GetExpiration(object T) (time.Time, error)
// HasExpired returns true if the object has expired.
HasExpired(object T) (bool, error)
// Persist persists the cache to disk.
Persist() error
Copy link
Contributor

@darkowlzz darkowlzz Jun 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Considering the idea of seeding or restoring data from a snapshot, it may be better to accept an argument here to allow persistence at a different location. Allowing to have separate restore and save locations. A cache may have a need to take snapshots multiple times with different file names.

For our default cache implementation, both the restore and persist path would be the same, but it would be better to leave space for different flexible implementations where the cache may not have internal knowledge about where to save.
I'm considering all these because we are defining an interface. If didn't had an interface, I wouldn't consider about other implementations.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure about this. We want a simple backup/restore mechanism. Users don't have access to the underlying datastore, so they cannot decide on what is being snapshot at a given location.

In the future if we want to enable this, we can add a setter method to change the snapshot location. But I don't see a use case now.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that case, maybe we can simplify it further. If there's no configurability for the user of the interface, an explicit method and interface for invoking persist seems unnecessary. The cache would itself determine what, where and when to persist. For example, when the application is about to stop, the cache will be closed. Persistence can be performed as part of closing the cache. In case of controllers apps, it'll most likely be outside of the controller, in the program main. The main would be aware of the type of cache and can call the public Close() method or the appropriate method for closing the used cache. All the consumers of the cache, the controller/reconcilers, would interact with the cache via the Store and Expirable interface.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's also one of the reason why Persist() was in Expirable. You could just decide whether to store or not what was in the cache (based on the ttl and capacity, there still is a possibility to infere what's in there). Having the cache arbitrarily decide to make a snapshot, means we have to prepare for it (writable fs and capacity), and also that leave no choice. In our case we want to snapshot in IRC but not in SC.

Copy link
Contributor

@darkowlzz darkowlzz Jun 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That can be configured in the cache Options to persist at close or not.
Or based on the Options.snapshotPath value.

Copy link
Contributor

@darkowlzz darkowlzz Jun 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another way to implement this could be to accept persistence policy as a cache option. Policies could be persist at close, or persist at every write, or persist at every x interval.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes it simpler yes, but you want be able to do p, ok := cache.(Persistable) to check what type you have in case you receive a cache as a parameter.

Policies could be persist at close, or persist at every write, or persist at every x interval.
So we would have a single policy to begin. This and the interface are not mutually exclusive.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we have an example of how persistence will be used in the controller, it'll be easier to decide if we really need an interface for it.
The way I'm imagining how the controllers will use persistence, I don't see a need for it. As described above, I see the main program to be initializing a specific type of cache and passing that to the reconcilers. Since the main program knows what type of cache it is using and its features, it can control when to persist or configure the cache to persist based on some policy. Persistence becomes an internal feature of the cache. Other components don't need to do anything about persistence.
I don't know of a use case where the reconciler or something else that's unaware of the cache type will need to deal with persistence.

If implemented, the cache will be persisted to disk by calling
`Persist`. It will be loaded when instantiating a cache by calling `New`
if an existing `path` is provided.

Signed-off-by: Soule BA <[email protected]>
// writeToBuf writes the cache to the buffer
// no locks are taken, the caller should ensure that
// the cache is not being modified while this function is called.
func (c *cache[T]) writeToBuf() error {
Copy link
Contributor

@darkowlzz darkowlzz Jun 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since all the cached data are of known type and in Go, can't we just use gob to achieve the same, that is to serialize the cache items and write to a buffer?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To have better control. I am open to change this. My reasoning is that if we ever want to offer different durability level for the cache (mem vs on disk), we will need to deal with the underlying persistence layer on an per-item basis in order to be able to every change. We may never need it, but it's possible with this implementation.

Copy link
Contributor

@darkowlzz darkowlzz Jun 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The gob API deals with serialization at per object level. Looks similar to what the current code does, just the serialization of each cache item comes for free. How the serialized data is organized later and loaded is up to the higher level implementation. Gob just provides a nice encoder and decoder for simple serialization of Go objects.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants