Skip to content

Commit

Permalink
Add docs for configuring storage and isolation for cache policies (#1…
Browse files Browse the repository at this point in the history
  • Loading branch information
desertaxle authored Sep 19, 2024
1 parent 7de0aaa commit 6b8cc51
Showing 1 changed file with 99 additions and 0 deletions.
99 changes: 99 additions & 0 deletions docs/3.0/develop/task-caching.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -186,6 +186,105 @@ def my_cached_task(x: int):
return x + 1
```

### Cache storage

By default, cache records are collocated with task results and files containing task results will include metadata used for caching.
Configuring a cache policy with a `key_storage` argument allows cache records to be stored separately from task results.

When cache key storage is configured, persisted task results will only include the return value of your task and cache records can be deleted or modified
without effecting your task results.

You can configure where cache records are stored by using the `.configure` method with a `key_storage` argument on a cache policy.
The `key_storage` argument accepts either a path to a local directory or a storage block.

For example:

```python
from prefect import task
from prefect.cache_policies import TASK_SOURCE, INPUTS

cache_policy = (TASK_SOURCE + INPUTS).configure(key_storage="/path/to/cache/storage")

@task(cache_policy=cache_policy)
def my_cached_task(x: int):
return x + 42
```

This task will store cache records in the specified directory.

To store cache records in a remote object store such as S3, pass a storage block instead:

```python
from prefect import task
from prefect.cache_policies import TASK_SOURCE, INPUTS

from prefect_aws import S3Bucket

cache_policy = (TASK_SOURCE + INPUTS).configure(key_storage=S3Bucket.load("my-bucket"))

@task(cache_policy=cache_policy)
def my_cached_task(x: int):
return x + 42
```

### Cache isolation

Cache isolation controls how concurrent task runs interact with cache records. Prefect supports two isolation levels: `READ_COMMITTED` and `SERIALIZABLE`.

By default, cache records operate with a `READ_COMMITTED` isolation level. This guarantees that reading a cache record will see the latest committed cache value,
but allows multiple executions of the same task to occur simultaneously.

For stricter isolation, you can use the `SERIALIZABLE` isolation level. This ensures that only one execution of a task occurs at a time for a given cache
record via a locking mechanism.

To configure the isolation level, use the `.configure` method with an `isolation_level` argument on a cache policy. When using `SERIALIZABLE`, you must
also provide a `lock_manager` that implements locking logic for your system.

For example:

```python
from prefect import task
from prefect.cache_policies import TASK_SOURCE, INPUTS
from prefect.isolation_levels import SERIALIZABLE
from prefect.locking.filesystem import FileSystemLockManager

cache_policy = (INPUTS + TASK_SOURCE).configure(
isolation_level=SERIALIZABLE,
lock_manager=FileSystemLockManager(lock_files_directory="path/to/lock/files"),
)

@task(cache_policy=cache_policy)
def my_cached_task(x: int):
return x + 42
```

This task will create a lock file in the specified directory whenever it is run to ensure that only one instance of the task is executed at a time.

<Note>
**Locking in a distributed setting**

To manage locks in a distributed setting, you will need to use a storage system for locks that is accessible by all of your execution infrastructure.

We recommend using the `RedisLockManager` provided by `prefect-redis` in conjunction with a shared Redis instance:

```python
from prefect import task
from prefect.cache_policies import TASK_SOURCE, INPUTS
from prefect.isolation_levels import SERIALIZABLE

from prefect_redis import RedisLockManager

cache_policy = (INPUTS + TASK_SOURCE).configure(
isolation_level=SERIALIZABLE,
lock_manager=RedisLockManager(host="my-redis-host"),
)

@task(cache_policy=cache_policy)
def my_cached_task(x: int):
return x + 42
```
</Note>

## Multi-task caching

There are many situations in which multiple tasks need to always run together or not at all.
Expand Down

0 comments on commit 6b8cc51

Please sign in to comment.