Skip to content

Commit

Permalink
s/`maybe_changed_since/maybe_changed_after/
Browse files Browse the repository at this point in the history
  • Loading branch information
nikomatsakis committed Nov 13, 2021
1 parent cd265bf commit f837d99
Show file tree
Hide file tree
Showing 17 changed files with 411 additions and 51 deletions.
2 changes: 1 addition & 1 deletion book/src/SUMMARY.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@
- [Database](./plumbing/database.md)
- [The `salsa` crate](./plumbing/salsa_crate.md)
- [Query operations](./plumbing/query_ops.md)
- [Maybe changed since](./plumbing/maybe_changed_since.md)
- [maybe changed after](./plumbing/maybe_changed_after.md)
- [Fetch](./plumbing/fetch.md)
- [Derived queries flowchart](./plumbing/derived_flowchart.md)
- [Cycle handling](./plumbing/cycles.md)
Expand Down
4 changes: 2 additions & 2 deletions book/src/plumbing/derived_flowchart.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
# Derived queries flowchart

Derived queries are by far the most complex. This flowchart documents the flow of the [maybe changed since] and [fetch] operations. This flowchart can be edited on [draw.io]:
Derived queries are by far the most complex. This flowchart documents the flow of the [maybe changed after] and [fetch] operations. This flowchart can be edited on [draw.io]:

[draw.io]: https://draw.io
[fetch]: ./fetch.md
[maybe changed since]: ./maybe_changed_since.md
[maybe changed after]: ./maybe_changed_after.md

<!-- The explicit div is there because, otherwise, the flowchart is unreadable when using "dark mode" -->
<div style="background-color:white;">
Expand Down
2 changes: 1 addition & 1 deletion book/src/plumbing/fetch.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ The logic for derived queries is more complex. We summarize the high-level ideas
* Otherwise, if the memo contains a memoized value, we must check whether [dependencies] have been modified:
* Let R be the revision in which the memo was last verified; we wish to know if any of the dependencies have changed since revision R.
* First, we check the [durability]. For each memo, we track the minimum durability of the memo's dependencies. If the memo has durability D, and there have been no changes to an input with durability D since the last time the memo was verified, then we can consider the memo verified without any further work.
* If the durability check is not sufficient, then we must check the dependencies individually. For this, we iterate over each dependency D and invoke the [maybe changed since](./maybe_changed_since.md) operation to check whether D has changed since the revision R.
* If the durability check is not sufficient, then we must check the dependencies individually. For this, we iterate over each dependency D and invoke the [maybe changed after](./maybe_changed_after.md) operation to check whether D has changed since the revision R.
* If no dependency was modified:
* We can mark the memo as verified and return its memoized value.
* Assuming dependencies have been modified or the memo does not contain a memoized value:
Expand Down
Original file line number Diff line number Diff line change
@@ -1,14 +1,14 @@
# Maybe changed since
# Maybe changed after

```rust,no_run,noplayground
{{#include ../../../src/plumbing.rs:maybe_changed_since}}
{{#include ../../../src/plumbing.rs:maybe_changed_after}}
```

The `maybe_changed_since` operation computes whether a query's value *may have changed* since the given revision.
The `maybe_changed_after` operation computes whether a query's value *may have changed* **after** the given revision. In other words, `Q.maybe_change_since(R)` is true if the value of the query `Q` may have changed in the revisions `(R+1)..R_now`, where `R_now` is the current revision. Note that it doesn't make sense to ask `maybe_changed_after(R_now)`.

## Input queries

Input queries are set explicitly by the user. `maybe_changed_since` can therefore just check when the value was last set and compare.
Input queries are set explicitly by the user. `maybe_changed_after` can therefore just check when the value was last set and compare.

## Interned queries

Expand All @@ -20,7 +20,7 @@ The logic for derived queries is more complex. We summarize the high-level ideas
* Otherwise, we must check whether [dependencies] have been modified:
* Let R be the revision in which the memo was last verified; we wish to know if any of the dependencies have changed since revision R.
* First, we check the [durability]. For each memo, we track the minimum durability of the memo's dependencies. If the memo has durability D, and there have been no changes to an input with durability D since the last time the memo was verified, then we can consider the memo verified without any further work.
* If the durability check is not sufficient, then we must check the dependencies individually. For this, we iterate over each dependency D and invoke the [maybe changed since](./maybe_changed_since.md) operation to check whether D has changed since the revision R.
* If the durability check is not sufficient, then we must check the dependencies individually. For this, we iterate over each dependency D and invoke the [maybe changed after](./maybe_changed_after.md) operation to check whether D has changed since the revision R.
* If no dependency was modified:
* We can mark the memo as verified and use its [changed at] revision to return true or false.
* Assuming dependencies have been modified:
Expand Down
2 changes: 1 addition & 1 deletion book/src/plumbing/query_ops.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ Each of the query storage struct implements the `QueryStorageOps` trait found in

which defines the basic operations that all queries support. The most important are these two:

* [Maybe changed since](./maybe_changed_since.md): Returns true if the value of the query (for the given key) may have changed since the given revision.
* [maybe changed after](./maybe_changed_after.md): Returns true if the value of the query (for the given key) may have changed since the given revision.
* [Fetch](./fetch.md): Returms the up-to-date value for the given K (or an error in the case of an "unrecovered" cycle).

[`plumbing`]: https://github.com/salsa-rs/salsa/blob/master/src/plumbing.rs
4 changes: 2 additions & 2 deletions book/src/plumbing/terminology/verified.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
# Verified

A [memo] is *verified* in a revision R if we have checked that its value is still up-to-date (i.e., if we were to reexecute the [query function], we are guaranteed to get the same result). Each memo tracks the revision in which it was last verified to avoid repeatedly checking whether dependencies have changed during the [fetch] and [maybe changed since] operations.
A [memo] is *verified* in a revision R if we have checked that its value is still up-to-date (i.e., if we were to reexecute the [query function], we are guaranteed to get the same result). Each memo tracks the revision in which it was last verified to avoid repeatedly checking whether dependencies have changed during the [fetch] and [maybe changed after] operations.

[query function]: ./query_function.md
[fetch]: ../fetch.md
[maybe changed since]: ../maybe_changed_since.md
[maybe changed after]: ../maybe_changed_after.md
[memo]: ./memo.md
8 changes: 4 additions & 4 deletions book/src/rfcs/RFC0001-Query-Group-Traits.md
Original file line number Diff line number Diff line change
Expand Up @@ -268,9 +268,9 @@ impl of `QueryGroup`. That involves generating the following things:
but since we cannot, it is entitled `MyGroupGroupKey`
- It is an enum which contains one variant per query with the value being the key:
- `my_query(<MyQueryQuery as salsa::plumbing::Query<DB>>::Key)`
- The group key enum offers a public, inherent method `maybe_changed_since`:
- `fn maybe_changed_since<DB>(db: &DB, db_descriptor: &DB::DatabaseKey, revision: Revision)`
- it is invoked when implementing `maybe_changed_since` for the database key
- The group key enum offers a public, inherent method `maybe_changed_after`:
- `fn maybe_changed_after<DB>(db: &DB, db_descriptor: &DB::DatabaseKey, revision: Revision)`
- it is invoked when implementing `maybe_changed_after` for the database key

### Lowering database storage

Expand Down Expand Up @@ -298,7 +298,7 @@ It generates the following things:
- This contains a `for_each_query` method, which is implemented by invoking, in turn,
the inherent methods defined on each query group storage struct.
- impl of `plumbing::DatabaseKey` for the database key enum
- This contains a method `maybe_changed_since`. We implement this by
- This contains a method `maybe_changed_after`. We implement this by
matching to get a particular group key, and then invoking the
inherent method on the group key struct.

Expand Down
2 changes: 1 addition & 1 deletion book/src/rfcs/RFC0006-Dynamic-Databases.md
Original file line number Diff line number Diff line change
Expand Up @@ -343,7 +343,7 @@ some type for dependencies that is independent of the dtabase type `DB`.
There are a number of methods that can be dispatched through the database
interface on a `DatabaseKeyIndex`. For example, we already mentioned
`fmt_debug`, which emits a debug representation of the key, but there is also
`maybe_changed_since`, which checks whether the value for a given key may have
`maybe_changed_after`, which checks whether the value for a given key may have
changed since the given revision. Each of these methods is a member of the
`DatabaseOps` trait and they are dispatched as follows.

Expand Down
175 changes: 175 additions & 0 deletions book/src/rfcs/RFC0010-Slot-no-more.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,175 @@
# Parallel friendly caching

## Metadata

* Author: nikomatsakis
* Date: 2021-05-29
* Introduced in: (please update once you open your PR)

## Summary

* Rework query storage to be based on concurrent hashmaps instead of slots with read-write locked state.

## Motivation

Two-fold:

* Simpler, cleaner, and hopefully faster algorithm.
* Enables some future developments that are not part of this RFC:
* Derived queries whose keys are known to be integers.
* Fixed point cycles so that salsa and chalk can be deeply integrated.
* Non-synchronized queries that potentially execute on many threads in parallel (required for fixed point cycles, but potentially valuable in their own right).

## User's guide

No user visible changes.

## Reference guide

### Background: Current structure

Before this RFC, the **overall structure** of derived queries is as follows:

* Each derived query has a `DerivedStorage<Q>` (stored in the database) that contains:
* the `slot_map`, a monotonically growing, indexable map from keys (`Q::Key`) to the `Slot<Q>` for the given key
* lru list
* Each `Slot<Q>` has
* r-w locked query-state that can be:
* not-computed
* in-progress with synchronization storage:
* `id` of the runtime computing the value
* `anyone_waiting`: `AtomicBool` set to true if other threads are awaiting result
* a `Memo<Q>`
* A `Memo<Q>` has
* an optional value `Option<Q::Value>`
* dependency information:
* verified-at
* changed-at
* durability
* input set (typically a `Arc<[DatabaseKeyIndex]>`)

[maybe changed after]: ../plumbing/maybe_changed_after.md
[fetch]: ../plumbing/fetch.md

**Fetching the value for a query** currently works as follows:

* Acquire the read lock on the (indexable) `slot_map` and hash key to find the slot.
* If no slot exists, acquire write lock and insert.
* Acquire the slot's internal lock to perform the [fetch] operation.

**Verifying a dependency** uses a scheme introduced in [RFC #6](./RFC0006-Dynamic-Databases.md). Each dependency is represented as a `DatabaseKeyIndex` which contains three indices (group, query, and key). The group and query indices are used to find the query storage via `match` statements and then the next operation depends on the query type:

* Acquire the read lock on the (indexable) `slot_map` and use key index to load the slot. Read lock is released afterwards.
* Acquire the slot's internal lock to perform the [maybe changed after] operation.

### New structure (introduced by this RFC)

The **overall structure** of derived queries after this RFC is as follows:

* Each derived query has a `DerivedStorage<Q>` (stored in the database) that contains:
* a set of concurrent hashmaps:
* `key_map`: maps from `Q::Key` to an internal key index `K`
* `memo_map`: maps from `K` to cached memo `ArcSwap<Memo<Q>>`
* `sync_map`: maps from `K` to a `Sync<Q>` synchronization value
* lru set
* A `Memo<Q>` has
* an *immutable* optional value `Option<Q::Value>`
* dependency information:
* *updatable* verified-at (`AtomicCell<Option<Revision>>`)
* *immutable* changed-at (`Revision`)
* *immutable* durability (`Durability`)
* *immutable* input set (typically a `Arc<[DatabaseKeyIndex]>`)
* information for LRU:
* `DatabaseKeyIndex`
* `lru_index`, an `AtomicUsize`
* A `Sync<Q>` has
* `id` of the runtime computing the value
* `anyone_waiting`: `AtomicBool` set to true if other threads are awaiting result

**Fetching the value for a *derived* query** will work as follows:

1. Find internal index `K` by hashing key, as today.
* Precise operation for this will depend on the concurrent hashmap implementation.
2. Load memo `M: Arc<Memo<Q>>` from `memo_map[K]` (if present):
* If verified is `None`, then another thread has found this memo to be invalid; ignore it.
* Else, let `Rv` be the "last verified revision".
* If `Rv` is the current revision, or last change to an input with durability `M.durability` was before `Rv`:
* Update "last verified revision" and **return** memoized value.
3. Atomically check `sync_map` for an existing `Sync<Q>`:
* If one exists, block on the thread within and return to step 2 after it completes:
* If this results in a cycle, unwind as today.
* If none exists, insert a new entry with current runtime-id.
4. Check dependencies deeply
* Iterate over each dependency `D` and check `db.maybe_changed_after(D, Rv)`.
* If no dependency has changed, update `verified_at` to current revision and **return** memoized value.
* Mark memo as invalid by storing `None` in the verified-at.
5. Construct the new memo:
* Push query onto the local stack and execute the query function:
* If this query is found to be a cycle participant, execute recovery function.
* Backdate result if it is equal to the old memo's value.
* Allocate new memo.
6. Store results:
* Store new memo into `memo_map[K]`.
* Remove query from the `sync_map`.
7. **Return** newly constructed value._

**Verifying a dependency for a *derived* query** will work as follows:

1. Find internal index `K` by hashing key, as today.
* Precise operation for this will depend on the concurrent hashmap implementation.
2. Load memo `M: Arc<Memo<Q>>` from `memo_map[K]` (if present):
* If verified is `None`, then another thread has found this memo to be invalid; ignore it.
* Else, let `Rv` be the "last verified revision".
* If `Rv` is the current revision, **return** true or false depending on whether changed-at from memo.
* If last change to an input with durability `M.durability` was before `Rv`:
* Update `verified_at` to current revision and **return** memoized value.
* Iterate over each dependency `D` and check `db.maybe_changed_after(D, Rv)`.
* If no dependency has changed, update `verified_at` to current revision and **return** memoized value.
* Mark memo as invalid by storing `None` in the verified-at.
3. Atomically check `sync_map` for an existing `Sync<Q>`:
* If one exists, block on the thread within and return to step 2 after it completes:
* If this results in a cycle, unwind as today.
* If none exists, insert a new entry with current runtime-id.
4. Construct the new memo:
* Push query onto the local stack and execute the query function:
* If this query is found to be a cycle participant, execute recovery function.
* Backdate result if it is equal to the old memo's value.
* Allocate new memo.
5. Store results:
* Store new memo into `memo_map[K]`.
* Remove query from the `sync_map`.
6. **Return** true or false depending on whether memo was backdated.

## Frequently asked questions

### Why use `ArcSwap`?

It's a relatively minor implementation detail, but the code in this PR uses `ArcSwap` to store the values in the memo-map. In the case of a cache hit or other transient operations, this allows us to read from the arc while avoiding a full increment of the ref count. It adds a small bit of complexity because we have to be careful to do a full load before any recursive operations, since arc-swap only gives a fixed number of "guards" per thread before falling back to more expensive loads.

### Do we really need `maybe_changed_after` *and* `fetch`?

Yes, we do. "maybe changed after" is very similar to "fetch", but it doesn't require that we have a memoized value. This is important for LRU.

### The LRU map in the code is just a big lock!

That's not a question. But it's true, I simplified the LRU code to just use a mutex. My assumption is that there are relatively few LRU-ified queries, and their values are relatively expensive to compute, so this is ok. If we find it's a bottleneck, though, I believe we could improve it by using a similar "zone scheme" to what we use now. We would add a `lru_index` to the `Memo` so that we can easily check if the memo is in the "green zone" when reading (if so, no updates are needed). The complexity there is that when we produce a replacement memo, we have to install it and swap the index. Thinking about that made my brain hurt a little so I decided to just take the simple option for now.

### How do the synchronized / atomic operations compare after this RFC?

After this RFC, to perform a read, in the best case:

* We do one "dashmap get" to map key to key index.
* We do another "dashmap get" from key index to memo.
* We do an "arcswap load" to get the memo.
* We do an "atomiccell read" to load the current revision or durability information.

dashmap is implemented with a striped set of read-write locks, so this is roughly the same (two read locks) as before this RFC. However:

* We no longer do any atomic ref count increments.
* It is theoretically possible to replace dashmap with something that doesn't use locks.
* The first dashmap get should be removable, if we know that the key is a 32 bit integer.
* I plan to propose this in a future RFC.

### Yeah yeah, show me some benchmarks!

I didn't run any. I'll get on that.
4 changes: 2 additions & 2 deletions components/salsa-macros/src/database_storage.rs
Original file line number Diff line number Diff line change
Expand Up @@ -124,7 +124,7 @@ pub(crate) fn database(args: TokenStream, input: TokenStream) -> TokenStream {
#group_index => {
let storage: &#group_storage =
<Self as salsa::plumbing::HasQueryGroup<#group_path>>::group_storage(self);
storage.maybe_changed_since(self, input, revision)
storage.maybe_changed_after(self, input, revision)
}
});
cycle_recovery_strategy_ops.extend(quote! {
Expand Down Expand Up @@ -165,7 +165,7 @@ pub(crate) fn database(args: TokenStream, input: TokenStream) -> TokenStream {
}
}

fn maybe_changed_since(
fn maybe_changed_after(
&self,
input: salsa::DatabaseKeyIndex,
revision: salsa::Revision
Expand Down
4 changes: 2 additions & 2 deletions components/salsa-macros/src/query_group.rs
Original file line number Diff line number Diff line change
Expand Up @@ -532,7 +532,7 @@ pub(crate) fn query_group(args: TokenStream, input: TokenStream) -> TokenStream
for (Query { fn_name, .. }, query_index) in non_transparent_queries().zip(0_u16..) {
maybe_changed_ops.extend(quote! {
#query_index => {
salsa::plumbing::QueryStorageOps::maybe_changed_since(
salsa::plumbing::QueryStorageOps::maybe_changed_after(
&*self.#fn_name, db, input, revision
)
}
Expand Down Expand Up @@ -590,7 +590,7 @@ pub(crate) fn query_group(args: TokenStream, input: TokenStream) -> TokenStream
}
}

#trait_vis fn maybe_changed_since(
#trait_vis fn maybe_changed_after(
&self,
db: &(#dyn_db + '_),
input: salsa::DatabaseKeyIndex,
Expand Down
Loading

0 comments on commit f837d99

Please sign in to comment.