s/`maybe_changed_since/maybe_changed_after/

salsa-rs · Nov 13, 2021 · f837d99 · f837d99
1 parent cd265bf
commit f837d99
Show file tree

Hide file tree

Showing 17 changed files with 411 additions and 51 deletions.
diff --git a/book/src/SUMMARY.md b/book/src/SUMMARY.md
@@ -22,7 +22,7 @@
     - [Database](./plumbing/database.md)
   - [The `salsa` crate](./plumbing/salsa_crate.md)
     - [Query operations](./plumbing/query_ops.md)
-      - [Maybe changed since](./plumbing/maybe_changed_since.md)
+      - [maybe changed after](./plumbing/maybe_changed_after.md)
       - [Fetch](./plumbing/fetch.md)
     - [Derived queries flowchart](./plumbing/derived_flowchart.md)
     - [Cycle handling](./plumbing/cycles.md)

diff --git a/book/src/plumbing/derived_flowchart.md b/book/src/plumbing/derived_flowchart.md
@@ -1,10 +1,10 @@
 # Derived queries flowchart
 
-Derived queries are by far the most complex. This flowchart documents the flow of the [maybe changed since] and [fetch] operations. This flowchart can be edited on [draw.io]:
+Derived queries are by far the most complex. This flowchart documents the flow of the [maybe changed after] and [fetch] operations. This flowchart can be edited on [draw.io]:
 
 [draw.io]: https://draw.io
 [fetch]: ./fetch.md
-[maybe changed since]: ./maybe_changed_since.md
+[maybe changed after]: ./maybe_changed_after.md
 
 <!-- The explicit div is there because, otherwise, the flowchart is unreadable when using "dark mode" -->
 <div style="background-color:white;">

diff --git a/book/src/plumbing/fetch.md b/book/src/plumbing/fetch.md
@@ -22,7 +22,7 @@ The logic for derived queries is more complex. We summarize the high-level ideas
 * Otherwise, if the memo contains a memoized value, we must check whether [dependencies] have been modified:
     * Let R be the revision in which the memo was last verified; we wish to know if any of the dependencies have changed since revision R.
     * First, we check the [durability]. For each memo, we track the minimum durability of the memo's dependencies. If the memo has durability D, and there have been no changes to an input with durability D since the last time the memo was verified, then we can consider the memo verified without any further work.
-    * If the durability check is not sufficient, then we must check the dependencies individually. For this, we iterate over each dependency D and invoke the [maybe changed since](./maybe_changed_since.md) operation to check whether D has changed since the revision R.
+    * If the durability check is not sufficient, then we must check the dependencies individually. For this, we iterate over each dependency D and invoke the [maybe changed after](./maybe_changed_after.md) operation to check whether D has changed since the revision R.
     * If no dependency was modified:
         * We can mark the memo as verified and return its memoized value.
 * Assuming dependencies have been modified or the memo does not contain a memoized value:

diff --git a/book/src/plumbing/maybe_changed_since.md → book/src/plumbing/maybe_changed_after.md b/book/src/plumbing/maybe_changed_since.md → book/src/plumbing/maybe_changed_after.md
@@ -1,14 +1,14 @@
-# Maybe changed since
+# Maybe changed after
 
 ```rust,no_run,noplayground
-{{#include ../../../src/plumbing.rs:maybe_changed_since}}
+{{#include ../../../src/plumbing.rs:maybe_changed_after}}
 ```
 
-The `maybe_changed_since` operation computes whether a query's value *may have changed* since the given revision.
+The `maybe_changed_after` operation computes whether a query's value *may have changed* **after** the given revision. In other words, `Q.maybe_change_since(R)` is true if the value of the query `Q` may have changed in the revisions `(R+1)..R_now`, where `R_now` is the current revision. Note that it doesn't make sense to ask `maybe_changed_after(R_now)`.
 
 ## Input queries
 
-Input queries are set explicitly by the user. `maybe_changed_since` can therefore just check when the value was last set and compare.
+Input queries are set explicitly by the user. `maybe_changed_after` can therefore just check when the value was last set and compare.
 
 ## Interned queries
 
@@ -20,7 +20,7 @@ The logic for derived queries is more complex. We summarize the high-level ideas
 * Otherwise, we must check whether [dependencies] have been modified:
     * Let R be the revision in which the memo was last verified; we wish to know if any of the dependencies have changed since revision R.
     * First, we check the [durability]. For each memo, we track the minimum durability of the memo's dependencies. If the memo has durability D, and there have been no changes to an input with durability D since the last time the memo was verified, then we can consider the memo verified without any further work.
-    * If the durability check is not sufficient, then we must check the dependencies individually. For this, we iterate over each dependency D and invoke the [maybe changed since](./maybe_changed_since.md) operation to check whether D has changed since the revision R.
+    * If the durability check is not sufficient, then we must check the dependencies individually. For this, we iterate over each dependency D and invoke the [maybe changed after](./maybe_changed_after.md) operation to check whether D has changed since the revision R.
     * If no dependency was modified:
         * We can mark the memo as verified and use its [changed at] revision to return true or false.
 * Assuming dependencies have been modified:

diff --git a/book/src/plumbing/query_ops.md b/book/src/plumbing/query_ops.md
@@ -8,7 +8,7 @@ Each of the query storage struct implements the `QueryStorageOps` trait found in
 
  which defines the basic operations that all queries support. The most important are these two:
 
-* [Maybe changed since](./maybe_changed_since.md): Returns true if the value of the query (for the given key) may have changed since the given revision.
+* [maybe changed after](./maybe_changed_after.md): Returns true if the value of the query (for the given key) may have changed since the given revision.
 * [Fetch](./fetch.md): Returms the up-to-date value for the given K (or an error in the case of an "unrecovered" cycle).
 
 [`plumbing`]: https://github.com/salsa-rs/salsa/blob/master/src/plumbing.rs
diff --git a/book/src/plumbing/terminology/verified.md b/book/src/plumbing/terminology/verified.md
@@ -1,8 +1,8 @@
 # Verified
 
-A [memo] is *verified* in a revision R if we have checked that its value is still up-to-date (i.e., if we were to reexecute the [query function], we are guaranteed to get the same result). Each memo tracks the revision in which it was last verified to avoid repeatedly checking whether dependencies have changed during the [fetch] and [maybe changed since] operations.
+A [memo] is *verified* in a revision R if we have checked that its value is still up-to-date (i.e., if we were to reexecute the [query function], we are guaranteed to get the same result). Each memo tracks the revision in which it was last verified to avoid repeatedly checking whether dependencies have changed during the [fetch] and [maybe changed after] operations.
 
 [query function]: ./query_function.md
 [fetch]: ../fetch.md
-[maybe changed since]: ../maybe_changed_since.md
+[maybe changed after]: ../maybe_changed_after.md
 [memo]: ./memo.md
diff --git a/book/src/rfcs/RFC0001-Query-Group-Traits.md b/book/src/rfcs/RFC0001-Query-Group-Traits.md
@@ -268,9 +268,9 @@ impl of `QueryGroup`.  That involves generating the following things:
     but since we cannot, it is entitled `MyGroupGroupKey`
   - It is an enum which contains one variant per query with the value being the key:
     - `my_query(<MyQueryQuery as salsa::plumbing::Query<DB>>::Key)`
-  - The group key enum offers a public, inherent method `maybe_changed_since`:
-    - `fn maybe_changed_since<DB>(db: &DB, db_descriptor: &DB::DatabaseKey, revision: Revision)`
-    - it is invoked when implementing `maybe_changed_since` for the database key
+  - The group key enum offers a public, inherent method `maybe_changed_after`:
+    - `fn maybe_changed_after<DB>(db: &DB, db_descriptor: &DB::DatabaseKey, revision: Revision)`
+    - it is invoked when implementing `maybe_changed_after` for the database key
 
 ### Lowering database storage
 
@@ -298,7 +298,7 @@ It generates the following things:
   - This contains a `for_each_query` method, which is implemented by invoking, in turn,
     the inherent methods defined on each query group storage struct.
 - impl of `plumbing::DatabaseKey` for the database key enum
-  - This contains a method `maybe_changed_since`. We implement this by
+  - This contains a method `maybe_changed_after`. We implement this by
     matching to get a particular group key, and then invoking the
     inherent method on the group key struct.
 

diff --git a/book/src/rfcs/RFC0006-Dynamic-Databases.md b/book/src/rfcs/RFC0006-Dynamic-Databases.md
@@ -343,7 +343,7 @@ some type for dependencies that is independent of the dtabase type `DB`.
 There are a number of methods that can be dispatched through the database
 interface on a `DatabaseKeyIndex`. For example, we already mentioned
 `fmt_debug`, which emits a debug representation of the key, but there is also
-`maybe_changed_since`, which checks whether the value for a given key may have
+`maybe_changed_after`, which checks whether the value for a given key may have
 changed since the given revision. Each of these methods is a member of the
 `DatabaseOps` trait and they are dispatched as follows.
 

diff --git a/book/src/rfcs/RFC0010-Slot-no-more.md b/book/src/rfcs/RFC0010-Slot-no-more.md
@@ -0,0 +1,175 @@
+# Parallel friendly caching
+
+## Metadata
+
+* Author: nikomatsakis
+* Date: 2021-05-29
+* Introduced in: (please update once you open your PR)
+
+## Summary
+
+* Rework query storage to be based on concurrent hashmaps instead of slots with read-write locked state.
+
+## Motivation
+
+Two-fold:
+
+* Simpler, cleaner, and hopefully faster algorithm.
+* Enables some future developments that are not part of this RFC:
+    * Derived queries whose keys are known to be integers.
+    * Fixed point cycles so that salsa and chalk can be deeply integrated.
+    * Non-synchronized queries that potentially execute on many threads in parallel (required for fixed point cycles, but potentially valuable in their own right).
+
+## User's guide
+
+No user visible changes.
+
+## Reference guide
+
+### Background: Current structure
+
+Before this RFC, the **overall structure** of derived queries is as follows:
+
+* Each derived query has a `DerivedStorage<Q>` (stored in the database) that contains:
+    * the `slot_map`, a monotonically growing, indexable map from keys (`Q::Key`) to the `Slot<Q>` for the given key
+    * lru list
+* Each `Slot<Q>` has
+    * r-w locked query-state that can be:
+        * not-computed
+        * in-progress with synchronization storage:
+            * `id` of the runtime computing the value
+            * `anyone_waiting`: `AtomicBool` set to true if other threads are awaiting result
+        * a `Memo<Q>`
+* A `Memo<Q>` has
+    * an optional value `Option<Q::Value>`
+    * dependency information:
+        * verified-at
+        * changed-at
+        * durability
+        * input set (typically a `Arc<[DatabaseKeyIndex]>`)
+
+[maybe changed after]: ../plumbing/maybe_changed_after.md
+[fetch]: ../plumbing/fetch.md
+
+**Fetching the value for a query** currently works as follows:
+
+* Acquire the read lock on the (indexable) `slot_map` and hash key to find the slot.
+    * If no slot exists, acquire write lock and insert.
+* Acquire the slot's internal lock to perform the [fetch] operation.
+
+**Verifying a dependency** uses a scheme introduced in [RFC #6](./RFC0006-Dynamic-Databases.md). Each dependency is represented as a `DatabaseKeyIndex` which contains three indices (group, query, and key). The group and query indices are used to find the query storage via `match` statements and then the next operation depends on the query type:
+
+* Acquire the read lock on the (indexable) `slot_map` and use key index to load the slot. Read lock is released afterwards.
+* Acquire the slot's internal lock to perform the [maybe changed after] operation.
+
+### New structure (introduced by this RFC)
+
+The **overall structure** of derived queries after this RFC is as follows:
+
+* Each derived query has a `DerivedStorage<Q>` (stored in the database) that contains:
+    * a set of concurrent hashmaps:
+        * `key_map`: maps from `Q::Key` to an internal key index `K`
+        * `memo_map`: maps from `K` to cached memo `ArcSwap<Memo<Q>>`
+        * `sync_map`: maps from `K` to a `Sync<Q>` synchronization value
+    * lru set
+* A `Memo<Q>` has
+    * an *immutable* optional value `Option<Q::Value>`
+    * dependency information:
+        * *updatable* verified-at (`AtomicCell<Option<Revision>>`)
+        * *immutable* changed-at (`Revision`)
+        * *immutable* durability (`Durability`)
+        * *immutable* input set (typically a `Arc<[DatabaseKeyIndex]>`)
+    * information for LRU:
+        * `DatabaseKeyIndex`
+        * `lru_index`, an `AtomicUsize`
+* A `Sync<Q>` has
+    * `id` of the runtime computing the value
+    * `anyone_waiting`: `AtomicBool` set to true if other threads are awaiting result
+
+**Fetching the value for a *derived* query** will work as follows:
+
+1. Find internal index `K` by hashing key, as today.
+    * Precise operation for this will depend on the concurrent hashmap implementation.
+2. Load memo `M: Arc<Memo<Q>>` from `memo_map[K]` (if present):
+    * If verified is `None`, then another thread has found this memo to be invalid; ignore it.
+    * Else, let `Rv` be the "last verified revision".
+    * If `Rv` is the current revision, or last change to an input with durability `M.durability` was before `Rv`:
+        * Update "last verified revision" and **return** memoized value.
+3. Atomically check `sync_map` for an existing `Sync<Q>`:
+    * If one exists, block on the thread within and return to step 2 after it completes:
+        * If this results in a cycle, unwind as today.
+    * If none exists, insert a new entry with current runtime-id.
+4. Check dependencies deeply
+    * Iterate over each dependency `D` and check `db.maybe_changed_after(D, Rv)`.
+        * If no dependency has changed, update `verified_at` to current revision and **return** memoized value.
+    * Mark memo as invalid by storing `None` in the verified-at.
+5. Construct the new memo:
+    * Push query onto the local stack and execute the query function:
+        * If this query is found to be a cycle participant, execute recovery function.
+    * Backdate result if it is equal to the old memo's value.
+    * Allocate new memo.
+6. Store results:
+    * Store new memo into `memo_map[K]`.
+    * Remove query from the `sync_map`.
+7. **Return** newly constructed value._
+
+**Verifying a dependency for a *derived* query** will work as follows:
+
+1. Find internal index `K` by hashing key, as today.
+    * Precise operation for this will depend on the concurrent hashmap implementation.
+2. Load memo `M: Arc<Memo<Q>>` from `memo_map[K]` (if present):
+    * If verified is `None`, then another thread has found this memo to be invalid; ignore it.
+    * Else, let `Rv` be the "last verified revision".
+    * If `Rv` is the current revision, **return** true or false depending on whether changed-at from memo.
+    * If last change to an input with durability `M.durability` was before `Rv`: 
+        * Update `verified_at` to current revision and **return** memoized value.
+    * Iterate over each dependency `D` and check `db.maybe_changed_after(D, Rv)`.
+        * If no dependency has changed, update `verified_at` to current revision and **return** memoized value.
+    * Mark memo as invalid by storing `None` in the verified-at.
+3. Atomically check `sync_map` for an existing `Sync<Q>`:
+    * If one exists, block on the thread within and return to step 2 after it completes:
+        * If this results in a cycle, unwind as today.
+    * If none exists, insert a new entry with current runtime-id.
+4. Construct the new memo:
+    * Push query onto the local stack and execute the query function:
+        * If this query is found to be a cycle participant, execute recovery function.
+    * Backdate result if it is equal to the old memo's value.
+    * Allocate new memo.
+5. Store results:
+    * Store new memo into `memo_map[K]`.
+    * Remove query from the `sync_map`.
+6. **Return** true or false depending on whether memo was backdated.
+
+## Frequently asked questions
+
+### Why use `ArcSwap`?
+
+It's a relatively minor implementation detail, but the code in this PR uses `ArcSwap` to store the values in the memo-map. In the case of a cache hit or other transient operations, this allows us to read from the arc while avoiding a full increment of the ref count. It adds a small bit of complexity because we have to be careful to do a full load before any recursive operations, since arc-swap only gives a fixed number of "guards" per thread before falling back to more expensive loads.
+
+### Do we really need `maybe_changed_after` *and* `fetch`?
+
+Yes, we do. "maybe changed after" is very similar to "fetch", but it doesn't require that we have a memoized value. This is important for LRU.
+
+### The LRU map in the code is just a big lock!
+
+That's not a question. But it's true, I simplified the LRU code to just use a mutex. My assumption is that there are relatively few LRU-ified queries, and their values are relatively expensive to compute, so this is ok. If we find it's a bottleneck, though, I believe we could improve it by using a similar "zone scheme" to what we use now. We would add a `lru_index` to the `Memo` so that we can easily check if the memo is in the "green zone" when reading (if so, no updates are needed). The complexity there is that when we produce a replacement memo, we have to install it and swap the index. Thinking about that made my brain hurt a little so I decided to just take the simple option for now.
+
+### How do the synchronized / atomic operations compare after this RFC?
+
+After this RFC, to perform a read, in the best case:
+
+* We do one "dashmap get" to map key to key index.
+* We do another "dashmap get" from key index to memo.
+* We do an "arcswap load" to get the memo.
+* We do an "atomiccell read" to load the current revision or durability information.
+
+dashmap is implemented with a striped set of read-write locks, so this is roughly the same (two read locks) as before this RFC. However:
+
+* We no longer do any atomic ref count increments.
+* It is theoretically possible to replace dashmap with something that doesn't use locks.
+* The first dashmap get should be removable, if we know that the key is a 32 bit integer.
+    * I plan to propose this in a future RFC.
+
+### Yeah yeah, show me some benchmarks!
+
+I didn't run any. I'll get on that.
diff --git a/components/salsa-macros/src/database_storage.rs b/components/salsa-macros/src/database_storage.rs
@@ -124,7 +124,7 @@ pub(crate) fn database(args: TokenStream, input: TokenStream) -> TokenStream {
             #group_index => {
                 let storage: &#group_storage =
                     <Self as salsa::plumbing::HasQueryGroup<#group_path>>::group_storage(self);
-                storage.maybe_changed_since(self, input, revision)
+                storage.maybe_changed_after(self, input, revision)
             }
         });
         cycle_recovery_strategy_ops.extend(quote! {
@@ -165,7 +165,7 @@ pub(crate) fn database(args: TokenStream, input: TokenStream) -> TokenStream {
                 }
             }
 
-            fn maybe_changed_since(
+            fn maybe_changed_after(
                 &self,
                 input: salsa::DatabaseKeyIndex,
                 revision: salsa::Revision

diff --git a/components/salsa-macros/src/query_group.rs b/components/salsa-macros/src/query_group.rs
@@ -532,7 +532,7 @@ pub(crate) fn query_group(args: TokenStream, input: TokenStream) -> TokenStream
     for (Query { fn_name, .. }, query_index) in non_transparent_queries().zip(0_u16..) {
         maybe_changed_ops.extend(quote! {
             #query_index => {
-                salsa::plumbing::QueryStorageOps::maybe_changed_since(
+                salsa::plumbing::QueryStorageOps::maybe_changed_after(
                     &*self.#fn_name, db, input, revision
                 )
             }
@@ -590,7 +590,7 @@ pub(crate) fn query_group(args: TokenStream, input: TokenStream) -> TokenStream
                 }
             }
 
-            #trait_vis fn maybe_changed_since(
+            #trait_vis fn maybe_changed_after(
                 &self,
                 db: &(#dyn_db + '_),
                 input: salsa::DatabaseKeyIndex,