Skip to content

Commit

Permalink
Merge #285
Browse files Browse the repository at this point in the history
285: Cleanup and fix cycle handling r=nikomatsakis a=nikomatsakis

This PR is...kind of long. It reshapes the core logic of salsa to fix various bugs in cycle handling and generally simplify how we handle cross-thread coordination.

Best read commit by commit: every commit passes all tests, afaik.

The core bug I was taking aim at was the fact that, when you invoke `maybe_changed_since`, you can sometimes wind up detecting a cycle without having pushed some of the relevant queries onto the stack. This is now fixed.

From a user's POV, ~~nothing changes from this PR~~, there are only minimal changes to the public interface. The biggest one is that recover functions now get a `&salsa::Cycle` which has methods for accessing the participants; the other is that queries that are participating in cycle fallback will use unwinding to avoid executing past the point where the cycle is discovered. Otherwise, things work the same as before:

* If you encounter a cycle and all participant queries are marked with `#[salsa::recover]`, then they will take on the recovery value. (At the moment, they continue executing after the cycle is observed, but their final result is ignored; I plan to change this in a follow-up PR, or maybe some future commit to this PR.)
* If you encounter a cycle and some or all participants are NOT marked with `#[salsa::recover]`, then the code panics. This is treated like any other panic, cancelling all other work.

Along the way, I made... a few... other changes:

* Cross-thread handling is simplified. When we block on another thread, it no longer sends us a final result. Instead, it just gets re-awoken and then it retries the original request. This is helpful when you encounter cycles in `maybe_changed_since` vs `read`, but it's also more compatible with some of the directions I have in mind.
* Cycle detection is simplified and more centrally coordinated. Previously, when a cycle was detected, we would mark all the participants on the current thread, but then we would mark other threads 'lazilly'. Now, threads move ownership of their stack into the shared dep graph when they block, so that we can mark all the stack frames at once. This also means less cloning on blocking, so it should be mildly more efficient.
* The code is DRY-er, since `maybe_changed_since` has been re-implemented in terms of the same core building blocks as `read` (`probe` and friends). I originally tried to unify them, but I realized that they behave somewhat differently from one another and both of them make sense. (In particular, we want to be able to free values with the LRU cache while still checking if they are up to date.)

Ah, I realize now that I had planned to write a bunch of docs in the salsa book before I landed this. Well, I'm going to open the PR anyway, as I've let this branch go far too long.

r? `@matklad` 

Co-authored-by: Florian Diebold <[email protected]>
Co-authored-by: Aleksey Kladov <[email protected]>
Co-authored-by: Niko Matsakis <[email protected]>
  • Loading branch information
4 people authored Jan 21, 2022
2 parents 2ae813e + fc020de commit a2e5cd8
Show file tree
Hide file tree
Showing 57 changed files with 2,907 additions and 1,220 deletions.
2 changes: 2 additions & 0 deletions Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -25,5 +25,7 @@ env_logger = "0.7"
linked-hash-map = "0.5.2"
rand = "0.7"
rand_distr = "0.2.1"
test-env-log = "0.2.7"
insta = "1.8.0"

[workspace]
48 changes: 43 additions & 5 deletions book/src/SUMMARY.md
Original file line number Diff line number Diff line change
@@ -1,16 +1,48 @@
# Summary

- [About salsa](./about_salsa.md)

# How to use Salsa

- [How to use Salsa](./how_to_use.md)
- [How Salsa works](./how_salsa_works.md)
- [Common patterns](./common_patterns.md)
- [Selection](./common_patterns/selection.md)
- [On-demand (Lazy) inputs](./common_patterns/on_demand_inputs.md)
- [Cycle handling](./cycles.md)
- [Recovering via fallback](./cycles/fallback.md)

# How Salsa works internally

- [How Salsa works](./how_salsa_works.md)
- [Videos](./videos.md)
- [Plumbing](./plumbing.md)
- [Diagram](./plumbing/diagram.md)
- [Query groups](./plumbing/query_groups.md)
- [Database](./plumbing/database.md)
- [Generated code](./plumbing/generated_code.md)
- [Diagram](./plumbing/diagram.md)
- [Query groups](./plumbing/query_groups.md)
- [Database](./plumbing/database.md)
- [The `salsa` crate](./plumbing/salsa_crate.md)
- [Query operations](./plumbing/query_ops.md)
- [maybe changed after](./plumbing/maybe_changed_after.md)
- [Fetch](./plumbing/fetch.md)
- [Derived queries flowchart](./plumbing/derived_flowchart.md)
- [Cycle handling](./plumbing/cycles.md)
- [Terminology](./plumbing/terminology.md)
- [Backdate](./plumbing/terminology/backdate.md)
- [Changed at](./plumbing/terminology/changed_at.md)
- [Dependency](./plumbing/terminology/dependency.md)
- [Derived query](./plumbing/terminology/derived_query.md)
- [Durability](./plumbing/terminology/durability.md)
- [Input query](./plumbing/terminology/input_query.md)
- [LRU](./plumbing/terminology/LRU.md)
- [Memo](./plumbing/terminology/memo.md)
- [Query](./plumbing/terminology/query.md)
- [Query function](./plumbing/terminology/query_function.md)
- [Revision](./plumbing/terminology/revision.md)
- [Untracked dependency](./plumbing/terminology/untracked.md)
- [Verified](./plumbing/terminology/verified.md)

# Salsa RFCs

- [RFCs](./rfcs.md)
- [Template](./rfcs/template.md)
- [RFC 0001: Query group traits](./rfcs/RFC0001-Query-Group-Traits.md)
Expand All @@ -20,4 +52,10 @@
- [RFC 0005: Durability](./rfcs/RFC0005-Durability.md)
- [RFC 0006: Dynamic database](./rfcs/RFC0006-Dynamic-Databases.md)
- [RFC 0007: Opinionated cancelation](./rfcs/RFC0007-Opinionated-Cancelation.md)
- [RFC 0008: Remove garbage collection](./rfcs/RFC0008-Remove-Garbage-Collection.md)
- [RFC 0008: Remove garbage collection](./rfcs/RFC0008-Remove-Garbage-Collection.md)
- [RFC 0009: Cycle recovery](./rfcs/RFC0009-Cycle-recovery.md)

# Appendices

- [Meta: about the book itself](./meta.md)

3 changes: 3 additions & 0 deletions book/src/cycles.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
# Cycle handling

By default, when Salsa detects a cycle in the computation graph, Salsa will panic with a [`salsa::Cycle`] as the panic value. The [`salsa::Cycle`] structure that describes the cycle, which can be useful for diagnosing what went wrong.
22 changes: 22 additions & 0 deletions book/src/cycles/fallback.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
# Recovering via fallback

Panicking when a cycle occurs is ok for situations where you believe a cycle is impossible. But sometimes cycles can result from illegal user input and cannot be statically prevented. In these cases, you might prefer to gracefully recover from a cycle rather than panicking the entire query. Salsa supports that with the idea of *cycle recovery*.

To use cycle recovery, you annotate potential participants in the cycle with a `#[salsa::recover(my_recover_fn)]` attribute. When a cycle occurs, if any participant P has recovery information, then no panic occurs. Instead, the execution of P is aborted and P will execute the recovery function to generate its result. Participants in the cycle that do not have recovery information continue executing as normal, using this recovery result.

The recovery function has a similar signature to a query function. It is given a reference to your database along with a `salsa::Cycle` describing the cycle that occurred; it returns the result of the query. Example:

```rust
fn my_recover_fn(
db: &dyn MyDatabase,
cycle: &salsa::Cycle,
) -> MyResultValue
```

The `db` and `cycle` argument can be used to prepare a useful error message for your users.

**Important:** Although the recovery function is given a `db` handle, you should be careful to avoid creating a cycle from within recovery or invoking queries that may be participating in the current cycle. Attempting to do so can result in inconsistent results.

## Figuring out why recovery did not work

If a cycle occurs and *some* of the participant queries have `#[salsa::recover]` annotations and others do not, then the query will be treated as irrecoverable and will simply panic. You can use the `Cycle::unexpected_participants` method to figure out why recovery did not succeed and add the appropriate `#[salsa::recover]` annotations.
4 changes: 4 additions & 0 deletions book/src/derived-query-read.drawio.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
17 changes: 17 additions & 0 deletions book/src/meta.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
# Meta: about the book itself

## Linking policy

We try to avoid links that easily become fragile.

**Do:**

* Link to `docs.rs` types to document the public API, but modify the link to use `latest` as the version.
* Link to modules in the source code.
* Create ["named anchors"] and embed source code directly.

["named anchors"]: https://rust-lang.github.io/mdBook/format/mdbook.html?highlight=ANCHOR#including-portions-of-a-file

**Don't:**

* Link to direct lines on github, even within a specific commit, unless you are trying to reference a historical piece of code ("how things were at the time").
11 changes: 0 additions & 11 deletions book/src/plumbing.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,17 +3,6 @@
This chapter documents the code that salsa generates and its "inner workings".
We refer to this as the "plumbing".

This page walks through the ["Hello, World!"] example and explains the code that
it generates. Please take it with a grain of salt: while we make an effort to
keep this documentation up to date, this sort of thing can fall out of date
easily. See the page history below for major updates.

["Hello, World!"]: https://github.com/salsa-rs/salsa/blob/master/examples/hello_world/main.rs

If you'd like to see for yourself, you can set the environment variable
`SALSA_DUMP` to 1 while the procedural macro runs, and it will dump the full
output to stdout. I recommend piping the output through rustfmt.

## History

* 2020-07-05: Updated to take [RFC 6](rfcs/RFC0006-Dynamic-Databases.md) into account.
Expand Down
65 changes: 65 additions & 0 deletions book/src/plumbing/cycles.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,65 @@
# Cycles

## Cross-thread blocking

The interface for blocking across threads now works as follows:

* When one thread `T1` wishes to block on a query `Q` being executed by another thread `T2`, it invokes `Runtime::try_block_on`. This will check for cycles. Assuming no cycle is detected, it will block `T1` until `T2` has completed with `Q`. At that point, `T1` reawakens. However, we don't know the result of executing `Q`, so `T1` now has to "retry". Typically, this will result in successfully reading the cached value.
* While `T1` is blocking, the runtime moves its query stack (a `Vec`) into the shared dependency graph data structure. When `T1` reawakens, it recovers ownership of its query stack before returning from `try_block_on`.

## Cycle detection

When a thread `T1` attempts to execute a query `Q`, it will try to load the value for `Q` from the memoization tables. If it finds an `InProgress` marker, that indicates that `Q` is currently being computed. This indicates a potential cycle. `T1` will then try to block on the query `Q`:

* If `Q` is also being computed by `T1`, then there is a cycle.
* Otherwise, if `Q` is being computed by some other thread `T2`, we have to check whether `T2` is (transitively) blocked on `T1`. If so, there is a cycle.

These two cases are handled internally by the `Runtime::try_block_on` function. Detecting the intra-thread cycle case is easy; to detect cross-thread cycles, the runtime maintains a dependency DAG between threads (identified by `RuntimeId`). Before adding an edge `T1 -> T2` (i.e., `T1` is blocked waiting for `T2`) into the DAG, it checks whether a path exists from `T2` to `T1`. If so, we have a cycle and the edge cannot be added (then the DAG would not longer be acyclic).

When a cycle is detected, the current thread `T1` has full access to the query stacks that are participating in the cycle. Consider: naturally, `T1` has access to its own stack. There is also a path `T2 -> ... -> Tn -> T1` of blocked threads. Each of the blocked threads `T2 ..= Tn` will have moved their query stacks into the dependency graph, so those query stacks are available for inspection.

Using the available stacks, we can create a list of cycle participants `Q0 ... Qn` and store that into a `Cycle` struct. If none of the participants `Q0 ... Qn` have cycle recovery enabled, we panic with the `Cycle` struct, which will trigger all the queries on this thread to panic.

## Cycle recovery via fallback

If any of the cycle participants `Q0 ... Qn` has cycle recovery set, we recover from the cycle. To help explain how this works, we will use this example cycle which contains three threads. Beginning with the current query, the cycle participants are `QA3`, `QB2`, `QB3`, `QC2`, `QC3`, and `QA2`.

```
The cyclic
edge we have
failed to add.
:
A : B C
:
QA1 v QB1 QC1
┌► QA2 ┌──► QB2 ┌─► QC2
│ QA3 ───┘ QB3 ──┘ QC3 ───┐
│ │
└───────────────────────────────┘
```

Recovery works in phases:

* **Analyze:** As we enumerate the query participants, we collect their collective inputs (all queries invoked so far by any cycle participant) and the max changed-at and min duration. We then remove the cycle participants themselves from this list of inputs, leaving only the queries external to the cycle.
* **Mark**: For each query Q that is annotated with `#[salsa::recover]`, we mark it and all of its successors on the same thread by setting its `cycle` flag to the `c: Cycle` we constructed earlier; we also reset its inputs to the collective inputs gathering during analysis. If those queries resume execution later, those marks will trigger them to immediately unwind and use cycle recovery, and the inputs will be used as the inputs to the recovery value.
* Note that we mark *all* the successors of Q on the same thread, whether or not they have recovery set. We'll discuss later how this is important in the case where the active thread (A, here) doesn't have any recovery set.
* **Unblock**: Each blocked thread T that has a recovering query is forcibly reawoken; the outgoing edge from that thread to its successor in the cycle is removed. Its condvar is signalled with a `WaitResult::Cycle(c)`. When the thread reawakens, it will see that and start unwinding with the cycle `c`.
* **Handle the current thread:** Finally, we have to choose how to have the current thread proceed. If the current thread includes any cycle with recovery information, then we can begin unwinding. Otherwise, the current thread simply continues as if there had been no cycle, and so the cyclic edge is added to the graph and the current thread blocks. This is possible because some other thread had recovery information and therefore has been awoken.

Let's walk through the process with a few examples.

### Example 1: Recovery on the detecting thread

Consider the case where only the query QA2 has recovery set. It and QA3 will be marked with their `cycle` flag set to `c: Cycle`. Threads B and C will not be unblocked, as they do not have any cycle recovery nodes. The current thread (Thread A) will initiate unwinding with the cycle `c` as the value. Unwinding will pass through QA3 and be caught by QA2. QA2 will substitute the recovery value and return normally. QA1 and QC3 will then complete normally and so forth, on up until all queries have completed.

### Example 2: Recovery in two queries on the detecting thread

Consider the case where both query QA2 and QA3 have recovery set. It proceeds the same Example 1 until the the current initiates unwinding, as described in Example 1. When QA3 receives the cycle, it stores its recovery value and completes normally. QA2 then adds QA3 as an input dependency: at that point, QA2 observes that it too has the cycle mark set, and so it initiates unwinding. The rest of QA2 therefore never executes. This unwinding is caught by QA2's entry point and it stores the recovery value and returns normally. QA1 and QC3 then continue normally, as they have not had their `cycle` flag set.

### Example 3: Recovery on another thread

Now consider the case where only the query QB2 has recovery set. It and QB3 will be marked with the cycle `c: Cycle` and thread B will be unblocked; the edge `QB3 -> QC2` will be removed from the dependency graph. Thread A will then add an edge `QA3 -> QB2` and block on thread B. At that point, thread A releases the lock on the dependency graph, and so thread B is re-awoken. It observes the `WaitResult::Cycle` and initiates unwinding. Unwinding proceeds through QB3 and into QB2, which recovers. QB1 is then able to execute normally, as is QA3, and execution proceeds from there.

### Example 4: Recovery on all queries

Now consider the case where all the queries have recovery set. In that case, they are all marked with the cycle, and all the cross-thread edges are removed from the graph. Each thread will independently awaken and initiate unwinding. Each query will recover.
14 changes: 14 additions & 0 deletions book/src/plumbing/derived_flowchart.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
# Derived queries flowchart

Derived queries are by far the most complex. This flowchart documents the flow of the [maybe changed after] and [fetch] operations. This flowchart can be edited on [draw.io]:

[draw.io]: https://draw.io
[fetch]: ./fetch.md
[maybe changed after]: ./maybe_changed_after.md

<!-- The explicit div is there because, otherwise, the flowchart is unreadable when using "dark mode" -->
<div style="background-color:white;">

![Flowchart](../derived-query-read.drawio.svg)

</div>
16 changes: 4 additions & 12 deletions book/src/plumbing/diagram.md
Original file line number Diff line number Diff line change
@@ -1,21 +1,13 @@
# Diagram

Based on the hello world example:

```rust,ignore
{{#include ../../../examples/hello_world/main.rs:trait}}
```

```rust,ignore
{{#include ../../../examples/hello_world/main.rs:database}}
```
This diagram shows the items that get generated from the Hello World query group and database struct. You can click on each item to be taken to the explanation of its purpose. The diagram is wide so be sure to scroll over!

```mermaid
graph LR
classDef diagramNode text-align:left;
subgraph query group
HelloWorldTrait["trait HelloWorld: Database + HasQueryGroup(HelloWorldStroage)"]
HelloWorldImpl["impl(DB) HelloWorld for DB<br>where DB: HasQueryGroup(HelloWorldStorage)"]
HelloWorldImpl["impl&lt;DB&gt; HelloWorld for DB<br>where DB: HasQueryGroup(HelloWorldStorage)"]
click HelloWorldImpl "http:query_groups.html#impl-of-the-hello-world-trait" "more info"
HelloWorldStorage["struct HelloWorldStorage"]
click HelloWorldStorage "http:query_groups.html#the-group-struct-and-querygroup-trait" "more info"
Expand Down Expand Up @@ -53,6 +45,6 @@ graph LR
class DerivedStorage diagramNode;
end
LengthQueryImpl --> DerivedStorage;
DatabaseStruct --> HelloWorldImpl
HasQueryGroup --> HelloWorldImpl
DatabaseStruct -- "used by" --> HelloWorldImpl
HasQueryGroup -- "used by" --> HelloWorldImpl
```
42 changes: 42 additions & 0 deletions book/src/plumbing/fetch.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
# Fetch

```rust,no_run,noplayground
{{#include ../../../src/plumbing.rs:fetch}}
```

The `fetch` operation computes the value of a query. It prefers to reuse memoized values when it can.

## Input queries

Input queries simply load the result from the table.

## Interned queries

Interned queries map the input into a hashmap to find an existing integer. If none is present, a new value is created.

## Derived queries

The logic for derived queries is more complex. We summarize the high-level ideas here, but you may find the [flowchart](./derived_flowchart.md) useful to dig deeper. The [terminology](./terminology.md) section may also be useful; in some cases, we link to that section on the first usage of a word.

* If an existing [memo] is found, then we check if the memo was [verified] in the current [revision]. If so, we can directly return the memoized value.
* Otherwise, if the memo contains a memoized value, we must check whether [dependencies] have been modified:
* Let R be the revision in which the memo was last verified; we wish to know if any of the dependencies have changed since revision R.
* First, we check the [durability]. For each memo, we track the minimum durability of the memo's dependencies. If the memo has durability D, and there have been no changes to an input with durability D since the last time the memo was verified, then we can consider the memo verified without any further work.
* If the durability check is not sufficient, then we must check the dependencies individually. For this, we iterate over each dependency D and invoke the [maybe changed after](./maybe_changed_after.md) operation to check whether D has changed since the revision R.
* If no dependency was modified:
* We can mark the memo as verified and return its memoized value.
* Assuming dependencies have been modified or the memo does not contain a memoized value:
* Then we execute the user's query function.
* Next, we compute the revision in which the memoized value last changed:
* *Backdate:* If there was a previous memoized value, and the new value is equal to that old value, then we can *backdate* the memo, which means to use the 'changed at' revision from before.
* Thanks to backdating, it is possible for a dependency of the query to have changed in some revision R1 but for the *output* of the query to have changed in some revision R2 where R2 predates R1.
* Otherwise, we use the current revision.
* Construct a memo for the new value and return it.

[durability]: ./terminology/durability.md
[backdate]: ./terminology/backdate.md
[dependency]: ./terminology/dependency.md
[dependencies]: ./terminology/dependency.md
[memo]: ./terminology/memo.md
[revision]: ./terminology/revision.md
[verified]: ./terminology/verified.md
28 changes: 28 additions & 0 deletions book/src/plumbing/generated_code.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
# Generated code

This page walks through the ["Hello, World!"] example and explains the code that
it generates. Please take it with a grain of salt: while we make an effort to
keep this documentation up to date, this sort of thing can fall out of date
easily. See the page history below for major updates.

["Hello, World!"]: https://github.com/salsa-rs/salsa/blob/master/examples/hello_world/main.rs

If you'd like to see for yourself, you can set the environment variable
`SALSA_DUMP` to 1 while the procedural macro runs, and it will dump the full
output to stdout. I recommend piping the output through rustfmt.

## Sources

The main parts of the source that we are focused on are as follows.

### Query group

```rust,ignore
{{#include ../../../examples/hello_world/main.rs:trait}}
```

### Database

```rust,ignore
{{#include ../../../examples/hello_world/main.rs:database}}
```
Loading

0 comments on commit a2e5cd8

Please sign in to comment.