Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Termination algorithm for impl selection #2687

Merged
merged 11 commits into from
Apr 10, 2023
337 changes: 337 additions & 0 deletions proposals/p2687.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,337 @@
# Termination algorithm for impl selection

<!--
Part of the Carbon Language project, under the Apache License v2.0 with LLVM
Exceptions. See /LICENSE for license information.
SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
-->

[Pull request](https://github.com/carbon-language/carbon-lang/pull/2687)

<!-- toc -->

## Table of contents

- [Abstract](#abstract)
- [Problem](#problem)
- [Background](#background)
- [Proposal](#proposal)
- [Details](#details)
- [Non-type arguments](#non-type-arguments)
- [Proof of termination](#proof-of-termination)
- [Rationale](#rationale)
- [Alternatives considered](#alternatives-considered)
- [Measure complexity using type tree depth](#measure-complexity-using-type-tree-depth)
- [Consider each type parameter in an `impl` declaration separately](#consider-each-type-parameter-in-an-impl-declaration-separately)
- [Consider types in the interface being implemented as distinct](#consider-types-in-the-interface-being-implemented-as-distinct)
- [Require some count to decrease](#require-some-count-to-decrease)
- [Require non-type values to stay the same](#require-non-type-values-to-stay-the-same)

<!-- tocstop -->

## Abstract

This proposal replaces the termination algorithm for `impl` selection. The
previous algorithm relied on a recursion limit, which is counter to
[our goal for predictability](/docs/design/generics/goals.md#predictability).

The replacement is to terminate if any `impl` lookup performed while considering
an `impl` declaration depends transitively on the same `impl` declaration with a
"strict superset" of the types in the query.

## Problem

Consider this `impl` declaration:

```
interface I;
impl forall [T:! type where Optional(.Self) is I] T as I;
```

A type like `i32` is a valid value of `T`, and so implements `I`, if
`Optional(i32)` implements `I`. This `impl` declaration could also possibly be
used to give an implementation of `Optional(i32)` for interface `I`, but only if
there was an implementation of `Optional(Optional(i32))` for interface `I`. The
job of the termination rule is to report an error instead of being caught in an
infinite loop in this situation.

Ideally, a termination rule would identify the loop in a minimal way. This has a
few benefits, including reducing compile times and making error messages as
short and understandable as possible. One downside of the original recursion
limit rule is its tendency to only detect a problem after the loop had been
repeated many times. This problem is worse if the recursion limit is large.

Another concern with using a recursion limit is that refactorings that would
otherwise be legal can increase the depth of recursion, causing spurious
failures. The workaround for this and other spurious failures is to increase the
recursion limit. This makes the other problems with using a recursion limit
worse.

Note that determining whether a particular set of `impl` declarations terminates
is
[equivalent to the halting problem](https://sdleffler.github.io/RustTypeSystemTuringComplete/)
(content warning: contains many instances of an obscene word as part of a
programming language name), and so is undecidable in general. So any termination
rule will have some false positives or spurious failures, where it reports an
error even though it would in fact complete if allowed to continue running. We
would like a criteria that correctly classifies the examples that arise in
practice.

## Background

The first termination rule was introduced in proposal
[#920: Generic parameterized impls (details 5)](https://github.com/carbon-language/carbon-lang/pull/920),
following Rust and C++. The problems with using a recursion limit were
[recognized at the time that proposal was written](https://github.com/carbon-language/carbon-lang/blob/f282bca20e41e2f8dc05881d9d6b38213d6c6c87/docs/design/generics/details.md#termination-rule),
but no alternative was known.

Alternatives termination rules have since been discussed:

- in open discussion on
[2022-04-13](https://docs.google.com/document/d/1tEt4iM6vfcY0O0DG0uOEMIbaXcZXlNREc2ChNiEtn_w/edit#heading=h.cja3fkwzv9tr),
prompted by a question on
[#1088: Generic details 10: interface-implemented requirements](https://github.com/carbon-language/carbon-lang/pull/1088);
and
- in issue
[#2458: Infinite recursion during impl selection](https://github.com/carbon-language/carbon-lang/issues/2458),
which includes summaries of discussions including those on
[2023-02-07](https://docs.google.com/document/d/1gnJBTfY81fZYvI_QXjwKk1uQHYBNHGqRLI2BS_cYYNQ/edit?resourcekey=0-ql1Q1WvTcDvhycf8LbA9DQ#heading=h.9u2u6078figt).

PR
[#2602: Implement the termination algorithm for impl selection described in #2458](https://github.com/carbon-language/carbon-lang/pull/2602)
implements the termination rule of this proposal in Explorer.

## Proposal

We replace the termination criteria with a rule that the types in the `impl`
query must never get strictly more complicated when considering the same `impl`
declaration again. The way we measure the complexity of a set of types is by
counting how many of each base type appears. A base type is the name of a type
without its parameters. For example, the base types in this query
`Pair(Optional(i32), bool) impls AddWith(Optional(i32))` are:

- `Pair`
- `Optional` twice
- `i32` twice
- `bool`
- `AddWith`

A query is strictly more complicated if at least one count increases, and no
count decreases. So `Optional(Optional(i32))` is strictly more complicated than
`Optional(i32)` but not strictly more complicated than `Optional(bool)`.

This rule, when combined with
[the acyclic rule](/docs/design/generics/details.md#acyclic-rule) that a query
can't repeat exactly, [guarantees termination](#proof-of-termination). This rule
is expected to identify a problematic sequence of `impl` declaration
instantiations in a way that is easier for the user to understand. Consider the
example from before,

```
interface I;
impl forall [T:! type where Optional(.Self) is I] T as I;
```

This `impl` declaration matches the query `i32 impls I` as long as
`Optional(i32) impls I`. That is a strictly more complicated query, though,
zygoloid marked this conversation as resolved.
Show resolved Hide resolved
since it contains all the base types of the starting query (`i32` and `I`), plus
one more (`Optional`). As a result, an error can be given after one step, rather
than after hitting a large recursion limit. And that error can state explicitly
what went wrong: we went from a query with no `Optional` to one with one,
without anything else decreasing.

Note this only triggers a failure when the same `impl` declaration is considered
with the strictly more complicated query. For example, if the declaration is not
considered since there is a more specialized `impl` declaration that is
preferred by the
[type-structure overlap rule](/docs/design/generics/details.md#overlap-rule), as
in:

```
impl forall [T:! type where Optional(.Self) is I] T as I;
impl Optional(bool) as I;

// OK, because we never consider the first `impl`
// declaration when looking for `Optional(bool) impls I`.
let U:! I = bool;

// Error: cycle with `i32 impls I` depending on
// `Optional(i32) impls I`, using the same `impl`
// declaration, as before.
let V:! I = i32;
```

The rule is also robust in the face of refactoring:

- It does not depend on the specifics of how an `impl` declaration is
parameterized, only on the query.
- It does not depend on the length of the chain of queries.
- It does not depend on a measure of type-expression complexity, like depth.

## Details

### Non-type arguments

For non-type arguments we have to expand beyond base types to consider other
kinds of keys. These other keys are in a separate namespace from base types.

- Values with an integral type use the name of the type as the key and the
absolute value as a count. This means integer arguments are considered more
complicated if they increase in absolute value. For example, if the values
`2` and `-3` are used as arguments to parameters with type `i32`, then the
`i32` key will have count `5`.
- Every option of a choice type is its own key, counting how many times a
value using that option occurs. Any parameters to the option are recorded as
separate keys. For example, the `Optional(i32)` value of `.Some(7)` is
recorded as keys `.Some` (with a count of `1`) and `i32` (with a count of
`7`).
- Yet another namespace of keys is used to track counts of variadic arguments,
under the base type. This is to defend against having a variadic type `V`
that takes any number of `i32` arguments, with an infinite set of distinct
instantiations: `V(0)`, `V(0, 0)`, `V(0, 0, 0)`, ...
- A `tuple` key in this namespace is used to track the total number of
components of tuple values. The values of those elements will be tracked
using their own keys.

Non-type argument values not covered by these cases are deleted from the query
entirely for purposes of the termination algorithm. This requires that two
queries that only differ by non-type arguments are considered identical and
therefore are rejected by the acyclic rule. Otherwise, we could construct an
infinite family of non-type argument values that could be used to avoid
termination.

### Proof of termination

Let's call a (finite or infinite) sequence of type expressions "good" if no
later element is strictly more complex than an earlier element, and no type
expression is repeated. We would like to prove that any good sequence of type
expressions with a finite set of keys is finite.

We can restrict to good sequences that don't repeat any multiset of keys, since
there are only a finite number of types with a given multiset of keys. Proof: If
none of the types have a variadic parameter list, then there is at most one type
for every distinct permutation of base types. If some types are variadic, then
we can get a conservative finite upper bound by multiplying the number of
distinct permutations by the number of different possible arity combinations.
The number of arity combinations is finite since, ignoring non-type arguments,
the total arity must equal the number of base types in the type minus 1.

The proof of termination is by induction on the number `N` of distinct keys.

- If `N == 1`, then types map to a multiset of a single key, which can be
represented by the count of the number of times that key appears. That
number must be non-negative and decreasing in the sequence, and so the
length of the sequence is bounded by the value of the first element. So good
sequences with `N == 1` must be finite.
- Assuming that good sequences with `N` distinct keys must be finite, consider
a good sequence with `N+1` distinct keys. Its first element will be
represented by a non-negative integer `(N+1)`-tuple, `(i_0, i_1, ..., i_N)`.
Every element after that will be in at least one of the
`i_0 + i_1 + ... + i_N` hyperplanes (co-dimension 1) given by these
equations:
- `x_0 = 0`, `x_0 = 1`, ..., `x_0 = i_0 - 1` (`i_0` different equations,
each defining a separate hyperplane)
- `x_1 = 0`, `x_1 = 1`, ..., `x_1 = i_1 - 1`
- ...
- `x_N = 0`, `x_N = 1`, ..., `x_N = i_N - 1`
- Any point not in one of those hyperplanes has components all >= the first
element, and so can't be in the sequence if it is good.
- The restriction of the sequence to the subsequence in each of those
hyperplanes is finite, by the induction hypothesis.
- The sequence visits points in this finite union of finite sets without
repetition, and so must be finite.
- Conclusion: Any good sequence with `N+1` distinct keys is finite, completing
the induction.

This bound given by this construction is not completely tight, since there is
overlap between the hyperplanes. It is tight once that overlap is taken into
account, though. We can construct sequences that reach the upper bound by
visiting the points in the union of the hyperplanes in descending order of their
L1-norm (sum of the components).

Note: The text of this argument was derived from comments on
[issue #2458: Infinite recursion during impl selection](https://github.com/carbon-language/carbon-lang/issues/2458).

## Rationale

This proposal advances these [Carbon goals](/docs/project/goals.md):

- [Language tools and ecosystem](/docs/project/goals.md#language-tools-and-ecosystem)
by improving the quality of diagnostics.
- [Software and language evolution](/docs/project/goals.md#software-and-language-evolution)
since we've chosen alternatives that avoid introducing failures as the
result of refactorings, particularly those outside the files changed in the
refactoring.
- [Code that is easy to read, understand, and write](/docs/project/goals.md#code-that-is-easy-to-read-understand-and-write)
by having a relatively simple language rule, that is predictable when
authoring code, and allows independent modules to compose without triggering
errors.

## Alternatives considered

### Measure complexity using type tree depth

We
[considered](https://github.com/carbon-language/carbon-lang/issues/2458#issuecomment-1371412985)
a rule which would ensure termination by forbidding the depth of the type tree
in the query from increasing. This depth could either be measured in the query
or in the values of types used to parameterize `impl` declaration. Either way,
this raises a concern that otherwise safe refactorings might trigger spurious
termination errors. Specifically, refactorings that replace a type, like
`String`, with an alias to a parameterized type, like `BasicString(Char8)`,
could change the tree depths in `impl` declarations in files that were not part
of the refactoring.

### Consider each type parameter in an `impl` declaration separately

Instead of measuring the complexity of the `impl` query as a whole, we
considered measuring the complexity of the argument values of parameters in an
individual `impl` declaration. The advantage of this would be fewer spurious
failures due to the termination rule.

We decided against it because it is a more complex rule and it is sensitive to
the specifics of how `impl` declarations are parameterized. This raises concerns
about refactorings introducing termination rule failures. We did not want to
incorporate this change without evidence that those spurious failures would be a
problem in practice.

### Consider types in the interface being implemented as distinct

Instead of measuring the complexity of the entire `impl` query together, we
could consider keys in the type and interface parts of the query to be in
distinct namespaces. This would reduce the spurious failuers due to the
termination rule, but not as much as the previous alternative. It avoids the
problem of the previous alternative, since it is not sensitive to the specifics
of how `impl` declarations are parameterized.

We are not choosing this alternative now since it is a more complicated rule to
explain. But we would consider this alternative in the future if we find it
beneficial in practice to support the additional cases this rule permits.

### Require some count to decrease

We considered a rule that would forbid repeating the multiset of types. This
would simplify the termination argument. However, we thought it important to
support `impl` declarations that effectively shuffled the terms around into some
canonical form, as in this example:

```
impl forall [T:! I(Optional(.Self))] Optional(T) as I(T);
```

Here, `Optional(bool) impls I(bool)` if `bool impls I(Optional(bool))`. This
rule can only be applied a finite number of times, and is something we imagine
might arise naturally, so it seemed good to support.

### Require non-type values to stay the same

We considered different handling for
[non-type argument values](#non-type-arguments) that did not have an integral or
choice type. The alternative rule required the value to stay constant. This led
to
[a number of edge cases](https://github.com/carbon-language/carbon-lang/pull/2687#discussion_r1151028867)
to consider, like how to identify the same argument when the type constructor
may be called a different number of times. The rule we chose to use instead has
the advantages of being simpler and also accepting more cases. With the current
rule, the value of those arguments may change freely, they just don't create
different type expressions for purposes of detecting termination.