From a87300012d8c24c1a4cc376509b910abdcaabce3 Mon Sep 17 00:00:00 2001 From: Geoffrey Romer Date: Tue, 8 Sep 2020 16:48:40 -0700 Subject: [PATCH 01/28] Initial draft of "Design direction for sum types" --- proposals/new.md | 471 +++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 471 insertions(+) create mode 100644 proposals/new.md diff --git a/proposals/new.md b/proposals/new.md new file mode 100644 index 0000000000000..17cf7efedf99e --- /dev/null +++ b/proposals/new.md @@ -0,0 +1,471 @@ +# Design direction for sum types + + + +[Pull request](https://github.com/carbon-language/carbon-lang/pull/####) + +## Table of contents + + + +- [Problem](#problem) +- [Background](#background) +- [Proposal](#proposal) +- [Pattern matching](#pattern-matching) +- [Pattern functions](#pattern-functions) + - [Alternatives considered](#alternatives-considered) +- [`StorageArray`](#storagearray) +- [`choice`](#choice) + - [Alternatives considered](#alternatives-considered-1) + - [Separate support for enumerated types](#separate-support-for-enumerated-types) + - [Different spelling for `choice`](#different-spelling-for-choice) + - [Different syntax for `closed`](#different-syntax-for-closed) + + + +## Problem + +Many important programming use cases involve values that are most naturally +represented as having one of several alternative forms (called _alternatives_ +for short). For example, + +- Optional values, which are pervasive in computing, take the form of either + values of some underlying type, or a special "not present" value. +- Functions that cannot throw exceptions often use a return type that can + represent either a successfully computed result, or some description of how + the computation failed. +- Nodes of a parse tree often take different forms depending on the grammar + production that generated them. For example, a node of a parse tree for + simple arithmetic expressions might represent either a sum or product + expression with two child nodes representing the operands, or a + parenthesized expression with a single child node representing the contents + of the parentheses. +- Boolean values take the form of either a "true" value or a "false" value. + +What unites these use cases is that the set of alternatives is fixed by the API, +it is possible for user code to determine which alternative is present, and +there is little or nothing you can usefully do with such a value without first +making that determination. + +Carbon needs to support defining and working with types representing such +values. Following Carbon's principles, these types need to be easy to define, +understand, and use, and they need to be safe -- in ordinary usage, the type +system should ensure that user code cannot accidentally access the wrong +alternative. Furthermore, these types need to be efficient enough that type +designers are not tempted to switch to an API that is less safe or less +user-friendly. + +## Background + +The terminology in this space is quite fragmented and inconsistent. This +proposal will use the term _sum types_ to refer to types of the kind described +in the problem statement. Note that "sum type" is not being proposed as a +specific Carbon feature, or even as a precisely defined term of art; it is +merely an informal way for this proposal to refer to its motivating use cases, +in much the same way that a structs proposal might refer to "value types". + +Carbon as currently envisioned is already capable of approximating support for +sum types. In particular, [pattern matching](/docs/design/pattern_matching.html) +gives us a natural way to express querying which alternative is active, and then +performing computations on that active alternative, which as discussed above is +the primary way of interacting with a sum type. For example, a value-or-error +type `Result(T, Error)` could be implemented and used like so: + +``` +struct Result(Type:$$ T, Type:$$ Error) { + // 0 if this represents a value, 1 if this represents an error + var Int: discriminator; + var T: value; + var E: error; + + fn Success(T: value) -> Result(T, Error) { + return (.discriminator = 0, .value = value, .error = E()); + } + + fn Failure(Error: error) -> Result(T, Error) { + return (.discriminator = 1, .value = T(), .error = error); + } +} + +fn DoTheThing() -> Result(Int, String); + +var Result(Int, String): r = DoTheThing(); +match (r) { + case (.discriminator = 0, .value = Int: value, .error = _) => { + // Do something with `value` + } + case (.discriminator = 1, .value = _, .error = String: error) => { + // Do something with `error` + } + default => { Assert(False); } +} +``` + +However, this code suffers from a number of serious deficiencies: + +- The implementation details of `Result` are not encapsulated. This makes the + `Result` API unsafe: nothing prevents client code from accessing `.value` + even when `.discriminator` is not 0. This also makes the patterns extremely + verbose. +- `.value` and `.error` must both be live throughout the `Result`'s lifetime, + even when they are not meaningful. Consequently, `Success` must populate + `.error` with a default-constructed dummy value (and so it won't work if + `Error` is not default-constructible), and `Failure` must do the same for + `.value`. Furthermore, `Result` is bloated by the fact that the two must + have separately-allocated storage, even though only one at a time actually + stores any data. +- `.discriminator` should never have any value other than 0 or 1, but the + compiler can't enforce that property when `Result`s are created, or exploit + it when `Result`s are used. So, for example, the `match` must have a + `default` case in order for the compiler and other tools to consider it + exhaustive, even though that default case should never be entered. If Carbon + supports integer types with arbitrary bit-widths, we could use `Int1` in + this specific case, but that won't work when the number of alternatives + isn't a power of 2. Furthermore, the type of `.discriminator` is somewhat + misleading: its values are not numbers, but only symbolic tags. It will + never make sense to perform arithmetic on it, for example. +- The definition of `Result` is largely boilerplate. Conceptually, the only + information needed to specify this type is the names and parameter types of + the two factory functions, plus the fact that every possible value of + `Result` is uniquely described by the name and parameter values of a call to + one of those two functions. Given that information, the compiler could + easily generate the rest of the struct definition. This generated + implementation may not always be as efficient as a hand-coded one could be, + but in a lot of cases that may not matter. + +## Proposal + +To summarize, the previous section identified four deficiencies in Carbon, which +together prevent it from adequately supporting sum types: + +- There is no way for pattern matching to operate through an encapsulation + boundary. +- There is no way to manually control the lifetimes of subobjects, or enable + them to share storage. +- There is no way to define an enumerated type. +- There is no way to define a sum type without micromanaging the details of + its representation. + +I propose supporting sum types by introducing three separate language features +to supply the missing functionality. These features are largely independent and +orthogonal, so their detailed design will be addressed in separate proposals. +This proposal merely establishes the overall design direction for sum types, in +the same way that [p0083](p0083.md) established the overall design direction for +the language as a whole. + +To address the lack of encapsulation support in pattern matching, I propose +introducing the concept of a _pattern function_, which is a function that can be +invoked as part of a pattern, even with arguments that contain placeholders. +Pattern functions can only contain the sort of code that could appear directly +in a pattern, but they let us define reusable pattern syntaxes that can do +things like encapsulate hidden implementation details of the object they're +matching. + +To address the lack of manual lifetime control and storage sharing, I propose +introducing a new fundamental type `Storage`, which is the type of a byte of raw +memory, and `create` and `destroy` operations which create and destroy objects +within a span of `Storage`. FIXME: `StorageArray`, which adds init support + +To address the last two deficiencies, I propose introducing `choice` as a +convenient syntax for defining a sum type by specifying the names and parameter +types of any factory functions, and the names of any static constant instances +of the type. These are understood to be exhaustive and mutually exclusive, and +so choice types that have no factory functions behave as enumerated types. A +`choice` type can be marked `closed`, which indicates that client code can +assume no new alternatives will be added in the future. + +Using these features, the `Result(T, Error)` example can be rewritten as +follows: + +``` +struct Result(Type:$$ T, Type:$$ Error) { + closed choice Discriminator { + var _:$$ IsValue; + var _:$$ IsError; + } + var Discriminator: discriminator; + Array(Storage, Max(Sizeof(T), Sizeof(E))) storage; + + pattern Success(T: value) -> Result(T, Error) { + return (.discriminator = Discriminator.IsValue, .storage = value); + } + + pattern Failure(Error: error) -> Result(T, Error) { + return (.discriminator = Discriminator.IsError, .storage = error); + } +} + +fn DoTheThing() -> Result(Int, String); + +var Result(Int, String): r = DoTheThing(); +match (r) { + case .Success(T: value) => { + // Do something with `value` + } + case .Error(T: error) => { + // Do something with `error` + } +} +``` + +Or, more concisely, + +``` +closed choice Result(Type:$$ T, Type:$$ Error) { + pattern Success(T: value) -> _; + pattern Failure(Error: error) -> _; +} +``` + +Similarly, `Optional(T)` could be defined as + +``` +closed choice Optional(Type:$$ T) { + pattern Some(T: value) -> _; + var _:$$ None; +} +``` + +## Pattern matching + +This proposal presupposes that pattern matching will be built around the +following basic intuition, which I will call the "substitution principle": + +> A pattern is an expression in which zero or more terms have been replaced with +> variable binding declarations (`_` can be considered an anonymous binding +> declaration). If the pattern matches a given value, evaluating that expression +> by substituting the bound values back into the pattern will yield that value. + +In particular, this implies that any operation that can be used in the body of a +pattern can also be used in an expression. This is not strictly a requirement of +this proposal, but to the extent that Carbon deviates from this principle, this +proposal may fit less well. In particular, if we want to support user-defined +operations that can appear only in patterns, we would need a customization +mechanism that's separate from pattern functions as described in this proposal, +and we may not want to have two separate customization mechanisms. + +This proposal requires us to impose some sort of evaluation-order constraint on +pattern matching. For example, when matching `r` with the pattern +`.Success(T: value)` in the example above, we must be able to guarantee that we +do not recursively attempt to match the contents of `.storage` with `value` +until after we have successfully matched `.discriminator` with +`Discriminator.IsValue`. Otherwise, we risk trying to access a `T` object that +isn't actually present in `.storage`. + +FIXME: How do we constrain evaluation to accomplish this? We've discussed +requiring that any subpatterns without placeholders be evaluated before any +subpatterns with placeholders, but that seems ad-hoc; it wouldn't protect a +pattern like `(.discriminator = Discriminator.IsValue, .storage = 1)`. + +It's also worth noting some pre-existing requirements for pattern matching, +which will inform the design of pattern functions. These requirements are highly +desirable in any pattern matching system, but they are hard requirements if we +want to treat C++-like overload resolution as a form of pattern matching. + +Carbon pattern matching must not only be able to determine whether a pattern +matches a value, and bind values to its variables, but also be able to select +the _best_ pattern when more than one matches. Patterns are ranked by +specificity, so for example `(Int: x, 1)` is a better match than +`(Int: x, Int: y)` in the cases where they both match. This is not a total +order, or even a weak order (intuitively, a total order with "ties"); for +example, there is no ordering between `(Int: x, 1, 1)` and either +`(1, Int: y, Int: z)` or `(1, 1, Int:z)`, but we can't just say that all three +are "tied", because `(1, 1, Int: z)` is more specialized than +`(1, Int: y, Int: z)`. + +Carbon must be able to determine whether a given set of patterns is +_exhaustive_, meaning that for any possible value at least one of them will +match. It must also be able to determine whether a set of patterns is +_ambiguous_, meaning that the set contains two or more patterns that are not +ordered with respect to each other, and that can match the same value. + +## Pattern functions + +Pattern functions have the same declaration syntax as ordinary functions, except +that they are introduced with `pattern` rather than `fn`. However, the function +body is required to be a single `return` statement whose operand expression is a +valid pattern, with the function parameters acting as variable bindings. + +It seems likely that pattern functions will be required to be inline, because +they will probably need to generate very different code depending on the +structure of their arguments, particularly if the arguments contain +placeholders. + +FIXME: anything else to say about pattern functions? + +### Alternatives considered + +The substitution principle implies that pattern matching can be thought of as +the inverse of expression evaluation: given the purported value of an +expression, we are trying to work backwards to determine the values for +variables in the expression. This proposal allows users to define custom +patterns in terms of expressions, by restricting their definitions to a narrow +subset of the language that we know how to invert automatically. As an +alternative, we could allow developers to define custom patterns by directly +specifying the code that should be executed during pattern matching. This +obliges them to determine the correct inverse computation themselves, but +permits them to express it in terms of the full Carbon language. + +However, this approach creates the risk that user-defined patterns will not +satisfy the substitution principle, either due to bugs or because developers +want to use pattern matching syntax to express logic that has no counterpart in +expression evaluation. Furthermore, it requires the developer to write code for +both the forward and reverse computations, unless we choose to embrace the +possibility of syntaxes that can only occur in patterns, and so don't satisfy +the substitution principle. + +Furthermore, as discussed above, Carbon pattern matching needs to be able to not +only determine whether a pattern matches, but also rank patterns, and determine +whether a set of patterns is exhaustive and/or ambiguous. Defining a custom +pattern in terms of arbitrary forward and reverse functions doesn't give the +compiler enough information to do those things, so the developer would need to +supply additional information in some way, and it's not at all clear how that +would work. + +## `StorageArray` + +This approach to sum types imposes relatively few requirements on the type used +to implement shared storage. I'm proposing an untyped byte buffer here because +it's more general, but union types along the lines of proposal +[0139](https://github.com/carbon-language/carbon-lang/pull/139) would work just +as well. + +However, the one major requirement we do have is a tricky one: it must be +possible to initialize an instance of the shared storage type from an instance +of any of the values that it can represent, and to do so inside a pattern or +pattern function. This proposal uses the obvious assignment-style syntax for +that, so in the above example we use `.storage = value` to initialize `.storage` +to hold the value `value`. However, this implies that there is something like an +implicit conversion to `StorageArray(N)` from _any_ type whose size is at most +`N`, and such broad implicit conversions often cause substantial maintenance and +readability problems, at least in C++. + +As an alternative, we could require the conversion to be explicit, making the +syntax something like +`.storage = StorageArray(Max(Sizeof(T), Sizeof(E)))(value)`. However, this +requires repeating the size parameter, which is logically redundant and +syntactically somewhat annoying (although an alias could help). We could split +the difference and define a factory function whose return type implicitly +converts to `StorageArray(N)` for any sufficiently large `N`, making the example +look like `.storage = MakeStorageArray(value)`, but that only works if Carbon +provides something equivalent to C++'s "guaranteed RVO", because `T` may not be +movable and `StorageArray(N)` definitely won't (and that guaranteed RVO needs to +work even though the right and left sides of the initialization have different +types). + +Either of those alternatives would also be fairly misleading from a readability +perspective, because they look like function or constructor calls, but they +can't actually _be_ function or constructor calls. Recall that this syntax needs +to be usable inside patterns, so these functions would need to be pattern +functions, which means they would need to consist of a single expression that +initializes the `StorageArray(N)`, and the whole problem we're trying to solve +here is to make it possible to write such an expression. + +At a deeper level, the problem is that the whole notion of allowing multiple +objects to successively share the same storage is inherently procedural (because +it involves changes in state over time), but pattern matching is fundamentally +descriptive, and hence functional. + +A design in which user-defined patterns can be expressed in terms of procedural +forward and reverse functions would avoid this whole problem, by allowing us to +express storing the alternative in terms of procedural code (such as some +equivalent of C++'s placement `new`) rather than initialization. + +## `choice` + +A `choice` type definition has the same general form as a `struct` definition, +but a `choice` type can only contain declarations of pattern functions and +compile-time constants (in other words, variables declared with `$$`). The +definitions of those members will be provided by the compiler; they cannot be +defined in user code. The return type of the pattern functions, and the type of +the constants, must be the `choice` type being declared, and so to avoid +redundancy we allow that type to be deduced from the compiler-provided +definitions. The examples in this proposal use `_` as the placeholder, but +something like `auto` would work too, if we don't want to model type deduction +as a form of pattern matching. + +Carbon will probably have some mechanism for allowing a struct to have +compiler-generated default implementations of operations such as copy, move, +hashing, and equality comparison, so long as the struct's members support those +operations. Assuming that mechanism exists, `choice` types will support it as +well, with the parameter types of the pattern functions taking the place of the +member types. However, `choice` types cannot be default constructible. A future +proposal for this mechanism will need to consider whether to require an explicit +opt-in to generate these operations. + +The compiler-generated definitions of a choice type's members are unspecified, +except that they will satisfy the following two properties: + +- The alternatives are _mutually exclusive_: an enumerator can only compare + equal to copies of itself, and the result of calling a factory function can + only compare equal to the result of calling the same factory function with + the same arguments. +- The alternatives are _exhaustive_: every possible value of the type is equal + to one of the enumerators, or to the result of invoking one of the factory + functions with some set of arguments. + +Although choice types are always exhaustive as far as the language semantics are +concerned, by default user code will be required to treat them as +non-exhaustive. For example, a `match` statement that operates on a value of a +choice type will be required to have a `defaut` case, even if it also has +patterns that match all of the declared alternatives. This ensures that choice +types can be extended with new alternatives without breaking any existing code. +Declaring a type with `closed choice` rather than `choice` allows client code to +treat the type as exhaustive. + +### Alternatives considered + +#### Separate support for enumerated types + +This proposal supports enumerated types as a special case of choice types. +However, there may be some benefit to providing special-case support for +enumerated types, similar to C++'s `enum`. In particular: + +- We could more easily avoid the `var _:$$` boilerplate in enumerator + declarations. +- We could allow the developer to specify an underlying type, associate a + specific value of the underlying type with each enumerator, and convert + between the enum and the underlying type, which are fairly common practices + in C++ code. When using choice types, these practices can be emulated by + defining functions that map between the choice type and the underlying type, + but that requires a substantial amount of error-prone boilerplate. + Furthermore, those functions can't reliably be no-ops at the hardware level, + the way they can be with C++ enums. +- It would allow us to treat choice types as purely syntactic sugar for struct + definitions. We can't quite do that at present because the desugared form + would itself contain a `choice` type, as with the long version of `Result` + above. + +I am omitting that from this proposal for simplicity, since it's purely +additive, and not necessary for the goals of this proposal. + +#### Different spelling for `choice` + +The Rust and Swift counterparts of `choice` are spelled `enum`. I have avoided +this because these types are not really "enumerated types" in the sense that all +values are explicitly enumerated in the code, except in the special case where +there are no factory functions. I chose the spelling `choice` because "choice +type" is one of the only available synonyms for "sum type" that doesn't have any +potentially-misleading associations. + +#### Different syntax for `closed` + +The `closed` syntax should be considered little more than a placeholder. It's +somewhat unconventional for a modifier to come before an introducer, as with +`closed choice`. Reversing the order would fix that problem, but `choice closed` +reads quite awkwardly as English. Making `closed` a keyword would prevent +developers from using `closed` as an identifier, which may be too high a cost +for such a niche use case. We could fix that by making it an attribute rather +than a keyword, but it's not clear that Carbon will have attributes, much less +what the syntax would be. + +It may be surprising that there is no corresponding `open choice` syntax, but +`open` would be meaningless syntactic noise unless we made it mandatory, and +making it mandatory would be poor ergonomics, because developers would be forced +to make an up-front decision between the two, rather than relying on a safe +default. Furthermore, reserving `open` as a keyword seems even more problematic +than reserving `closed`. From 483166f0f1a8e06508a8b46e7967535ccd2170d8 Mon Sep 17 00:00:00 2001 From: Geoffrey Romer Date: Tue, 8 Sep 2020 16:52:07 -0700 Subject: [PATCH 02/28] Move proposal to numbered file --- proposals/README.md | 1 + proposals/{new.md => p0157.md} | 2 +- 2 files changed, 2 insertions(+), 1 deletion(-) rename proposals/{new.md => p0157.md} (99%) diff --git a/proposals/README.md b/proposals/README.md index 27d65a25dd951..ca97e24e5cdcf 100644 --- a/proposals/README.md +++ b/proposals/README.md @@ -33,5 +33,6 @@ request: - [Decision](p0074_decision.md) - [0083 - In-progress design overview](p0083.md) - [0120 - Add idiomatic code performance and developer-facing docs to goals](p0120.md) +- [0157 - Design direction for sum types](p0157.md) diff --git a/proposals/new.md b/proposals/p0157.md similarity index 99% rename from proposals/new.md rename to proposals/p0157.md index 17cf7efedf99e..411d86ae353fc 100644 --- a/proposals/new.md +++ b/proposals/p0157.md @@ -6,7 +6,7 @@ Exceptions. See /LICENSE for license information. SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception --> -[Pull request](https://github.com/carbon-language/carbon-lang/pull/####) +[Pull request](https://github.com/carbon-language/carbon-lang/pull/157) ## Table of contents From ab67a8e816712f7a213c9745c3267f91939707a8 Mon Sep 17 00:00:00 2001 From: Geoffrey Romer Date: Fri, 11 Sep 2020 16:34:40 -0700 Subject: [PATCH 03/28] Respond to review comments and other feedback. --- proposals/p0157.md | 374 ++++++++++++++++++++++++++++++++++----------- 1 file changed, 287 insertions(+), 87 deletions(-) diff --git a/proposals/p0157.md b/proposals/p0157.md index 411d86ae353fc..f93bf271bc95d 100644 --- a/proposals/p0157.md +++ b/proposals/p0157.md @@ -24,6 +24,11 @@ SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception - [Separate support for enumerated types](#separate-support-for-enumerated-types) - [Different spelling for `choice`](#different-spelling-for-choice) - [Different syntax for `closed`](#different-syntax-for-closed) + - [Allowing templated pattern functions](#allowing-templated-pattern-functions) + - [Make `choice` an element of a `struct`](#make-choice-an-element-of-a-struct) + - [More concise syntax for alternatives](#more-concise-syntax-for-alternatives) +- [Alternatives considered](#alternatives-considered-2) + - [Indexing by type](#indexing-by-type) @@ -55,9 +60,16 @@ Carbon needs to support defining and working with types representing such values. Following Carbon's principles, these types need to be easy to define, understand, and use, and they need to be safe -- in ordinary usage, the type system should ensure that user code cannot accidentally access the wrong -alternative. Furthermore, these types need to be efficient enough that type -designers are not tempted to switch to an API that is less safe or less -user-friendly. +alternative. + +Furthermore, it needs to be possible for type owners to customize the +representations of these types. For example, sum types usually need a +"discriminator" field to indicate which alternative is present, but since it +typically has very few possible values, it can often be packed into padding, or +even the low-order bits of a pointer. This sort of customization inherently +creates a risk of implementation bugs that break type safety, but it must be +possible for correctly-implemented customizations to avoid changing the API, and +hence avoid weakening the static safety guarantees for users. ## Background @@ -73,11 +85,16 @@ sum types. In particular, [pattern matching](/docs/design/pattern_matching.html) gives us a natural way to express querying which alternative is active, and then performing computations on that active alternative, which as discussed above is the primary way of interacting with a sum type. For example, a value-or-error -type `Result(T, Error)` could be implemented and used like so: +type `Result(T, Error)` could be implemented like so: ``` +// Result(T, Error) holds either a successfully-computed value of type T, +// or metadata about a failure during that computation, or a singleton +// "cancelled" state indicating that the computation successfully complied with +// a request to halt before completion. struct Result(Type:$$ T, Type:$$ Error) { - // 0 if this represents a value, 1 if this represents an error + // 0 if this represents a value, 1 if this represents an error, 2 if this + // represents the cancelled state. var Int: discriminator; var T: value; var E: error; @@ -89,19 +106,48 @@ struct Result(Type:$$ T, Type:$$ Error) { fn Failure(Error: error) -> Result(T, Error) { return (.discriminator = 1, .value = T(), .error = error); } + + var Result(T, Error):$$ Cancelled = + (.discriminator = 2, .value = T(), .error = E()); } +``` -fn DoTheThing() -> Result(Int, String); +A typical usage might look like: -var Result(Int, String): r = DoTheThing(); -match (r) { - case (.discriminator = 0, .value = Int: value, .error = _) => { - // Do something with `value` +``` +fn ParseAsInt(String: s) -> Result(Int, String) { + var Int: result = 0; + var auto: it = s.begin(); + while (it != s.end()) { + if (*it < '0' || *it > '9') { + return Result(Int, String).Failure("String contains non-digit"); + } + result += *it - '0'; + result *= 10; } - case (.discriminator = 1, .value = _, .error = String: error) => { - // Do something with `error` + return Result(Int, String).Success(result); +} + +fn GetIntFromUser() -> Int { + while(True) { + var String: s = UserPrompt("Please enter a number"); + match (ParseAsInt(s)) { + case (.discriminator = 0, .value = Int: value, .error = String: _) => { + return value; + } + case (.discriminator = 1, .value = Int: _, .error = String: error) => { + Display(error); + } + case .Cancelled => { + // We didn't request cancellation, so something is very wrong. + Terminate(); + } + default => { + // Can't happen, because the above cases are exhaustive. + Assert(False); + } + } } - default => { Assert(False); } } ``` @@ -114,20 +160,18 @@ However, this code suffers from a number of serious deficiencies: - `.value` and `.error` must both be live throughout the `Result`'s lifetime, even when they are not meaningful. Consequently, `Success` must populate `.error` with a default-constructed dummy value (and so it won't work if - `Error` is not default-constructible), and `Failure` must do the same for - `.value`. Furthermore, `Result` is bloated by the fact that the two must - have separately-allocated storage, even though only one at a time actually - stores any data. -- `.discriminator` should never have any value other than 0 or 1, but the + `Error` is not default-constructible), `Failure` must do the same for + `.value`, and `Cancelled` must do the same for both. Furthermore, `Result` + is bloated by the fact that the two must have separately-allocated storage, + even though at most one at a time actually stores any data. +- `.discriminator` should never have any value other than 0, 1, or 2, but the compiler can't enforce that property when `Result`s are created, or exploit it when `Result`s are used. So, for example, the `match` must have a `default` case in order for the compiler and other tools to consider it - exhaustive, even though that default case should never be entered. If Carbon - supports integer types with arbitrary bit-widths, we could use `Int1` in - this specific case, but that won't work when the number of alternatives - isn't a power of 2. Furthermore, the type of `.discriminator` is somewhat - misleading: its values are not numbers, but only symbolic tags. It will - never make sense to perform arithmetic on it, for example. + exhaustive, even though that default case should never be entered. + Furthermore, the type of `.discriminator` is somewhat misleading: its values + are not numbers, but only symbolic tags. It will never make sense to perform + arithmetic on it, for example. - The definition of `Result` is largely boilerplate. Conceptually, the only information needed to specify this type is the names and parameter types of the two factory functions, plus the fact that every possible value of @@ -137,6 +181,14 @@ However, this code suffers from a number of serious deficiencies: implementation may not always be as efficient as a hand-coded one could be, but in a lot of cases that may not matter. +This code arguably exhibits another problem as well: the `return` statements in +`ParseAsInt` are quite verbose, due to the need to explicitly qualify the +function calls with `Result(Int, String)`. In fact, developers might prefer to +avoid having a function call at all, especially in the success case, and instead +rely on implicit conversions to write something like `return result;`. This +proposal will not address that problem, because it appears to be orthogonal to +the design of sum types. + ## Proposal To summarize, the previous section identified four deficiencies in Carbon, which @@ -166,17 +218,17 @@ things like encapsulate hidden implementation details of the object they're matching. To address the lack of manual lifetime control and storage sharing, I propose -introducing a new fundamental type `Storage`, which is the type of a byte of raw -memory, and `create` and `destroy` operations which create and destroy objects -within a span of `Storage`. FIXME: `StorageArray`, which adds init support +introducing a new fundamental type `StorageArray`, which represents a contiguous +region of raw memory. To address the last two deficiencies, I propose introducing `choice` as a -convenient syntax for defining a sum type by specifying the names and parameter -types of any factory functions, and the names of any static constant instances -of the type. These are understood to be exhaustive and mutually exclusive, and -so choice types that have no factory functions behave as enumerated types. A -`choice` type can be marked `closed`, which indicates that client code can -assume no new alternatives will be added in the future. +convenient syntax for defining a sum type by specifying the set of possible +alternatives. The alternatives can be represented by factory functions like +`Success` and `Failure`, or singleton constants like `Cancelled`. In either +case, the definitions are omitted, because they are supplied by the compiler. +Choice types whose alternatives are all constants behave as enumerated types. A +choice type can be marked `closed`, which indicates that client code can assume +no new alternatives will be added in the future. Using these features, the `Result(T, Error)` example can be rewritten as follows: @@ -184,11 +236,12 @@ follows: ``` struct Result(Type:$$ T, Type:$$ Error) { closed choice Discriminator { - var _:$$ IsValue; - var _:$$ IsError; + var Self:$$ IsValue; + var Self:$$ IsError; + var Self:$$ IsCancelled; } var Discriminator: discriminator; - Array(Storage, Max(Sizeof(T), Sizeof(E))) storage; + var StorageArray(Max(Sizeof(T), Sizeof(E))): storage; pattern Success(T: value) -> Result(T, Error) { return (.discriminator = Discriminator.IsValue, .storage = value); @@ -197,36 +250,54 @@ struct Result(Type:$$ T, Type:$$ Error) { pattern Failure(Error: error) -> Result(T, Error) { return (.discriminator = Discriminator.IsError, .storage = error); } + + var Self:$$ Cancelled = (.discriminator = Discriminator.IsCancelled); } +``` -fn DoTheThing() -> Result(Int, String); +The code for `ParseAsInt` is unchanged, and the rest of the usage example would +look like: -var Result(Int, String): r = DoTheThing(); -match (r) { - case .Success(T: value) => { - // Do something with `value` - } - case .Error(T: error) => { - // Do something with `error` +``` +fn GetIntFromUser() -> Int { + while(True) { + var String: s = UserPrompt("Please enter a number"); + match (ParseAsInt(s)) { + case .Success(var Int: value) => { + return value; + } + case .Failure(var String: error) => { + Display(error); + } + case .Cancelled => { + // We didn't request cancellation, so something is very wrong. + Terminate(); + } + } } } ``` -Or, more concisely, +Since our implementation of `Result` doesn't benefit from having direct control +of the object representation, we could instead define it more concisely like so: ``` closed choice Result(Type:$$ T, Type:$$ Error) { - pattern Success(T: value) -> _; - pattern Failure(Error: error) -> _; + pattern Success(T: value) -> Self; + pattern Failure(Error: error) -> Self; + var Self:$$ Cancelled; } ``` -Similarly, `Optional(T)` could be defined as +We don't yet have enough of a design for variadics to give an example of a +Carbon counterpart for `std::variant`, but a variant with exactly three +alternative types could be written like so: ``` -closed choice Optional(Type:$$ T) { - pattern Some(T: value) -> _; - var _:$$ None; +choice Variant(Type:$$ T1, Type:$$ T2, Type:$$ T3) { + pattern Value(T1: value) -> Self; + pattern Value(T2: value) -> Self; + pattern Value(T3: value) -> Self; } ``` @@ -264,24 +335,25 @@ pattern like `(.discriminator = Discriminator.IsValue, .storage = 1)`. It's also worth noting some pre-existing requirements for pattern matching, which will inform the design of pattern functions. These requirements are highly desirable in any pattern matching system, but they are hard requirements if we -want to treat C++-like overload resolution as a form of pattern matching. - -Carbon pattern matching must not only be able to determine whether a pattern -matches a value, and bind values to its variables, but also be able to select -the _best_ pattern when more than one matches. Patterns are ranked by -specificity, so for example `(Int: x, 1)` is a better match than -`(Int: x, Int: y)` in the cases where they both match. This is not a total -order, or even a weak order (intuitively, a total order with "ties"); for -example, there is no ordering between `(Int: x, 1, 1)` and either -`(1, Int: y, Int: z)` or `(1, 1, Int:z)`, but we can't just say that all three -are "tied", because `(1, 1, Int: z)` is more specialized than -`(1, Int: y, Int: z)`. - -Carbon must be able to determine whether a given set of patterns is -_exhaustive_, meaning that for any possible value at least one of them will -match. It must also be able to determine whether a set of patterns is -_ambiguous_, meaning that the set contains two or more patterns that are not -ordered with respect to each other, and that can match the same value. +want to treat C++-like overload resolution as a form of pattern matching: + +- Carbon pattern matching must not only be able to determine whether a pattern + matches a value, and bind values to its variables, but also be able to + select the _best_ pattern when more than one matches. Patterns are partially + ordered by specificity, so for example `(Int: x, 1)` is a better match than + `(Int: x, Int: y)` in the cases where they both match. This is not a total + order, or even a weak order (intuitively, a total order with "ties"); for + example, there is no ordering between `(Int: x, 1, 1)` and either + `(1, Int: y, Int: z)` or `(1, 1, Int:z)`, but we can't just say that all + three are "tied", because `(1, 1, Int: z)` is more specialized than + `(1, Int: y, Int: z)`. This order should at least form a + [lattice](), but that's not + directly relevant here. +- Carbon must be able to determine whether a given set of patterns is + _exhaustive_, meaning that for any possible value at least one of them will + match. It must also be able to determine whether a set of patterns is + _ambiguous_, meaning that the set contains two or more patterns that are not + ordered with respect to each other, and that can match the same value. ## Pattern functions @@ -377,25 +449,33 @@ equivalent of C++'s placement `new`) rather than initialization. ## `choice` -A `choice` type definition has the same general form as a `struct` definition, -but a `choice` type can only contain declarations of pattern functions and -compile-time constants (in other words, variables declared with `$$`). The -definitions of those members will be provided by the compiler; they cannot be -defined in user code. The return type of the pattern functions, and the type of -the constants, must be the `choice` type being declared, and so to avoid -redundancy we allow that type to be deduced from the compiler-provided -definitions. The examples in this proposal use `_` as the placeholder, but -something like `auto` would work too, if we don't want to model type deduction -as a form of pattern matching. +A choice type definition has the same general form as a struct definition, but a +choice type can only contain declarations of pattern functions and compile-time +constants (in other words, variables declared with `$$`). The definitions of +those members will be provided by the compiler; they cannot be defined in user +code. The return type of the pattern functions, and the type of the constants, +must be the choice type being declared. + +The pattern functions of a choice type can be overloaded (as in the `Variant` +example above), but they cannot be templates. More precisely, the parameter +types of a pattern function in a choice type can't depend on any of the +arguments. Carbon will probably have some mechanism for allowing a struct to have compiler-generated default implementations of operations such as copy, move, -hashing, and equality comparison, so long as the struct's members support those -operations. Assuming that mechanism exists, `choice` types will support it as -well, with the parameter types of the pattern functions taking the place of the -member types. However, `choice` types cannot be default constructible. A future -proposal for this mechanism will need to consider whether to require an explicit -opt-in to generate these operations. +assignment, hashing, and equality comparison, so long as the struct's members +support those operations. Assuming that mechanism exists, choice types will +support it as well, with the parameter types of the pattern functions taking the +place of the member types. However, there are a couple of special cases: + +- choice types cannot be default constructible, unless we provide a separate + mechanism for specifying which alternative is the default. +- choice types can be assignable, regardless of whether the parameter types + are assignable, because assigning to a choice type always destroys the + existing alternative, rather than assigning to it. + +A future proposal for this mechanism will need to consider whether to require an +explicit opt-in to generate these operations. The compiler-generated definitions of a choice type's members are unspecified, except that they will satisfy the following two properties: @@ -411,7 +491,7 @@ except that they will satisfy the following two properties: Although choice types are always exhaustive as far as the language semantics are concerned, by default user code will be required to treat them as non-exhaustive. For example, a `match` statement that operates on a value of a -choice type will be required to have a `defaut` case, even if it also has +choice type will be required to have a `default` case, even if it also has patterns that match all of the declared alternatives. This ensures that choice types can be extended with new alternatives without breaking any existing code. Declaring a type with `closed choice` rather than `choice` allows client code to @@ -425,7 +505,7 @@ This proposal supports enumerated types as a special case of choice types. However, there may be some benefit to providing special-case support for enumerated types, similar to C++'s `enum`. In particular: -- We could more easily avoid the `var _:$$` boilerplate in enumerator +- We could more easily avoid the `var Self:$$` boilerplate in enumerator declarations. - We could allow the developer to specify an underlying type, associate a specific value of the underlying type with each enumerator, and convert @@ -469,3 +549,123 @@ making it mandatory would be poor ergonomics, because developers would be forced to make an up-front decision between the two, rather than relying on a safe default. Furthermore, reserving `open` as a keyword seems even more problematic than reserving `closed`. + +#### Allowing templated pattern functions + +We could remove the restriction that pattern functions can't be templated. This +would allow defining something like `std::any` as a choice type: + +``` +choice Any { + pattern Value[Type:$$ T](T value); +} +``` + +The problem is that there's no bound on the amount of storage that an instance +of this type would require, so the compiler-generated code would have to +allocate storage on the heap, and decide whether to apply a small-buffer +optimization, and if so what size threshold to use. This would make the +performance of choice types far less predictable, contrary to Carbon's +performance goals, and would have little offsetting benefit: these sorts of +types appear to be rare, and when needed they should be implemented in library +code, where the performance tradeoffs are explicit and under programmer control. + +> TODO: we should ensure that such a library type can support pattern matching. +> The primary challenge is that the factory function will need to allocate heap +> memory, so we will need to ensure that pattern functions are permitted do so. +> Resolution of this issue must await a design for heap allocation. + +#### Make `choice` an element of a `struct` + +Instead of introducing `choice` as a new kind of type declaration, we could +instead treat it as an optional component of a `struct` declaration. For +example, with this approach `Result(T, Error)` would be defined as: + +``` +struct Result(Type:$$ T, Type:$$ Error) { + choice { + pattern Success(T: value) -> Self; + pattern Failure(Error: error) -> Self; + var Self:$$ Cancelled; + } +} +``` + +This approach is somewhat more flexible, because it would permit the type owner +to give `Result` additional member functions by placing them outside the +`choice` block. However, it still wouldn't be possible to give `Result` +additional data members, because the alternatives would have no way to +initialize them, so structs containing a `choice` block would be starkly +different from ones that don't. This approach would also be more verbose, +especially vertically, which will be especially noticeable since choice types +will probably be quite small in the common case. + +A variant of this approach would be to allow `choice` declarations to be either +named, in which case they behave as currently proposed, or anonymous, in which +case they must appear inside a struct, and behave as described here. This would +address the increased verbosity by limiting it to the cases where there's an +offsetting benefit. However, it would make the language more ambiguous, since a +`choice` declaration would have a substantially different meaning depending on +whether it's followed by an identifier. + +#### More concise syntax for alternatives + +We could introduce a special syntax for declaring alternatives, rather than +using the existing syntaxes for pattern functions and static variables. Such a +syntax could be substantially more concise, because it could omit the components +that are just boilerplate in the context of a choice type. For example, the +definition of `Result(T, Error)` might look like this: + +``` +choice Result(Type:$$ T, Type:$$ Error) { + alt Success(T); + alt Failure(Error); + alt Cancelled; +} +``` + +However, this brevity would come at the cost of consistency: there would now be +two structurally different syntaxes for declaring functions, and two +structurally different syntaxes for declaring static constants. + +## Alternatives considered + +### Indexing by type + +Rather than requiring each alternative to have a distinct name (or at least a +distinct function signature), we could pursue a design that requires each +alternative to have a distinct type. With this approach, which I'll call +"type-indexed" as opposed to "name-indexed", Carbon sum types would much more +closely resemble C++'s `std::variant`, rather than Swift and Rust's `enum` or +the sum types of various functional programming languages. + +Either approach can be emulated in terms of the other: the `Variant` example +above shows how we can use overloading to emulate type-indexing in our +name-indexed framework, and conversely a type-indexed type like `std::variant` +can model a name-indexed type like `Result(T,E)` by introducing a wrapper type +for each name, leading to something like `std::variant, Error>` +(note that `std::variant` would not work, because `T` and `E` can be the +same type). In either case, emulating the other model introduces some syntactic +overhead: with name-indexing, `Variant`'s factory functions must be given a name +(`Value`) even though it doesn't really convey any information, and emulating +`Result(T,E)` in terms of type-indexing requires separately defining the wrapper +templates `Value` and `Error`. + +The distinction between these two models of sum types seems analogous the +distinction between the tuple and struct models of product types. Tuples and +type-indexed sum types treat the data structurally, in terms of types and +positional indices, but structs and name-indexed sum types require the +components of the data to have names, which contributes to both readability and +type-safety by attaching higher-level semantics to the data. + +It is possible that both models of sum types could coexist in Carbon, just as +structs and tuples do. However, that seems unlikely to be a good idea: the +coexistence of tuples and structs is necessitated by the fact that it is quite +difficult to emulate either of them in terms of the other in a type-safe way, +but as we've seen, it's fairly straightforward to emulate either model of sum +types in terms of the other. + +Use cases that work best with type-indexing appear to be quite rare, just as use +cases for tuples appear to be quite rare compared to use cases for structs. +Consequently, if Carbon has only one form of sum types, it should probably be +the name-indexed form, as proposed here. From e536993fac316d0cec1194cac4008d68aec1400f Mon Sep 17 00:00:00 2001 From: Geoffrey Romer Date: Fri, 11 Sep 2020 16:45:33 -0700 Subject: [PATCH 04/28] Minor cleanup of FIXMEs --- proposals/p0157.md | 10 ++++------ 1 file changed, 4 insertions(+), 6 deletions(-) diff --git a/proposals/p0157.md b/proposals/p0157.md index f93bf271bc95d..f36e7c152fe4f 100644 --- a/proposals/p0157.md +++ b/proposals/p0157.md @@ -327,10 +327,10 @@ until after we have successfully matched `.discriminator` with `Discriminator.IsValue`. Otherwise, we risk trying to access a `T` object that isn't actually present in `.storage`. -FIXME: How do we constrain evaluation to accomplish this? We've discussed -requiring that any subpatterns without placeholders be evaluated before any -subpatterns with placeholders, but that seems ad-hoc; it wouldn't protect a -pattern like `(.discriminator = Discriminator.IsValue, .storage = 1)`. +> FIXME: How do we constrain evaluation to accomplish this? We've discussed +> requiring that any subpatterns without placeholders be evaluated before any +> subpatterns with placeholders, but that seems ad-hoc; it wouldn't protect a +> pattern like `(.discriminator = Discriminator.IsValue, .storage = 1)`. It's also worth noting some pre-existing requirements for pattern matching, which will inform the design of pattern functions. These requirements are highly @@ -367,8 +367,6 @@ they will probably need to generate very different code depending on the structure of their arguments, particularly if the arguments contain placeholders. -FIXME: anything else to say about pattern functions? - ### Alternatives considered The substitution principle implies that pattern matching can be thought of as From 7783cb6d5bf05c21f4431df98d3d38e44408d001 Mon Sep 17 00:00:00 2001 From: Geoffrey Romer Date: Tue, 15 Sep 2020 14:59:45 -0700 Subject: [PATCH 05/28] Respond to reviewer comments. --- proposals/p0157.md | 64 +++++++++++++++++++++++++++++++++------------- 1 file changed, 46 insertions(+), 18 deletions(-) diff --git a/proposals/p0157.md b/proposals/p0157.md index f36e7c152fe4f..8049307a46aa9 100644 --- a/proposals/p0157.md +++ b/proposals/p0157.md @@ -19,6 +19,8 @@ SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception - [Pattern functions](#pattern-functions) - [Alternatives considered](#alternatives-considered) - [`StorageArray`](#storagearray) + - [Initialization](#initialization) + - [Special member functions](#special-member-functions) - [`choice`](#choice) - [Alternatives considered](#alternatives-considered-1) - [Separate support for enumerated types](#separate-support-for-enumerated-types) @@ -219,7 +221,8 @@ matching. To address the lack of manual lifetime control and storage sharing, I propose introducing a new fundamental type `StorageArray`, which represents a contiguous -region of raw memory. +region of raw memory, as well as `create` and `destroy` operations for creating +and destroying objects within that region. To address the last two deficiencies, I propose introducing `choice` as a convenient syntax for defining a sum type by specifying the set of possible @@ -252,6 +255,9 @@ struct Result(Type:$$ T, Type:$$ Error) { } var Self:$$ Cancelled = (.discriminator = Discriminator.IsCancelled); + + // Copy, move, assign, destroy, and similar operations need to be defined + // explicitly, but are omitted for brevity. } ``` @@ -399,19 +405,22 @@ would work. ## `StorageArray` This approach to sum types imposes relatively few requirements on the type used -to implement shared storage. I'm proposing an untyped byte buffer here because -it's more general, but union types along the lines of proposal +to implement shared storage, and so this proposal doesn't describe that type in +much detail. I'm proposing an untyped byte buffer here because it's more +general, but union types along the lines of proposal [0139](https://github.com/carbon-language/carbon-lang/pull/139) would work just as well. -However, the one major requirement we do have is a tricky one: it must be -possible to initialize an instance of the shared storage type from an instance -of any of the values that it can represent, and to do so inside a pattern or -pattern function. This proposal uses the obvious assignment-style syntax for -that, so in the above example we use `.storage = value` to initialize `.storage` -to hold the value `value`. However, this implies that there is something like an -implicit conversion to `StorageArray(N)` from _any_ type whose size is at most -`N`, and such broad implicit conversions often cause substantial maintenance and +### Initialization + +The most challenging requirement is that it must be possible to initialize an +instance of the shared storage type from an instance of any of the values that +it can represent, and to do so inside a pattern or pattern function. This +proposal uses the obvious assignment-style syntax for that, so in the above +example we use `.storage = value` to initialize `.storage` to hold the value +`value`. However, this implies that there is something like an implicit +conversion to `StorageArray(N)` from _any_ type whose size is at most `N`, and +such broad implicit conversions often cause substantial maintenance and readability problems, at least in C++. As an alternative, we could require the conversion to be explicit, making the @@ -445,6 +454,24 @@ forward and reverse functions would avoid this whole problem, by allowing us to express storing the alternative in terms of procedural code (such as some equivalent of C++'s placement `new`) rather than initialization. +### Special member functions + +`StorageArray(N)` can't keep track of whether it currently holds any objects, or +the types or offsets of those objects, because that would require it to maintain +additional hidden storage, and a major goal of this design is to give the +developer explicit control of the object representation. Consequently, it can't +be copyable, movable, or assignable, because it has no way of knowing how to +perform those operations safely. Similarly, although it must be destructible in +order to be usable as a field type, it will not be safe to destroy until user +code has manually destroyed any objects it contains. In short, user-defined sum +types will not be able to rely on compiler-generated defaults for any special +member functions. + +In order for user code to define those functions manually, the design for +`StorageArray(N)` will need to include operations for creating and destroying +objects within it, analogous to placement-new and pseudo-destructor calls in +C++. + ## `choice` A choice type definition has the same general form as a struct definition, but a @@ -641,13 +668,14 @@ Either approach can be emulated in terms of the other: the `Variant` example above shows how we can use overloading to emulate type-indexing in our name-indexed framework, and conversely a type-indexed type like `std::variant` can model a name-indexed type like `Result(T,E)` by introducing a wrapper type -for each name, leading to something like `std::variant, Error>` -(note that `std::variant` would not work, because `T` and `E` can be the -same type). In either case, emulating the other model introduces some syntactic -overhead: with name-indexing, `Variant`'s factory functions must be given a name -(`Value`) even though it doesn't really convey any information, and emulating -`Result(T,E)` in terms of type-indexing requires separately defining the wrapper -templates `Value` and `Error`. +for each name, leading to something like +`std::variant, Error, Cancelled>` (note that `std::variant` +would not work, because `T` and `E` can be the same type). In either case, +emulating the other model introduces some syntactic overhead: with +name-indexing, `Variant`'s factory functions must be given a name (`Value`) even +though it doesn't really convey any information, and emulating `Result(T,E)` in +terms of type-indexing requires separately defining the wrapper templates +`Value` and `Error`. The distinction between these two models of sum types seems analogous the distinction between the tuple and struct models of product types. Tuples and From 6cab1d9e2f475ad8ba2591da690b67eb49af7fd5 Mon Sep 17 00:00:00 2001 From: Geoffrey Romer Date: Fri, 25 Sep 2020 13:22:47 -0700 Subject: [PATCH 06/28] Switch proposal to make conversion to `StorageArray` explicit, and consequently switch to using `Array(Storage)` instead of `StorageArray` --- proposals/p0157.md | 161 ++++++++++++++++++++++++++------------------- 1 file changed, 92 insertions(+), 69 deletions(-) diff --git a/proposals/p0157.md b/proposals/p0157.md index 8049307a46aa9..0a6bdc60c4a8c 100644 --- a/proposals/p0157.md +++ b/proposals/p0157.md @@ -18,18 +18,18 @@ SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception - [Pattern matching](#pattern-matching) - [Pattern functions](#pattern-functions) - [Alternatives considered](#alternatives-considered) -- [`StorageArray`](#storagearray) +- [Shareable storage](#shareable-storage) - [Initialization](#initialization) - - [Special member functions](#special-member-functions) + - [Alternatives considered](#alternatives-considered-1) - [`choice`](#choice) - - [Alternatives considered](#alternatives-considered-1) + - [Alternatives considered](#alternatives-considered-2) - [Separate support for enumerated types](#separate-support-for-enumerated-types) - [Different spelling for `choice`](#different-spelling-for-choice) - [Different syntax for `closed`](#different-syntax-for-closed) - [Allowing templated pattern functions](#allowing-templated-pattern-functions) - [Make `choice` an element of a `struct`](#make-choice-an-element-of-a-struct) - [More concise syntax for alternatives](#more-concise-syntax-for-alternatives) -- [Alternatives considered](#alternatives-considered-2) +- [Alternatives considered](#alternatives-considered-3) - [Indexing by type](#indexing-by-type) @@ -220,9 +220,11 @@ things like encapsulate hidden implementation details of the object they're matching. To address the lack of manual lifetime control and storage sharing, I propose -introducing a new fundamental type `StorageArray`, which represents a contiguous -region of raw memory, as well as `create` and `destroy` operations for creating -and destroying objects within that region. +introducing a new fundamental type `Storage`, which represents a byte of untyped +memory that may or may not be part of the underlying storage of some object, as +well as `create` and `destroy` operations for creating and destroying objects +within a span of `Storage`, and a `to_storage` operation for obtaining the +object representation of a value. To address the last two deficiencies, I propose introducing `choice` as a convenient syntax for defining a sum type by specifying the set of possible @@ -244,14 +246,16 @@ struct Result(Type:$$ T, Type:$$ Error) { var Self:$$ IsCancelled; } var Discriminator: discriminator; - var StorageArray(Max(Sizeof(T), Sizeof(E))): storage; + var Array(Storage, Max(Sizeof(T), Sizeof(E))): storage; pattern Success(T: value) -> Result(T, Error) { - return (.discriminator = Discriminator.IsValue, .storage = value); + return (.discriminator = Discriminator.IsValue, + .storage = to_storage(value)); } pattern Failure(Error: error) -> Result(T, Error) { - return (.discriminator = Discriminator.IsError, .storage = error); + return (.discriminator = Discriminator.IsError, + .storage = to_storage(error)); } var Self:$$ Cancelled = (.discriminator = Discriminator.IsCancelled); @@ -402,75 +406,94 @@ compiler enough information to do those things, so the developer would need to supply additional information in some way, and it's not at all clear how that would work. -## `StorageArray` +## Shareable storage -This approach to sum types imposes relatively few requirements on the type used -to implement shared storage, and so this proposal doesn't describe that type in -much detail. I'm proposing an untyped byte buffer here because it's more -general, but union types along the lines of proposal +This approach to sum types imposes relatively few requirements on the language +features used to implement shareable storage (meaning, storage that can be +inhabited by different objects at different times), and so this proposal doesn't +describe them in much detail. I'm proposing an untyped byte buffer here because +it's more general, but union types along the lines of proposal [0139](https://github.com/carbon-language/carbon-lang/pull/139) would work just as well. +Regardless of the form that shareable storage takes, it won't be able to +intrinsically keep track of whether it currently holds any objects, or the types +or offsets of those objects, because that would require it to maintain +additional hidden storage, and a major goal of this design is to give the +developer explicit control of the object representation. Consequently, it is not +safe to copy, move, assign to, or destroy shareable storage unless it is known +not to be inhabited by an object. + +This means the compiler will not be able to generate safe default +implementations for any special member functions of types that include shareable +storage. In order for user code to define those functions manually, the design +for shared storage will need to include operations for creating and destroying +objects on top of it, analogous to placement-`new` and pseudo-destructor calls +in C++. + ### Initialization -The most challenging requirement is that it must be possible to initialize an -instance of the shared storage type from an instance of any of the values that -it can represent, and to do so inside a pattern or pattern function. This -proposal uses the obvious assignment-style syntax for that, so in the above -example we use `.storage = value` to initialize `.storage` to hold the value -`value`. However, this implies that there is something like an implicit -conversion to `StorageArray(N)` from _any_ type whose size is at most `N`, and -such broad implicit conversions often cause substantial maintenance and -readability problems, at least in C++. - -As an alternative, we could require the conversion to be explicit, making the -syntax something like -`.storage = StorageArray(Max(Sizeof(T), Sizeof(E)))(value)`. However, this -requires repeating the size parameter, which is logically redundant and -syntactically somewhat annoying (although an alias could help). We could split -the difference and define a factory function whose return type implicitly -converts to `StorageArray(N)` for any sufficiently large `N`, making the example -look like `.storage = MakeStorageArray(value)`, but that only works if Carbon -provides something equivalent to C++'s "guaranteed RVO", because `T` may not be -movable and `StorageArray(N)` definitely won't (and that guaranteed RVO needs to -work even though the right and left sides of the initialization have different -types). - -Either of those alternatives would also be fairly misleading from a readability -perspective, because they look like function or constructor calls, but they -can't actually _be_ function or constructor calls. Recall that this syntax needs -to be usable inside patterns, so these functions would need to be pattern -functions, which means they would need to consist of a single expression that -initializes the `StorageArray(N)`, and the whole problem we're trying to solve -here is to make it possible to write such an expression. +The most challenging requirement that this approach imposes on the design of +shared storage is that it must be possible to initialize it from an instance of +any of the values that it can represent, and to do so inside a pattern or +pattern function. + +This proposal represents that initialization using a `ToStorage` function call, +such as `.storage = ToStorage(success)` in the example above. However, the +semantics of this code may be somewhat surprising: it doesn't merely copy the +underlying bytes of `success` into `.storage`, it actually creates a new `T` +object within `.storage`, which is directly initialized from `success`. This +assumes that Carbon has some equivalent to C++'s "guaranteed RVO"; this code +cannot be understood as creating a temporary `Storage` array representing a `T` +value and then moving it into `.storage`, because it is not safe to move an +inhabited `Storage` storage array, as discussed above. + +`ToStorage` is presented here as a function, but it can't actually be +implemented as a function within the Carbon language. Recall that our motivating +use cases involve invoking `ToStorage` inside pattern functions, so it needs to +be a pattern function itself, assuming it's a function at all. If it were +implemented as Carbon code, it would need to consist of a single expression that +initializes an `Array(Storage)` from an object, and the whole problem we're +trying to solve here is to make it possible to write such an expression. +Consequently, we may wish to give it a different syntactic form. This operation +is in many ways a type cast, so the syntactic choice here will depend heavily on +the syntax of Carbon's other casts. At a deeper level, the problem is that the whole notion of allowing multiple objects to successively share the same storage is inherently procedural (because it involves changes in state over time), but pattern matching is fundamentally -descriptive, and hence functional. - -A design in which user-defined patterns can be expressed in terms of procedural -forward and reverse functions would avoid this whole problem, by allowing us to -express storing the alternative in terms of procedural code (such as some -equivalent of C++'s placement `new`) rather than initialization. - -### Special member functions - -`StorageArray(N)` can't keep track of whether it currently holds any objects, or -the types or offsets of those objects, because that would require it to maintain -additional hidden storage, and a major goal of this design is to give the -developer explicit control of the object representation. Consequently, it can't -be copyable, movable, or assignable, because it has no way of knowing how to -perform those operations safely. Similarly, although it must be destructible in -order to be usable as a field type, it will not be safe to destroy until user -code has manually destroyed any objects it contains. In short, user-defined sum -types will not be able to rely on compiler-generated defaults for any special -member functions. - -In order for user code to define those functions manually, the design for -`StorageArray(N)` will need to include operations for creating and destroying -objects within it, analogous to placement-new and pseudo-destructor calls in -C++. +descriptive, and hence functional. A design in which user-defined patterns can +be expressed in terms of procedural forward and reverse functions would avoid +this whole problem, by allowing us to express storing the alternative in terms +of procedural code (such as the equivalent of C++'s placement-`new`) rather than +initialization. + +#### Alternatives considered + +It is tempting to try to mitigate those problems by making the conversion +implicit, so that the code looks like `.storage = success`. However, this would +mean that an array of `Storage` can be implicitly initialized from a value of +any type, and such extremely broad implicit conversions tend to be highly +problematic. At a minimum, we would probably need to introduce a separate type +`StorageArray`, rather than give special semantics to this one specialization of +`Array`. Furthermore, we will also need the ability to initialize a `Storage` +array with specific byte values, but that will interact awkwardly with the +implicit conversion. For example, suppose the syntax for initializing a +`Storage` array to all-zeros is `.storage = MakeZeroStorageArray(N)`, where the +type of `MakeZeroStorageArray(N)` is `StorageArray(N)`. The universal implicit +conversion would mean that `.storage` is a `StorageArray` object that contains +the representation of _another_ `StorageArray` object, which may have a shorter +lifetime, and in fact may need to be explicitly destroyed before the "outer" +`StorageArray` is. There are various ways to finesse this issue, but they all +involve adding additional special-case rules in order to avoid or mitigate this +consequence of the general rules. + +We don't propose this approach because it doesn't really address the problems +with `ToStorage`; it merely obscures those problems, while introducing a new +problem that would itself require further work to obscure. However, it's worth +noting that the drawbacks of this approach are much less severe if we implement +shared storage using a union rather than an untyped byte array, because the +implicit conversion would not need to be universal. ## `choice` From a42883519a884c6c61ddae998f54f0f4ff0bcb0a Mon Sep 17 00:00:00 2001 From: Geoffrey Romer Date: Thu, 29 Oct 2020 17:01:59 -0700 Subject: [PATCH 07/28] Extensive overhaul of sum types proposal: - Introduce `alternatives` blocks as a way of asserting that a set of alternatives is exhaustive and mutually exclusive. - Largely eliminate discussion of "substitution principle" for pattern matching as a whole, and instead focus on a "mirroring requirement" that applies to pattern functions specifically. Add extensive discussion of the potential drawbacks of this requirement. - Add discussion of sum types that are implemented with sentinel values rather than discriminators. - More comprehensive discussion of evaluation order --- proposals/p0157.md | 864 +++++++++++++++++++++++++++++++++------------ 1 file changed, 632 insertions(+), 232 deletions(-) diff --git a/proposals/p0157.md b/proposals/p0157.md index 0a6bdc60c4a8c..6bad2f6759741 100644 --- a/proposals/p0157.md +++ b/proposals/p0157.md @@ -17,19 +17,32 @@ SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception - [Proposal](#proposal) - [Pattern matching](#pattern-matching) - [Pattern functions](#pattern-functions) + - [Constraints on function evaluation](#constraints-on-function-evaluation) + - [Constraints on pattern matching](#constraints-on-pattern-matching) + - [Underdetermined patterns](#underdetermined-patterns) + - [Match guards](#match-guards) + - [Subpattern bindings](#subpattern-bindings) + - [Pattern-specific sugar syntaxes](#pattern-specific-sugar-syntaxes) + - [Avoidable inconsistencies](#avoidable-inconsistencies) - [Alternatives considered](#alternatives-considered) + - [Explicit inverses](#explicit-inverses) + - [Make alternatives be pattern functions implicitly](#make-alternatives-be-pattern-functions-implicitly) - [Shareable storage](#shareable-storage) - [Initialization](#initialization) - [Alternatives considered](#alternatives-considered-1) -- [`choice`](#choice) + - [Pattern matching evaluation order](#pattern-matching-evaluation-order) +- [The `alternatives` block](#the-alternatives-block) - [Alternatives considered](#alternatives-considered-2) + - [More concise syntax](#more-concise-syntax) + - [Marking alternatives individually](#marking-alternatives-individually) + - [Different syntax for `closed`](#different-syntax-for-closed) +- [`choice`](#choice) + - [Alternatives considered](#alternatives-considered-3) - [Separate support for enumerated types](#separate-support-for-enumerated-types) - [Different spelling for `choice`](#different-spelling-for-choice) - - [Different syntax for `closed`](#different-syntax-for-closed) - [Allowing templated pattern functions](#allowing-templated-pattern-functions) - - [Make `choice` an element of a `struct`](#make-choice-an-element-of-a-struct) - - [More concise syntax for alternatives](#more-concise-syntax-for-alternatives) -- [Alternatives considered](#alternatives-considered-3) + - [Extend `alternatives` instead](#extend-alternatives-instead) +- [Alternatives considered](#alternatives-considered-4) - [Indexing by type](#indexing-by-type) @@ -68,10 +81,15 @@ Furthermore, it needs to be possible for type owners to customize the representations of these types. For example, sum types usually need a "discriminator" field to indicate which alternative is present, but since it typically has very few possible values, it can often be packed into padding, or -even the low-order bits of a pointer. This sort of customization inherently -creates a risk of implementation bugs that break type safety, but it must be -possible for correctly-implemented customizations to avoid changing the API, and -hence avoid weakening the static safety guarantees for users. +even the low-order bits of a pointer. Other sum types avoid an explicit +discriminator, and instead reserve certain values to indicate separate +alternatives. For example, a typical C-style pointer can be thought of as an +optional type, with a special null value indicating that no pointer is present, +because the platform guarantees that the null byte pattern is never the +representation of a valid pointer. This sort of customization inherently creates +a risk of implementation bugs that break type safety, but it must be possible +for correctly-implemented customizations to avoid changing the API, and hence +avoid weakening the static safety guarantees for users. ## Background @@ -153,7 +171,7 @@ fn GetIntFromUser() -> Int { } ``` -However, this code suffers from a number of serious deficiencies: +However, this code has several serious deficiencies: - The implementation details of `Result` are not encapsulated. This makes the `Result` API unsafe: nothing prevents client code from accessing `.value` @@ -171,17 +189,15 @@ However, this code suffers from a number of serious deficiencies: it when `Result`s are used. So, for example, the `match` must have a `default` case in order for the compiler and other tools to consider it exhaustive, even though that default case should never be entered. - Furthermore, the type of `.discriminator` is somewhat misleading: its values - are not numbers, but only symbolic tags. It will never make sense to perform - arithmetic on it, for example. -- The definition of `Result` is largely boilerplate. Conceptually, the only - information needed to specify this type is the names and parameter types of - the two factory functions, plus the fact that every possible value of - `Result` is uniquely described by the name and parameter values of a call to - one of those two functions. Given that information, the compiler could - easily generate the rest of the struct definition. This generated - implementation may not always be as efficient as a hand-coded one could be, - but in a lot of cases that may not matter. + +It's also worth noting that the definition of `Result` is largely boilerplate. +Conceptually, the only information needed to specify this type is the names and +parameter types of the two factory functions, plus the fact that every possible +value of `Result` is uniquely described by the name and parameter values of a +call to one of those two functions. Given that information, the compiler could +easily generate the rest of the struct definition. This generated implementation +may not always be as efficient as a hand-coded one could be, but in a lot of +cases that may not matter. This code arguably exhibits another problem as well: the `return` statements in `ParseAsInt` are quite verbose, due to the need to explicitly qualify the @@ -193,72 +209,70 @@ the design of sum types. ## Proposal -To summarize, the previous section identified four deficiencies in Carbon, which -together prevent it from adequately supporting sum types: +To summarize, the previous section identified three missing features in Carbon, +which together prevent it from adequately supporting sum types: - There is no way for pattern matching to operate through an encapsulation boundary. - There is no way to manually control the lifetimes of subobjects, or enable them to share storage. -- There is no way to define an enumerated type. -- There is no way to define a sum type without micromanaging the details of - its representation. - -I propose supporting sum types by introducing three separate language features -to supply the missing functionality. These features are largely independent and -orthogonal, so their detailed design will be addressed in separate proposals. -This proposal merely establishes the overall design direction for sum types, in -the same way that [p0083](p0083.md) established the overall design direction for -the language as a whole. - -To address the lack of encapsulation support in pattern matching, I propose -introducing the concept of a _pattern function_, which is a function that can be -invoked as part of a pattern, even with arguments that contain placeholders. -Pattern functions can only contain the sort of code that could appear directly -in a pattern, but they let us define reusable pattern syntaxes that can do -things like encapsulate hidden implementation details of the object they're -matching. - -To address the lack of manual lifetime control and storage sharing, I propose -introducing a new fundamental type `Storage`, which represents a byte of untyped -memory that may or may not be part of the underlying storage of some object, as -well as `create` and `destroy` operations for creating and destroying objects -within a span of `Storage`, and a `to_storage` operation for obtaining the -object representation of a value. - -To address the last two deficiencies, I propose introducing `choice` as a -convenient syntax for defining a sum type by specifying the set of possible -alternatives. The alternatives can be represented by factory functions like -`Success` and `Failure`, or singleton constants like `Cancelled`. In either -case, the definitions are omitted, because they are supplied by the compiler. -Choice types whose alternatives are all constants behave as enumerated types. A -choice type can be marked `closed`, which indicates that client code can assume -no new alternatives will be added in the future. +- There is no way for a type to specify that a given set of patterns is + exhaustive. + +I propose supporting sum types by introducing three language features to supply +the missing functionality, as well as a sugar syntax for defining a sum type +without micromanaging the implementation details. These features are largely +separable, although there are some dependencies between them, so their detailed +design will be addressed in future proposals, and the details discussed here +should be considered provisional. This proposal merely establishes the overall +design direction for sum types, in the same way that [p0083](p0083.md) +established the overall design direction for the language as a whole. + +To support encapsulation in pattern matching, I propose introducing the concept +of a _pattern function_, which is a function that can be invoked as part of a +pattern, even with arguments that contain placeholders. Pattern functions can +only contain the sort of code that could appear directly in a pattern, but they +let us define reusable pattern syntaxes that can do things like encapsulate +hidden implementation details of the object they're matching. + +To support manual lifetime control and storage sharing, I propose introducing a +new fundamental type `Storage`, which represents a byte of untyped memory that +may or may not be part of the underlying storage of some object, as well as +`create` and `destroy` operations for creating and destroying objects within a +span of `Storage`, and a `ToStorage` operation for obtaining the object +representation of a value. + +To allow types to specify an exhaustive set of patterns, as well as to solve +certain problems with accessing untyped memory in a pattern function, I propose +introducing `alternatives { ... }` as a grouping construct within a struct +definition. The elements of an `alternatives` block are pattern functions whose +return type is the enclosing struct type, and they are required to be both +exhaustive and unambiguous, meaning that for any possible value of the type, it +must be possible to obtain it as the return value of exactly one of the +alternatives. Alternatives that take no arguments, which represent singleton +states such as `Cancelled`, can instead be written as static constants. An +`alternatives` block can be marked `closed`, which indicates that client code +can assume no new alternatives will be added in the future. Using these features, the `Result(T, Error)` example can be rewritten as follows: ``` struct Result(Type:$$ T, Type:$$ Error) { - closed choice Discriminator { - var Self:$$ IsValue; - var Self:$$ IsError; - var Self:$$ IsCancelled; - } - var Discriminator: discriminator; + var Int: discriminator; var Array(Storage, Max(Sizeof(T), Sizeof(E))): storage; - pattern Success(T: value) -> Result(T, Error) { - return (.discriminator = Discriminator.IsValue, - .storage = to_storage(value)); - } + closed alternatives { + pattern Success(T: value) -> Self { + return (.discriminator = 0, .storage = ToStorage(value)); + } - pattern Failure(Error: error) -> Result(T, Error) { - return (.discriminator = Discriminator.IsError, - .storage = to_storage(error)); - } + pattern Failure(Error: error) -> Self { + return (.discriminator = 1, .storage = ToStorage(error)); + } - var Self:$$ Cancelled = (.discriminator = Discriminator.IsCancelled); + var Self:$$ Cancelled = (.discriminator = 2); + } // Copy, move, assign, destroy, and similar operations need to be defined // explicitly, but are omitted for brevity. @@ -288,8 +302,15 @@ fn GetIntFromUser() -> Int { } ``` -Since our implementation of `Result` doesn't benefit from having direct control -of the object representation, we could instead define it more concisely like so: +To allow users to define sum types without micromanaging the implementation +details, I propose introducing `choice` as a convenient syntax for defining a +sum type by specifying only the declarations of the set of alternatives. From +that information, the compiler generates an appropriate object representation, +and synthesizes definitions for the alternatives and special member functions. + +Our manual implementation of `Result` doesn't really benefit from having direct +control of the object representation, and doesn't seem to have any additional +API surfaces, so it's well-suited to being defined as a choice type instead: ``` closed choice Result(Type:$$ T, Type:$$ Error) { @@ -311,41 +332,25 @@ choice Variant(Type:$$ T1, Type:$$ T2, Type:$$ T3) { } ``` +As an example of a case where manual control of the representation is useful, +here's an example of a type that implements an optional pointer value without +requiring additional storage for a discriminator: + +``` +struct OptionalPtr(Type:$$ T) { + var Array(Storage, Sizeof(Ptr(T))): storage; + + alternatives { + pattern Value(Ptr(T): ptr) -> Self { return (.storage = ToStorage(ptr)); } + var Self:$$ Null = (.storage = 0); + } +} +``` + ## Pattern matching -This proposal presupposes that pattern matching will be built around the -following basic intuition, which I will call the "substitution principle": - -> A pattern is an expression in which zero or more terms have been replaced with -> variable binding declarations (`_` can be considered an anonymous binding -> declaration). If the pattern matches a given value, evaluating that expression -> by substituting the bound values back into the pattern will yield that value. - -In particular, this implies that any operation that can be used in the body of a -pattern can also be used in an expression. This is not strictly a requirement of -this proposal, but to the extent that Carbon deviates from this principle, this -proposal may fit less well. In particular, if we want to support user-defined -operations that can appear only in patterns, we would need a customization -mechanism that's separate from pattern functions as described in this proposal, -and we may not want to have two separate customization mechanisms. - -This proposal requires us to impose some sort of evaluation-order constraint on -pattern matching. For example, when matching `r` with the pattern -`.Success(T: value)` in the example above, we must be able to guarantee that we -do not recursively attempt to match the contents of `.storage` with `value` -until after we have successfully matched `.discriminator` with -`Discriminator.IsValue`. Otherwise, we risk trying to access a `T` object that -isn't actually present in `.storage`. - -> FIXME: How do we constrain evaluation to accomplish this? We've discussed -> requiring that any subpatterns without placeholders be evaluated before any -> subpatterns with placeholders, but that seems ad-hoc; it wouldn't protect a -> pattern like `(.discriminator = Discriminator.IsValue, .storage = 1)`. - -It's also worth noting some pre-existing requirements for pattern matching, -which will inform the design of pattern functions. These requirements are highly -desirable in any pattern matching system, but they are hard requirements if we -want to treat C++-like overload resolution as a form of pattern matching: +This proposal presupposes the following requirements for pattern matching, which +will inform the design of pattern functions: - Carbon pattern matching must not only be able to determine whether a pattern matches a value, and bind values to its variables, but also be able to @@ -365,6 +370,10 @@ want to treat C++-like overload resolution as a form of pattern matching: _ambiguous_, meaning that the set contains two or more patterns that are not ordered with respect to each other, and that can match the same value. +These requirements are highly desirable in any pattern matching system, but they +are hard requirements if we want to treat C++-like overload resolution as a form +of pattern matching. + ## Pattern functions Pattern functions have the same declaration syntax as ordinary functions, except @@ -377,28 +386,264 @@ they will probably need to generate very different code depending on the structure of their arguments, particularly if the arguments contain placeholders. +The body of a pattern function must consist of a single `return` statement whose +operand is not only a valid expression, but also would be a valid pattern if any +uses of the function parameters were replaced with variable binding patterns. +Furthermore, the semantics of the expression must mirror the semantics of the +pattern. For example, this is a valid pattern function: + +``` +pattern MakeStruct(Int: x, Int: y) -> MyStruct { + return (.x = x, .y = y); +} +``` + +because `(.x = Int: x, .y = Int: y)` is a valid pattern that is guaranteed to +match the result of any call to `MakeStruct` and set `x` and `y` to that call's +arguments, and conversely any value that matches `(.x = Int: x, .y = Int: y)` is +guaranteed to be equal to `MakeStruct(x, y)`. + +This mirroring requirement ensures that we can run the function both forwards +and backwards, either evaluating the function on some arguments to produce a +return value, or pattern matching on a purported return value to deduce the +arguments that produced it. The major drawback of this approach is how it +constrains the expressiveness of pattern functions, and it's useful to consider +the forward and reverse aspects of that constraint separately. + +### Constraints on function evaluation + +In the forward direction, the mirroring requirement is clearly a very limiting +constraint: most function bodies do not consist of a single return statement, +and most expressions are not valid patterns. So it's reasonable to wonder +whether pattern functions have enough expressive power. So far I've come across +two notable examples of code that we are likely to want to express in pattern +functions, but that may not satisfy the mirroring requirement. + +The first example is that it seems very difficult to allow a pattern function to +express in-place initialization, such what C++'s placement-new does. In this +proposal, I work around that problem with a special case rule that essentially +treats `storage = ToStorage(obj)` as constructing a copy of `obj` in-place in +the untyped array `storage`, but only when initializing `storage`. However, this +is decidedly a hack. See the section on shareable storage below for details on +this issue. + +The second example is memory allocation. We will inevitably want pattern +functions to be able to create, and match, objects that store data on the heap. +In principle this seems feasible: for example, Carbon might have a +`UniquePtr(T)` type like C++'s `std::unique_ptr`, and a `MakeUnique` +operation like C++'s `std::make_unique`, which takes a type and a list of +arguments, allocates an instance of that type, constructed with those arguments, +and returns a `UniquePtr` to that object. We could then use that function in +pattern functions. For example, this is a pattern function that creates/matches +objects of type `IndirectInt`, a hypothetical integer-like type that stores its +data on the heap: + +``` +struct IndirectInt { + var UniquePtr(Int): i_ptr; +} + +pattern IndirectIntOf(Int: i) -> IndirectInt { + return (.i_ptr = MakeUnique(Int, i)); +} +``` + +Note that in order for this to work in pattern matching, we would have to ensure +that two `UniquePtr(T)` objects are considered to be equal to each other when +the underlying `T` objects are equal, rather than when they have equal +addresses. This may be initially surprising for C++ programmers, but it's +internally consistent, and seems desirable for other reasons as well. For +example, it will probably make compiler-generated equality comparisons much more +reliable: when an object has a `std::unique_ptr` member, the actual memory +address it represents is virtually never a meaningful part of the object's +state, but the underlying `T` object often is. + +The primary challenge here is that `MakeUnique` has to both read and mutate some +state that's not part of its arguments, namely the state of the memory +allocator. It's not clear that we can allow pattern functions to have side +effects, or to depend on side inputs, because those operations are likely to be +very difficult to invert. In this case that mutable state has no effect on the +actual value, at least as far as equality comparison is concerned, but it's not +clear whether or how we can take advantage of that. + +In both cases, it seems very possible that we could extend the pattern language +with new primitives that encapsulate the non-invertible part of these +operations, but it's too soon to say what they would look like. + +### Constraints on pattern matching + +In the reverse direction, the mirroring requirement may prevent us from using +certain pattern matching syntaxes inside pattern functions, namely syntaxes that +don't correspond to expression syntaxes. However, that appears to be a tolerable +constraint, at least for the use cases we're interested in here. + +Pattern matching syntaxes in other languages generally don't have many such +syntaxes, but there are a few. The following sections discuss the most notable +examples I've found, and discuss how they affect the viability of this design. + +#### Underdetermined patterns + +One corollary of the mirroring requirement is that the named variable bindings +in the pattern must uniquely determine the value being matched. This precludes +any pattern which "loses information" about the matched value. Most notably, +this means a pattern function cannot contain wildcard patterns, such as `_`, or +Rust's `..` partial wildcard. + +As a more complex example, several languages support combining multiple patterns +disjunctively, yielding a pattern that matches when any of the subpatterns +match. For example, in Rust the pattern `(0, x) | (x, 0)` matches any pair whose +first or second element is `0`, and binds `x` to the other element. Such +patterns lose information about which case successfully matched, and so they +cannot occur in pattern functions. + +In such cases, the problem could be partially worked around by changing the code +to preserve the lost information. For example, the `_` pattern can be replaced +with a named function parameter. Note that although `_` cannot appear in a +pattern function, it can appear as an argument to a pattern function in a +top-level pattern, so the caller can discard that part of the pattern if it's +not useful to them. + +In the case of disjunctive patterns, preserving the lost information would +essentially require a pattern syntax for defining a variable binding that +records which alternative was selected. Or, to state the problem in the mirrored +sense, we need an expression syntax that uses a variable to select which of +several cases to return. Fortunately, Carbon is likely to have such a syntax +anyway, namely `match` expressions. If we also allow `match` in patterns, a +pattern like `(0, x) | (x, 0)` might be written as: + +``` +match c { + case 0 => (0, auto:x) + case 1 => (auto:x, 0) + default => Assert(False) +} +``` + +Note that inside a pattern, the case labels must be expressions, whereas the +match operand and the case bodies can be patterns. Interestingly, this is the +exact opposite of the rule for `match` outside of patterns: in effect, switching +from forward to reverse evaluation reverses the direction of the arrows. + +Inside a pattern function, the wildcard and variable bindings would need to be +replaced with function parameters (thereby making both sides expressions): + +``` +match c { + case 0 => (0, x) + case 1 => (x, 0) + default => Assert(False) +} +``` + +This is evidently more verbose than the Rust version, but that could be +mitigated with some syntax adjustments, such as a more concise way of asserting +that a match is exhaustive. + +The primary difficulty with `match` patterns is that they are prone to +ambiguity. For example, when matching with `(0, 0)`, it's not obvious what value +should be deduced for `c`. However, this problem is not unique to `match` +patterns, but is inherent in any sort of disjunctive patterns. Even though Rust +patterns don't provide an explicit indicator of which case was matched, +suitably-chosen variable bindings can indirectly expose that information. It +appears that Rust chooses the first match, but Carbon could in principle choose +the best match, or require the patterns to be unambiguous. Here again, it's as +if the `=>` arrow has switched directions, because the same ambiguity problem, +and the same possible resolutions, come up on the _left_ side of the arrows when +the `match` isn't in a pattern matching context. + +All that being said, it's not clear how important it is to support disjunctive +patterns (Swift doesn't, for example), so it may be simpler to avoid the issue +by not supporting them. + +#### Match guards + +Several languages support _match guards_, which use a predicate to restrict +pattern matching. For example, in Rust the pattern `Some(x) if x < 5` matches an +`Optional` value, and binds `x` to the `Int` it contains, only if that +value is less than 5. Match guards are always expressions, not patterns -- for +example, a pattern like `Some(x) if _ < 5` would be nonsensical. + +Match guards generally do not satisfy the mirroring requirement; for example, +`Some(x) if x < 5` is not a valid Rust expression. This may be a problem, +because we are likely to want match guards for Carbon, although it's not clear +if there will be important use cases for match guards inside pattern functions. +If there are, a very plausible path forward would be to allow pattern functions +to contain assertions. In the forward direction, these would behave in the +expected way, documenting and optionally enforcing preconditions on the function +call. In the reverse direction, they would function as match guards: + +``` +fn Foo(Int: x) -> Optional(Int) { + Assert(x < 5); + return Some(x); +} +``` + +This does somewhat complicate the mapping between functions and patterns, but +the additional complexity seems quite tolerable. + +Note that some languages provide a way for type authors to specify when two +values are considered to match. For example, in Swift you can overload the `~=` +operator, which returns a boolean indicating whether the left-hand side matches +the right. Both sides of the operator are values rather than patterns, so this +mechanism is effectively syntactic sugar for match guards, or an alternative to +them. + +#### Subpattern bindings + +Some languages support binding an identifier to a subpattern, rather than only +using identifiers as placeholders. This generally requires a separate syntax. +For example, in Rust the syntax `id @ p` matches the same values as the pattern +`p`, but also binds the name `id` to the matched value. The mirroring +requirement would imply that in an expression, `id @ p` evaluates to the value +of `id`, and also asserts that `id` matches `p`. These semantics seem +conceptually straightforward, but awkardly non-orthogonal and not very useful, +so if Carbon has this syntax at all, we could probably safely limit it to +patterns and pattern functions. + +It's interesting to note that we can interpret `p` as a pattern even in the +forward direction (because the primary behavior is supplied by `id`), so we can +probably allow it to contain wildcards. However, it's not clear whether that +will be useful in practice. + +#### Pattern-specific sugar syntaxes + +In Swift, if `x` is a pattern that matches values of type `T`, `x?` is a pattern +that matches values of type `Optional` that contain a value matching `x`. +However, if `x` is an expression of type `T`, `x?` will generally not be a valid +expression, so such patterns don't satisfy the mirroring requirement. However, +`x?` is purely syntactic sugar for `.some(x)`, which does satisfy the mirroring +requirement. I would tentatively recommend that we avoid introducing +pattern-specific sugar syntaxes for simplicity, but even if we do, by definition +it will always be possible to avoid using them in pattern functions. + +#### Avoidable inconsistencies + +Swift has an expression syntax `x as T`, which casts the value `x` to static +type `T`. Swift also has a pattern syntax `x as T`, which matches a +dynamically-typed value if its dynamic type is `T`, and its value matches `x`. +Although syntactically identical, these syntaxes don't mirror each other: in the +pattern form, `x` has static type `T`, whereas the expression form is useful +precisely when `x` does _not_ have static type `T`. For example, the pattern +`0 as Int` does not match the expression `0 as Int`, but does match the +expression `0 as Any`. However, this inconsistency doesn't seem necessary, +especially given that in Carbon, the variable binding syntax includes a +mandatory type, so the Carbon counterpart to Swift's `x as T` will presumably be +something like `T: x` or `T: x as Any`, depending on whether converting `T` to +`Any` requires an explicit cast, and that satisfies the mirroring requirement. + ### Alternatives considered -The substitution principle implies that pattern matching can be thought of as -the inverse of expression evaluation: given the purported value of an -expression, we are trying to work backwards to determine the values for -variables in the expression. This proposal allows users to define custom -patterns in terms of expressions, by restricting their definitions to a narrow -subset of the language that we know how to invert automatically. As an -alternative, we could allow developers to define custom patterns by directly -specifying the code that should be executed during pattern matching. This -obliges them to determine the correct inverse computation themselves, but -permits them to express it in terms of the full Carbon language. - -However, this approach creates the risk that user-defined patterns will not -satisfy the substitution principle, either due to bugs or because developers -want to use pattern matching syntax to express logic that has no counterpart in -expression evaluation. Furthermore, it requires the developer to write code for -both the forward and reverse computations, unless we choose to embrace the -possibility of syntaxes that can only occur in patterns, and so don't satisfy -the substitution principle. - -Furthermore, as discussed above, Carbon pattern matching needs to be able to not +#### Explicit inverses + +Instead of restricting pattern functions to a narrow sublanguage that's usable +in both patterns and expressions, we could allow developers to explicitly +specify the code that should be executed during pattern matching, as well as the +code that should be executed during expression evaluation. This obliges them to +determine the correct inverse computation themselves, but permits them to +express it in terms of the full Carbon language. + +However, as discussed earlier, Carbon pattern matching needs to be able to not only determine whether a pattern matches, but also rank patterns, and determine whether a set of patterns is exhaustive and/or ambiguous. Defining a custom pattern in terms of arbitrary forward and reverse functions doesn't give the @@ -406,6 +651,31 @@ compiler enough information to do those things, so the developer would need to supply additional information in some way, and it's not at all clear how that would work. +Furthermore, this approach opens the door to user-defined pattern syntaxes that +do not correspond to an expression syntax (as with Swift's `?`), or worse, +user-defined syntaxes that can be used in both expressions and patterns, but +have different meanings in the two contexts (as with Swift's `x as T`). Of +course, that may not be a problem -- I find Swift's choices surprising, but I +don't have any evidence that they've caused confusion for users. However, Carbon +will be breaking new ground by allowing user code to define pattern-matching +abstractions, so it seems advisable to impose fairly strict semantic constraints +on them, at least initially. + +#### Make alternatives be pattern functions implicitly + +Every function in an `alternatives` block must be a pattern function, and this +proposal has no need for pattern functions outside of `alternatives` blocks. +That being the case, we could eliminate the `pattern` keyword, and instead +specify that functions within an `alternatives` block have special semantics. + +I have opted not to do so here because I think it's somewhat clearer to mark +individual functions, and because the ability to define a pattern matching +shorthand seems likely to be useful outside of the context of sum types. For +example, proposal [0087](https://github.com/carbon-language/carbon-lang/pull/87) +suggests introducing an `NTuple` function as a built-in primitive in order to +support variadics, but given some plausible extensions, it could probably be +implemented as an ordinary pattern function in library code. + ## Shareable storage This approach to sum types imposes relatively few requirements on the language @@ -425,17 +695,17 @@ safe to copy, move, assign to, or destroy shareable storage unless it is known not to be inhabited by an object. This means the compiler will not be able to generate safe default -implementations for any special member functions of types that include shareable -storage. In order for user code to define those functions manually, the design -for shared storage will need to include operations for creating and destroying -objects on top of it, analogous to placement-`new` and pseudo-destructor calls -in C++. +implementations for any special member functions of types that have shareable +storage members. In order for user code to define those functions manually, the +design for shared storage will need to include operations for creating and +destroying objects within it, analogous to placement-`new` and pseudo-destructor +calls in C++. ### Initialization The most challenging requirement that this approach imposes on the design of -shared storage is that it must be possible to initialize it from an instance of -any of the values that it can represent, and to do so inside a pattern or +shareable storage is that it must be possible to initialize it from an instance +of any of the values that it can represent, and to do so inside a pattern or pattern function. This proposal represents that initialization using a `ToStorage` function call, @@ -495,14 +765,199 @@ noting that the drawbacks of this approach are much less severe if we implement shared storage using a union rather than an untyped byte array, because the implicit conversion would not need to be universal. +### Pattern matching evaluation order + +As discussed above, Carbon's pattern language must be restricted to operations +that the compiler can automatically invert. The inverse of creating an object of +type `T` in shareable storage is reading an object of type `T` out of shareable +storage. However, since shareable storage will not track the types or offsets of +the objects it contains, this inverse operation is safe only if the shared +storage is known to contain an object of type `T` at that offset. + +Consequently, pattern matching evaluation must occur in an order that guarantees +that a suitable object is present before it is loaded from shareable storage. +Correspondingly, the author of the code must structure the pattern match in such +a way that the compiler can find an appropriate evaluation order. + +At least for our motivating use cases, this appears to be intuitively +straightforward. Consider the `Result` example we saw earlier: + +``` +struct Result(Type:$$ T, Type:$$ Error) { + var Int: discriminator; + var Array(Storage, Max(Sizeof(T), Sizeof(E))): storage; + + closed alternatives { + pattern Success(T: value) -> Self { + return (.discriminator = 0, .storage = ToStorage(value)); + } + + pattern Failure(Error: error) -> Self { + return (.discriminator = 1, .storage = ToStorage(error)); + } + + var Self:$$ Cancelled = (.discriminator = 2); + } +} + +... + +match (ParseAsInt(s)) { + case .Success(var Int: value) => { + return value; + } + case .Failure(var String: error) => { + Display(error); + } + case .Cancelled => { + Terminate(); + } +} +``` + +When matching `ParseAsInt(s)` with the `Success` pattern, the compiler can +observe that no other alternative sets `.discriminator` to `0`, so it can safely +load a `T` value out of storage once it has successfully matched +`.discriminator` with `0`. + +The `OptionalPtr` example is somewhat more subtle: + +``` +struct OptionalPtr(Type:$$ T) { + var Array(Storage, Sizeof(Ptr(T))): storage; + + alternatives { + pattern Value(Ptr(T): ptr) -> Self { return (.storage = ToStorage(ptr)); } + var Self:$$ Null = (.storage = 0); + } +} +``` + +In this case, the compiler can observe that `.storage` must hold either a `T` +value or an all-zeros bit pattern, so it can safely attempt to match the +`.Value` case only after it has ruled out the `.Null` case, which it can +evaluate unconditionally because that case only requires loading raw bytes, +which is always safe. Consequently, we may want to require that a pattern match +that looks for `.Value` must also have an explicit `.Null` case (rather than a +generic `default` case), so as to make it clear to the reader that the `.Null` +case is being evaluated. If Carbon specifies that pattern matching evaluates +cases in order, we would presumably also require that the `.Null` case is above +the `.Value` case. + +We will need to formalize the intuition behind those examples, in the form of +concrete language rules that are strict enough for the compiler to feasibly +perform that sort of symbolic reasoning, and yet permissive enough that type +authors can feasibly understand and follow them, and the compiler can produce +intelligible error messages when they are violated. Furthermore, it would be +strongly preferable if the compiler could detect any errors purely by inspecting +the type definition, rather than delaying them until the type is actually used +in pattern matching. + +The design of those rules is deferred to a subsequent proposal. However, one key +requirement is already clear: the compiler must be able to identify all of the +type's alternatives, and know that they exhaustively describe all possible +values of the type. This is a key reason for introducing the `alternatives` +block. For similar reasons, we may also need to introduce some syntax for +marking a single pattern function as being exhaustive (`MakeUnique`, for +example), but such use cases are out of scope for this document because they +don't involve sum types. + +## The `alternatives` block + +An `alternatives` block designates a set of pattern functions and static +constants as representing an exhaustive and unambiguous set of alternatives for +the type. By defining an `alternatives` block, the type author is guaranteeing +that every possible value of the type matches exactly one of the alternatives. +Consequently, the design of this feature will need to determine how Carbon +handles violations of this guarantee without compromsing Carbon's safety goals. + +Although alternatives are always exhaustive as far as the language semantics are +concerned, by default user code will be required to treat them as +non-exhaustive. For example, a `match` statement that has cases for the +alternatives of a sum type will be required to have a `default` case, even if it +also has patterns that match all of the declared alternatives. This ensures that +sum types can be extended with new alternatives without breaking any existing +code. Declaring the alternatives with `closed alternatives` rather than +`alternatives` allows client code to treat the type as exhaustive. + +### Alternatives considered + +#### More concise syntax + +We could introduce a special syntax for declaring alternatives, rather than +using the existing syntaxes for pattern functions and static variables. Such a +syntax could be substantially more concise, because it could omit the components +of a function/variable declaration that are just boilerplate in the context of a +set of alternatives. For example, the alternatives of the `Result(T, Error)` +struct might look like this: + +``` +closed alternatives { + Success(T: value) { ... } + Failure(Error: error) { ... } + Cancelled = (.discriminator = 2); +} +``` + +The same simplifications could be applied to `choice` types, so that for example +the `choice` version of `Result` could look like this: + +``` +choice Result(Type:$$ T, Type:$$ Error) { + Success(T); + Failure(Error); + Cancelled; +} +``` + +However, this brevity would come at the cost of consistency: there would now be +two structurally different syntaxes for declaring functions, and two +structurally different syntaxes for declaring static constants. + +#### Marking alternatives individually + +Rather than making `alternatives` a separate block, we could define a syntax for +marking alternatives individually. If we choose not to support pattern functions +that aren't alternatives, this syntax could be an introducer that takes the +place of `pattern`. This would also synergize well with the option of having a +more concise syntax for alternatives, which otherwise may have problems with the +lack of an introducer. + +The primary drawback of this approach is that the lack of explicit grouping +could make it more difficult for readers (and perhaps even the compiler) to know +when all the alternatives have been enumerated. It would also mean that we can't +support the admittedly unlikely use case of defining a sum type that has +multiple complete sets of alternatives. + +#### Different syntax for `closed` + +The `closed` syntax should be considered little more than a placeholder. It's +somewhat unconventional for a modifier to come before an introducer, as with +`closed alternatives`. Reversing the order would fix that problem, but +`alternatives closed` reads quite awkwardly as English. Making `closed` a +keyword would prevent developers from using `closed` as an identifier, which may +be too high a cost for such a niche use case. We could fix that by making it an +attribute rather than a keyword, but it's not clear that Carbon will have +attributes, much less what the syntax would be. + +It may be surprising that there is no corresponding `open alternatives` syntax, +but `open` would be meaningless syntactic noise unless we made it mandatory, and +making it mandatory would be poor ergonomics, because developers would be forced +to make an up-front decision between the two, rather than relying on a safe +default. Furthermore, reserving `open` as a keyword seems even more problematic +than reserving `closed`. + ## `choice` -A choice type definition has the same general form as a struct definition, but a -choice type can only contain declarations of pattern functions and compile-time -constants (in other words, variables declared with `$$`). The definitions of -those members will be provided by the compiler; they cannot be defined in user -code. The return type of the pattern functions, and the type of the constants, -must be the choice type being declared. +A choice type definition has the same general form as an `alternatives` block, +except that: + +- It has `choice` and the type name in place of `alternatives`. +- It need not (and usually won't) be inside a `struct` definition, because it + defines a new type, rather than specifying part of an enclosing type + definition. +- Its members cannot have definitions, because those definitions will be + provided by the compiler. The pattern functions of a choice type can be overloaded (as in the `Variant` example above), but they cannot be templates. More precisely, the parameter @@ -525,25 +980,9 @@ place of the member types. However, there are a couple of special cases: A future proposal for this mechanism will need to consider whether to require an explicit opt-in to generate these operations. -The compiler-generated definitions of a choice type's members are unspecified, -except that they will satisfy the following two properties: - -- The alternatives are _mutually exclusive_: an enumerator can only compare - equal to copies of itself, and the result of calling a factory function can - only compare equal to the result of calling the same factory function with - the same arguments. -- The alternatives are _exhaustive_: every possible value of the type is equal - to one of the enumerators, or to the result of invoking one of the factory - functions with some set of arguments. - -Although choice types are always exhaustive as far as the language semantics are -concerned, by default user code will be required to treat them as -non-exhaustive. For example, a `match` statement that operates on a value of a -choice type will be required to have a `default` case, even if it also has -patterns that match all of the declared alternatives. This ensures that choice -types can be extended with new alternatives without breaking any existing code. -Declaring a type with `closed choice` rather than `choice` allows client code to -treat the type as exhaustive. +The compiler-generated definitions of a choice type's alternatives are +unspecified, except that they will satisfy the semantic requirements that apply +to all alternatives. ### Alternatives considered @@ -563,10 +1002,6 @@ enumerated types, similar to C++'s `enum`. In particular: but that requires a substantial amount of error-prone boilerplate. Furthermore, those functions can't reliably be no-ops at the hardware level, the way they can be with C++ enums. -- It would allow us to treat choice types as purely syntactic sugar for struct - definitions. We can't quite do that at present because the desugared form - would itself contain a `choice` type, as with the long version of `Result` - above. I am omitting that from this proposal for simplicity, since it's purely additive, and not necessary for the goals of this proposal. @@ -574,29 +1009,10 @@ additive, and not necessary for the goals of this proposal. #### Different spelling for `choice` The Rust and Swift counterparts of `choice` are spelled `enum`. I have avoided -this because these types are not really "enumerated types" in the sense that all -values are explicitly enumerated in the code, except in the special case where -there are no factory functions. I chose the spelling `choice` because "choice -type" is one of the only available synonyms for "sum type" that doesn't have any -potentially-misleading associations. - -#### Different syntax for `closed` - -The `closed` syntax should be considered little more than a placeholder. It's -somewhat unconventional for a modifier to come before an introducer, as with -`closed choice`. Reversing the order would fix that problem, but `choice closed` -reads quite awkwardly as English. Making `closed` a keyword would prevent -developers from using `closed` as an identifier, which may be too high a cost -for such a niche use case. We could fix that by making it an attribute rather -than a keyword, but it's not clear that Carbon will have attributes, much less -what the syntax would be. - -It may be surprising that there is no corresponding `open choice` syntax, but -`open` would be meaningless syntactic noise unless we made it mandatory, and -making it mandatory would be poor ergonomics, because developers would be forced -to make an up-front decision between the two, rather than relying on a safe -default. Furthermore, reserving `open` as a keyword seems even more problematic -than reserving `closed`. +this because these types are not really "enumerated types" in the sense of all +values being explicitly enumerated in the code. I chose the spelling `choice` +because "choice type" is one of the only available synonyms for "sum type" that +doesn't have any potentially-misleading associations. #### Allowing templated pattern functions @@ -618,63 +1034,44 @@ performance goals, and would have little offsetting benefit: these sorts of types appear to be rare, and when needed they should be implemented in library code, where the performance tradeoffs are explicit and under programmer control. -> TODO: we should ensure that such a library type can support pattern matching. -> The primary challenge is that the factory function will need to allocate heap -> memory, so we will need to ensure that pattern functions are permitted do so. -> Resolution of this issue must await a design for heap allocation. - -#### Make `choice` an element of a `struct` +#### Extend `alternatives` instead -Instead of introducing `choice` as a new kind of type declaration, we could -instead treat it as an optional component of a `struct` declaration. For -example, with this approach `Result(T, Error)` would be defined as: +Instead of introducing `choice`, we could extend `alternatives` with a syntax +for requesting that the definitions be synthesized rather than provided by the +user. For example, perhaps the abbreviated definition of `Result(T, Error)` +could be written as: ``` struct Result(Type:$$ T, Type:$$ Error) { - choice { + alternatives { pattern Success(T: value) -> Self; pattern Failure(Error: error) -> Self; var Self:$$ Cancelled; - } + } = default; } ``` This approach is somewhat more flexible, because it would permit the type owner -to give `Result` additional member functions by placing them outside the -`choice` block. However, it still wouldn't be possible to give `Result` -additional data members, because the alternatives would have no way to -initialize them, so structs containing a `choice` block would be starkly -different from ones that don't. This approach would also be more verbose, -especially vertically, which will be especially noticeable since choice types -will probably be quite small in the common case. - -A variant of this approach would be to allow `choice` declarations to be either -named, in which case they behave as currently proposed, or anonymous, in which -case they must appear inside a struct, and behave as described here. This would -address the increased verbosity by limiting it to the cases where there's an -offsetting benefit. However, it would make the language more ambiguous, since a -`choice` declaration would have a substantially different meaning depending on -whether it's followed by an identifier. - -#### More concise syntax for alternatives - -We could introduce a special syntax for declaring alternatives, rather than -using the existing syntaxes for pattern functions and static variables. Such a -syntax could be substantially more concise, because it could omit the components -that are just boilerplate in the context of a choice type. For example, the -definition of `Result(T, Error)` might look like this: - -``` -choice Result(Type:$$ T, Type:$$ Error) { - alt Success(T); - alt Failure(Error); - alt Cancelled; -} -``` - -However, this brevity would come at the cost of consistency: there would now be -two structurally different syntaxes for declaring functions, and two -structurally different syntaxes for declaring static constants. +to give `Result` additional member functions without forcing them to supply all +the boilerplate associated with the fully handwritten type definition. However, +it still wouldn't be possible to give `Result` additional data members, because +the generated code for the alternatives would have no way to initialize them, so +structs containing an `alternatives` block would be starkly different from ones +that don't. On a related note, this syntax would be somewhat misleading, because +the compiler would be synthesizing not only the definitions of the alternatives, +but also the special member functions and data members. + +This approach would also be more verbose, especially vertically, which will be +especially noticeable since these types will probably be quite small in the +common case. + +Alternatively, we could unify the two keywords by eliminating `alternatives` and +using `choice` in its place, with the presence of a name acting to distinguish a +`choice` block from a `choice` type. However, it seems liable to be confusing +for the presence or absence of a name to trigger such stark differences in +semantics. Note also that unlike the previous option, this approach doesn't +allow users to extend the type with additional methods without losing compiler +generation of the alternatives and special member functions. ## Alternatives considered @@ -718,3 +1115,6 @@ Use cases that work best with type-indexing appear to be quite rare, just as use cases for tuples appear to be quite rare compared to use cases for structs. Consequently, if Carbon has only one form of sum types, it should probably be the name-indexed form, as proposed here. + +> TODO: We should consider ways of minimizing or avoiding the burden of +> boilerplate factory names like `Value` for type-indexed use cases. From 2431758fc0fff306f9e1a7a540243ec01bcc0096 Mon Sep 17 00:00:00 2001 From: Geoffrey Romer Date: Fri, 30 Oct 2020 13:03:48 -0700 Subject: [PATCH 08/28] Respond to reviewer feedback. --- proposals/p0157.md | 3 +++ 1 file changed, 3 insertions(+) diff --git a/proposals/p0157.md b/proposals/p0157.md index 6bad2f6759741..54d63a68058fc 100644 --- a/proposals/p0157.md +++ b/proposals/p0157.md @@ -260,6 +260,9 @@ follows: ``` struct Result(Type:$$ T, Type:$$ Error) { var Int: discriminator; + + // This would also need to be properly aligned for `T` and `E`, but + // we don't have syntax for that yet. var Array(Storage, Max(Sizeof(T), Sizeof(E))): storage; closed alternatives { From 4cdb352c45cebd6f715c4877c9ff0abad0c09f4e Mon Sep 17 00:00:00 2001 From: Geoff Romer Date: Mon, 2 Nov 2020 15:59:00 -0800 Subject: [PATCH 09/28] Apply suggestions from code review Co-authored-by: josh11b --- proposals/p0157.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/proposals/p0157.md b/proposals/p0157.md index 54d63a68058fc..6a880ed4810e0 100644 --- a/proposals/p0157.md +++ b/proposals/p0157.md @@ -120,7 +120,7 @@ struct Result(Type:$$ T, Type:$$ Error) { var E: error; fn Success(T: value) -> Result(T, Error) { - return (.discriminator = 0, .value = value, .error = E()); + return (.discriminator = 0, .value = value, .error = Error()); } fn Failure(Error: error) -> Result(T, Error) { @@ -128,7 +128,7 @@ struct Result(Type:$$ T, Type:$$ Error) { } var Result(T, Error):$$ Cancelled = - (.discriminator = 2, .value = T(), .error = E()); + (.discriminator = 2, .value = T(), .error = Error()); } ``` @@ -263,7 +263,7 @@ struct Result(Type:$$ T, Type:$$ Error) { // This would also need to be properly aligned for `T` and `E`, but // we don't have syntax for that yet. - var Array(Storage, Max(Sizeof(T), Sizeof(E))): storage; + var Array(Storage, Max(Sizeof(T), Sizeof(Error))): storage; closed alternatives { pattern Success(T: value) -> Self { From b00255ad9d939734c0617587488c513c44c840ce Mon Sep 17 00:00:00 2001 From: Geoffrey Romer Date: Mon, 2 Nov 2020 17:12:55 -0800 Subject: [PATCH 10/28] Respond to reviewer comments. --- proposals/p0157.md | 105 +++++++++++++++++++++++---------------------- 1 file changed, 53 insertions(+), 52 deletions(-) diff --git a/proposals/p0157.md b/proposals/p0157.md index 6a880ed4810e0..28a2f477510df 100644 --- a/proposals/p0157.md +++ b/proposals/p0157.md @@ -40,7 +40,6 @@ SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception - [Alternatives considered](#alternatives-considered-3) - [Separate support for enumerated types](#separate-support-for-enumerated-types) - [Different spelling for `choice`](#different-spelling-for-choice) - - [Allowing templated pattern functions](#allowing-templated-pattern-functions) - [Extend `alternatives` instead](#extend-alternatives-instead) - [Alternatives considered](#alternatives-considered-4) - [Indexing by type](#indexing-by-type) @@ -432,42 +431,38 @@ this issue. The second example is memory allocation. We will inevitably want pattern functions to be able to create, and match, objects that store data on the heap. -In principle this seems feasible: for example, Carbon might have a -`UniquePtr(T)` type like C++'s `std::unique_ptr`, and a `MakeUnique` -operation like C++'s `std::make_unique`, which takes a type and a list of -arguments, allocates an instance of that type, constructed with those arguments, -and returns a `UniquePtr` to that object. We could then use that function in -pattern functions. For example, this is a pattern function that creates/matches -objects of type `IndirectInt`, a hypothetical integer-like type that stores its -data on the heap: +In principle this seems feasible: for example, Carbon might have a `Box(T)` type +that represents a heap-allocated `T` object, and a `MakeBox` function for +creating a `Box(T)` from a `T` value. We could then use `MakeBox` in pattern +functions. For example, this is a pattern function that creates/matches objects +of type `IndirectInt`, a hypothetical integer-like type that stores its data on +the heap: ``` struct IndirectInt { - var UniquePtr(Int): i_ptr; + var Box(Int): i_ptr; } pattern IndirectIntOf(Int: i) -> IndirectInt { - return (.i_ptr = MakeUnique(Int, i)); + return (.i_ptr = MakeBox(Int, i)); } ``` -Note that in order for this to work in pattern matching, we would have to ensure -that two `UniquePtr(T)` objects are considered to be equal to each other when -the underlying `T` objects are equal, rather than when they have equal -addresses. This may be initially surprising for C++ programmers, but it's -internally consistent, and seems desirable for other reasons as well. For -example, it will probably make compiler-generated equality comparisons much more -reliable: when an object has a `std::unique_ptr` member, the actual memory -address it represents is virtually never a meaningful part of the object's -state, but the underlying `T` object often is. - -The primary challenge here is that `MakeUnique` has to both read and mutate some -state that's not part of its arguments, namely the state of the memory -allocator. It's not clear that we can allow pattern functions to have side -effects, or to depend on side inputs, because those operations are likely to be -very difficult to invert. In this case that mutable state has no effect on the -actual value, at least as far as equality comparison is concerned, but it's not -clear whether or how we can take advantage of that. +For this to work, two `Box(T)` values must be considered equal when the `T` +values they hold compare equal, rather than when they are literally the same `T` +object. This is important not only for pattern matching, but also for situations +like automatically generating a struct's comparison operators. Note that other +than these "deep" equality semantics, `Box(T)` would be very similar to C++'s +`std::unique_ptr`, and would probably play a comparable role in the Carbon +ecosystem. + +The problem here is that `MakeBox` has to both read and mutate some state that's +not part of its arguments, namely the state of the memory allocator. It's not +clear that we can allow pattern functions to have side effects, or to depend on +side inputs, because those operations are likely to be very difficult to invert. +In this case that mutable state has no effect on the actual value, at least as +far as equality comparison is concerned, but it's not clear whether or how we +can take advantage of that. In both cases, it seems very possible that we could extend the pattern language with new primitives that encapsulate the non-invertible part of these @@ -962,10 +957,36 @@ except that: - Its members cannot have definitions, because those definitions will be provided by the compiler. -The pattern functions of a choice type can be overloaded (as in the `Variant` -example above), but they cannot be templates. More precisely, the parameter -types of a pattern function in a choice type can't depend on any of the -arguments. +The expected implementation of a `choice` type will be very similar to the +`struct` version of `Result` shown earlier, with a discriminator field and a +storage buffer large enough to hold the argument values of the alternatives. +This implementation will be hidden, of course, and the compiler may be able to +generate better code, but we will design this feature to support at least that +baseline implementation strategy. + +One consequence is that although the pattern functions of a choice type can be +overloaded (as in the `Variant` example above), they cannot be templates. More +precisely, the parameter types of a pattern function must be fixed without +knowing the values of any of the arguments. To see why, consider a choice type +like the following, which attempts to emulate `std::any`: + +``` +choice Any { + pattern Value[Type:$$ T](T value); +} +``` + +The problem is that since `T` could be any type, and a single `Any` object could +hold values of different types throughout its lifetime, `Any` can't be +implemented using a storage buffer within the `Any` object. Instead, the storage +buffer for the `T` object would have to be allocated on the heap, but then the +compiler would need to decide whether to apply a small buffer optimization, and +if so what size threshold to use, etc. Allowing choice types to be implemented +in terms of heap allocation would make their performance far less predictable, +contrary to Carbon's performance goals, and would have little offsetting +benefit: these sorts of types appear to be rare, and when needed they should be +implemented in library code, where the performance tradeoffs are explicit and +under programmer control. Carbon will probably have some mechanism for allowing a struct to have compiler-generated default implementations of operations such as copy, move, @@ -1017,26 +1038,6 @@ values being explicitly enumerated in the code. I chose the spelling `choice` because "choice type" is one of the only available synonyms for "sum type" that doesn't have any potentially-misleading associations. -#### Allowing templated pattern functions - -We could remove the restriction that pattern functions can't be templated. This -would allow defining something like `std::any` as a choice type: - -``` -choice Any { - pattern Value[Type:$$ T](T value); -} -``` - -The problem is that there's no bound on the amount of storage that an instance -of this type would require, so the compiler-generated code would have to -allocate storage on the heap, and decide whether to apply a small-buffer -optimization, and if so what size threshold to use. This would make the -performance of choice types far less predictable, contrary to Carbon's -performance goals, and would have little offsetting benefit: these sorts of -types appear to be rare, and when needed they should be implemented in library -code, where the performance tradeoffs are explicit and under programmer control. - #### Extend `alternatives` instead Instead of introducing `choice`, we could extend `alternatives` with a syntax From 47d627e4bce12b60bb434d00ddea90ece4631f02 Mon Sep 17 00:00:00 2001 From: Geoffrey Romer Date: Thu, 12 Nov 2020 12:50:36 -0800 Subject: [PATCH 11/28] Switch `Storage` to represent whole buffers, not individual bytes. Also, explicitly discuss some issues that the OptionalPtr example was fudging, and mention the issue of `choice` alternative parameters having incomplete types. --- proposals/p0157.md | 136 ++++++++++++++++++++++++++++++++------------- 1 file changed, 98 insertions(+), 38 deletions(-) diff --git a/proposals/p0157.md b/proposals/p0157.md index 28a2f477510df..84a88d442062f 100644 --- a/proposals/p0157.md +++ b/proposals/p0157.md @@ -24,6 +24,7 @@ SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception - [Subpattern bindings](#subpattern-bindings) - [Pattern-specific sugar syntaxes](#pattern-specific-sugar-syntaxes) - [Avoidable inconsistencies](#avoidable-inconsistencies) + - [Worked example: `ArrayOfZeroes`](#worked-example-arrayofzeroes) - [Alternatives considered](#alternatives-considered) - [Explicit inverses](#explicit-inverses) - [Make alternatives be pattern functions implicitly](#make-alternatives-be-pattern-functions-implicitly) @@ -235,11 +236,9 @@ let us define reusable pattern syntaxes that can do things like encapsulate hidden implementation details of the object they're matching. To support manual lifetime control and storage sharing, I propose introducing a -new fundamental type `Storage`, which represents a byte of untyped memory that -may or may not be part of the underlying storage of some object, as well as -`create` and `destroy` operations for creating and destroying objects within a -span of `Storage`, and a `ToStorage` operation for obtaining the object -representation of a value. +`Storage` type, which represents a fixed-size region of untyped memory and +provides operations for creating and destroying within it, as well as a +`ToStorage` operation for obtaining the object representation of a value. To allow types to specify an exhaustive set of patterns, as well as to solve certain problems with accessing untyped memory in a pattern function, I propose @@ -260,9 +259,8 @@ follows: struct Result(Type:$$ T, Type:$$ Error) { var Int: discriminator; - // This would also need to be properly aligned for `T` and `E`, but - // we don't have syntax for that yet. - var Array(Storage, Max(Sizeof(T), Sizeof(Error))): storage; + var Storage(Max(Sizeof(T), Sizeof(Error)), + Max(Alignof(T), Alignof(Error))): storage; closed alternatives { pattern Success(T: value) -> Self { @@ -340,15 +338,19 @@ requiring additional storage for a discriminator: ``` struct OptionalPtr(Type:$$ T) { - var Array(Storage, Sizeof(Ptr(T))): storage; + var Storage(Sizeof(Ptr(T)), Alignof(Ptr(T))): storage; alternatives { pattern Value(Ptr(T): ptr) -> Self { return (.storage = ToStorage(ptr)); } - var Self:$$ Null = (.storage = 0); + var Self:$$ Null = (.storage = ArrayOfZeros(Sizeof(Ptr(T)))); } } ``` +Note that `ArrayOfZeros` can be defined as a pattern function, but only with +some possibly surprising extensions to Carbon's pattern language. See below for +details. + ## Pattern matching This proposal presupposes the following requirements for pattern matching, which @@ -381,7 +383,9 @@ of pattern matching. Pattern functions have the same declaration syntax as ordinary functions, except that they are introduced with `pattern` rather than `fn`. However, the function body is required to be a single `return` statement whose operand expression is a -valid pattern, with the function parameters acting as variable bindings. +valid pattern, with each use of a function parameter acting as a variable +binding. One consequence of that constraint is that a given function parameter +cannot be accessed more than once when evaluating the pattern function. It seems likely that pattern functions will be required to be inline, because they will probably need to generate very different code depending on the @@ -630,6 +634,56 @@ mandatory type, so the Carbon counterpart to Swift's `x as T` will presumably be something like `T: x` or `T: x as Any`, depending on whether converting `T` to `Any` requires an explicit cast, and that satisfies the mirroring requirement. +### Worked example: `ArrayOfZeroes` + +The `OptionalPtr` example above relies on `ArrayOfZeroes`, a pattern function +that takes an integer `N` and returns an `Array(Byte, N)` whose elements are all +zero. It is possible to implement such a pattern function, but it requires some +extensions to Carbon's pattern language that may be surprising: + +``` +pattern ArrayOfZeroes(UInt:$$ N) -> Array(Byte, N) { + Array(Byte, N)(ListOfZeroes(N)); +} + +pattern ListOfZeroes(UInt:$$ N) -> List(Byte) { + return match (N) { + case 0 => { .Nil } + case (UInt: M)+1 => { .Cons(Byte(0), ListOfZeroes(M)); } + }; +} +``` + +This design requires Carbon to support `match` expressions inside patterns, as +discussed above in the section on underdetermined patterns. More surprisingly, +it requires the ability to use `+1` as a pattern matching operator on unsigned +integers. This is novel in my experience, but I hope the meaning of the code is +relatively clear. Furthermore, although it may seem ad-hoc, it's theoretically +well founded: we're effectively destructuring the unary representation of +counting numbers as `0+1+1+...+1`. In other words, we're treating `UInt` as +though it were a sum type, analogous to + +``` +closed choice UInt { + var Self:$$ Zero; + pattern Successor(Uint) -> Self; +} +``` + +We could even use a named `Successor` function in place of `+1`, if we wish to +avoid uncertainty about what kinds of arithmetic expressions can appear in +patterns. Whatever the syntax, the ability to use `+1` in patterns is a +fundamental building block for creating pattern functions that involve counting. + +This implementation also presupposes a type (called `List(Byte)` here) that can +be used to initialize an `Array`, and that supports pattern matching on the +empty list (spelled `.Nil` here) and on destructuring the first element from a +list (spelled `.Cons(first, rest)` here). These seem like reasonable +requirements for whatever type Carbon uses to represent array literals. If +Carbon uses tuples for that purpose, I expect an approach equivalent to the +above would still work, but I can't directly demonstrate that until we have a +design for variadics. + ### Alternatives considered #### Explicit inverses @@ -721,11 +775,11 @@ implemented as a function within the Carbon language. Recall that our motivating use cases involve invoking `ToStorage` inside pattern functions, so it needs to be a pattern function itself, assuming it's a function at all. If it were implemented as Carbon code, it would need to consist of a single expression that -initializes an `Array(Storage)` from an object, and the whole problem we're -trying to solve here is to make it possible to write such an expression. -Consequently, we may wish to give it a different syntactic form. This operation -is in many ways a type cast, so the syntactic choice here will depend heavily on -the syntax of Carbon's other casts. +initializes a `Storage` from an object, and the whole problem we're trying to +solve here is to make it possible to write such an expression. Consequently, we +may wish to give it a different syntactic form. This operation is in many ways a +type cast, so the syntactic choice here will depend heavily on the syntax of +Carbon's other casts. At a deeper level, the problem is that the whole notion of allowing multiple objects to successively share the same storage is inherently procedural (because @@ -740,21 +794,22 @@ initialization. It is tempting to try to mitigate those problems by making the conversion implicit, so that the code looks like `.storage = success`. However, this would -mean that an array of `Storage` can be implicitly initialized from a value of -any type, and such extremely broad implicit conversions tend to be highly -problematic. At a minimum, we would probably need to introduce a separate type -`StorageArray`, rather than give special semantics to this one specialization of -`Array`. Furthermore, we will also need the ability to initialize a `Storage` -array with specific byte values, but that will interact awkwardly with the -implicit conversion. For example, suppose the syntax for initializing a -`Storage` array to all-zeros is `.storage = MakeZeroStorageArray(N)`, where the -type of `MakeZeroStorageArray(N)` is `StorageArray(N)`. The universal implicit -conversion would mean that `.storage` is a `StorageArray` object that contains -the representation of _another_ `StorageArray` object, which may have a shorter -lifetime, and in fact may need to be explicitly destroyed before the "outer" -`StorageArray` is. There are various ways to finesse this issue, but they all -involve adding additional special-case rules in order to avoid or mitigate this -consequence of the general rules. +mean that `Storage` can be implicitly initialized from a value of any type, and +such extremely broad implicit conversions tend to be highly problematic. For +example, consider the definition of `.Null` in the `OptionalPtr` example: + +``` +var Self:$$ Null = (.storage = ArrayOfZeros(Sizeof(Ptr(T)))); +``` + +If `Storage` were implicitly convertible from any type, and that conversion +caused an object to be created within the `Storage`, this code would no longer +just set the bytes of `.storage` to zero: it would also actually create an +`Array(Byte)` object within `.storage`, which would then need to be subsequently +destroyed before `.storage` is destroyed, or used to store a pointer value. +There are various ways to finesse this issue, but they all involve adding +additional special-case rules in order to avoid or mitigate this consequence of +the general rules. We don't propose this approach because it doesn't really address the problems with `ToStorage`; it merely obscures those problems, while introducing a new @@ -783,7 +838,9 @@ straightforward. Consider the `Result` example we saw earlier: ``` struct Result(Type:$$ T, Type:$$ Error) { var Int: discriminator; - var Array(Storage, Max(Sizeof(T), Sizeof(E))): storage; + + var Storage(Max(Sizeof(T), Sizeof(Error)), + Max(Alignof(T), Alignof(Error))): storage; closed alternatives { pattern Success(T: value) -> Self { @@ -822,11 +879,11 @@ The `OptionalPtr` example is somewhat more subtle: ``` struct OptionalPtr(Type:$$ T) { - var Array(Storage, Sizeof(Ptr(T))): storage; + var Storage(Sizeof(Ptr(T)), Alignof(Ptr(T))): storage; alternatives { pattern Value(Ptr(T): ptr) -> Self { return (.storage = ToStorage(ptr)); } - var Self:$$ Null = (.storage = 0); + var Self:$$ Null = (.storage = ArrayOfZeros(Sizeof(Ptr(T)))); } } ``` @@ -959,10 +1016,13 @@ except that: The expected implementation of a `choice` type will be very similar to the `struct` version of `Result` shown earlier, with a discriminator field and a -storage buffer large enough to hold the argument values of the alternatives. -This implementation will be hidden, of course, and the compiler may be able to -generate better code, but we will design this feature to support at least that -baseline implementation strategy. +storage buffer large enough to hold the argument values of the alternatives. Any +alternative parameter types that are incomplete (or have unknown size for any +other reason) will be represented using owning pointers; among other things, +this will allow users to define recursive choice types. The implementation will +be hidden, of course, and the compiler may be able to generate better code, but +we will design this feature to support at least that baseline implementation +strategy. One consequence is that although the pattern functions of a choice type can be overloaded (as in the `Variant` example above), they cannot be templates. More From 60d00d1e9e3be1f4d9f0e902a4207335641af5e3 Mon Sep 17 00:00:00 2001 From: Geoffrey Romer Date: Thu, 12 Nov 2020 13:59:30 -0800 Subject: [PATCH 12/28] Explain how special members can be generated automatically. --- proposals/p0157.md | 20 +++++++++++--------- 1 file changed, 11 insertions(+), 9 deletions(-) diff --git a/proposals/p0157.md b/proposals/p0157.md index 84a88d442062f..102029a586015 100644 --- a/proposals/p0157.md +++ b/proposals/p0157.md @@ -273,9 +273,6 @@ struct Result(Type:$$ T, Type:$$ Error) { var Self:$$ Cancelled = (.discriminator = 2); } - - // Copy, move, assign, destroy, and similar operations need to be defined - // explicitly, but are omitted for brevity. } ``` @@ -746,12 +743,17 @@ developer explicit control of the object representation. Consequently, it is not safe to copy, move, assign to, or destroy shareable storage unless it is known not to be inhabited by an object. -This means the compiler will not be able to generate safe default -implementations for any special member functions of types that have shareable -storage members. In order for user code to define those functions manually, the -design for shared storage will need to include operations for creating and -destroying objects within it, analogous to placement-`new` and pseudo-destructor -calls in C++. +This means that in the general case, the compiler will not be able to generate +safe default implementations for any special member functions of types that have +shareable storage members. However, it can do so if the shareable storage is +part of a sum type, because the `alternatives` block contains enough information +for it to infer what objects are present, by effectively pattern-matching on the +sum type object. + +In order for non-sum types to define those functions manually, the design for +shared storage will need to include operations for creating and destroying +objects within it, analogous to placement-`new` and pseudo-destructor calls in +C++. ### Initialization From c1c08f1779c037092049cc187d216b180b5dbe22 Mon Sep 17 00:00:00 2001 From: Geoffrey Romer Date: Thu, 12 Nov 2020 15:26:57 -0800 Subject: [PATCH 13/28] Respond to reviewer comments --- proposals/p0157.md | 21 +++++++++++++++++++-- 1 file changed, 19 insertions(+), 2 deletions(-) diff --git a/proposals/p0157.md b/proposals/p0157.md index 102029a586015..2e65311324ad7 100644 --- a/proposals/p0157.md +++ b/proposals/p0157.md @@ -934,8 +934,21 @@ non-exhaustive. For example, a `match` statement that has cases for the alternatives of a sum type will be required to have a `default` case, even if it also has patterns that match all of the declared alternatives. This ensures that sum types can be extended with new alternatives without breaking any existing -code. Declaring the alternatives with `closed alternatives` rather than -`alternatives` allows client code to treat the type as exhaustive. +code. + +However, code in the same +[library](/docs/design/code_and_name_organization/#libraries) as the choice type +is not subject to this restriction. Requiring a `default` has little benefit for +that code, because it can easily be updated when a new alternative is added, +without creating any version-skew concerns. Conversely, requiring a `default` +would have higher costs: code that's part of the sum type's API is more likely +to be required to explicitly handle every alternative (consider, for example, an +`Unparse` method on a sum type representing a parse tree). When that's the case, +omitting a `default` provides a build-time guarantee that every alternative has +been handled. + +Declaring the alternatives with `closed alternatives` rather than `alternatives` +allows all code to treat the alternatives as exhaustive. ### Alternatives considered @@ -1050,6 +1063,10 @@ benefit: these sorts of types appear to be rare, and when needed they should be implemented in library code, where the performance tradeoffs are explicit and under programmer control. +If may be possible to relax this restriction when and if we have a design for +supporting non-fixed-size types, although it's worth noting that even that would +not give us a way for `Any` to support assignment. + Carbon will probably have some mechanism for allowing a struct to have compiler-generated default implementations of operations such as copy, move, assignment, hashing, and equality comparison, so long as the struct's members From 0afe466ff0ea0f29d271fcf55fb2e93717d855c8 Mon Sep 17 00:00:00 2001 From: Geoffrey Romer Date: Fri, 13 Nov 2020 15:46:42 -0800 Subject: [PATCH 14/28] Draft discussion of "Pattern matching proxies" alternative. --- proposals/p0157.md | 201 +++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 201 insertions(+) diff --git a/proposals/p0157.md b/proposals/p0157.md index 2e65311324ad7..f7309ba42e0e0 100644 --- a/proposals/p0157.md +++ b/proposals/p0157.md @@ -44,6 +44,7 @@ SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception - [Extend `alternatives` instead](#extend-alternatives-instead) - [Alternatives considered](#alternatives-considered-4) - [Indexing by type](#indexing-by-type) + - [Pattern matching proxies](#pattern-matching-proxies) @@ -1201,3 +1202,203 @@ the name-indexed form, as proposed here. > TODO: We should consider ways of minimizing or avoiding the burden of > boilerplate factory names like `Value` for type-indexed use cases. + +### Pattern matching proxies + +Rather than having user-defined sum types directly participate in pattern +matching, we could allow them to specify a conversion to a type that Carbon +knows how to pattern-match against. Here's what that might look like, again +revisiting our `Result` example: + +``` +struct Result(Type:$$ T, Type:$$ Error) { + var Int: discriminator; + + var Storage(Max(Sizeof(T), Sizeof(Error)), + Max(Alignof(T), Alignof(Error))): storage; + + fn Success(T: value) -> Self { + Self result = (.discriminator = 0); + result.storage->Create(T, value); + return result; + } + + fn Failure(Error: error) -> Self { + Self result = (.discriminator = 1); + result.storage->Create(Error, error); + return result; + } + + var Self:$$ Cancelled = (.discriminator = 2); + + closed choice Choice { + Success(T), + Failure(Error), + Cancelled + } + + fn operator match(Ptr(Self): this) -> Choice { + match (discriminator) { + case 0 => { + return .Success(this->storage.Read(T)); + } + case 1 => { + return .Failure(this->storage.Read(Error)); + } + case 2 => { + return .Cancelled; + } + default => { + Assert(false); + } + } + } + + // Copy, move, assign, destroy, and similar operations need to be defined + // explicitly, but are omitted for brevity. +} +``` + +In this approach, we no longer have pattern functions, or `alternatives` blocks: +factories like `Success` are entirely ordinary Carbon functions, with no special +restrictions. For example, we can now create objects inside `.storage` +procedurally, via a `Create` method. But the price is they can no longer be used +in pattern matching. + +This approach still has `choice` types, but they are a fundamental part of the +Carbon type system, not syntactic sugar for lower-level constructions. In +particular, the alternatives of a `choice` type are the only function-call-like +construct that can appear in a pattern. Their declaration syntax does not need +to match the syntax of pattern functions, since there is no such feature, so we +can make the syntax much simpler. + +The key feature of this approach is an overloadable `operator match` function. +When pattern matching is applied to an object, `operator match` is implicitly +invoked on it, and its return value is used instead for purposes of pattern +matching. This feature is not limited to sum types. For example, a struct type +with private members might define an `operator match` that returns a struct with +only public members. + +In this example, the names `Success`, `Failure`, and `Cancelled` appear twice, +once as factory functions of `Result` and once as alternatives of +`Result.Choice`, with the same arity in each case. Unlike in the primary +proposal, this correspondence is just an API design choice; there is no +language-level requirement that the alternatives correspond to the factory +functions. Consequently, there is no requirement that the factory functions are +either exhaustive or mutually exclusive. + +This approach has a number of advantages: + +- Types can use the full power of the Carbon language when expressing how they + participate in pattern matching. In particular, this means `Storage` doesn't + need a rich initialization API, because its state can be set procedurally. + More broadly, it's much less likely that a type will be unable to + participate in pattern matching because of limitations of the language. +- Patterns are allowed to be underconstrained. This means Carbon can more + easily include operations like `|` in its pattern language, and it means + types can make parts of the object state invisible to pattern matching (for + example, non-salient state like the capacity of a `std::vector`). +- `choice` can more easily have a syntax that's simple, and familiar from + other languages like Rust. +- The language rules are much simpler, and easier to explain, because we don't + need to specify both "forward" and "reverse" semantics for a subset of the + language, or specify the boundaries of that subset. Relatedly, it provides a + simpler correspondence between the Carbon code and the generated assembly, + which could substantially simplify things like debugging. + +However, it also has some substantial drawbacks: + +- User-defined sum types will probably require dramatically more code, both + because the special member functions cannot be generated automatically, and + because the "forward" and "reverse" directions both require explicit code. + This additional verbosity not only makes these types more tedious to write + and read, it also creates a risk of bugs in the code that would otherwise be + generated automatically. The duplication of names may also create + readability problems. For example, an inexperienced reader might + misunderstand `.Success`, `.Failure`, and `.Cancelled` in the + `operator match` body above as referring to the factory functions. + Conversely, an inexperienced programmer might omit the leading `.`, so that + they actually _do_ refer to the factory functions. +- Users will be able to extend pattern matching to support new _types_, but + not new _operations_. For example, it wouldn't let us define + `ArrayOfZeroes(N)` (see above) as an operator that can be used in patterns. + We could of course introduce pattern functions as a separate feature, + although it's less likely to pull its weight if it's not needed for the sum + type use case. +- We have to introduce a second, fundamentally separate kind of user-defined + type, namely `choice` types. Among other things, this would rule out the + possibility of changing the `struct` keyword to `type`, as some have + suggested. + +There's a final drawback that's worth discussing in more depth: evaluating an +`operator match` currently seems to involve copying all the data that affects +pattern matching from the original object to the one that's returned from +`operator match`, so a naive implementation of this approach would make pattern +matching much slower than under the primary proposal. It's possible that these +copies could be optimized away, but it's not clear how feasible, and how +reliable, that will be in practice. More fundamentally, reliance on the +optimizer seems contrary to Carbon's goal of making performance both predictable +and controllable by the programmer. + +Programmers can work around this problem by having the return type hold pointers +to the underlying data, rather than copies of it. For example, `Result.Choice` +in the example above could be rewritten as + +``` +closed choice Choice { + Success(Ptr(T)), + Failure(Ptr(Error)), + Cancelled +} +``` + +However, that would require changing the patterns correspondingly. For example, +`GetIntFromUser` would have to be rewritten to something like this, where I'm +assuming that `&foo` is a pattern operator that matches a pointer whenever `foo` +matches the pointee: + +``` +fn GetIntFromUser() -> Int { + while(True) { + var String: s = UserPrompt("Please enter a number"); + match (ParseAsInt(s)) { + case .Success(&(var Int: value)) => { + return value; + } + case .Failure(&(var String: error)) => { + Display(error); + } + case .Cancelled => { + // We didn't request cancellation, so something is very wrong. + Terminate(); + } + } + } +} +``` + +Not only does this make the client code somewhat harder to read and write, it +forces the client code to deal with the implementation details of `Result`. In +particular, this means that if a type is initially defined as a `choice` type, +there will typically be no way to redefine it as a struct with an +`operator match` without either atomically rewriting all pattern matches +involving that type, or incurring a large performance hit (which would probably +defeat the purpose of making that change in the first place). As it stands, I +think this is a fatal problem with this approach, because it's so directly +contrary to Carbon's software-evolvability goals. + +It's worth noting that this would probably be a non-issue if Carbon supported +C++-like references. For example, supposing `Ref(T)` were the syntax for +reference types, we could define `Result.Choice` like so: + +``` +closed choice Choice { + Success(Ref(T)), + Failure(Ref(Error)), + Cancelled +} +``` + +This would avoid the need for copying, without forcing the type's clients to +make any changes to their code. However, we currently do not expect Carbon to +support references, so this solution may not be available. From 80bac798722aa4aed0902851e942f7a525bc7791 Mon Sep 17 00:00:00 2001 From: Geoffrey Romer Date: Tue, 17 Nov 2020 10:28:59 -0800 Subject: [PATCH 15/28] Initial sketch of callback-based approach --- proposals/p0157.md | 37 +++++++++++++++++++++++++++++++++++++ 1 file changed, 37 insertions(+) diff --git a/proposals/p0157.md b/proposals/p0157.md index f7309ba42e0e0..e1cee2500971f 100644 --- a/proposals/p0157.md +++ b/proposals/p0157.md @@ -45,6 +45,7 @@ SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception - [Alternatives considered](#alternatives-considered-4) - [Indexing by type](#indexing-by-type) - [Pattern matching proxies](#pattern-matching-proxies) + - [Pattern matching callbacks](#pattern-matching-callbacks) @@ -1402,3 +1403,39 @@ closed choice Choice { This would avoid the need for copying, without forcing the type's clients to make any changes to their code. However, we currently do not expect Carbon to support references, so this solution may not be available. + +### Pattern matching callbacks + +FIXME add words to explain this. + +``` +struct Result(Type:$$ T, Type:$$ Error) { + // Data members, factories, and special members same as above + + interface ContinuationT { + fn Success(T: value) -> (); + fn Failure(Error: error) -> (); + fn Cancelled() -> (); + } + + // Note that we can't replace `ContinuationT` with `Type` -- the compiler + // needs `ContinuationT` because it's how we specify the set of alternatives. + fn operator match[ContinuationT:$$ Continuation]( + Ptr(Self): this, Ptr(Continuation): continuation) -> () { + match (discriminator) { + case 0 => { + return continuation->Success(this->storage.Read(T)); + } + case 1 => { + return continuation->Failure(this->storage.Read(Error)); + } + case 2 => { + return continuation->Cancelled(); + } + default => { + Assert(false); + } + } + } +} +``` From 026aba88cd50b7c6a481ff37319e779f61b48e4a Mon Sep 17 00:00:00 2001 From: Geoffrey Romer Date: Mon, 30 Nov 2020 16:27:07 -0800 Subject: [PATCH 16/28] Discuss callback-based approach in depth, and focus on it as the leading alternative. Also add discussion of binding patterns that mutate in-place. --- proposals/p0157.md | 565 ++++++++++++++++++++++++++++----------------- 1 file changed, 350 insertions(+), 215 deletions(-) diff --git a/proposals/p0157.md b/proposals/p0157.md index e1cee2500971f..4f0ea1ba3e608 100644 --- a/proposals/p0157.md +++ b/proposals/p0157.md @@ -15,7 +15,8 @@ SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception - [Problem](#problem) - [Background](#background) - [Proposal](#proposal) -- [Pattern matching](#pattern-matching) + - [Additional motivating use cases](#additional-motivating-use-cases) + - [In-place mutation](#in-place-mutation) - [Pattern functions](#pattern-functions) - [Constraints on function evaluation](#constraints-on-function-evaluation) - [Constraints on pattern matching](#constraints-on-pattern-matching) @@ -44,8 +45,8 @@ SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception - [Extend `alternatives` instead](#extend-alternatives-instead) - [Alternatives considered](#alternatives-considered-4) - [Indexing by type](#indexing-by-type) - - [Pattern matching proxies](#pattern-matching-proxies) - [Pattern matching callbacks](#pattern-matching-callbacks) + - [Pattern matching proxies](#pattern-matching-proxies) @@ -173,7 +174,7 @@ fn GetIntFromUser() -> Int { } ``` -However, this code has several serious deficiencies: +However, this code has several functional deficiencies: - The implementation details of `Result` are not encapsulated. This makes the `Result` API unsafe: nothing prevents client code from accessing `.value` @@ -192,22 +193,22 @@ However, this code has several serious deficiencies: `default` case in order for the compiler and other tools to consider it exhaustive, even though that default case should never be entered. -It's also worth noting that the definition of `Result` is largely boilerplate. -Conceptually, the only information needed to specify this type is the names and -parameter types of the two factory functions, plus the fact that every possible -value of `Result` is uniquely described by the name and parameter values of a -call to one of those two functions. Given that information, the compiler could -easily generate the rest of the struct definition. This generated implementation -may not always be as efficient as a hand-coded one could be, but in a lot of -cases that may not matter. - -This code arguably exhibits another problem as well: the `return` statements in -`ParseAsInt` are quite verbose, due to the need to explicitly qualify the -function calls with `Result(Int, String)`. In fact, developers might prefer to -avoid having a function call at all, especially in the success case, and instead -rely on implicit conversions to write something like `return result;`. This -proposal will not address that problem, because it appears to be orthogonal to -the design of sum types. +It also has a couple of ergonomic problems: + +- The definition of `Result` is largely boilerplate. Conceptually, the only + information needed to specify this type is the names and parameter types of + the two factory functions, the name of the static member `Cancelled`, plus + the fact that every possible value of `Result` is uniquely described by the + name and parameter values of a call to one of those two functions, or else + equal to `Cancelled`. Given that information, the compiler could easily + generate the rest of the struct definition. This generated implementation + may not always be as efficient as a hand-coded one could be, but in a lot of + cases that may not matter. +- The `return` statements in `ParseAsInt` are quite verbose, due to the need + to explicitly qualify the function calls with `Result(Int, String)`. In + fact, developers might prefer to avoid having a function call at all, + especially in the success case, and instead rely on implicit conversions to + write something like `return result;`. ## Proposal @@ -220,15 +221,18 @@ which together prevent it from adequately supporting sum types: them to share storage. - There is no way for a type to specify that a given set of patterns is exhaustive. - -I propose supporting sum types by introducing three language features to supply -the missing functionality, as well as a sugar syntax for defining a sum type -without micromanaging the implementation details. These features are largely -separable, although there are some dependencies between them, so their detailed -design will be addressed in future proposals, and the details discussed here -should be considered provisional. This proposal merely establishes the overall -design direction for sum types, in the same way that [p0083](p0083.md) -established the overall design direction for the language as a whole. +- There isn't a concise way to define a sum type based on the form of its + alternatives. +- There isn't a way to return a specific alternative without restating the + return type of the function. + +I propose supporting sum types by introducing five language features to supply +the missing functionality. These features are largely separable, although there +are some dependencies between them, so their detailed design will be addressed +in future proposals, and the details discussed here should be considered +provisional. This proposal merely establishes the overall design direction for +sum types, in the same way that [p0083](p0083.md) established the overall design +direction for the language as a whole. To support encapsulation in pattern matching, I propose introducing the concept of a _pattern function_, which is a function that can be invoked as part of a @@ -319,6 +323,12 @@ closed choice Result(Type:$$ T, Type:$$ Error) { } ``` +Finally, to avoid the boilerplate in `return` statements, I propose that in a +function with return type `T`, a statement of the form `return .F(args)` is +equivalent to `return T.F(args)`. + +### Additional motivating use cases + We don't yet have enough of a design for variadics to give an example of a Carbon counterpart for `std::variant`, but a variant with exactly three alternative types could be written like so: @@ -331,6 +341,13 @@ choice Variant(Type:$$ T1, Type:$$ T2, Type:$$ T3) { } ``` +> **Open question:** Should Carbon have a native syntax for pattern matching on +> the dynamic type of an object? If so, should types like `Variant` be able to +> use it, instead of having the `.Value` boilerplate in every pattern? Should +> this mechanism be aware of subtype relationships (so that a subtype pattern is +> a better match than a supertype pattern)? If so, how are those subtype +> relationships defined? + As an example of a case where manual control of the representation is useful, here's an example of a type that implements an optional pointer value without requiring additional storage for a discriminator: @@ -350,32 +367,92 @@ Note that `ArrayOfZeros` can be defined as a pattern function, but only with some possibly surprising extensions to Carbon's pattern language. See below for details. -## Pattern matching - -This proposal presupposes the following requirements for pattern matching, which -will inform the design of pattern functions: - -- Carbon pattern matching must not only be able to determine whether a pattern - matches a value, and bind values to its variables, but also be able to - select the _best_ pattern when more than one matches. Patterns are partially - ordered by specificity, so for example `(Int: x, 1)` is a better match than - `(Int: x, Int: y)` in the cases where they both match. This is not a total - order, or even a weak order (intuitively, a total order with "ties"); for - example, there is no ordering between `(Int: x, 1, 1)` and either - `(1, Int: y, Int: z)` or `(1, 1, Int:z)`, but we can't just say that all - three are "tied", because `(1, 1, Int: z)` is more specialized than - `(1, Int: y, Int: z)`. This order should at least form a - [lattice](), but that's not - directly relevant here. -- Carbon must be able to determine whether a given set of patterns is - _exhaustive_, meaning that for any possible value at least one of them will - match. It must also be able to determine whether a set of patterns is - _ambiguous_, meaning that the set contains two or more patterns that are not - ordered with respect to each other, and that can match the same value. - -These requirements are highly desirable in any pattern matching system, but they -are hard requirements if we want to treat C++-like overload resolution as a form -of pattern matching. +### In-place mutation + +Consider the following code: + +``` +var Optional(Vector(Int)) opt = .Some(...); + +match (opt) { + case .Some(Vector(Int): v) => { v[0] += 1; } + default => {} +} +``` + +By analogy with function parameters, it seems clear that `Vector(Int): i` +declares either a new variable or a new constant, so either `v[0] += 1` modifies +that variable, or the code doesn't compile at all. In neither case does it +mutate `opt` itself. + +However, as discussed earlier, pattern matching is the primary interface for +interacting with sum types, so it's essential that pattern matching be able to +express in-place mutation somehow. In particular, there needs to be some way of +changing the above code so that it modifies the `Vector` in place. However, not +all parameters of all alternatives will be able to support this, because in some +cases the parameter value will be computed on the fly. For example, if the +low-order bits of a pointer are used to store a discriminator, a binding of that +pointer would need to expose a _copy_ that has the discriminator masked out, and +so any mutations that the user applied via the binding would apply to the copy, +and not the original sum object. + +Consequently, opting into in-place mutation probably needs to be done on a +per-binding basis, rather than a per-alternative or per-`match` basis. This will +express programmer intent more precisely, which will allow us to diagnose if +they're trying to mutate a field that doesn't support in-place mutation. + +By analogy with function parameters, it's tempting to try to express this opt-in +by changing the type of `v` to `Ptr(Vector(Int))`. However, that doesn't work in +general. For example, suppose we want to pattern match on a +`Variant3(Int, Ptr(Int), String)` -- does `.Value(Ptr(Int))` match the `Int` +case with in-place semantics, the `Ptr(Int)` case with by-value semantics, or +can it match both? The same argument applies to any other type-based opt-in, +unless the two types are indistinguishable by overload resolution (as with `T` +vs. `const T&` in C++). + +One way to solve this would be to provide a new kind of binding pattern. For +example, we could introduce `ref Type: name` as a syntax for binding the name +`name` to the object this pattern is matched against (rather than to a new +object with the same value). This syntax is intended to suggest that `ref` is +not part of the type, but rather an introducer for the declaration of `name`, +analogous to `var` and `val`: + +``` +match (opt) { + case .Some(ref Vector(Int): v) => { v[0] += 1; } + default => {} +} +``` + +Alternatively, we could have a syntax that binds a name to a pointer, rather +than directly to the object, although it's harder to find a concise but mnemonic +introducer to use in that case: + +``` +match (opt) { + case .Some(inplace Ptr(Vector(Int)): v) => { (*v)[0] += 1; } + default => {} +} +``` + +Rather than introduce a new kind of binding, it might be possible to solve this +problem by introducing a new pattern operator. In particular, we could allow +unary `*` to appear in patterns, with the semantics that `*` matches an +object whenever `` matches the address of that object. + +``` +match (opt) { + case .Some(*(Ptr(Vector(Int)): v)) => { (*v)[0] += 1; } + default => {} +} +``` + +This is in some ways more parsimonious, because we're just slightly broadening +the set of expression operators that have a dual role as pattern operators, +rather than introducing an entirely new kind of binding. + +This proposal doesn't take a position on what syntax to use, but I an inclined +to prefer `ref Type: name`. ## Pattern functions @@ -1204,12 +1281,15 @@ the name-indexed form, as proposed here. > TODO: We should consider ways of minimizing or avoiding the burden of > boilerplate factory names like `Value` for type-indexed use cases. -### Pattern matching proxies +### Pattern matching callbacks -Rather than having user-defined sum types directly participate in pattern -matching, we could allow them to specify a conversion to a type that Carbon -knows how to pattern-match against. Here's what that might look like, again -revisiting our `Result` example: +Rather than requiring the compiler to automatically invert a set of factory +functions in order to use them as patterns, we could allow types to specify how +they participate in pattern matching by supplying explicit code for both +directions. In this approach, the method that implements the reverse direction +would receive a set of continuations representing the different branches of the +match, and it would be responsible for choosing which one to execute. Here's +what that might look like, again revisiting our `Result` example: ``` struct Result(Type:$$ T, Type:$$ Error) { @@ -1232,25 +1312,28 @@ struct Result(Type:$$ T, Type:$$ Error) { var Self:$$ Cancelled = (.discriminator = 2); - closed choice Choice { - Success(T), - Failure(Error), - Cancelled + interface MatchContinuation { + fn Success(lval T: value); + fn Failure(lval Error: error); + fn Cancelled(); } - fn operator match(Ptr(Self): this) -> Choice { - match (discriminator) { - case 0 => { - return .Success(this->storage.Read(T)); - } - case 1 => { - return .Failure(this->storage.Read(Error)); - } - case 2 => { - return .Cancelled; - } - default => { - Assert(false); + impl Matchable(MatchContinuation) { + method Match[MatchContinuation:$ Continuation]( + Ptr(Self): this, Ptr(Continuation): continuation) { + match (discriminator) { + case 0 => { + continuation->Success(this->storage.Read(T)); + } + case 1 => { + continuation->Failure(this->storage.Read(Error)); + } + case 2 => { + continuation->Cancelled(); + } + default => { + Assert(false); + } } } } @@ -1260,47 +1343,84 @@ struct Result(Type:$$ T, Type:$$ Error) { } ``` -In this approach, we no longer have pattern functions, or `alternatives` blocks: -factories like `Success` are entirely ordinary Carbon functions, with no special -restrictions. For example, we can now create objects inside `.storage` -procedurally, via a `Create` method. But the price is they can no longer be used -in pattern matching. +In this code, `Result` makes itself available for use in pattern matching by +declaring that it implements the `Matchable` interface with a given continuation +interface. That interface, `MatchContinuation` in this case, tells the compiler +what kinds of patterns can be matched against this type, so that it can +typecheck the `match` expression. The compiler then invokes `Match` with a +concrete continuation object containing the actual code that implements the +different branches of the `match` expression, and `Match` invokes the +appropriate continuation. + +This proposal assumes that Carbon will have support for defining and +implementing generic interfaces, including interfaces that take interface +parameters, and uses `interface`, `impl`, `method` etc. as **placeholder** +syntax. It can probably be revised to work if interfaces can't be parameterized +that way, or if we don't have a feature like this at all, but it might be +somewhat more awkward. + +`lval T: value` is a **placeholder** syntax that indicates that this parameter +can be bound either in-place or by value when pattern matching against `Result`; +omitting the `lval` introducer would indicate that the parameter can only be +bound by value. The "`lval`" spelling is intended to pair with the +`ref Type: value` spelling for opting into in-place semantics in a pattern, and +is motivated by analogy with C++'s concept of an "lvalue": `ref` patterns can +only bind to `lval` parameters, just as non-const references can only bind to +lvalues in C++, because they represent durable objects rather than ephemeral +values. If we choose a different in-place syntax for patterns, we could +presumably find a corresponding syntax to use here. + +With this approach, we no longer have pattern functions, or `alternatives` +blocks: factories like `Success` are entirely ordinary Carbon functions, with no +special restrictions. For example, we can now create objects inside `.storage` +procedurally, via a `Create` method. However, the price is that the factory +functions can no longer be used in patterns. + +Consequently, the names `Success`, `Failure`, and `Cancelled` are defined twice, +once as factory functions of `Result` and once as member functions of +`MatchContinuation`, with the same arity and argument types in each case. Unlike +in the primary proposal, this correspondence is a design choice by the type +author; there is no language-level requirement that the alternatives correspond +to the factory functions. Consequently, there is no requirement that the factory +functions are exhaustive. There is a requirement that the interface methods are +mutually exclusive, but only in the sense that `Match` is required to call +exactly one of them. + +This approach can be extended to support non-sum patterns as well. For example, +a type that wants to match a tuple-shaped pattern like `(Int: i, String: s)` +could define a continuation interface like -This approach still has `choice` types, but they are a fundamental part of the -Carbon type system, not syntactic sugar for lower-level constructions. In -particular, the alternatives of a `choice` type are the only function-call-like -construct that can appear in a pattern. Their declaration syntax does not need -to match the syntax of pattern functions, since there is no such feature, so we -can make the syntax much simpler. - -The key feature of this approach is an overloadable `operator match` function. -When pattern matching is applied to an object, `operator match` is implicitly -invoked on it, and its return value is used instead for purposes of pattern -matching. This feature is not limited to sum types. For example, a struct type -with private members might define an `operator match` that returns a struct with -only public members. - -In this example, the names `Success`, `Failure`, and `Cancelled` appear twice, -once as factory functions of `Result` and once as alternatives of -`Result.Choice`, with the same arity in each case. Unlike in the primary -proposal, this correspondence is just an API design choice; there is no -language-level requirement that the alternatives correspond to the factory -functions. Consequently, there is no requirement that the factory functions are -either exhaustive or mutually exclusive. +``` +interface MatchContinuation { + fn operator()(Int: i, String: s); +} +``` + +A type's continuation interface can also include a function named +`operator default`, which implements the `default` case of the `match`. +Consequently, any `match` on that type will be required to have a `default` +case, so this has roughly the same effect as omitting `closed` from the +`alternatives` block in the primary proposal. + +> **Open question:** Can `Match` actually invoke `operator default`? It might +> sometimes be useful to define a type that can't be matched exhaustively, and +> the mere possibility that the `default` could actually run might help +> encourage client code to implement it robustly, rather than blindly providing +> something like `Assert(False)`. However, if there's no language-level +> guarantee that the `default` case is unreachable if all other alternatives are +> handled, then we won't be able to let same-library code omit the `default`. This approach has a number of advantages: - Types can use the full power of the Carbon language when expressing how they participate in pattern matching. In particular, this means `Storage` doesn't - need a rich initialization API, because its state can be set procedurally. - More broadly, it's much less likely that a type will be unable to - participate in pattern matching because of limitations of the language. + need a complex initialization API, because its state can be set + procedurally. More broadly, it's much less likely that a type will be unable + to participate in pattern matching because of limitations of the language. - Patterns are allowed to be underconstrained. This means Carbon can more easily include operations like `|` in its pattern language, and it means types can make parts of the object state invisible to pattern matching (for example, non-salient state like the capacity of a `std::vector`). -- `choice` can more easily have a syntax that's simple, and familiar from - other languages like Rust. - The language rules are much simpler, and easier to explain, because we don't need to specify both "forward" and "reverse" semantics for a subset of the language, or specify the boundaries of that subset. Relatedly, it provides a @@ -1309,133 +1429,148 @@ This approach has a number of advantages: However, it also has some substantial drawbacks: -- User-defined sum types will probably require dramatically more code, both - because the special member functions cannot be generated automatically, and - because the "forward" and "reverse" directions both require explicit code. - This additional verbosity not only makes these types more tedious to write - and read, it also creates a risk of bugs in the code that would otherwise be - generated automatically. The duplication of names may also create - readability problems. For example, an inexperienced reader might - misunderstand `.Success`, `.Failure`, and `.Cancelled` in the - `operator match` body above as referring to the factory functions. - Conversely, an inexperienced programmer might omit the leading `.`, so that - they actually _do_ refer to the factory functions. -- Users will be able to extend pattern matching to support new _types_, but - not new _operations_. For example, it wouldn't let us define - `ArrayOfZeroes(N)` (see above) as an operator that can be used in patterns. - We could of course introduce pattern functions as a separate feature, - although it's less likely to pull its weight if it's not needed for the sum - type use case. -- We have to introduce a second, fundamentally separate kind of user-defined - type, namely `choice` types. Among other things, this would rule out the - possibility of changing the `struct` keyword to `type`, as some have - suggested. - -There's a final drawback that's worth discussing in more depth: evaluating an -`operator match` currently seems to involve copying all the data that affects -pattern matching from the original object to the one that's returned from -`operator match`, so a naive implementation of this approach would make pattern -matching much slower than under the primary proposal. It's possible that these -copies could be optimized away, but it's not clear how feasible, and how -reliable, that will be in practice. More fundamentally, reliance on the -optimizer seems contrary to Carbon's goal of making performance both predictable -and controllable by the programmer. - -Programmers can work around this problem by having the return type hold pointers -to the underlying data, rather than copies of it. For example, `Result.Choice` -in the example above could be rewritten as - -``` -closed choice Choice { - Success(Ptr(T)), - Failure(Ptr(Error)), - Cancelled -} -``` - -However, that would require changing the patterns correspondingly. For example, -`GetIntFromUser` would have to be rewritten to something like this, where I'm -assuming that `&foo` is a pattern operator that matches a pointer whenever `foo` -matches the pointee: - -``` -fn GetIntFromUser() -> Int { - while(True) { - var String: s = UserPrompt("Please enter a number"); - match (ParseAsInt(s)) { - case .Success(&(var Int: value)) => { - return value; - } - case .Failure(&(var String: error)) => { - Display(error); - } - case .Cancelled => { - // We didn't request cancellation, so something is very wrong. - Terminate(); - } - } - } -} -``` - -Not only does this make the client code somewhat harder to read and write, it -forces the client code to deal with the implementation details of `Result`. In -particular, this means that if a type is initially defined as a `choice` type, -there will typically be no way to redefine it as a struct with an -`operator match` without either atomically rewriting all pattern matches -involving that type, or incurring a large performance hit (which would probably -defeat the purpose of making that change in the first place). As it stands, I -think this is a fatal problem with this approach, because it's so directly -contrary to Carbon's software-evolvability goals. - -It's worth noting that this would probably be a non-issue if Carbon supported -C++-like references. For example, supposing `Ref(T)` were the syntax for -reference types, we could define `Result.Choice` like so: +- Manually-defined sum types will probably require dramatically more code, + both because the special member functions cannot be generated automatically, + and because the "forward" and "reverse" directions both require explicit + code. This additional verbosity not only makes these types more tedious to + write and read, it also creates a risk of bugs in the code that's being + written by fallible humans instead of the compiler. The duplication of names + may also create readability problems. +- This approach only allows programmers to extend pattern matching to support + new _types_, not new _operations_. For example, it doesn't directly give us + a way to define `ArrayOfZeroes(N)` (see above) as an operator that can be + used in patterns. However, it's possible that this feature could be extended + to support user-defined pattern operations, by treating them as comparable + to `Matchable.Match` except that they must be invoked by name rather than + being invoked implicitly. +- The process of translating a pattern-match into a `Match` call may require + special overload resolution rules. Overload resolution is normally driven by + the types of the arguments, but the concrete type of the argument to `Match` + may not be known until quite late in the compilation process, because it + could depend on the details of code generation. Instead, overload resolution + will have to be driven by some more abstract representation of the "shape" + of the callsite, such as a dummy type with the same interface as the actual + argument, but potentially different implementation details. However, this + might not be visible to users so long as the continuation parameter is + passed as a generic rather than a template (`:$` rather than `:$$`). +- Manually-defined sum types default to being closed, rather than open. This + creates some risk that sum type authors will accidentally lock themselves + out of the ability to add new alternatives. + +One drawback is worth discussing in more depth: Carbon's parameter-passing +semantics might not be expressive enough to support this approach. The current +design is expressed in terms of a restricted form of pass-by-reference, but +there is substantial resistance to supporting any form of reference-like +parameter passing in Carbon. However, it is not clear how a purely pass-by-value +approach could replace the pass-by-reference placeholder design while retaining +its most important properties, such as: + +- The interface definition makes the structure of the sum type clearly legible + to the reader, and makes it easy for the reader to disregard the distinction + between parameters that do and do not support in-place binding if it's not + relevant to them. +- The code for a by-value binding doesn't depend on whether an in-place + binding would also be supported. +- The `Match` function's behavior depends solely on the value of the object, + and not on any characteristics of the pattern, such as whether any given + parameter is being bound in-place. Correspondingly, there's no need for a + mechanism to convey information about the pattern to `Match`. + +However, there are a number of other problems that Carbon will need to solve if +we want to avoid supporting pass-by-reference, and it's quite possible that +solving those problems will naturally solve these as well, or at least get us +closer to a solution. However, at this point that's no more than speculation, so +this approach carries a risk that we will eventually be forced to either abandon +it, or accept some form of pass-by-reference in Carbon. + +> **Open question:** Does Carbon need to allow user-defined pattern matching +> code to optimize based on the presence of wildcards, and if so, how? In +> theory, `Match` might sometimes be able to avoid substantial amounts of work +> if it knows about wildcards in the pattern, because then it can supply them +> with dummy data rather than having to compute correct values. This is likely +> to be especially valuable if Carbon supports list patterns and variable-length +> wildcards (like Rust's `..`): most list-like types could in principle +> determine whether they match `{1, 2, ..}` in constant time, rather than pass +> their entire contents into the continuation, but the approach described so far +> doesn't allow them to implement such an optimization. -``` -closed choice Choice { - Success(Ref(T)), - Failure(Ref(Error)), - Cancelled -} -``` - -This would avoid the need for copying, without forcing the type's clients to -make any changes to their code. However, we currently do not expect Carbon to -support references, so this solution may not be available. - -### Pattern matching callbacks +### Pattern matching proxies -FIXME add words to explain this. +As a variant of the previous approach, we could allow types to specify their +pattern-matching behavior in terms of a proxy type that Carbon "natively" knows +how to pattern-match against. In the case of a sum type, this proxy would be a +`choice` type, which means that `choice` needs to be a fundamental part of the +language, rather than syntactic sugar for a sum type struct. Returning once more +to the `Result` example: ``` struct Result(Type:$$ T, Type:$$ Error) { // Data members, factories, and special members same as above - interface ContinuationT { - fn Success(T: value) -> (); - fn Failure(Error: error) -> (); - fn Cancelled() -> (); + closed choice Choice { + Success(ref T), + Failure(ref Error), + Cancelled } - // Note that we can't replace `ContinuationT` with `Type` -- the compiler - // needs `ContinuationT` because it's how we specify the set of alternatives. - fn operator match[ContinuationT:$$ Continuation]( - Ptr(Self): this, Ptr(Continuation): continuation) -> () { + fn operator match(Ptr(Self): this) -> Choice { match (discriminator) { case 0 => { - return continuation->Success(this->storage.Read(T)); + return .Success(this->storage.Read(T)); } case 1 => { - return continuation->Failure(this->storage.Read(Error)); + return .Failure(this->storage.Read(Error)); } case 2 => { - return continuation->Cancelled(); + return .Cancelled; } default => { - Assert(false); + Assert(False); } } } } ``` + +This approach has many of the same tradeoffs as the callback-based approach: it +substantially simplifies the language, and gives much greater freedom to type +authors, but obliges them to write substantially more code. It also has some +advantages over the callback-based approach: + +- It's somewhat simpler, because it uses return values instead of + continution-passing +- It could generalize more easily to allow things like types that can match + list patterns (if Carbon has those). +- Since choice types are no longer syntactic sugar for structs with factory + functions, their syntax no longer needs to mirror the syntax of function + declarations. + +However, it also has several significant drawbacks: + +- It forces us to treat `choice` as a fundamental part of the language: in + order to implement a sum type, you have to work with an object type whose + layout and implementation is inherently opaque. This would be a substantial + departure from C++, and it's difficult to foresee the consequences of that. + Possibly the closest analogy in C++ is virtual calls, and especially virtual + base classes, where fundamental operations like `->` and pointer casts can + involve nontrivial generated code, and some aspects of object layout are + required to be hidden from the user. However, Carbon seems to be moving away + from C++ in precisely those ways. +- It may be somewhat less efficient, because it requires instantiating the + enclosing `Choice` type (presumably including a discriminator field), rather + than merely passing the appropriate alternative into the continuation. +- Not only might it require Carbon to support pass-by-reference, it comes very + close to requiring Carbon to support reference _types_, which are even more + contentious and problematic. This is already somewhat evident in the way + `Choice` is defined, but becomes much clearer when this mechanism is used + for product types, where allowing mutable bindings would require + `operator match` to return a tuple of references. +- The risk of confusion due to duplicate names is somewhat greater: a naive + reader might think that, for example, `.Success` in the body of + `operator match` refers to `Self.Success` rather than `Self.Choice.Success`. + Relatedly, there's some risk that the author may omit the leading `.`, and + thereby invoke `Self.Success` instead of `Self.Choice.Success`. This will + probably fail to build, but the errors may be confusing. + +I think these drawbacks decisively outweigh the advantages, so I do not +recommend this approach. From 4dbd31d71e02895892f97a211df4b5fff8cae5c3 Mon Sep 17 00:00:00 2001 From: Geoffrey Romer Date: Mon, 30 Nov 2020 16:49:45 -0800 Subject: [PATCH 17/28] Respond to reviewer comments, and delete redundant/inaccurate discussion of "explicit inverses" --- proposals/p0157.md | 64 ++++++++++++++-------------------------------- 1 file changed, 19 insertions(+), 45 deletions(-) diff --git a/proposals/p0157.md b/proposals/p0157.md index 4f0ea1ba3e608..74ecd1fa2e35f 100644 --- a/proposals/p0157.md +++ b/proposals/p0157.md @@ -27,7 +27,6 @@ SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception - [Avoidable inconsistencies](#avoidable-inconsistencies) - [Worked example: `ArrayOfZeroes`](#worked-example-arrayofzeroes) - [Alternatives considered](#alternatives-considered) - - [Explicit inverses](#explicit-inverses) - [Make alternatives be pattern functions implicitly](#make-alternatives-be-pattern-functions-implicitly) - [Shareable storage](#shareable-storage) - [Initialization](#initialization) @@ -90,9 +89,9 @@ alternatives. For example, a typical C-style pointer can be thought of as an optional type, with a special null value indicating that no pointer is present, because the platform guarantees that the null byte pattern is never the representation of a valid pointer. This sort of customization inherently creates -a risk of implementation bugs that break type safety, but it must be possible -for correctly-implemented customizations to avoid changing the API, and hence -avoid weakening the static safety guarantees for users. +a risk of implementation bugs that break type safety, but it must be possible to +implement such customizations without changing the type's API, and hence without +altering the static safety guarantees for users. ## Background @@ -120,7 +119,7 @@ struct Result(Type:$$ T, Type:$$ Error) { // represents the cancelled state. var Int: discriminator; var T: value; - var E: error; + var Error: error; fn Success(T: value) -> Result(T, Error) { return (.discriminator = 0, .value = value, .error = Error()); @@ -185,8 +184,8 @@ However, this code has several functional deficiencies: `.error` with a default-constructed dummy value (and so it won't work if `Error` is not default-constructible), `Failure` must do the same for `.value`, and `Cancelled` must do the same for both. Furthermore, `Result` - is bloated by the fact that the two must have separately-allocated storage, - even though at most one at a time actually stores any data. + is bloated by the fact that the two fields must have separately-allocated + storage, even though at most one at a time actually stores any data. - `.discriminator` should never have any value other than 0, 1, or 2, but the compiler can't enforce that property when `Result`s are created, or exploit it when `Result`s are used. So, for example, the `match` must have a @@ -256,7 +255,8 @@ must be possible to obtain it as the return value of exactly one of the alternatives. Alternatives that take no arguments, which represent singleton states such as `Cancelled`, can instead be written as static constants. An `alternatives` block can be marked `closed`, which indicates that client code -can assume no new alternatives will be added in the future. +need not be prepared to accept alternatives other than the ones currently +present. Using these features, the `Result(T, Error)` example can be rewritten as follows: @@ -762,33 +762,6 @@ design for variadics. ### Alternatives considered -#### Explicit inverses - -Instead of restricting pattern functions to a narrow sublanguage that's usable -in both patterns and expressions, we could allow developers to explicitly -specify the code that should be executed during pattern matching, as well as the -code that should be executed during expression evaluation. This obliges them to -determine the correct inverse computation themselves, but permits them to -express it in terms of the full Carbon language. - -However, as discussed earlier, Carbon pattern matching needs to be able to not -only determine whether a pattern matches, but also rank patterns, and determine -whether a set of patterns is exhaustive and/or ambiguous. Defining a custom -pattern in terms of arbitrary forward and reverse functions doesn't give the -compiler enough information to do those things, so the developer would need to -supply additional information in some way, and it's not at all clear how that -would work. - -Furthermore, this approach opens the door to user-defined pattern syntaxes that -do not correspond to an expression syntax (as with Swift's `?`), or worse, -user-defined syntaxes that can be used in both expressions and patterns, but -have different meanings in the two contexts (as with Swift's `x as T`). Of -course, that may not be a problem -- I find Swift's choices surprising, but I -don't have any evidence that they've caused confusion for users. However, Carbon -will be breaking new ground by allowing user code to define pattern-matching -abstractions, so it seems advisable to impose fairly strict semantic constraints -on them, at least initially. - #### Make alternatives be pattern functions implicitly Every function in an `alternatives` block must be a pattern function, and this @@ -969,16 +942,17 @@ struct OptionalPtr(Type:$$ T) { } ``` -In this case, the compiler can observe that `.storage` must hold either a `T` -value or an all-zeros bit pattern, so it can safely attempt to match the -`.Value` case only after it has ruled out the `.Null` case, which it can -evaluate unconditionally because that case only requires loading raw bytes, -which is always safe. Consequently, we may want to require that a pattern match -that looks for `.Value` must also have an explicit `.Null` case (rather than a -generic `default` case), so as to make it clear to the reader that the `.Null` -case is being evaluated. If Carbon specifies that pattern matching evaluates -cases in order, we would presumably also require that the `.Null` case is above -the `.Value` case. +In this case, the compiler can observe that since the `alternatives` block is +required to be exhaustive, `.storage` must hold either a `T` value or an +all-zeros bit pattern, so it can safely attempt to match the `.Value` case only +after it has ruled out the `.Null` case, which it can evaluate unconditionally +because that case only requires loading raw bytes, which is always safe. +Consequently, we may want to require that a pattern match that looks for +`.Value` must also have an explicit `.Null` case (rather than a generic +`default` case), so as to make it clear to the reader that the `.Null` case is +being evaluated. If Carbon specifies that pattern matching evaluates cases in +order, we would presumably also require that the `.Null` case is above the +`.Value` case. We will need to formalize the intuition behind those examples, in the form of concrete language rules that are strict enough for the compiler to feasibly From bdaac5335cdaa87a40eda8361f038885033fb1aa Mon Sep 17 00:00:00 2001 From: Geoffrey Romer Date: Mon, 14 Dec 2020 15:13:18 -0800 Subject: [PATCH 18/28] Pivot to make "Pattern matching callbacks" the primary proposal, and move reference semantics to "open question" status. --- proposals/p0157.md | 1514 ++++++++++++-------------------------------- 1 file changed, 401 insertions(+), 1113 deletions(-) diff --git a/proposals/p0157.md b/proposals/p0157.md index 74ecd1fa2e35f..c0393399f863b 100644 --- a/proposals/p0157.md +++ b/proposals/p0157.md @@ -15,37 +15,16 @@ SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception - [Problem](#problem) - [Background](#background) - [Proposal](#proposal) - - [Additional motivating use cases](#additional-motivating-use-cases) - - [In-place mutation](#in-place-mutation) -- [Pattern functions](#pattern-functions) - - [Constraints on function evaluation](#constraints-on-function-evaluation) - - [Constraints on pattern matching](#constraints-on-pattern-matching) - - [Underdetermined patterns](#underdetermined-patterns) - - [Match guards](#match-guards) - - [Subpattern bindings](#subpattern-bindings) - - [Pattern-specific sugar syntaxes](#pattern-specific-sugar-syntaxes) - - [Avoidable inconsistencies](#avoidable-inconsistencies) - - [Worked example: `ArrayOfZeroes`](#worked-example-arrayofzeroes) - - [Alternatives considered](#alternatives-considered) - - [Make alternatives be pattern functions implicitly](#make-alternatives-be-pattern-functions-implicitly) - [Shareable storage](#shareable-storage) - - [Initialization](#initialization) - - [Alternatives considered](#alternatives-considered-1) - - [Pattern matching evaluation order](#pattern-matching-evaluation-order) -- [The `alternatives` block](#the-alternatives-block) - - [Alternatives considered](#alternatives-considered-2) - - [More concise syntax](#more-concise-syntax) - - [Marking alternatives individually](#marking-alternatives-individually) - - [Different syntax for `closed`](#different-syntax-for-closed) -- [`choice`](#choice) - - [Alternatives considered](#alternatives-considered-3) +- [User-defined pattern matching](#user-defined-pattern-matching) +- [Choice types](#choice-types) + - [Alternatives considered](#alternatives-considered) - [Separate support for enumerated types](#separate-support-for-enumerated-types) - [Different spelling for `choice`](#different-spelling-for-choice) - - [Extend `alternatives` instead](#extend-alternatives-instead) -- [Alternatives considered](#alternatives-considered-4) +- [Alternatives considered](#alternatives-considered-1) - [Indexing by type](#indexing-by-type) - - [Pattern matching callbacks](#pattern-matching-callbacks) - [Pattern matching proxies](#pattern-matching-proxies) + - [Pattern functions](#pattern-functions) @@ -175,10 +154,6 @@ fn GetIntFromUser() -> Int { However, this code has several functional deficiencies: -- The implementation details of `Result` are not encapsulated. This makes the - `Result` API unsafe: nothing prevents client code from accessing `.value` - even when `.discriminator` is not 0. This also makes the patterns extremely - verbose. - `.value` and `.error` must both be live throughout the `Result`'s lifetime, even when they are not meaningful. Consequently, `Success` must populate `.error` with a default-constructed dummy value (and so it won't work if @@ -186,6 +161,10 @@ However, this code has several functional deficiencies: `.value`, and `Cancelled` must do the same for both. Furthermore, `Result` is bloated by the fact that the two fields must have separately-allocated storage, even though at most one at a time actually stores any data. +- The implementation details of `Result` are not encapsulated. This makes the + `Result` API unsafe: nothing prevents client code from accessing `.value` + even when `.discriminator` is not 0. This also makes the patterns extremely + verbose. - `.discriminator` should never have any value other than 0, 1, or 2, but the compiler can't enforce that property when `Result`s are created, or exploit it when `Result`s are used. So, for example, the `match` must have a @@ -211,81 +190,64 @@ It also has a couple of ergonomic problems: ## Proposal -To summarize, the previous section identified three missing features in Carbon, -which together prevent it from adequately supporting sum types: +To summarize, the previous section identified several missing features in +Carbon, which together prevent it from adequately supporting sum types: -- There is no way for pattern matching to operate through an encapsulation - boundary. -- There is no way to manually control the lifetimes of subobjects, or enable +- There's no way to manually control the lifetimes of subobjects, or enable them to share storage. -- There is no way for a type to specify that a given set of patterns is +- There's no way for pattern matching to operate through an encapsulation + boundary. +- There's no way for a type to specify that a given set of patterns is exhaustive. -- There isn't a concise way to define a sum type based on the form of its +- There's no concise way to define a sum type based on the form of its alternatives. -- There isn't a way to return a specific alternative without restating the - return type of the function. - -I propose supporting sum types by introducing five language features to supply -the missing functionality. These features are largely separable, although there -are some dependencies between them, so their detailed design will be addressed -in future proposals, and the details discussed here should be considered -provisional. This proposal merely establishes the overall design direction for -sum types, in the same way that [p0083](p0083.md) established the overall design -direction for the language as a whole. - -To support encapsulation in pattern matching, I propose introducing the concept -of a _pattern function_, which is a function that can be invoked as part of a -pattern, even with arguments that contain placeholders. Pattern functions can -only contain the sort of code that could appear directly in a pattern, but they -let us define reusable pattern syntaxes that can do things like encapsulate -hidden implementation details of the object they're matching. +- There's no way to return a specific alternative without restating the return + type of the function. + +I propose supporting sum types by introducing several language features to +supply the missing functionality identified above. These features are largely +separable, although there are some dependencies between them, so their detailed +design will be addressed in future proposals, and the details discussed here +should be considered provisional. This proposal merely establishes the overall +design direction for sum types, in the same way that [p0083](p0083.md) +established the overall design direction for the language as a whole. To support manual lifetime control and storage sharing, I propose introducing a `Storage` type, which represents a fixed-size region of untyped memory and -provides operations for creating and destroying within it, as well as a -`ToStorage` operation for obtaining the object representation of a value. - -To allow types to specify an exhaustive set of patterns, as well as to solve -certain problems with accessing untyped memory in a pattern function, I propose -introducing `alternatives { ... }` as a grouping construct within a struct -definition. The elements of an `alternatives` block are pattern functions whose -return type is the enclosing struct type, and they are required to be both -exhaustive and unambiguous, meaning that for any possible value of the type, it -must be possible to obtain it as the return value of exactly one of the -alternatives. Alternatives that take no arguments, which represent singleton -states such as `Cancelled`, can instead be written as static constants. An -`alternatives` block can be marked `closed`, which indicates that client code -need not be prepared to accept alternatives other than the ones currently -present. - -Using these features, the `Result(T, Error)` example can be rewritten as -follows: +provides operations for creating and destroying objects within it. -``` -struct Result(Type:$$ T, Type:$$ Error) { - var Int: discriminator; +To support encapsulation in pattern matching, I propose introducing a +`Matchable` interface, which a type can implement in order to specify how it +behaves in pattern matching, including what (if anything) constitutes an +exhaustive set of patterns for that type. - var Storage(Max(Sizeof(T), Sizeof(Error)), - Max(Alignof(T), Alignof(Error))): storage; +To allow users to concisely specify a sum type when they don't need to +micro-optimize the implementation details, I propose a `choice` syntax that +specifies a sum type purely in terms of its alternatives, which acts as a sugar +syntax for those lower-level features. - closed alternatives { - pattern Success(T: value) -> Self { - return (.discriminator = 0, .storage = ToStorage(value)); - } +To avoid redundant boilerplate in functions that return sum types, I propose +allowing statements of the form `return .X;` and `return .F();`, which are +interpreted as if the function's return type appeared immediately prior to the +`.` character. - pattern Failure(Error: error) -> Self { - return (.discriminator = 1, .storage = ToStorage(error)); - } +Cumulatively, these features will allow `Status` to be defined so that the usage +portions of the above example can be rewritten as follows: - var Self:$$ Cancelled = (.discriminator = 2); +``` +fn ParseAsInt(String: s) -> Result(Int, String) { + var Int: result = 0; + var auto: it = s.begin(); + while (it != s.end()) { + if (*it < '0' || *it > '9') { + return .Failure("String contains non-digit"); + } + result += *it - '0'; + result *= 10; } + return .Success(result); } -``` -The code for `ParseAsInt` is unchanged, and the rest of the usage example would -look like: - -``` fn GetIntFromUser() -> Int { while(True) { var String: s = UserPrompt("Please enter a number"); @@ -305,477 +267,8 @@ fn GetIntFromUser() -> Int { } ``` -To allow users to define sum types without micromanaging the implementation -details, I propose introducing `choice` as a convenient syntax for defining a -sum type by specifying only the declarations of the set of alternatives. From -that information, the compiler generates an appropriate object representation, -and synthesizes definitions for the alternatives and special member functions. - -Our manual implementation of `Result` doesn't really benefit from having direct -control of the object representation, and doesn't seem to have any additional -API surfaces, so it's well-suited to being defined as a choice type instead: - -``` -closed choice Result(Type:$$ T, Type:$$ Error) { - pattern Success(T: value) -> Self; - pattern Failure(Error: error) -> Self; - var Self:$$ Cancelled; -} -``` - -Finally, to avoid the boilerplate in `return` statements, I propose that in a -function with return type `T`, a statement of the form `return .F(args)` is -equivalent to `return T.F(args)`. - -### Additional motivating use cases - -We don't yet have enough of a design for variadics to give an example of a -Carbon counterpart for `std::variant`, but a variant with exactly three -alternative types could be written like so: - -``` -choice Variant(Type:$$ T1, Type:$$ T2, Type:$$ T3) { - pattern Value(T1: value) -> Self; - pattern Value(T2: value) -> Self; - pattern Value(T3: value) -> Self; -} -``` - -> **Open question:** Should Carbon have a native syntax for pattern matching on -> the dynamic type of an object? If so, should types like `Variant` be able to -> use it, instead of having the `.Value` boilerplate in every pattern? Should -> this mechanism be aware of subtype relationships (so that a subtype pattern is -> a better match than a supertype pattern)? If so, how are those subtype -> relationships defined? - -As an example of a case where manual control of the representation is useful, -here's an example of a type that implements an optional pointer value without -requiring additional storage for a discriminator: - -``` -struct OptionalPtr(Type:$$ T) { - var Storage(Sizeof(Ptr(T)), Alignof(Ptr(T))): storage; - - alternatives { - pattern Value(Ptr(T): ptr) -> Self { return (.storage = ToStorage(ptr)); } - var Self:$$ Null = (.storage = ArrayOfZeros(Sizeof(Ptr(T)))); - } -} -``` - -Note that `ArrayOfZeros` can be defined as a pattern function, but only with -some possibly surprising extensions to Carbon's pattern language. See below for -details. - -### In-place mutation - -Consider the following code: - -``` -var Optional(Vector(Int)) opt = .Some(...); - -match (opt) { - case .Some(Vector(Int): v) => { v[0] += 1; } - default => {} -} -``` - -By analogy with function parameters, it seems clear that `Vector(Int): i` -declares either a new variable or a new constant, so either `v[0] += 1` modifies -that variable, or the code doesn't compile at all. In neither case does it -mutate `opt` itself. - -However, as discussed earlier, pattern matching is the primary interface for -interacting with sum types, so it's essential that pattern matching be able to -express in-place mutation somehow. In particular, there needs to be some way of -changing the above code so that it modifies the `Vector` in place. However, not -all parameters of all alternatives will be able to support this, because in some -cases the parameter value will be computed on the fly. For example, if the -low-order bits of a pointer are used to store a discriminator, a binding of that -pointer would need to expose a _copy_ that has the discriminator masked out, and -so any mutations that the user applied via the binding would apply to the copy, -and not the original sum object. - -Consequently, opting into in-place mutation probably needs to be done on a -per-binding basis, rather than a per-alternative or per-`match` basis. This will -express programmer intent more precisely, which will allow us to diagnose if -they're trying to mutate a field that doesn't support in-place mutation. - -By analogy with function parameters, it's tempting to try to express this opt-in -by changing the type of `v` to `Ptr(Vector(Int))`. However, that doesn't work in -general. For example, suppose we want to pattern match on a -`Variant3(Int, Ptr(Int), String)` -- does `.Value(Ptr(Int))` match the `Int` -case with in-place semantics, the `Ptr(Int)` case with by-value semantics, or -can it match both? The same argument applies to any other type-based opt-in, -unless the two types are indistinguishable by overload resolution (as with `T` -vs. `const T&` in C++). - -One way to solve this would be to provide a new kind of binding pattern. For -example, we could introduce `ref Type: name` as a syntax for binding the name -`name` to the object this pattern is matched against (rather than to a new -object with the same value). This syntax is intended to suggest that `ref` is -not part of the type, but rather an introducer for the declaration of `name`, -analogous to `var` and `val`: - -``` -match (opt) { - case .Some(ref Vector(Int): v) => { v[0] += 1; } - default => {} -} -``` - -Alternatively, we could have a syntax that binds a name to a pointer, rather -than directly to the object, although it's harder to find a concise but mnemonic -introducer to use in that case: - -``` -match (opt) { - case .Some(inplace Ptr(Vector(Int)): v) => { (*v)[0] += 1; } - default => {} -} -``` - -Rather than introduce a new kind of binding, it might be possible to solve this -problem by introducing a new pattern operator. In particular, we could allow -unary `*` to appear in patterns, with the semantics that `*` matches an -object whenever `` matches the address of that object. - -``` -match (opt) { - case .Some(*(Ptr(Vector(Int)): v)) => { (*v)[0] += 1; } - default => {} -} -``` - -This is in some ways more parsimonious, because we're just slightly broadening -the set of expression operators that have a dual role as pattern operators, -rather than introducing an entirely new kind of binding. - -This proposal doesn't take a position on what syntax to use, but I an inclined -to prefer `ref Type: name`. - -## Pattern functions - -Pattern functions have the same declaration syntax as ordinary functions, except -that they are introduced with `pattern` rather than `fn`. However, the function -body is required to be a single `return` statement whose operand expression is a -valid pattern, with each use of a function parameter acting as a variable -binding. One consequence of that constraint is that a given function parameter -cannot be accessed more than once when evaluating the pattern function. - -It seems likely that pattern functions will be required to be inline, because -they will probably need to generate very different code depending on the -structure of their arguments, particularly if the arguments contain -placeholders. - -The body of a pattern function must consist of a single `return` statement whose -operand is not only a valid expression, but also would be a valid pattern if any -uses of the function parameters were replaced with variable binding patterns. -Furthermore, the semantics of the expression must mirror the semantics of the -pattern. For example, this is a valid pattern function: - -``` -pattern MakeStruct(Int: x, Int: y) -> MyStruct { - return (.x = x, .y = y); -} -``` - -because `(.x = Int: x, .y = Int: y)` is a valid pattern that is guaranteed to -match the result of any call to `MakeStruct` and set `x` and `y` to that call's -arguments, and conversely any value that matches `(.x = Int: x, .y = Int: y)` is -guaranteed to be equal to `MakeStruct(x, y)`. - -This mirroring requirement ensures that we can run the function both forwards -and backwards, either evaluating the function on some arguments to produce a -return value, or pattern matching on a purported return value to deduce the -arguments that produced it. The major drawback of this approach is how it -constrains the expressiveness of pattern functions, and it's useful to consider -the forward and reverse aspects of that constraint separately. - -### Constraints on function evaluation - -In the forward direction, the mirroring requirement is clearly a very limiting -constraint: most function bodies do not consist of a single return statement, -and most expressions are not valid patterns. So it's reasonable to wonder -whether pattern functions have enough expressive power. So far I've come across -two notable examples of code that we are likely to want to express in pattern -functions, but that may not satisfy the mirroring requirement. - -The first example is that it seems very difficult to allow a pattern function to -express in-place initialization, such what C++'s placement-new does. In this -proposal, I work around that problem with a special case rule that essentially -treats `storage = ToStorage(obj)` as constructing a copy of `obj` in-place in -the untyped array `storage`, but only when initializing `storage`. However, this -is decidedly a hack. See the section on shareable storage below for details on -this issue. - -The second example is memory allocation. We will inevitably want pattern -functions to be able to create, and match, objects that store data on the heap. -In principle this seems feasible: for example, Carbon might have a `Box(T)` type -that represents a heap-allocated `T` object, and a `MakeBox` function for -creating a `Box(T)` from a `T` value. We could then use `MakeBox` in pattern -functions. For example, this is a pattern function that creates/matches objects -of type `IndirectInt`, a hypothetical integer-like type that stores its data on -the heap: - -``` -struct IndirectInt { - var Box(Int): i_ptr; -} - -pattern IndirectIntOf(Int: i) -> IndirectInt { - return (.i_ptr = MakeBox(Int, i)); -} -``` - -For this to work, two `Box(T)` values must be considered equal when the `T` -values they hold compare equal, rather than when they are literally the same `T` -object. This is important not only for pattern matching, but also for situations -like automatically generating a struct's comparison operators. Note that other -than these "deep" equality semantics, `Box(T)` would be very similar to C++'s -`std::unique_ptr`, and would probably play a comparable role in the Carbon -ecosystem. - -The problem here is that `MakeBox` has to both read and mutate some state that's -not part of its arguments, namely the state of the memory allocator. It's not -clear that we can allow pattern functions to have side effects, or to depend on -side inputs, because those operations are likely to be very difficult to invert. -In this case that mutable state has no effect on the actual value, at least as -far as equality comparison is concerned, but it's not clear whether or how we -can take advantage of that. - -In both cases, it seems very possible that we could extend the pattern language -with new primitives that encapsulate the non-invertible part of these -operations, but it's too soon to say what they would look like. - -### Constraints on pattern matching - -In the reverse direction, the mirroring requirement may prevent us from using -certain pattern matching syntaxes inside pattern functions, namely syntaxes that -don't correspond to expression syntaxes. However, that appears to be a tolerable -constraint, at least for the use cases we're interested in here. - -Pattern matching syntaxes in other languages generally don't have many such -syntaxes, but there are a few. The following sections discuss the most notable -examples I've found, and discuss how they affect the viability of this design. - -#### Underdetermined patterns - -One corollary of the mirroring requirement is that the named variable bindings -in the pattern must uniquely determine the value being matched. This precludes -any pattern which "loses information" about the matched value. Most notably, -this means a pattern function cannot contain wildcard patterns, such as `_`, or -Rust's `..` partial wildcard. - -As a more complex example, several languages support combining multiple patterns -disjunctively, yielding a pattern that matches when any of the subpatterns -match. For example, in Rust the pattern `(0, x) | (x, 0)` matches any pair whose -first or second element is `0`, and binds `x` to the other element. Such -patterns lose information about which case successfully matched, and so they -cannot occur in pattern functions. - -In such cases, the problem could be partially worked around by changing the code -to preserve the lost information. For example, the `_` pattern can be replaced -with a named function parameter. Note that although `_` cannot appear in a -pattern function, it can appear as an argument to a pattern function in a -top-level pattern, so the caller can discard that part of the pattern if it's -not useful to them. - -In the case of disjunctive patterns, preserving the lost information would -essentially require a pattern syntax for defining a variable binding that -records which alternative was selected. Or, to state the problem in the mirrored -sense, we need an expression syntax that uses a variable to select which of -several cases to return. Fortunately, Carbon is likely to have such a syntax -anyway, namely `match` expressions. If we also allow `match` in patterns, a -pattern like `(0, x) | (x, 0)` might be written as: - -``` -match c { - case 0 => (0, auto:x) - case 1 => (auto:x, 0) - default => Assert(False) -} -``` - -Note that inside a pattern, the case labels must be expressions, whereas the -match operand and the case bodies can be patterns. Interestingly, this is the -exact opposite of the rule for `match` outside of patterns: in effect, switching -from forward to reverse evaluation reverses the direction of the arrows. - -Inside a pattern function, the wildcard and variable bindings would need to be -replaced with function parameters (thereby making both sides expressions): - -``` -match c { - case 0 => (0, x) - case 1 => (x, 0) - default => Assert(False) -} -``` - -This is evidently more verbose than the Rust version, but that could be -mitigated with some syntax adjustments, such as a more concise way of asserting -that a match is exhaustive. - -The primary difficulty with `match` patterns is that they are prone to -ambiguity. For example, when matching with `(0, 0)`, it's not obvious what value -should be deduced for `c`. However, this problem is not unique to `match` -patterns, but is inherent in any sort of disjunctive patterns. Even though Rust -patterns don't provide an explicit indicator of which case was matched, -suitably-chosen variable bindings can indirectly expose that information. It -appears that Rust chooses the first match, but Carbon could in principle choose -the best match, or require the patterns to be unambiguous. Here again, it's as -if the `=>` arrow has switched directions, because the same ambiguity problem, -and the same possible resolutions, come up on the _left_ side of the arrows when -the `match` isn't in a pattern matching context. - -All that being said, it's not clear how important it is to support disjunctive -patterns (Swift doesn't, for example), so it may be simpler to avoid the issue -by not supporting them. - -#### Match guards - -Several languages support _match guards_, which use a predicate to restrict -pattern matching. For example, in Rust the pattern `Some(x) if x < 5` matches an -`Optional` value, and binds `x` to the `Int` it contains, only if that -value is less than 5. Match guards are always expressions, not patterns -- for -example, a pattern like `Some(x) if _ < 5` would be nonsensical. - -Match guards generally do not satisfy the mirroring requirement; for example, -`Some(x) if x < 5` is not a valid Rust expression. This may be a problem, -because we are likely to want match guards for Carbon, although it's not clear -if there will be important use cases for match guards inside pattern functions. -If there are, a very plausible path forward would be to allow pattern functions -to contain assertions. In the forward direction, these would behave in the -expected way, documenting and optionally enforcing preconditions on the function -call. In the reverse direction, they would function as match guards: - -``` -fn Foo(Int: x) -> Optional(Int) { - Assert(x < 5); - return Some(x); -} -``` - -This does somewhat complicate the mapping between functions and patterns, but -the additional complexity seems quite tolerable. - -Note that some languages provide a way for type authors to specify when two -values are considered to match. For example, in Swift you can overload the `~=` -operator, which returns a boolean indicating whether the left-hand side matches -the right. Both sides of the operator are values rather than patterns, so this -mechanism is effectively syntactic sugar for match guards, or an alternative to -them. - -#### Subpattern bindings - -Some languages support binding an identifier to a subpattern, rather than only -using identifiers as placeholders. This generally requires a separate syntax. -For example, in Rust the syntax `id @ p` matches the same values as the pattern -`p`, but also binds the name `id` to the matched value. The mirroring -requirement would imply that in an expression, `id @ p` evaluates to the value -of `id`, and also asserts that `id` matches `p`. These semantics seem -conceptually straightforward, but awkardly non-orthogonal and not very useful, -so if Carbon has this syntax at all, we could probably safely limit it to -patterns and pattern functions. - -It's interesting to note that we can interpret `p` as a pattern even in the -forward direction (because the primary behavior is supplied by `id`), so we can -probably allow it to contain wildcards. However, it's not clear whether that -will be useful in practice. - -#### Pattern-specific sugar syntaxes - -In Swift, if `x` is a pattern that matches values of type `T`, `x?` is a pattern -that matches values of type `Optional` that contain a value matching `x`. -However, if `x` is an expression of type `T`, `x?` will generally not be a valid -expression, so such patterns don't satisfy the mirroring requirement. However, -`x?` is purely syntactic sugar for `.some(x)`, which does satisfy the mirroring -requirement. I would tentatively recommend that we avoid introducing -pattern-specific sugar syntaxes for simplicity, but even if we do, by definition -it will always be possible to avoid using them in pattern functions. - -#### Avoidable inconsistencies - -Swift has an expression syntax `x as T`, which casts the value `x` to static -type `T`. Swift also has a pattern syntax `x as T`, which matches a -dynamically-typed value if its dynamic type is `T`, and its value matches `x`. -Although syntactically identical, these syntaxes don't mirror each other: in the -pattern form, `x` has static type `T`, whereas the expression form is useful -precisely when `x` does _not_ have static type `T`. For example, the pattern -`0 as Int` does not match the expression `0 as Int`, but does match the -expression `0 as Any`. However, this inconsistency doesn't seem necessary, -especially given that in Carbon, the variable binding syntax includes a -mandatory type, so the Carbon counterpart to Swift's `x as T` will presumably be -something like `T: x` or `T: x as Any`, depending on whether converting `T` to -`Any` requires an explicit cast, and that satisfies the mirroring requirement. - -### Worked example: `ArrayOfZeroes` - -The `OptionalPtr` example above relies on `ArrayOfZeroes`, a pattern function -that takes an integer `N` and returns an `Array(Byte, N)` whose elements are all -zero. It is possible to implement such a pattern function, but it requires some -extensions to Carbon's pattern language that may be surprising: - -``` -pattern ArrayOfZeroes(UInt:$$ N) -> Array(Byte, N) { - Array(Byte, N)(ListOfZeroes(N)); -} - -pattern ListOfZeroes(UInt:$$ N) -> List(Byte) { - return match (N) { - case 0 => { .Nil } - case (UInt: M)+1 => { .Cons(Byte(0), ListOfZeroes(M)); } - }; -} -``` - -This design requires Carbon to support `match` expressions inside patterns, as -discussed above in the section on underdetermined patterns. More surprisingly, -it requires the ability to use `+1` as a pattern matching operator on unsigned -integers. This is novel in my experience, but I hope the meaning of the code is -relatively clear. Furthermore, although it may seem ad-hoc, it's theoretically -well founded: we're effectively destructuring the unary representation of -counting numbers as `0+1+1+...+1`. In other words, we're treating `UInt` as -though it were a sum type, analogous to - -``` -closed choice UInt { - var Self:$$ Zero; - pattern Successor(Uint) -> Self; -} -``` - -We could even use a named `Successor` function in place of `+1`, if we wish to -avoid uncertainty about what kinds of arithmetic expressions can appear in -patterns. Whatever the syntax, the ability to use `+1` in patterns is a -fundamental building block for creating pattern functions that involve counting. - -This implementation also presupposes a type (called `List(Byte)` here) that can -be used to initialize an `Array`, and that supports pattern matching on the -empty list (spelled `.Nil` here) and on destructuring the first element from a -list (spelled `.Cons(first, rest)` here). These seem like reasonable -requirements for whatever type Carbon uses to represent array literals. If -Carbon uses tuples for that purpose, I expect an approach equivalent to the -above would still work, but I can't directly demonstrate that until we have a -design for variadics. - -### Alternatives considered - -#### Make alternatives be pattern functions implicitly - -Every function in an `alternatives` block must be a pattern function, and this -proposal has no need for pattern functions outside of `alternatives` blocks. -That being the case, we could eliminate the `pattern` keyword, and instead -specify that functions within an `alternatives` block have special semantics. - -I have opted not to do so here because I think it's somewhat clearer to mark -individual functions, and because the ability to define a pattern matching -shorthand seems likely to be useful outside of the context of sum types. For -example, proposal [0087](https://github.com/carbon-language/carbon-lang/pull/87) -suggests introducing an `NTuple` function as a built-in primitive in order to -support variadics, but given some plausible extensions, it could probably be -implemented as an ordinary pattern function in library code. +> **Open question:** How will user-defined sum types (and pattern matching in +> general) support name bindings that allow you to mutate the underlying object? ## Shareable storage @@ -797,97 +290,15 @@ not to be inhabited by an object. This means that in the general case, the compiler will not be able to generate safe default implementations for any special member functions of types that have -shareable storage members. However, it can do so if the shareable storage is -part of a sum type, because the `alternatives` block contains enough information -for it to infer what objects are present, by effectively pattern-matching on the -sum type object. - -In order for non-sum types to define those functions manually, the design for -shared storage will need to include operations for creating and destroying -objects within it, analogous to placement-`new` and pseudo-destructor calls in -C++. - -### Initialization - -The most challenging requirement that this approach imposes on the design of -shareable storage is that it must be possible to initialize it from an instance -of any of the values that it can represent, and to do so inside a pattern or -pattern function. - -This proposal represents that initialization using a `ToStorage` function call, -such as `.storage = ToStorage(success)` in the example above. However, the -semantics of this code may be somewhat surprising: it doesn't merely copy the -underlying bytes of `success` into `.storage`, it actually creates a new `T` -object within `.storage`, which is directly initialized from `success`. This -assumes that Carbon has some equivalent to C++'s "guaranteed RVO"; this code -cannot be understood as creating a temporary `Storage` array representing a `T` -value and then moving it into `.storage`, because it is not safe to move an -inhabited `Storage` storage array, as discussed above. - -`ToStorage` is presented here as a function, but it can't actually be -implemented as a function within the Carbon language. Recall that our motivating -use cases involve invoking `ToStorage` inside pattern functions, so it needs to -be a pattern function itself, assuming it's a function at all. If it were -implemented as Carbon code, it would need to consist of a single expression that -initializes a `Storage` from an object, and the whole problem we're trying to -solve here is to make it possible to write such an expression. Consequently, we -may wish to give it a different syntactic form. This operation is in many ways a -type cast, so the syntactic choice here will depend heavily on the syntax of -Carbon's other casts. - -At a deeper level, the problem is that the whole notion of allowing multiple -objects to successively share the same storage is inherently procedural (because -it involves changes in state over time), but pattern matching is fundamentally -descriptive, and hence functional. A design in which user-defined patterns can -be expressed in terms of procedural forward and reverse functions would avoid -this whole problem, by allowing us to express storing the alternative in terms -of procedural code (such as the equivalent of C++'s placement-`new`) rather than -initialization. - -#### Alternatives considered - -It is tempting to try to mitigate those problems by making the conversion -implicit, so that the code looks like `.storage = success`. However, this would -mean that `Storage` can be implicitly initialized from a value of any type, and -such extremely broad implicit conversions tend to be highly problematic. For -example, consider the definition of `.Null` in the `OptionalPtr` example: +shareable storage members. -``` -var Self:$$ Null = (.storage = ArrayOfZeros(Sizeof(Ptr(T)))); -``` +For purposes of illustration in this proposal, I will treat +`Storage(SizeT size, SizeT align)` as a library template representing an untyped +buffer of `size` bytes, aligned to `align`. It provides `Create`, `Read`, and +`Destroy` methods which create, access, and destroy an object of a specified +type within the buffer. -If `Storage` were implicitly convertible from any type, and that conversion -caused an object to be created within the `Storage`, this code would no longer -just set the bytes of `.storage` to zero: it would also actually create an -`Array(Byte)` object within `.storage`, which would then need to be subsequently -destroyed before `.storage` is destroyed, or used to store a pointer value. -There are various ways to finesse this issue, but they all involve adding -additional special-case rules in order to avoid or mitigate this consequence of -the general rules. - -We don't propose this approach because it doesn't really address the problems -with `ToStorage`; it merely obscures those problems, while introducing a new -problem that would itself require further work to obscure. However, it's worth -noting that the drawbacks of this approach are much less severe if we implement -shared storage using a union rather than an untyped byte array, because the -implicit conversion would not need to be universal. - -### Pattern matching evaluation order - -As discussed above, Carbon's pattern language must be restricted to operations -that the compiler can automatically invert. The inverse of creating an object of -type `T` in shareable storage is reading an object of type `T` out of shareable -storage. However, since shareable storage will not track the types or offsets of -the objects it contains, this inverse operation is safe only if the shared -storage is known to contain an object of type `T` at that offset. - -Consequently, pattern matching evaluation must occur in an order that guarantees -that a suitable object is present before it is loaded from shareable storage. -Correspondingly, the author of the code must structure the pattern match in such -a way that the compiler can find an appropriate evaluation order. - -At least for our motivating use cases, this appears to be intuitively -straightforward. Consider the `Result` example we saw earlier: +Using `Storage`, we can redefine `Status` as follows: ``` struct Result(Type:$$ T, Type:$$ Error) { @@ -896,21 +307,31 @@ struct Result(Type:$$ T, Type:$$ Error) { var Storage(Max(Sizeof(T), Sizeof(Error)), Max(Alignof(T), Alignof(Error))): storage; - closed alternatives { - pattern Success(T: value) -> Self { - return (.discriminator = 0, .storage = ToStorage(value)); - } - - pattern Failure(Error: error) -> Self { - return (.discriminator = 1, .storage = ToStorage(error)); - } + fn Success(T: value) -> Self { + Self result = (.discriminator = 0); + result.storage->Create(T, value); + return result; + } - var Self:$$ Cancelled = (.discriminator = 2); + fn Failure(Error: error) -> Self { + Self result = (.discriminator = 1); + result.storage->Create(Error, error); + return result; } + + var Self:$$ Cancelled = (.discriminator = 2); + + // Copy, move, assign, destroy, and similar operations need to be defined + // explicitly, but are omitted for brevity. } +``` -... +## User-defined pattern matching +As seen in the example above, we want to allow pattern-matching on `Status` to +look like this: + +``` match (ParseAsInt(s)) { case .Success(var Int: value) => { return value; @@ -919,188 +340,188 @@ match (ParseAsInt(s)) { Display(error); } case .Cancelled => { + // We didn't request cancellation, so something is very wrong. Terminate(); } } ``` -When matching `ParseAsInt(s)` with the `Success` pattern, the compiler can -observe that no other alternative sets `.discriminator` to `0`, so it can safely -load a `T` value out of storage once it has successfully matched -`.discriminator` with `0`. +For this to work, `Status` needs to specify two things: + +- The set of all possible alternatives, including their names and parameter + types, so that the compiler can typecheck the `match` body, identify any + unreachable `case`s, and determine whether any `case`s are missing. +- The algorithm that, given a `Status` object, determines which alternative is + present, and specifies the values of its parameters. -The `OptionalPtr` example is somewhat more subtle: +Here's how `Status` can do that under this proposal: ``` -struct OptionalPtr(Type:$$ T) { - var Storage(Sizeof(Ptr(T)), Alignof(Ptr(T))): storage; +struct Result(Type:$$ T, Type:$$ Error) { + var Int: discriminator; - alternatives { - pattern Value(Ptr(T): ptr) -> Self { return (.storage = ToStorage(ptr)); } - var Self:$$ Null = (.storage = ArrayOfZeros(Sizeof(Ptr(T)))); + var Storage(Max(Sizeof(T), Sizeof(Error)), + Max(Alignof(T), Alignof(Error))): storage; + + interface MatchContinuation { + fn Success(T: value); + fn Failure(Error: error); + fn Cancelled(); } + + impl Matchable(MatchContinuation) { + method Match[MatchContinuation:$ Continuation]( + Ptr(Self): this, Ptr(Continuation): continuation) { + match (discriminator) { + case 0 => { + continuation->Success(this->storage.Read(T)); + } + case 1 => { + continuation->Failure(this->storage.Read(Error)); + } + case 2 => { + continuation->Cancelled(); + } + default => { + Assert(false); + } + } + } + } + + // Success() and Failure() factory functions, and the Cancelled static + // constant, are defined as above. + + // Copy, move, assign, destroy, and similar operations need to be defined + // explicitly, but are omitted for brevity. } ``` -In this case, the compiler can observe that since the `alternatives` block is -required to be exhaustive, `.storage` must hold either a `T` value or an -all-zeros bit pattern, so it can safely attempt to match the `.Value` case only -after it has ruled out the `.Null` case, which it can evaluate unconditionally -because that case only requires loading raw bytes, which is always safe. -Consequently, we may want to require that a pattern match that looks for -`.Value` must also have an explicit `.Null` case (rather than a generic -`default` case), so as to make it clear to the reader that the `.Null` case is -being evaluated. If Carbon specifies that pattern matching evaluates cases in -order, we would presumably also require that the `.Null` case is above the -`.Value` case. - -We will need to formalize the intuition behind those examples, in the form of -concrete language rules that are strict enough for the compiler to feasibly -perform that sort of symbolic reasoning, and yet permissive enough that type -authors can feasibly understand and follow them, and the compiler can produce -intelligible error messages when they are violated. Furthermore, it would be -strongly preferable if the compiler could detect any errors purely by inspecting -the type definition, rather than delaying them until the type is actually used -in pattern matching. - -The design of those rules is deferred to a subsequent proposal. However, one key -requirement is already clear: the compiler must be able to identify all of the -type's alternatives, and know that they exhaustively describe all possible -values of the type. This is a key reason for introducing the `alternatives` -block. For similar reasons, we may also need to introduce some syntax for -marking a single pattern function as being exhaustive (`MakeUnique`, for -example), but such use cases are out of scope for this document because they -don't involve sum types. - -## The `alternatives` block - -An `alternatives` block designates a set of pattern functions and static -constants as representing an exhaustive and unambiguous set of alternatives for -the type. By defining an `alternatives` block, the type author is guaranteeing -that every possible value of the type matches exactly one of the alternatives. -Consequently, the design of this feature will need to determine how Carbon -handles violations of this guarantee without compromsing Carbon's safety goals. - -Although alternatives are always exhaustive as far as the language semantics are -concerned, by default user code will be required to treat them as -non-exhaustive. For example, a `match` statement that has cases for the -alternatives of a sum type will be required to have a `default` case, even if it -also has patterns that match all of the declared alternatives. This ensures that -sum types can be extended with new alternatives without breaking any existing -code. - -However, code in the same -[library](/docs/design/code_and_name_organization/#libraries) as the choice type -is not subject to this restriction. Requiring a `default` has little benefit for -that code, because it can easily be updated when a new alternative is added, -without creating any version-skew concerns. Conversely, requiring a `default` -would have higher costs: code that's part of the sum type's API is more likely -to be required to explicitly handle every alternative (consider, for example, an -`Unparse` method on a sum type representing a parse tree). When that's the case, -omitting a `default` provides a build-time guarantee that every alternative has -been handled. - -Declaring the alternatives with `closed alternatives` rather than `alternatives` -allows all code to treat the alternatives as exhaustive. +In this code, `Result` makes itself available for use in pattern matching by +declaring that it implements the `Matchable` interface. `Matchable` takes an +interface argument, `MatchContinuation` in this case, which specifies the set of +possible alternatives by declaring a method for each one. I'll call that +argument the _continuation interface_, for reasons that are about to become +clear. + +The `Match` method of the `Matchable` interface. This method takes two +parameters: the value being matched against, and an instance of the continuation +interface, which the compiler generates from the `match` expression being +evaluated, with method bodies that correspond to the bodies of the corresponding +`case`s. Once `Match` has determined which alternative is present, and the +values of its parameters, it invokes the corresponding method of the +continuation object. `Match` is required to invoke exactly one continuation +method, and to do so exactly once. -### Alternatives considered +This proposal assumes that Carbon will have support for defining and +implementing generic interfaces, including interfaces that take interface +parameters, and uses `interface`, `impl`, `method` etc. as **placeholder** +syntax. It can probably be revised to work if interfaces can't be parameterized +that way, or if we don't have a feature like this at all, but it might be +somewhat more awkward. -#### More concise syntax +Notice that the names `Success`, `Failure`, and `Cancelled` are defined twice, +once as factory functions of `Result` and once as methods of +`MatchContinuation`, with the same parameter types in each case. The two +effectively act as inverses of each other: the factory functions compute a +`Result` from their parameters, and the methods are used to report the parameter +values that compute a given `Result`. This mirroring between expression and +pattern syntax is ultimately a design choice by the type author; there is no +language-level requirement that the alternatives correspond to the factory +functions. -We could introduce a special syntax for declaring alternatives, rather than -using the existing syntaxes for pattern functions and static variables. Such a -syntax could be substantially more concise, because it could omit the components -of a function/variable declaration that are just boilerplate in the context of a -set of alternatives. For example, the alternatives of the `Result(T, Error)` -struct might look like this: +This approach can be extended to support non-sum patterns as well. For example, +a type that wants to match a tuple-shaped pattern like `(Int: i, String: s)` +could define a continuation interface like ``` -closed alternatives { - Success(T: value) { ... } - Failure(Error: error) { ... } - Cancelled = (.discriminator = 2); +interface MatchContinuation { + fn operator()(Int: i, String: s); } ``` -The same simplifications could be applied to `choice` types, so that for example -the `choice` version of `Result` could look like this: +A type's continuation interface can also include a function named +`operator default`, which implements the `default` case of the `match`. +Consequently, any `match` on that type will be required to have a `default` +case. This is valuable because it protects the type author's ability to add new +alternatives in the future, without causing build failures in client code. + +> **Open question:** Can `Match` actually invoke `operator default`? It might +> sometimes be useful to define a type that can't be matched exhaustively, and +> the mere possibility that the `default` case could actually run might help +> encourage client code to implement it robustly, rather than blindly providing +> something like `Assert(False)`. However, if there's a language-level guarantee +> that the `default` case is unreachable if all other alternatives are handled, +> then we can allow code in the same library as the type to omit the `default` +> case. That's desirable because code in the same library doesn't pose an +> evolutionary risk, and it's often valuable to have a build-time guarantee that +> code within the type's own API explicitly handles every case. + +Note that the process of translating a pattern-matching operation into a `Match` +call may require special overload resolution rules. Overload resolution is +normally driven by the types of the arguments, but the concrete type of the +argument to `Match` may not be known until quite late in the compilation +process, because it could depend on the details of code generation. Instead, +overload resolution will have to be driven by some more abstract representation +of the "shape" of the callsite, such as a dummy type with the same interface as +the actual argument, but potentially different implementation details. However, +this might not be visible to users so long as the continuation parameter is +passed as a generic rather than a template (`:$` rather than `:$$`). + +## Choice types + +To allow users to define sum types without micromanaging the implementation +details, I propose introducing `choice` as a convenient syntax for defining a +sum type by specifying only the declarations of the set of alternatives. From +that information, the compiler generates an appropriate object representation, +and synthesizes definitions for the alternatives and special member functions. + +Our manual implementation of `Result` above doesn't really benefit from having +direct control of the object representation, and doesn't seem to have any +additional API surfaces, so it's well-suited to being defined as a choice type +instead: ``` choice Result(Type:$$ T, Type:$$ Error) { - Success(T); - Failure(Error); - Cancelled; + Success(T: value), + Failure(Error: error), + Cancelled } ``` -However, this brevity would come at the cost of consistency: there would now be -two structurally different syntaxes for declaring functions, and two -structurally different syntaxes for declaring static constants. - -#### Marking alternatives individually - -Rather than making `alternatives` a separate block, we could define a syntax for -marking alternatives individually. If we choose not to support pattern functions -that aren't alternatives, this syntax could be an introducer that takes the -place of `pattern`. This would also synergize well with the option of having a -more concise syntax for alternatives, which otherwise may have problems with the -lack of an introducer. - -The primary drawback of this approach is that the lack of explicit grouping -could make it more difficult for readers (and perhaps even the compiler) to know -when all the alternatives have been enumerated. It would also mean that we can't -support the admittedly unlikely use case of defining a sum type that has -multiple complete sets of alternatives. - -#### Different syntax for `closed` - -The `closed` syntax should be considered little more than a placeholder. It's -somewhat unconventional for a modifier to come before an introducer, as with -`closed alternatives`. Reversing the order would fix that problem, but -`alternatives closed` reads quite awkwardly as English. Making `closed` a -keyword would prevent developers from using `closed` as an identifier, which may -be too high a cost for such a niche use case. We could fix that by making it an -attribute rather than a keyword, but it's not clear that Carbon will have -attributes, much less what the syntax would be. - -It may be surprising that there is no corresponding `open alternatives` syntax, -but `open` would be meaningless syntactic noise unless we made it mandatory, and -making it mandatory would be poor ergonomics, because developers would be forced -to make an up-front decision between the two, rather than relying on a safe -default. Furthermore, reserving `open` as a keyword seems even more problematic -than reserving `closed`. - -## `choice` - -A choice type definition has the same general form as an `alternatives` block, -except that: - -- It has `choice` and the type name in place of `alternatives`. -- It need not (and usually won't) be inside a `struct` definition, because it - defines a new type, rather than specifying part of an enclosing type - definition. -- Its members cannot have definitions, because those definitions will be - provided by the compiler. - -The expected implementation of a `choice` type will be very similar to the -`struct` version of `Result` shown earlier, with a discriminator field and a -storage buffer large enough to hold the argument values of the alternatives. Any -alternative parameter types that are incomplete (or have unknown size for any -other reason) will be represented using owning pointers; among other things, -this will allow users to define recursive choice types. The implementation will -be hidden, of course, and the compiler may be able to generate better code, but -we will design this feature to support at least that baseline implementation -strategy. - -One consequence is that although the pattern functions of a choice type can be -overloaded (as in the `Variant` example above), they cannot be templates. More +The body of a `choice` type definition consists of a comma-separated list of +alternatives. These have the same syntax as a function declaration, but with +`fn` and the return type omitted. If there are no arguments, the parentheses may +also be omitted, as with `Cancelled`. `default` may also be included in the +list, with the same meaning as `operator default` in a continuation interface: +it means that pattern-matching operations on this type must be prepared to +handle alternatives other than those explicitly listed. + +The choice type will have a static factory function corresponding to each of the +alternatives with parentheses, and a static data member corresponding to each +alternative with no parentheses. The choice type will also implement the +`Matchable` interface, and its continuation interface will have a method +corresponding to each alternative. + +In short, this definition of `Result` as a choice type will have the same +semantics as the earlier definition of it as a struct. It will probably also +have the same implementation, with a discriminator field and a storage buffer +large enough to hold the argument values of the alternatives. Any alternative +parameter types that are incomplete (or have unknown size for any other reason) +will be represented using owning pointers; among other things, this will allow +users to define recursive choice types. The implementation will be hidden, of +course, and the compiler may be able to generate better code, but we will design +this feature to support at least that baseline implementation strategy. + +One consequence is that although the alternatives of a choice type can be +overloaded (as in the `Variant` example below), they cannot be templates. More precisely, the parameter types of a pattern function must be fixed without knowing the values of any of the arguments. To see why, consider a choice type like the following, which attempts to emulate `std::any`: ``` choice Any { - pattern Value[Type:$$ T](T value); + Value[Type:$$ T](T value) } ``` @@ -1116,7 +537,7 @@ benefit: these sorts of types appear to be rare, and when needed they should be implemented in library code, where the performance tradeoffs are explicit and under programmer control. -If may be possible to relax this restriction when and if we have a design for +It may be possible to relax this restriction when and if we have a design for supporting non-fixed-size types, although it's worth noting that even that would not give us a way for `Any` to support assignment. @@ -1136,28 +557,20 @@ place of the member types. However, there are a couple of special cases: A future proposal for this mechanism will need to consider whether to require an explicit opt-in to generate these operations. -The compiler-generated definitions of a choice type's alternatives are -unspecified, except that they will satisfy the semantic requirements that apply -to all alternatives. - ### Alternatives considered #### Separate support for enumerated types This proposal supports enumerated types as a special case of choice types. However, there may be some benefit to providing special-case support for -enumerated types, similar to C++'s `enum`. In particular: - -- We could more easily avoid the `var Self:$$` boilerplate in enumerator - declarations. -- We could allow the developer to specify an underlying type, associate a - specific value of the underlying type with each enumerator, and convert - between the enum and the underlying type, which are fairly common practices - in C++ code. When using choice types, these practices can be emulated by - defining functions that map between the choice type and the underlying type, - but that requires a substantial amount of error-prone boilerplate. - Furthermore, those functions can't reliably be no-ops at the hardware level, - the way they can be with C++ enums. +enumerated types, similar to C++'s `enum`. In particular, we could allow the +developer to specify an underlying type, associate a specific value of the +underlying type with each enumerator, and convert between the enum and the +underlying type, which are fairly common practices in C++ code. When using +choice types, these practices can be emulated by defining functions that map +between the choice type and the underlying type, but that requires a substantial +amount of error-prone boilerplate. Furthermore, those functions can't reliably +be no-ops at the hardware level, the way they can be with C++ enums. I am omitting that from this proposal for simplicity, since it's purely additive, and not necessary for the goals of this proposal. @@ -1170,45 +583,6 @@ values being explicitly enumerated in the code. I chose the spelling `choice` because "choice type" is one of the only available synonyms for "sum type" that doesn't have any potentially-misleading associations. -#### Extend `alternatives` instead - -Instead of introducing `choice`, we could extend `alternatives` with a syntax -for requesting that the definitions be synthesized rather than provided by the -user. For example, perhaps the abbreviated definition of `Result(T, Error)` -could be written as: - -``` -struct Result(Type:$$ T, Type:$$ Error) { - alternatives { - pattern Success(T: value) -> Self; - pattern Failure(Error: error) -> Self; - var Self:$$ Cancelled; - } = default; -} -``` - -This approach is somewhat more flexible, because it would permit the type owner -to give `Result` additional member functions without forcing them to supply all -the boilerplate associated with the fully handwritten type definition. However, -it still wouldn't be possible to give `Result` additional data members, because -the generated code for the alternatives would have no way to initialize them, so -structs containing an `alternatives` block would be starkly different from ones -that don't. On a related note, this syntax would be somewhat misleading, because -the compiler would be synthesizing not only the definitions of the alternatives, -but also the special member functions and data members. - -This approach would also be more verbose, especially vertically, which will be -especially noticeable since these types will probably be quite small in the -common case. - -Alternatively, we could unify the two keywords by eliminating `alternatives` and -using `choice` in its place, with the presence of a name acting to distinguish a -`choice` block from a `choice` type. However, it seems liable to be confusing -for the presence or absence of a name to trigger such stark differences in -semantics. Note also that unlike the previous option, this approach doesn't -allow users to extend the type with additional methods without losing compiler -generation of the alternatives and special member functions. - ## Alternatives considered ### Indexing by type @@ -1220,18 +594,28 @@ alternative to have a distinct type. With this approach, which I'll call closely resemble C++'s `std::variant`, rather than Swift and Rust's `enum` or the sum types of various functional programming languages. -Either approach can be emulated in terms of the other: the `Variant` example -above shows how we can use overloading to emulate type-indexing in our -name-indexed framework, and conversely a type-indexed type like `std::variant` -can model a name-indexed type like `Result(T,E)` by introducing a wrapper type -for each name, leading to something like -`std::variant, Error, Cancelled>` (note that `std::variant` -would not work, because `T` and `E` can be the same type). In either case, -emulating the other model introduces some syntactic overhead: with -name-indexing, `Variant`'s factory functions must be given a name (`Value`) even -though it doesn't really convey any information, and emulating `Result(T,E)` in -terms of type-indexing requires separately defining the wrapper templates -`Value` and `Error`. +Either approach can be emulated in terms of the other. For example, we don't yet +have enough of a design for variadics to give an example of a Carbon counterpart +for `std::variant`, but a variant with exactly three alternative types +could be written like so: + +``` +choice Variant(Type:$$ T1, Type:$$ T2, Type:$$ T3) { + pattern Value(T1: value) -> Self; + pattern Value(T2: value) -> Self; + pattern Value(T3: value) -> Self; +} +``` + +Conversely a type-indexed type like `std::variant` can model a name-indexed type +like `Result(T,E)` by introducing a wrapper type for each name, leading to +something like `std::variant, Error, Cancelled>` (note that +`std::variant` would not work, because `T` and `E` can be the same type). +In either case, emulating the other model introduces some syntactic overhead: +with name-indexing, `Variant`'s factory functions must be given a name (`Value`) +even though it doesn't really convey any information, and emulating +`Result(T,E)` in terms of type-indexing requires separately defining the wrapper +templates `Value` and `Error`. The distinction between these two models of sum types seems analogous the distinction between the tuple and struct models of product types. Tuples and @@ -1252,221 +636,12 @@ cases for tuples appear to be quite rare compared to use cases for structs. Consequently, if Carbon has only one form of sum types, it should probably be the name-indexed form, as proposed here. -> TODO: We should consider ways of minimizing or avoiding the burden of -> boilerplate factory names like `Value` for type-indexed use cases. - -### Pattern matching callbacks - -Rather than requiring the compiler to automatically invert a set of factory -functions in order to use them as patterns, we could allow types to specify how -they participate in pattern matching by supplying explicit code for both -directions. In this approach, the method that implements the reverse direction -would receive a set of continuations representing the different branches of the -match, and it would be responsible for choosing which one to execute. Here's -what that might look like, again revisiting our `Result` example: - -``` -struct Result(Type:$$ T, Type:$$ Error) { - var Int: discriminator; - - var Storage(Max(Sizeof(T), Sizeof(Error)), - Max(Alignof(T), Alignof(Error))): storage; - - fn Success(T: value) -> Self { - Self result = (.discriminator = 0); - result.storage->Create(T, value); - return result; - } - - fn Failure(Error: error) -> Self { - Self result = (.discriminator = 1); - result.storage->Create(Error, error); - return result; - } - - var Self:$$ Cancelled = (.discriminator = 2); - - interface MatchContinuation { - fn Success(lval T: value); - fn Failure(lval Error: error); - fn Cancelled(); - } - - impl Matchable(MatchContinuation) { - method Match[MatchContinuation:$ Continuation]( - Ptr(Self): this, Ptr(Continuation): continuation) { - match (discriminator) { - case 0 => { - continuation->Success(this->storage.Read(T)); - } - case 1 => { - continuation->Failure(this->storage.Read(Error)); - } - case 2 => { - continuation->Cancelled(); - } - default => { - Assert(false); - } - } - } - } - - // Copy, move, assign, destroy, and similar operations need to be defined - // explicitly, but are omitted for brevity. -} -``` - -In this code, `Result` makes itself available for use in pattern matching by -declaring that it implements the `Matchable` interface with a given continuation -interface. That interface, `MatchContinuation` in this case, tells the compiler -what kinds of patterns can be matched against this type, so that it can -typecheck the `match` expression. The compiler then invokes `Match` with a -concrete continuation object containing the actual code that implements the -different branches of the `match` expression, and `Match` invokes the -appropriate continuation. - -This proposal assumes that Carbon will have support for defining and -implementing generic interfaces, including interfaces that take interface -parameters, and uses `interface`, `impl`, `method` etc. as **placeholder** -syntax. It can probably be revised to work if interfaces can't be parameterized -that way, or if we don't have a feature like this at all, but it might be -somewhat more awkward. - -`lval T: value` is a **placeholder** syntax that indicates that this parameter -can be bound either in-place or by value when pattern matching against `Result`; -omitting the `lval` introducer would indicate that the parameter can only be -bound by value. The "`lval`" spelling is intended to pair with the -`ref Type: value` spelling for opting into in-place semantics in a pattern, and -is motivated by analogy with C++'s concept of an "lvalue": `ref` patterns can -only bind to `lval` parameters, just as non-const references can only bind to -lvalues in C++, because they represent durable objects rather than ephemeral -values. If we choose a different in-place syntax for patterns, we could -presumably find a corresponding syntax to use here. - -With this approach, we no longer have pattern functions, or `alternatives` -blocks: factories like `Success` are entirely ordinary Carbon functions, with no -special restrictions. For example, we can now create objects inside `.storage` -procedurally, via a `Create` method. However, the price is that the factory -functions can no longer be used in patterns. - -Consequently, the names `Success`, `Failure`, and `Cancelled` are defined twice, -once as factory functions of `Result` and once as member functions of -`MatchContinuation`, with the same arity and argument types in each case. Unlike -in the primary proposal, this correspondence is a design choice by the type -author; there is no language-level requirement that the alternatives correspond -to the factory functions. Consequently, there is no requirement that the factory -functions are exhaustive. There is a requirement that the interface methods are -mutually exclusive, but only in the sense that `Match` is required to call -exactly one of them. - -This approach can be extended to support non-sum patterns as well. For example, -a type that wants to match a tuple-shaped pattern like `(Int: i, String: s)` -could define a continuation interface like - -``` -interface MatchContinuation { - fn operator()(Int: i, String: s); -} -``` - -A type's continuation interface can also include a function named -`operator default`, which implements the `default` case of the `match`. -Consequently, any `match` on that type will be required to have a `default` -case, so this has roughly the same effect as omitting `closed` from the -`alternatives` block in the primary proposal. - -> **Open question:** Can `Match` actually invoke `operator default`? It might -> sometimes be useful to define a type that can't be matched exhaustively, and -> the mere possibility that the `default` could actually run might help -> encourage client code to implement it robustly, rather than blindly providing -> something like `Assert(False)`. However, if there's no language-level -> guarantee that the `default` case is unreachable if all other alternatives are -> handled, then we won't be able to let same-library code omit the `default`. - -This approach has a number of advantages: - -- Types can use the full power of the Carbon language when expressing how they - participate in pattern matching. In particular, this means `Storage` doesn't - need a complex initialization API, because its state can be set - procedurally. More broadly, it's much less likely that a type will be unable - to participate in pattern matching because of limitations of the language. -- Patterns are allowed to be underconstrained. This means Carbon can more - easily include operations like `|` in its pattern language, and it means - types can make parts of the object state invisible to pattern matching (for - example, non-salient state like the capacity of a `std::vector`). -- The language rules are much simpler, and easier to explain, because we don't - need to specify both "forward" and "reverse" semantics for a subset of the - language, or specify the boundaries of that subset. Relatedly, it provides a - simpler correspondence between the Carbon code and the generated assembly, - which could substantially simplify things like debugging. - -However, it also has some substantial drawbacks: - -- Manually-defined sum types will probably require dramatically more code, - both because the special member functions cannot be generated automatically, - and because the "forward" and "reverse" directions both require explicit - code. This additional verbosity not only makes these types more tedious to - write and read, it also creates a risk of bugs in the code that's being - written by fallible humans instead of the compiler. The duplication of names - may also create readability problems. -- This approach only allows programmers to extend pattern matching to support - new _types_, not new _operations_. For example, it doesn't directly give us - a way to define `ArrayOfZeroes(N)` (see above) as an operator that can be - used in patterns. However, it's possible that this feature could be extended - to support user-defined pattern operations, by treating them as comparable - to `Matchable.Match` except that they must be invoked by name rather than - being invoked implicitly. -- The process of translating a pattern-match into a `Match` call may require - special overload resolution rules. Overload resolution is normally driven by - the types of the arguments, but the concrete type of the argument to `Match` - may not be known until quite late in the compilation process, because it - could depend on the details of code generation. Instead, overload resolution - will have to be driven by some more abstract representation of the "shape" - of the callsite, such as a dummy type with the same interface as the actual - argument, but potentially different implementation details. However, this - might not be visible to users so long as the continuation parameter is - passed as a generic rather than a template (`:$` rather than `:$$`). -- Manually-defined sum types default to being closed, rather than open. This - creates some risk that sum type authors will accidentally lock themselves - out of the ability to add new alternatives. - -One drawback is worth discussing in more depth: Carbon's parameter-passing -semantics might not be expressive enough to support this approach. The current -design is expressed in terms of a restricted form of pass-by-reference, but -there is substantial resistance to supporting any form of reference-like -parameter passing in Carbon. However, it is not clear how a purely pass-by-value -approach could replace the pass-by-reference placeholder design while retaining -its most important properties, such as: - -- The interface definition makes the structure of the sum type clearly legible - to the reader, and makes it easy for the reader to disregard the distinction - between parameters that do and do not support in-place binding if it's not - relevant to them. -- The code for a by-value binding doesn't depend on whether an in-place - binding would also be supported. -- The `Match` function's behavior depends solely on the value of the object, - and not on any characteristics of the pattern, such as whether any given - parameter is being bound in-place. Correspondingly, there's no need for a - mechanism to convey information about the pattern to `Match`. - -However, there are a number of other problems that Carbon will need to solve if -we want to avoid supporting pass-by-reference, and it's quite possible that -solving those problems will naturally solve these as well, or at least get us -closer to a solution. However, at this point that's no more than speculation, so -this approach carries a risk that we will eventually be forced to either abandon -it, or accept some form of pass-by-reference in Carbon. - -> **Open question:** Does Carbon need to allow user-defined pattern matching -> code to optimize based on the presence of wildcards, and if so, how? In -> theory, `Match` might sometimes be able to avoid substantial amounts of work -> if it knows about wildcards in the pattern, because then it can supply them -> with dummy data rather than having to compute correct values. This is likely -> to be especially valuable if Carbon supports list patterns and variable-length -> wildcards (like Rust's `..`): most list-like types could in principle -> determine whether they match `{1, 2, ..}` in constant time, rather than pass -> their entire contents into the continuation, but the approach described so far -> doesn't allow them to implement such an optimization. +> **Open question:** Should Carbon have a native syntax for pattern matching on +> the dynamic type of an object? If so, should types like `Variant` be able to +> use it, instead of having the `.Value` boilerplate in every pattern? Should +> this mechanism be aware of subtype relationships (so that a subtype pattern is +> a better match than a supertype pattern)? If so, how are those subtype +> relationships defined? ### Pattern matching proxies @@ -1481,9 +656,9 @@ to the `Result` example: struct Result(Type:$$ T, Type:$$ Error) { // Data members, factories, and special members same as above - closed choice Choice { - Success(ref T), - Failure(ref Error), + choice Choice { + Success(T: value), + Failure(Error: error), Cancelled } @@ -1506,18 +681,9 @@ struct Result(Type:$$ T, Type:$$ Error) { } ``` -This approach has many of the same tradeoffs as the callback-based approach: it -substantially simplifies the language, and gives much greater freedom to type -authors, but obliges them to write substantially more code. It also has some -advantages over the callback-based approach: - -- It's somewhat simpler, because it uses return values instead of - continution-passing -- It could generalize more easily to allow things like types that can match - list patterns (if Carbon has those). -- Since choice types are no longer syntactic sugar for structs with factory - functions, their syntax no longer needs to mirror the syntax of function - declarations. +This approach is somewhat simpler, because it uses return values instead of +continution-passing. It could also generalize more easily to allow things like +user-defined types that can match list patterns (if Carbon has those). However, it also has several significant drawbacks: @@ -1533,18 +699,140 @@ However, it also has several significant drawbacks: - It may be somewhat less efficient, because it requires instantiating the enclosing `Choice` type (presumably including a discriminator field), rather than merely passing the appropriate alternative into the continuation. -- Not only might it require Carbon to support pass-by-reference, it comes very - close to requiring Carbon to support reference _types_, which are even more - contentious and problematic. This is already somewhat evident in the way - `Choice` is defined, but becomes much clearer when this mechanism is used - for product types, where allowing mutable bindings would require - `operator match` to return a tuple of references. - The risk of confusion due to duplicate names is somewhat greater: a naive reader might think that, for example, `.Success` in the body of `operator match` refers to `Self.Success` rather than `Self.Choice.Success`. Relatedly, there's some risk that the author may omit the leading `.`, and thereby invoke `Self.Success` instead of `Self.Choice.Success`. This will probably fail to build, but the errors may be confusing. +- Extending this approach to support in-place mutation during pattern matching + is likely to require Carbon to support reference _types_, whereas the + primary proposal would probably only require reference _patterns_, which are + substantially less problematic. This is a consequence of using a return type + rather than a set of callback parameters (which are patterns) to define the + type's pattern matching interface. I think these drawbacks decisively outweigh the advantages, so I do not recommend this approach. + +### Pattern functions + +Rather than require user types to define both the pattern-matching logic and the +factory functions, with the expectation that they will be inverses of each +other, we could instead enable them to define the set of alternatives as factory +functions that the compiler can invert automatically. This approach, like the +primary proposal, would consist of several parts. + +A _pattern function_ is a function that can be invoked as part of a pattern, +even with arguments that contain placeholders. Pattern functions use the +introducer `pattern` instead of `fn`, and can only contain the sort of code that +could appear directly in a pattern. This lets us define reusable pattern +syntaxes that can do things like encapsulate hidden implementation details of +the object they're matching. + +Next, we would introduce the concept of an `alternatives` block, which groups +together a set of factory functions and designates them as a set of +alternatives. They are required to be both exhaustive and unambiguous, meaning +that for any possible value of the type, it must be possible to obtain it as the +return value of exactly one of the alternatives. Alternatives that take no +arguments, which represent singleton states such as `Cancelled`, can instead be +written as static constants. An `alternatives` block can be marked `closed`, +which plays the same role as omitting `operator default` in the primary +proposal: it indicates that client code need not be prepared to accept +alternatives other than the ones currently present. + +As with the primary proposal, `Storage` is used to represent a span of memory +that the user can create objects within. However, with this approach we also +need it to support initialization from a `ToStorage` factory function, because +pattern functions can't contain procedural code. Note that for the same reason, +`ToStorage` will probably need to be a language intrinsic, or implemented in +terms of one. + +Using these features, the `Result(T, Error)` example can be written as follows: + +``` +struct Result(Type:$$ T, Type:$$ Error) { + var Int: discriminator; + + var Storage(Max(Sizeof(T), Sizeof(Error)), + Max(Alignof(T), Alignof(Error))): storage; + + closed alternatives { + pattern Success(T: value) -> Self { + return (.discriminator = 0, .storage = ToStorage(value)); + } + + pattern Failure(Error: error) -> Self { + return (.discriminator = 1, .storage = ToStorage(error)); + } + + var Self:$$ Cancelled = (.discriminator = 2); + } +} +``` + +Notice that with this approach, we do not need to define the special member +functions of `Result` manually. The compiler can infer appropriate definitions +in the same way that it infers how to invert these functions during pattern +matching. + +As with the primary proposal, `choice` would be available as syntactic sugar, +but its syntax would mirror the syntax of an `alternatives` block: + +``` +closed choice Result(Type:$$ T, Type:$$ Error) { + pattern Success(T: value) -> Self; + pattern Failure(Error: error) -> Self; + var Self:$$ Cancelled; +} +``` + +This ensures that a sum type's API is defined using essentially the same syntax, +regardless of how the type author chooses to implement it. + +This approach is described in much more detail in an +[earlier draft](https://github.com/carbon-language/carbon-lang/blob/4dbd31d71e02895892f97a211df4b5fff8cae5c3/proposals/p0157.md) +of this document, where it was the primary proposal. It has a number of +advantages over the primary proposal: + +- Manually-defined sum types could probably be implemented with dramatically + less code, because both the special member functions and the code that + implements pattern matching can be generated automatically. This would not + only make these types less tedious to implement, it would probably also + reduce the risk of bugs, and avoid readability problems arising from the + duplication of names between factory functions and continuation interface + methods. +- Programmers would be able to define functions that encapsulate complex + pattern-matching logic behind simple interfaces. +- Pattern matching would not rquire the special overload resolution rules that + are needed to translate a pattern-matching operation into a `Match` call. +- User-defined sum types would default to being open, whereas they default to + being closed under the primary proposal. This is probably a better default, + because an open sum type can always be closed, but a closed sum type can't + be opened (much less extended with new alternatives) without the risk of + breaking user code. + +However, this approach also carries some substantial drawbacks: + +- Types can't use the full power of the Carbon language when defining their + pattern-matching behavior, or the corresponding factory functions. Instead, + they are restricted to a very narrow subset of the language that is valid in + both patterns and procedural code. This forces us to introduce intrinsics + like `ToStorage` to make that subset even minimally usable, and creates a + substantial risk that some user-defined sum types just won't be able to + support pattern matching. +- The language rules are substantially more complicated and harder to explain, + because we need to define the language subset that is usable in pattern + functions, and define both "forward" and "reverse" semantics for it. + Relatedly, it means that during pattern matching, there will be no + straightforward correspondence between the Carbon code and the generated + assembly, which could substantially complicate things like debugging. +- Carbon's pattern language can't allow you to write _underconstrained_ + patterns, which are patterns where the values of all bindings aren't + sufficient to uniquely determine the state of the object being matched. This + rules out things like the `|` pattern operator, and even prevents us from + using pattern matching with types that have non-salient state, like the + capacity of a `std::vector`. + +These issues, especially the overall complexity of this approach, leads me to +recommend against adopting it. From 8b8fb3afa20cb690c1f3bb5c1957999b65cb0a6a Mon Sep 17 00:00:00 2001 From: Geoffrey Romer Date: Mon, 14 Dec 2020 15:18:17 -0800 Subject: [PATCH 19/28] Respond to reviewer comment. --- proposals/p0157.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/proposals/p0157.md b/proposals/p0157.md index c0393399f863b..8b63a734acecc 100644 --- a/proposals/p0157.md +++ b/proposals/p0157.md @@ -370,8 +370,8 @@ struct Result(Type:$$ T, Type:$$ Error) { } impl Matchable(MatchContinuation) { - method Match[MatchContinuation:$ Continuation]( - Ptr(Self): this, Ptr(Continuation): continuation) { + method (Ptr(Self): this) Match[MatchContinuation:$ Continuation]( + Ptr(Continuation): continuation) { match (discriminator) { case 0 => { continuation->Success(this->storage.Read(T)); From 6010589e59b7f3af7fb5e238ea820c5a1ba5f6d3 Mon Sep 17 00:00:00 2001 From: Geoffrey Romer Date: Mon, 11 Jan 2021 15:14:48 -0800 Subject: [PATCH 20/28] Respond to reviewer comments. --- proposals/p0157.md | 67 ++++++++++++++++++++++++++-------------------- 1 file changed, 38 insertions(+), 29 deletions(-) diff --git a/proposals/p0157.md b/proposals/p0157.md index 8b63a734acecc..aae78cb4d2a79 100644 --- a/proposals/p0157.md +++ b/proposals/p0157.md @@ -19,7 +19,6 @@ SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception - [User-defined pattern matching](#user-defined-pattern-matching) - [Choice types](#choice-types) - [Alternatives considered](#alternatives-considered) - - [Separate support for enumerated types](#separate-support-for-enumerated-types) - [Different spelling for `choice`](#different-spelling-for-choice) - [Alternatives considered](#alternatives-considered-1) - [Indexing by type](#indexing-by-type) @@ -45,7 +44,8 @@ for short). For example, expression with two child nodes representing the operands, or a parenthesized expression with a single child node representing the contents of the parentheses. -- Boolean values take the form of either a "true" value or a "false" value. +- The error codes returned by APIs like POSIX have a fixed set of named + values. What unites these use cases is that the set of alternatives is fixed by the API, it is possible for user code to determine which alternative is present, and @@ -191,7 +191,8 @@ It also has a couple of ergonomic problems: ## Proposal To summarize, the previous section identified several missing features in -Carbon, which together prevent it from adequately supporting sum types: +Carbon, which together would enable Carbon to support efficient and ergonomic +sum types: - There's no way to manually control the lifetimes of subobjects, or enable them to share storage. @@ -212,9 +213,13 @@ should be considered provisional. This proposal merely establishes the overall design direction for sum types, in the same way that [p0083](p0083.md) established the overall design direction for the language as a whole. -To support manual lifetime control and storage sharing, I propose introducing a -`Storage` type, which represents a fixed-size region of untyped memory and -provides operations for creating and destroying objects within it. +To support manual lifetime control and storage sharing, I propose introducing at +least one and preferably both of the following: + +- A `Storage` type, which represents a fixed-size buffer of untyped memory and + provides operations for creating and destroying objects within it. +- A typed `union` facility, such as the one described in proposal + [0139](https://github.com/carbon-language/carbon-lang/pull/139). To support encapsulation in pattern matching, I propose introducing a `Matchable` interface, which a type can implement in order to specify how it @@ -275,10 +280,22 @@ fn GetIntFromUser() -> Int { This approach to sum types imposes relatively few requirements on the language features used to implement shareable storage (meaning, storage that can be inhabited by different objects at different times), and so this proposal doesn't -describe them in much detail. I'm proposing an untyped byte buffer here because -it's more general, but union types along the lines of proposal -[0139](https://github.com/carbon-language/carbon-lang/pull/139) would work just -as well. +describe them in much detail. The primary options are typed unions along the +lines of proposal +[0139](https://github.com/carbon-language/carbon-lang/pull/139), and untyped +byte buffers along the lines described below. Typed unions are somewhat safer +and more readable, but less general. For example, they can't support use cases +like implementing a small-object-optimized version of +[`std::any`](https://en.cppreference.com/w/cpp/utility/any), because the set of +possible types is not known in advance. + +This proposal takes the position that Carbon must have at least one of these two +features. Qhether it should have only untyped byte buffers, only typed unions, +or both, is left as an **open question**, because the answer is orthogonal to +the overall design direction that is the focus of this proposal. Consequently, I +will not further discuss the tradeoffs between the two features. This proposal's +examples focus on untyped byte buffers because they are simpler to describe, and +aren't already covered by another proposal. Regardless of the form that shareable storage takes, it won't be able to intrinsically keep track of whether it currently holds any objects, or the types @@ -557,23 +574,14 @@ place of the member types. However, there are a couple of special cases: A future proposal for this mechanism will need to consider whether to require an explicit opt-in to generate these operations. -### Alternatives considered - -#### Separate support for enumerated types +**Open question:** Should `choice` provide a way to directly access the +discriminator? Correspondingly, should it provide a way to specify the +discriminator type, and which discriminator values correspond to which +alternatives? These features would enable choice types to support all the same +use cases as C++ `enum`s, and permit zero-overhead conversion between the two at +language boundaries. -This proposal supports enumerated types as a special case of choice types. -However, there may be some benefit to providing special-case support for -enumerated types, similar to C++'s `enum`. In particular, we could allow the -developer to specify an underlying type, associate a specific value of the -underlying type with each enumerator, and convert between the enum and the -underlying type, which are fairly common practices in C++ code. When using -choice types, these practices can be emulated by defining functions that map -between the choice type and the underlying type, but that requires a substantial -amount of error-prone boilerplate. Furthermore, those functions can't reliably -be no-ops at the hardware level, the way they can be with C++ enums. - -I am omitting that from this proposal for simplicity, since it's purely -additive, and not necessary for the goals of this proposal. +### Alternatives considered #### Different spelling for `choice` @@ -696,9 +704,10 @@ However, it also has several significant drawbacks: involve nontrivial generated code, and some aspects of object layout are required to be hidden from the user. However, Carbon seems to be moving away from C++ in precisely those ways. -- It may be somewhat less efficient, because it requires instantiating the - enclosing `Choice` type (presumably including a discriminator field), rather - than merely passing the appropriate alternative into the continuation. +- It may be somewhat less efficient (or impose more load on the optimizer), + because it requires instantiating the enclosing `Choice` type (presumably + including a discriminator field), rather than merely passing the appropriate + alternative into the continuation. - The risk of confusion due to duplicate names is somewhat greater: a naive reader might think that, for example, `.Success` in the body of `operator match` refers to `Self.Success` rather than `Self.Choice.Success`. From 581267c07270f74d34aaf900c5be720655f5b22c Mon Sep 17 00:00:00 2001 From: Geoff Romer Date: Mon, 11 Jan 2021 15:16:48 -0800 Subject: [PATCH 21/28] Apply suggestions from code review Co-authored-by: josh11b --- proposals/p0157.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/proposals/p0157.md b/proposals/p0157.md index aae78cb4d2a79..0ad0fa6f34549 100644 --- a/proposals/p0157.md +++ b/proposals/p0157.md @@ -538,7 +538,7 @@ like the following, which attempts to emulate `std::any`: ``` choice Any { - Value[Type:$$ T](T value) + Value[Type:$$ T](T: value) } ``` From 8ffcba5890fe99a36d6ae5375908df5b96df7911 Mon Sep 17 00:00:00 2001 From: Geoffrey Romer Date: Wed, 13 Jan 2021 17:54:29 -0800 Subject: [PATCH 22/28] Respond to reviewer comments --- proposals/p0157.md | 117 +++++++++++++++++++++++++++++++++++++-------- 1 file changed, 96 insertions(+), 21 deletions(-) diff --git a/proposals/p0157.md b/proposals/p0157.md index 0ad0fa6f34549..4a57cf2b77b30 100644 --- a/proposals/p0157.md +++ b/proposals/p0157.md @@ -20,7 +20,11 @@ SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception - [Choice types](#choice-types) - [Alternatives considered](#alternatives-considered) - [Different spelling for `choice`](#different-spelling-for-choice) -- [Alternatives considered](#alternatives-considered-1) +- ["Bare" designator syntax](#bare-designator-syntax) + - [Alternatives considered](#alternatives-considered-1) + - [Placeholder keywords](#placeholder-keywords) + - [Designator types](#designator-types) +- [Alternatives considered](#alternatives-considered-2) - [Indexing by type](#indexing-by-type) - [Pattern matching proxies](#pattern-matching-proxies) - [Pattern functions](#pattern-functions) @@ -290,7 +294,7 @@ like implementing a small-object-optimized version of possible types is not known in advance. This proposal takes the position that Carbon must have at least one of these two -features. Qhether it should have only untyped byte buffers, only typed unions, +features. Whether it should have only untyped byte buffers, only typed unions, or both, is left as an **open question**, because the answer is orthogonal to the overall design direction that is the focus of this proposal. Consequently, I will not further discuss the tradeoffs between the two features. This proposal's @@ -381,23 +385,24 @@ struct Result(Type:$$ T, Type:$$ Error) { Max(Alignof(T), Alignof(Error))): storage; interface MatchContinuation { - fn Success(T: value); - fn Failure(Error: error); - fn Cancelled(); + var Type:$$ ReturnType; + fn Success(T: value) -> ReturnType; + fn Failure(Error: error) -> ReturnType; + fn Cancelled() -> ReturnType; } impl Matchable(MatchContinuation) { method (Ptr(Self): this) Match[MatchContinuation:$ Continuation]( - Ptr(Continuation): continuation) { + Ptr(Continuation): continuation) -> Continuation.ReturnType { match (discriminator) { case 0 => { - continuation->Success(this->storage.Read(T)); + return continuation->Success(this->storage.Read(T)); } case 1 => { - continuation->Failure(this->storage.Read(Error)); + return continuation->Failure(this->storage.Read(Error)); } case 2 => { - continuation->Cancelled(); + return continuation->Cancelled(); } default => { Assert(false); @@ -591,6 +596,66 @@ values being explicitly enumerated in the code. I chose the spelling `choice` because "choice type" is one of the only available synonyms for "sum type" that doesn't have any potentially-misleading associations. +## "Bare" designator syntax + +A _designator_ is a token consisting of `.` followed by an identifier. The +canonical use case for designators is member access, as in `foo.bar`, where the +designator applies to the preceding expression. However, we expect Carbon will +also have some use cases for "bare" designators, where there is no preceding +expression. In particular, we expect to use bare designators to initialize named +tuple fields, as in `(.my_int = 42, .my_str = "Foo")`, which produces a tuple +with fields named `my_int` and `my_str`. + +I propose to also permit using bare designators to refer to the alternatives of +a sum type, in cases where the sum type is clear from context. In particular, I +propose that in statements of the form `return R.Alt;` or +`return R.Alt();`, where `R` is the function return type, the `R` can be +omitted. Similarly, I propose that patterns of the form `S.Alt` or +`S.Alt()`, where `S` is the type being matched against, the `S` can +be omitted. Note that both of these shorthands are allowed only at top level, +not as subexpressions or subpatterns. + +**Open question:** Can we also permit these shorthands to be nested? This would +be more consistent and less surprising, but could create ambiguity with the use +of bare designators in tuple initialization: does `case (.Foo, .Bar)` match a +tuple of two fields named `Foo` and `Bar`, or does it match a tuple of two +positional fields, of sum types that respectively have `.Foo` and `.Bar` as +alternatives? Furthermore, allowing such nested usages may conflict with the +[proposed principle](https://github.com/carbon-language/carbon-lang/pull/103) +that type information should propagate only from an expression to its enclosing +context, and not vice-versa. + +### Alternatives considered + +#### Placeholder keywords + +Rather than allowing code to omit the type altogether, we could allow code to +replace the type with a placeholder keyword, e.g. `case auto.Foo`. This would +avoid the ambiguity with tuple initialization, but would still mean that type +information is propagating into the expression from its surrounding context +rather than vice-versa (which also means that `auto` may not be an appropriate +spelling). Furthermore, this could wind up feeling fairly boilerplate-heavy, +even if the keyword is very short. + +#### Designator types + +We could treat each bare designator as essentially defining its own type, which +may then implicitly convert to a suitable sum type. For example, we could think +of `.Some(42)` as having a type `DesignatedTuple("Some", (Int))`, and +`Optional(Int)` would define an implicit conversion from that type. This would +ensure that type information does not propagate into the expression from the +context, but would not resolve the ambiguity with tuple initialization. +Furthermore, this would mean that the names of bare designators do not need to +be declared before they are used, and in fact can't be meaningfully declared at +all. This could have very surprising consequences. For example, a typo in a line +of code like `var auto: x = .Sone(42);` cannot be diagnosed at that line. +Instead, the problem can't be diagnosed until `x` is used, and even then it will +show up as a type error rather than a name lookup error. + +Note that this approach would work well with the "Indexing by type" alternative +discussed below, with instances of `DesignatedTuple` (or whatever we call it) +acting as tagged wrapper types. + ## Alternatives considered ### Indexing by type @@ -622,8 +687,8 @@ something like `std::variant, Error, Cancelled>` (note that In either case, emulating the other model introduces some syntactic overhead: with name-indexing, `Variant`'s factory functions must be given a name (`Value`) even though it doesn't really convey any information, and emulating -`Result(T,E)` in terms of type-indexing requires separately defining the wrapper -templates `Value` and `Error`. +`Result(T,E)` in terms of type-indexing requires separately defining the tagged +wrapper templates `Value` and `Error`. The distinction between these two models of sum types seems analogous the distinction between the tuple and struct models of product types. Tuples and @@ -689,11 +754,17 @@ struct Result(Type:$$ T, Type:$$ Error) { } ``` -This approach is somewhat simpler, because it uses return values instead of -continution-passing. It could also generalize more easily to allow things like -user-defined types that can match list patterns (if Carbon has those). +This approach has several advantages: + +- It's somewhat simpler, because it uses return values instead of + continution-passing. +- It will be easier for the compiler to reason about, because of that + simplicity and the somewhat narrower API surface. This may lead to better + compiler performance, and better generated code. +- It could generalize more easily to allow things like user-defined types that + can match list patterns (if Carbon has those). -However, it also has several significant drawbacks: +However, it also has several drawbacks: - It forces us to treat `choice` as a fundamental part of the language: in order to implement a sum type, you have to work with an object type whose @@ -704,16 +775,18 @@ However, it also has several significant drawbacks: involve nontrivial generated code, and some aspects of object layout are required to be hidden from the user. However, Carbon seems to be moving away from C++ in precisely those ways. -- It may be somewhat less efficient (or impose more load on the optimizer), - because it requires instantiating the enclosing `Choice` type (presumably - including a discriminator field), rather than merely passing the appropriate - alternative into the continuation. - The risk of confusion due to duplicate names is somewhat greater: a naive reader might think that, for example, `.Success` in the body of `operator match` refers to `Self.Success` rather than `Self.Choice.Success`. Relatedly, there's some risk that the author may omit the leading `.`, and thereby invoke `Self.Success` instead of `Self.Choice.Success`. This will probably fail to build, but the errors may be confusing. +- It has less expressive power, because there's no straightforward way for the + library type to specify code that should run after pattern matching is + complete. For example, the callback approach could allow `Match` to create a + mutable local variable, pass it to the callback by pointer/reference, and + then write the (possibly modified) contents of the variable back into the + sum type object. - Extending this approach to support in-place mutation during pattern matching is likely to require Carbon to support reference _types_, whereas the primary proposal would probably only require reference _patterns_, which are @@ -721,8 +794,10 @@ However, it also has several significant drawbacks: rather than a set of callback parameters (which are patterns) to define the type's pattern matching interface. -I think these drawbacks decisively outweigh the advantages, so I do not -recommend this approach. +I very tentatively recommend the callback approach rather than this one, +primarily because of the last point above: the Carbon type system is likely to +be dramatically simplified if there are no reference types, but I think the +proxy approach will make reference types all but unavoidable. ### Pattern functions From 9e1ba6304d7732ff5f9e043698063659ababa633 Mon Sep 17 00:00:00 2001 From: Geoffrey Romer Date: Mon, 18 Jan 2021 12:40:08 -0800 Subject: [PATCH 23/28] Add discussion of how to distinguish patterns from expressions, and move the discussion of bare designators to integrate better with it. --- proposals/p0157.md | 246 ++++++++++++++++++++++++++++++++------------- 1 file changed, 178 insertions(+), 68 deletions(-) diff --git a/proposals/p0157.md b/proposals/p0157.md index 4a57cf2b77b30..3f69d009db757 100644 --- a/proposals/p0157.md +++ b/proposals/p0157.md @@ -17,14 +17,19 @@ SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception - [Proposal](#proposal) - [Shareable storage](#shareable-storage) - [User-defined pattern matching](#user-defined-pattern-matching) -- [Choice types](#choice-types) - - [Alternatives considered](#alternatives-considered) - - [Different spelling for `choice`](#different-spelling-for-choice) - ["Bare" designator syntax](#bare-designator-syntax) - - [Alternatives considered](#alternatives-considered-1) + - [Alternatives considered](#alternatives-considered) - [Placeholder keywords](#placeholder-keywords) - [Designator types](#designator-types) -- [Alternatives considered](#alternatives-considered-2) +- [Distinguishing pattern and expression semantics](#distinguishing-pattern-and-expression-semantics) + - [Alternatives considered](#alternatives-considered-1) + - [Separate syntaxes](#separate-syntaxes) + - [Disambiguation by fixed priority](#disambiguation-by-fixed-priority) + - [Disambiguation based on pattern content](#disambiguation-based-on-pattern-content) +- [Choice types](#choice-types) + - [Alternatives considered](#alternatives-considered-2) + - [Different spelling for `choice`](#different-spelling-for-choice) +- [Alternatives considered](#alternatives-considered-3) - [Indexing by type](#indexing-by-type) - [Pattern matching proxies](#pattern-matching-proxies) - [Pattern functions](#pattern-functions) @@ -490,6 +495,171 @@ the actual argument, but potentially different implementation details. However, this might not be visible to users so long as the continuation parameter is passed as a generic rather than a template (`:$` rather than `:$$`). +## "Bare" designator syntax + +A _designator_ is a token consisting of `.` followed by an identifier. The +canonical use case for designators is member access, as in `foo.bar`, where the +designator applies to the preceding expression. However, we expect Carbon will +also have some use cases for "bare" designators, where there is no preceding +expression. In particular, we expect to use bare designators to initialize named +tuple fields, as in `(.my_int = 42, .my_str = "Foo")`, which produces a tuple +with fields named `my_int` and `my_str`. + +I propose to also permit using bare designators to refer to the alternatives of +a sum type, in cases where the sum type is clear from context. In particular, I +propose that in statements of the form `return R.Alt;` or +`return R.Alt();`, where `R` is the function return type, the `R` can be +omitted. Similarly, I propose that patterns of the form `S.Alt` or +`S.Alt()`, where `S` is the type being matched against, the `S` can +be omitted. Note that both of these shorthands are allowed only at top level, +not as subexpressions or subpatterns. + +> **Open question:** Can we also permit these shorthands to be nested? This +> would be more consistent and less surprising, but could create ambiguity with +> the use of bare designators in tuple initialization: does `case (.Foo, .Bar)` +> match a tuple of two fields named `Foo` and `Bar`, or does it match a tuple of +> two positional fields, of sum types that respectively have `.Foo` and `.Bar` +> as alternatives? Furthermore, allowing such nested usages may conflict with +> the +> [proposed principle](https://github.com/carbon-language/carbon-lang/pull/103) +> that type information should propagate only from an expression to its +> enclosing context, and not vice-versa. +> +> The issue of type propagation is particularly acute if we want to allow +> nesting of alternatives, such as `case .Foo(.Bar(_))`. The problem there is +> that we are relying on the type of `Foo`'s parameter to tell us the type of +> the argument expression `.Bar(_)`, but if `Foo` is overloaded, we can't +> determine the type of the parameter until the overload is resolved, and to do +> that we first need to know the type of the argument expression. And even if +> `Foo` is not currently overloaded, an overload might be added in the future +> (at least if the sum type has an `operator default`). Adding overloads is a +> canonical example of the kind of software evolution that Carbon is intended to +> allow, so the validity of this code can't depend on whether or not `Foo` is +> overloaded. + +### Alternatives considered + +#### Placeholder keywords + +Rather than allowing code to omit the type altogether, we could allow code to +replace the type with a placeholder keyword, e.g. `case auto.Foo`. This would +avoid the ambiguity with tuple initialization, but would still mean that type +information is propagating into the expression from its surrounding context +rather than vice-versa (which also means that `auto` may not be an appropriate +spelling). Furthermore, this could wind up feeling fairly boilerplate-heavy, +even if the keyword is very short. + +#### Designator types + +We could treat each bare designator as essentially defining its own type, which +may then implicitly convert to a suitable sum type. For example, we could think +of `.Some(42)` as having a type `DesignatedTuple("Some", (Int))`, and +`Optional(Int)` would define an implicit conversion from that type. This would +ensure that type information does not propagate into the expression from the +context, but would not resolve the ambiguity with tuple initialization. +Furthermore, this would mean that the names of bare designators do not need to +be declared before they are used, and in fact can't be meaningfully declared at +all. This could have very surprising consequences. For example, a typo in a line +of code like `var auto: x = .Sone(42);` cannot be diagnosed at that line. +Instead, the problem can't be diagnosed until `x` is used, and even then it will +show up as a type error rather than a name lookup error. + +Note that this approach would work well with the "Indexing by type" alternative +discussed below, with instances of `DesignatedTuple` (or whatever we call it) +acting as tagged wrapper types. + +## Distinguishing pattern and expression semantics + +As discussed above, we expect a well-behaved sum type to define factory +functions that correspond to each of the alternatives in its continuation +interface. This creates some potential ambiguity about whether a given use of an +alternative name refers to the factory function or the continuation interface. +For example, consider the following code: + +``` +var Result(Int, String): r = ...; +match (r) { + case Result(Int, String).Value(0) => ... +``` + +This could potentially be interpreted two ways: + +- `Result(Int, String).Value(0)` is evaluated as an ordinary function call, + and the result is compared with `r` to see if they match, presumably using + the `==` operator. +- The whole `match` expression is evaluated by invoking + `Result(Int, String)`'s implementation of `Matchable`, with a continuation + object whose `.Value(Int)` method compares the `Int` parameter with 0. + +This can also be thought of as a name lookup problem: is `.Value` looked up in +`Result(Int, String)`, or in `Result(Int, String)`'s implementation of +`Matchable`? + +I propose to leave this choice unspecified, so that the compiler may validly +generate code either way. This gives the compiler more freedom to optimize, and +perhaps more importantly, it helps discourage sum type authors from +intentionally making its factory function behavior inconsistent with its pattern +matching behavior. By extension, I propose treating the shorthand syntax +`case .Value(0)` the same way. + +### Alternatives considered + +#### Separate syntaxes + +We could instead avoid the ambiguity by providing separate syntaxes for the two +semantics. The more straightforward version of this would be to say that +`Result(Int, String).Value(0)` is always interpreted as an ordinary function +call, and patterns like `Result(Int, String).Value(Int: i)` are ill-formed +because they cannot be interpreted as function calls. However, this would +require us to have a syntax for matching alternatives that is disjoint from the +syntax for constructing alternatives. This would be at odds with existing +practice in languages like Rust, Swift, Haskell, and ML, all of which use the +same syntax for constructing and matching alternatives. + +Note that it is tempting to use bare designators as the syntax for matching +alternatives, so that `Result(Int, String).Value(0)` is an expression, but +`.Value(0)` is a pattern. However, that is unlikely to be sufficient on its own, +because bare designators rely on type information that may not always be +available, especially in nested patterns. Hence, in order for this approach to +work, we would need to introduce a separate syntax for specifying the type of a +bare designator, such as `.Value(0) as Result(Int, String)`. + +Alternatively, we could say that code in a pattern-matching context is always +interpreted as a pattern, even if it could otherwise be interpreted as an +expression. We would then need to introduce a pattern operator for explicitly +evaluating a subpattern as an expression, such as +`is Result(Int, String).Value(0)` or `== Result(Int, String).Value(0)`. However, +this would impose an educational and cognitive burden on users: FAQ entries like +"What's the difference between `case .Foo` and `case is .Foo`" and "How do I +choose between `case .Foo` and `case is .Foo`" seem inevitable, and would +require fairly nuanced answers. It would also add syntactic noise to the use +cases that correspond to C++ `switch` statements, where all of the cases are +fixed values. + +#### Disambiguation by fixed priority + +We could instead specify that, when `Result(Int, String).Value(0)` appears in a +context where a pattern is expected, the name `.Value` is looked up as both an +ordinary function call and as a use of the `Matchable` interface, and specify +one of the two as the "winner" in the case where both lookups succeed. This is +effectively a variant of the previous option, except that some usages that would +be build errors under that approach would be saved by the fallback +interpretation here. In particular, it would still require us to introduce a +second syntax for the case where the programmer wants the lower-priority of the +two behaviors. Thus, it would carry largely the same drawbacks as the previous +option, with the additional drawback that there wouldn't be a consistent +correspondence between syntax and semantics. + +#### Disambiguation based on pattern content + +We could instead specify that `Result(Int, String).Value(0)` is always +interpreted as an ordinary function call, but patterns like +`Result(Int, String).Value(Int: i)` are evaluated using the `Matchable` +interface, because no other implementation is possible. However, this would mean +that the name-lookup behavior of a function call depends on code that can be +arbitrarily deeply nested within it, which seems likely to be hostile to both +programmers and tools. + ## Choice types To allow users to define sum types without micromanaging the implementation @@ -596,66 +766,6 @@ values being explicitly enumerated in the code. I chose the spelling `choice` because "choice type" is one of the only available synonyms for "sum type" that doesn't have any potentially-misleading associations. -## "Bare" designator syntax - -A _designator_ is a token consisting of `.` followed by an identifier. The -canonical use case for designators is member access, as in `foo.bar`, where the -designator applies to the preceding expression. However, we expect Carbon will -also have some use cases for "bare" designators, where there is no preceding -expression. In particular, we expect to use bare designators to initialize named -tuple fields, as in `(.my_int = 42, .my_str = "Foo")`, which produces a tuple -with fields named `my_int` and `my_str`. - -I propose to also permit using bare designators to refer to the alternatives of -a sum type, in cases where the sum type is clear from context. In particular, I -propose that in statements of the form `return R.Alt;` or -`return R.Alt();`, where `R` is the function return type, the `R` can be -omitted. Similarly, I propose that patterns of the form `S.Alt` or -`S.Alt()`, where `S` is the type being matched against, the `S` can -be omitted. Note that both of these shorthands are allowed only at top level, -not as subexpressions or subpatterns. - -**Open question:** Can we also permit these shorthands to be nested? This would -be more consistent and less surprising, but could create ambiguity with the use -of bare designators in tuple initialization: does `case (.Foo, .Bar)` match a -tuple of two fields named `Foo` and `Bar`, or does it match a tuple of two -positional fields, of sum types that respectively have `.Foo` and `.Bar` as -alternatives? Furthermore, allowing such nested usages may conflict with the -[proposed principle](https://github.com/carbon-language/carbon-lang/pull/103) -that type information should propagate only from an expression to its enclosing -context, and not vice-versa. - -### Alternatives considered - -#### Placeholder keywords - -Rather than allowing code to omit the type altogether, we could allow code to -replace the type with a placeholder keyword, e.g. `case auto.Foo`. This would -avoid the ambiguity with tuple initialization, but would still mean that type -information is propagating into the expression from its surrounding context -rather than vice-versa (which also means that `auto` may not be an appropriate -spelling). Furthermore, this could wind up feeling fairly boilerplate-heavy, -even if the keyword is very short. - -#### Designator types - -We could treat each bare designator as essentially defining its own type, which -may then implicitly convert to a suitable sum type. For example, we could think -of `.Some(42)` as having a type `DesignatedTuple("Some", (Int))`, and -`Optional(Int)` would define an implicit conversion from that type. This would -ensure that type information does not propagate into the expression from the -context, but would not resolve the ambiguity with tuple initialization. -Furthermore, this would mean that the names of bare designators do not need to -be declared before they are used, and in fact can't be meaningfully declared at -all. This could have very surprising consequences. For example, a typo in a line -of code like `var auto: x = .Sone(42);` cannot be diagnosed at that line. -Instead, the problem can't be diagnosed until `x` is used, and even then it will -show up as a type error rather than a name lookup error. - -Note that this approach would work well with the "Indexing by type" alternative -discussed below, with instances of `DesignatedTuple` (or whatever we call it) -acting as tagged wrapper types. - ## Alternatives considered ### Indexing by type @@ -674,9 +784,9 @@ could be written like so: ``` choice Variant(Type:$$ T1, Type:$$ T2, Type:$$ T3) { - pattern Value(T1: value) -> Self; - pattern Value(T2: value) -> Self; - pattern Value(T3: value) -> Self; + Value(T1: value), + Value(T2: value), + Value(T3: value) } ``` From 0e988ef81b91fa26d11d945eb1c929b02118cc12 Mon Sep 17 00:00:00 2001 From: Geoffrey Romer Date: Mon, 18 Jan 2021 13:21:08 -0800 Subject: [PATCH 24/28] Expand the discussion of the drawbacks of the "Designator types" approach. --- proposals/p0157.md | 15 +++++++++++---- 1 file changed, 11 insertions(+), 4 deletions(-) diff --git a/proposals/p0157.md b/proposals/p0157.md index 3f69d009db757..f2852d6ca14b3 100644 --- a/proposals/p0157.md +++ b/proposals/p0157.md @@ -559,10 +559,17 @@ ensure that type information does not propagate into the expression from the context, but would not resolve the ambiguity with tuple initialization. Furthermore, this would mean that the names of bare designators do not need to be declared before they are used, and in fact can't be meaningfully declared at -all. This could have very surprising consequences. For example, a typo in a line -of code like `var auto: x = .Sone(42);` cannot be diagnosed at that line. -Instead, the problem can't be diagnosed until `x` is used, and even then it will -show up as a type error rather than a name lookup error. +all. + +This could have very surprising consequences. For example, a typo in a line of +code like `var auto: x = .Sone(42);` cannot be diagnosed at that line. Instead, +the problem can't be diagnosed until `x` is used, and even then it will show up +as an overload resolution error rather than a name lookup error. This is +particularly problematic because it is much harder to provide useful, actionable +diagnostics for overload resolution failure than for name lookup failure. +Relatedly, in the case of bare designators, IDEs would not be able to implement +tab-completion using Carbon's name lookup rules, but would effectively have to +invent their own ad hoc name lookup rules. Note that this approach would work well with the "Indexing by type" alternative discussed below, with instances of `DesignatedTuple` (or whatever we call it) From e14d6937eb930fa976f03c9fc6bcc26d3e8a1c30 Mon Sep 17 00:00:00 2001 From: Geoffrey Romer Date: Thu, 21 Jan 2021 14:14:02 -0800 Subject: [PATCH 25/28] Respond to reviewer comments. --- proposals/p0157.md | 49 ++++++++++++++++++++++++---------------------- 1 file changed, 26 insertions(+), 23 deletions(-) diff --git a/proposals/p0157.md b/proposals/p0157.md index f2852d6ca14b3..2125c62a604b7 100644 --- a/proposals/p0157.md +++ b/proposals/p0157.md @@ -65,21 +65,27 @@ Carbon needs to support defining and working with types representing such values. Following Carbon's principles, these types need to be easy to define, understand, and use, and they need to be safe -- in ordinary usage, the type system should ensure that user code cannot accidentally access the wrong -alternative. +alternative. These types should be writeable as well as readable, and writing +should be type-safe and efficient. In particular, it should be possible to +mutate a single sub-field of an alternative, without having to overwrite the +entire alternative, and without a risk of accidentally doing so when that +alternative is not present. Furthermore, it needs to be possible for type owners to customize the -representations of these types. For example, sum types usually need a -"discriminator" field to indicate which alternative is present, but since it -typically has very few possible values, it can often be packed into padding, or -even the low-order bits of a pointer. Other sum types avoid an explicit -discriminator, and instead reserve certain values to indicate separate -alternatives. For example, a typical C-style pointer can be thought of as an -optional type, with a special null value indicating that no pointer is present, -because the platform guarantees that the null byte pattern is never the -representation of a valid pointer. This sort of customization inherently creates -a risk of implementation bugs that break type safety, but it must be possible to -implement such customizations without changing the type's API, and hence without -altering the static safety guarantees for users. +representations of these types. For example: + +- Most sum types need a "discriminator" field to indicate which alternative is + present, but since it typically has very few possible values, it can often + be packed into padding, or even the low-order bits of a pointer. +- Other sum types avoid an explicit discriminator, and instead reserve certain + values to indicate separate alternatives. For example, a typical C-style + pointer can be thought of as an optional type, with a special null value + indicating that no pointer is present, because the platform guarantees that + the null byte pattern is never the representation of a valid pointer. + +It must be possible to implement such customizations without changing the type's +API, and hence without altering the static safety guarantees for users of the +type. ## Background @@ -484,16 +490,13 @@ alternatives in the future, without causing build failures in client code. > evolutionary risk, and it's often valuable to have a build-time guarantee that > code within the type's own API explicitly handles every case. -Note that the process of translating a pattern-matching operation into a `Match` -call may require special overload resolution rules. Overload resolution is -normally driven by the types of the arguments, but the concrete type of the -argument to `Match` may not be known until quite late in the compilation -process, because it could depend on the details of code generation. Instead, -overload resolution will have to be driven by some more abstract representation -of the "shape" of the callsite, such as a dummy type with the same interface as -the actual argument, but potentially different implementation details. However, -this might not be visible to users so long as the continuation parameter is -passed as a generic rather than a template (`:$` rather than `:$$`). +Note that `Match`'s continuation parameter type must be generic rather than +templated (`:$` rather than `:$$`). Template specialization is driven by the +concrete values of the template arguments, but the type of the continuation +parameter may depend on code generation details that aren't yet known when +template specialization takes place. By the same token, `Match` cannot be +overloaded, because overload resolution is likewise driven by the concrete types +of the function arguments, which may not be known at that point. ## "Bare" designator syntax From e4bb86dc77c0cd985e8f497233af22d26703fea9 Mon Sep 17 00:00:00 2001 From: Geoffrey Romer Date: Thu, 21 Jan 2021 14:37:46 -0800 Subject: [PATCH 26/28] Respond to reviewer comments. --- proposals/p0157.md | 13 +++++++------ 1 file changed, 7 insertions(+), 6 deletions(-) diff --git a/proposals/p0157.md b/proposals/p0157.md index 2125c62a604b7..4f3cfdb66a9c1 100644 --- a/proposals/p0157.md +++ b/proposals/p0157.md @@ -895,12 +895,13 @@ However, it also has several drawbacks: involve nontrivial generated code, and some aspects of object layout are required to be hidden from the user. However, Carbon seems to be moving away from C++ in precisely those ways. -- The risk of confusion due to duplicate names is somewhat greater: a naive - reader might think that, for example, `.Success` in the body of - `operator match` refers to `Self.Success` rather than `Self.Choice.Success`. - Relatedly, there's some risk that the author may omit the leading `.`, and - thereby invoke `Self.Success` instead of `Self.Choice.Success`. This will - probably fail to build, but the errors may be confusing. +- Although both approaches carry a risk of confusion due to duplicate names, + the risk is somewhat greater here: a naive reader might think that, for + example, `.Success` in the body of `operator match` refers to `Self.Success` + rather than `Self.Choice.Success`. Relatedly, there's some risk that the + author may omit the leading `.`, and thereby invoke `Self.Success` instead + of `Self.Choice.Success`. This will probably fail to build, but the errors + may be confusing. - It has less expressive power, because there's no straightforward way for the library type to specify code that should run after pattern matching is complete. For example, the callback approach could allow `Match` to create a From 2ab98c2a194728d1dc0009b6f8406a0cc985ff6a Mon Sep 17 00:00:00 2001 From: Geoffrey Romer Date: Wed, 3 Feb 2021 14:17:58 -0800 Subject: [PATCH 27/28] Respond to reviewer comments --- proposals/p0157.md | 19 ++++++++++++++++--- 1 file changed, 16 insertions(+), 3 deletions(-) diff --git a/proposals/p0157.md b/proposals/p0157.md index 4f3cfdb66a9c1..fcd5dd9dc2909 100644 --- a/proposals/p0157.md +++ b/proposals/p0157.md @@ -30,6 +30,7 @@ SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception - [Alternatives considered](#alternatives-considered-2) - [Different spelling for `choice`](#different-spelling-for-choice) - [Alternatives considered](#alternatives-considered-3) + - [`choice` types only](#choice-types-only) - [Indexing by type](#indexing-by-type) - [Pattern matching proxies](#pattern-matching-proxies) - [Pattern functions](#pattern-functions) @@ -494,9 +495,9 @@ Note that `Match`'s continuation parameter type must be generic rather than templated (`:$` rather than `:$$`). Template specialization is driven by the concrete values of the template arguments, but the type of the continuation parameter may depend on code generation details that aren't yet known when -template specialization takes place. By the same token, `Match` cannot be -overloaded, because overload resolution is likewise driven by the concrete types -of the function arguments, which may not be known at that point. +template specialization takes place. By the same token, we won't support +overloading `Match`, because overload resolution is likewise driven by the +concrete types of the function arguments, which may not be known at that point. ## "Bare" designator syntax @@ -778,6 +779,18 @@ doesn't have any potentially-misleading associations. ## Alternatives considered +### `choice` types only + +Rather than layering `choice` types on top of lower level features, we could +make them a primitive language feature, and simply not provide a way for user +code to customize the representation of sum types. However, this would mean that +users who encounter performance problems with the compiler-generated code for a +`choice` type would have no way to address those problems without rewriting all +code that uses that type. This would be contrary to Carbon's performance and +evolvability goals. Furthermore, the rewritten code would probably be +substantially less readable, and less safe, because it wouldn't be able to use +pattern matching. + ### Indexing by type Rather than requiring each alternative to have a distinct name (or at least a From 468a65a6cfb219a875210daea8cd9586a6640a29 Mon Sep 17 00:00:00 2001 From: Geoffrey Romer Date: Wed, 10 Feb 2021 11:30:28 -0800 Subject: [PATCH 28/28] Fix presubmit failure. --- proposals/p0157.md | 13 ++++++++----- 1 file changed, 8 insertions(+), 5 deletions(-) diff --git a/proposals/p0157.md b/proposals/p0157.md index fcd5dd9dc2909..56072a421f73c 100644 --- a/proposals/p0157.md +++ b/proposals/p0157.md @@ -12,6 +12,8 @@ SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception +## Table of contents + - [Problem](#problem) - [Background](#background) - [Proposal](#proposal) @@ -546,9 +548,9 @@ not as subexpressions or subpatterns. #### Placeholder keywords Rather than allowing code to omit the type altogether, we could allow code to -replace the type with a placeholder keyword, e.g. `case auto.Foo`. This would -avoid the ambiguity with tuple initialization, but would still mean that type -information is propagating into the expression from its surrounding context +replace the type with a placeholder keyword, for example `case auto.Foo`. This +would avoid the ambiguity with tuple initialization, but would still mean that +type information is propagating into the expression from its surrounding context rather than vice-versa (which also means that `auto` may not be an appropriate spelling). Furthermore, this could wind up feeling fairly boilerplate-heavy, even if the keyword is very short. @@ -1022,8 +1024,9 @@ advantages over the primary proposal: methods. - Programmers would be able to define functions that encapsulate complex pattern-matching logic behind simple interfaces. -- Pattern matching would not rquire the special overload resolution rules that - are needed to translate a pattern-matching operation into a `Match` call. +- Pattern matching would not require the special overload resolution rules + that are needed to translate a pattern-matching operation into a `Match` + call. - User-defined sum types would default to being open, whereas they default to being closed under the primary proposal. This is probably a better default, because an open sum type can always be closed, but a closed sum type can't