Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[spec] Reverse subtyping #110

Merged
merged 6 commits into from
Oct 6, 2020
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
300 changes: 239 additions & 61 deletions spec/Candid.md
Original file line number Diff line number Diff line change
Expand Up @@ -98,12 +98,8 @@ This is a summary of the grammar proposed:
| principal

<name> ::= <id> | <text>
<id> ::= (A..Z|a..z|_)(A..Z|a..z|_|0..9)*
<text> ::= "<char>*"
<nat> ::= (0..9)(_? 0..9)* | 0x(0..9|a..f|A..F)(_? 0..9|a..f|A..F)*
```

A `<char>` is a *Unicode scalar value* (i.e., a codepoint that is not a surrogate part).
See [below](#values) for the definitions of `<id>`, `<nat>`, and `<text>`.


#### Syntactic Shorthands
Expand Down Expand Up @@ -538,6 +534,96 @@ An *interface description* consists of a sequence of imports and type definition
The optional name given to the service in an interface description is immaterial; it only serves as documentation.


## Values

To enable convenient debugging, the following grammar specifies a text format for values.
The types of these values are assumed to be known from context, so the syntax does not attempt to be self-describing.

```
<val> ::=
| <primval> | <consval> | <refval>
| ( <annval> )

<annval> ::=
| <val>
| <val> : <datatype>

<primval> ::=
| <nat> | <int> | <float>
| <text>
| true | false
| null

<consval> ::=
| opt <val>
| vec { <annval>;* }
| record { <fieldval>;* }
| variant { <fieldval> }

<fieldval> ::= <nat> = <annval>

<refval> ::=
| service <text> (canister URI)
| func <text> . <id> (canister URI and message name)
| principal <text> (principal URI)

<arg> ::= ( <annval>,* )

<letter> ::= A..Z | a..z
<digit> ::= 0..9
<id> ::= (<letter> | _)(<letter> | <digit> | _)*

<sign> ::= + | -
<hex> ::= <digit> | A..F | a..f
<num> ::= <digit>(_? <digit>)*
<hexnum> ::= <hex>(_? <hex>)*
<nat> ::= <num> | 0x<hexnum>
<int> ::= <sign>? <num>
<float> ::=
| <sign>? <num> . <num>?
| <sign>? <num> (. <frac>?)? (e | E) <sign>? <num>
| <sign>? 0x<hexnum> . <hexnum>?
| <sign>? 0x<hexnum> (. <hexnum>?)? (p | P) <sign>? <num>

<text> ::= " <char>* "
<char> ::=
| <utf8>
| \ <hex> <hex>
| \ <escape>
| \u{ <hexnum> }
<escape> ::= n | r | t | \ | " | '
<utf8> ::= <ascii> | <utf8enc>
<ascii> ::= '\20'..'\7e' except " or \
<utf8enc> ::=
| '\c2'..'\df' <utf8cont>
| '\e0' '\a0'..'\bf' <utf8cont>
| '\ed' '\80'..'\9f' <utf8cont>
| '\e1'..'\ec' <utf8cont> <utf8cont>
| '\ee'..'\xef' <utf8cont> <utf8cont>
| '\f0' '\90'..'\bf' <utf8cont> <utf8cont>
| '\f4' '\80'..'\8f' <utf8cont> <utf8cont>
| '\f1'..'\f3' <utf8cont> <utf8cont> <utf8cont>
<utf8cont> ::= '\80'..'\bf'
```
A `<char>` is a *Unicode scalar value* (i.e., a codepoint that is not a surrogate part).


#### Syntactic Shorthands

Analoguous to types, a few syntactic shorthands are supported that can be reduced to the basic value forms:

```
<consval> ::= ...
| blob <text> := vec { N;* } where N* are of bytes in the string, interpreted [as in the WebAssembly textual format](https://webassembly.github.io/spec/core/text/values.html#strings)

<fieldval> ::= ...
| <name> = <annval> := <hash(name)> = <annval>
| <annval> := N = <annval> where N is either 0 or previous + 1 (only in records)
| <nat> := <nat> = null (only in variants)
| <name> := <name> = null (only in variants)
```


## Upgrading and Subtyping

Interfaces are allowed to evolve over time in a manner that is *robust*, i.e., cannot break existing client code. To capture this notion precisely, a service of type `T` is *upgradable* to a version with another type `T'` if and only if `T'` is *structural subtype* of `T`, written `T' <: T`. This defines that `T'` is more *specialised* than `T`. (Note: A more specialised type is less general, i.e., denotes a smaller set of possible values, thus the direction of the subtype ordering, even though a subtype record can have *more* fields.)
Expand All @@ -552,6 +638,77 @@ That is, outbound message results can only be replaced with a subtype (more fiel

Subtyping applies recursively to the types of the fields themselves. Moreover, the directions get *inverted* for inbound function and service references, in compliance with standard rules.

In addition to the usual subtyping rules, the subtyping relation has some more unusual rules.
In particular, it also allows fields to be *added* to inbound values (and conversely, removed from outbound ones), as long as they are optional.
Similarly, it allows *removing* cases from inbound values (and conversely, adding them to outbound ones), as long as the variant itself is optional.
In all these cases, a receiver who cannot handle the tag or field will simply see `null`.
This allows for maximal flexibility when evolving an interface over time, while still remaining sound.


### Examples

For example, a representative case is an interface of the following form:
```
// Version 1
type t = {x : nat};
service : {
produce : () -> t;
consume : t -> ();
}
```
The subtyping rules allow extending type `t` with additional fields later, as long as they are given optional type:
```
// Version 2
type t = {x : nat; y : opt nat};
service : {
produce : () -> t;
consume : t -> ();
}
```
Under normal subtyping rules, this wouldn't be allowed, because such record extension isn't usually compatible (sound) when `t` occurs in inbound position, such as with the `consume` function.
There might be existing clients that are not aware of the new fields, and would fail to provide them.
However, by restricting such fields to option types, and interpreting them as `null` when missing, such a mismatch is bridged.

Such extensibility also extends to *higher-order* examples, where functions themselves become parameters:
```
type t = {x : nat};
service : {
h1 : (f1 : () -> t) -> (); // might call f() and expects a t
h2 : (f2 : t -> ()) -> (); // might call f({x = 5})
rossberg marked this conversation as resolved.
Show resolved Hide resolved
}
```
If type `t` is later extended with a new optional field, then an existing client passing some function to `f1` or `f2` that is not yet aware of this change will still work correctly.
This works at any order, for example, `t` can safely be extended with a new optional field in the following scenario:
```
type t = {x : nat};
type f = t -> ();
type g = () -> t;
service : {
h : (f, g) -> (); // might compose f(g())
}
```

### Design Goals

To summarize, the subtyping relation for validating upgrades is designed with the following design goals in mind:

* Soundness: Subtyping implies that no deserialisation can fail.
nomeata marked this conversation as resolved.
Show resolved Hide resolved

* Transitivity: Subtyping implies that deserialisation cannot fail even across multiple upgrades.

* Record Extensibility: Records are upgradable by adding new (optional) fields, even when they appear in both outbound and inbound positions, such that round-trips always remain possible.

* Higher-order Extensibility: Subtyping extends to higher-order cases, where functions become parameters, so that there is no unique owner for their input/output contract.

* Language Injectivity: Subtyping does not depend on version or role annotations in interfaces that have no counterpart in source languages, i.e., an interface description can be generated from source-level types without special features or cross-version knowledge.

* Simple Deserialisation: Deserialisation does not require (sub)type checks other than the type-directed traversal of the value blob.

* Non-coercive Deserialisation: Deserialisation of a value is invariant across super- and sub-types.
nomeata marked this conversation as resolved.
Show resolved Hide resolved

* Type Erasure: Deserialised values do not require carrying dynamic type information on the language side.
nomeata marked this conversation as resolved.
Show resolved Hide resolved


### Rules

#### Primitive Types
Expand Down Expand Up @@ -608,6 +765,23 @@ The premise means that the rule does not apply when the constituent type is itse

Q: The negated nature of this premise isn't really compatible with parametric polymorphism. Is that a problem? We could always introduce a supertype of all non-nullable types and rephrase it with that.

Finally, in order to maintain *transitivity* of subtyping, an unusual rule allows, in fact, *any* option type to be regarded as a subtype of any other.
```
not (<datatype> <: <datatype'>)
---------------------------------
opt <datatype> <: opt <datatype'>
```
*Note:* This rule is necessary in the presence of the unusual record and variant rules shown below. Without it, certain upgrades may generally be valid one step at a time, but not taken together, which could cause problems for clients catching up with multiple upgrades.
For example, given a record type `record {666 : opt nat}` it is valid to remove the field `666` by the rule below and evolve the type to `record {}`.
A later step might legally re-add a field of the same name but with a different type, producing, e.g.,`record {666 : opt text}`.
A client having missed the intermediate step will have to upgrade directly from the original to the newest version of the type.
If two option type do not match up, its value will be treated as `null`.

In practice, users are strongly discouraged to ever remove a record field or ariant tag and later re-add it with a different meaning.
rossberg marked this conversation as resolved.
Show resolved Hide resolved
However, there is no general way for the type system to prevent this, since it cannot know the history of a type definition.
Consequently, the rule above is needed for technical more than for practical reasons.
Implementations of static upgrade checking are encouraged to warn if this rule is used.


#### Records

Expand All @@ -623,7 +797,16 @@ record { <fieldtype>;* } <: record { <fieldtype'>;* }
record { <nat> : <datatype>; <fieldtype>;* } <: record { <nat> : <datatype'>; <fieldtype'>;* }
```

**NOTE**: There is a need for a mechanism to also remove fields (which means adding a field when a record appears as an argument). The precise mechanism is still work in progress.
In order to be able to evolve and extend record types that also occur in inbound position (i.e., are used both as function results and function parameters), the subtype relation also supports *removing* fields from records, provided they are optional.
```
record { <fieldtype>;* } <: record { <fieldtype'>;* }
----------------------------------------------------------------------------------------------
record { <fieldtype>;* } <: record { <nat> : opt <datatype'>; <fieldtype'>;* }
```
*Note:* This rule is unusual form a regular subtyping perspective, but necessary in practice.
rossberg marked this conversation as resolved.
Show resolved Hide resolved
Together with the previous rule, it allows extending any record with optional fields in an upgrade, regardless of how it is used.
Any party not aware of the extension will treat the field as `null`.


#### Variants

Expand All @@ -639,6 +822,17 @@ variant { <fieldtype>;* } <: variant { <fieldtype'>;* }
variant { <nat> : <datatype>; <fieldtype>;* } <: variant { <nat> : <datatype'>; <fieldtype'>;* }
```

In order to be able to evolve and extend variant types that also occur in outbound position (i.e., are used both as function results and function parameters), the subtype relation also supports *adding* tags to variants, provided the variant itself is optional.
```
opt variant { <fieldtype>;* } <: opt variant { <fieldtype'>;* }
----------------------------------------------------------------------------------------------
opt variant { <nat> : opt <datatype>; <fieldtype>;* } <: opt variant { <fieldtype'>;* }
```
Copy link
Collaborator

@nomeata nomeata Oct 2, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that this rule is actually just an instance of the

not (<datatype> <: <datatype'>)
---------------------------------
opt <datatype> <: opt <datatype'>

rule, isn’t it?

Ah, but they have different elaboration of course… this makes your rules overlapping (I think you avoided that before…) A problem? Since this is just defining a binary relation on types at this point (and not an elaboration) it feels strange to use more rules than necessary.

Maybe it’s worth defining <: in a non-overlapping way, with simply an unrestricted

---------------------------------
opt <datatype> <: opt <datatype'>

And then doing the different cases in the elaborated version.

Or simply only define the elaborated version…

But I thought we’d agree we need different relations for “sensible evolution” and “decoding”?

Copy link
Contributor Author

@rossberg rossberg Oct 6, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, but they have different elaboration of course… this makes your rules overlapping (I think you avoided that before…) A problem?

Yes, it would be better if this wasn't there. Same problem as above. For now, I added similar handwaving to pepper over the non-determinism.

But I thought we’d agree we need different relations for “sensible evolution” and “decoding”?

I realised that doesn't actually work. You need to allow all of decoding in the evolution check, because cases like that might not be under your own control: some type you use may originate from another canister, that has performed multiple evolutionary steps that are transitively non-coherent, but you have missed the intermediate step that would avoid the "ugly" transitive case when you upgrade yourself.

Hence the informal language in L785+ instead, about discouraging users to create that case and implementations to warn about it.

IOW, we need the completeness property after all, even if some cases hopefully never occur in practice if everybody follows good style.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't follow the example. Even if you miss a step, and even if there there is incoherence, the “interface evolution” relation would still be transitive, woudn’t it? But maybe there is some complicate higher-order case where some canister is in the unfortunate situation where it is force to upgrade it’s interface in an undesirable way.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Our previous conclusion was (unless I misremember something) that the transitive rule (opt t <: opt t' with incompatible t, t') should not be allowed in the upgrade check, because that check always deals with a single evolutionary step where you simply shouldn't be doing that. And that we hence should define two relations, one with and the other without this rule (or something like that).

But the assumption that the check always deals with a single evolutionary step is bogus if your interface has to adapt to (the latest version of) somebody else's interface types, whose evolution might have a different pace, and who makes bad transitive evolutionary choices outside your control. In that case you may need the "bad" rule on options even during upgrade checks.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Our previous conclusion was (unless I misremember something) that the transitive rule (opt t <: opt t' with incompatible t, t') should not be allowed in the upgrade check,

yes… but

because that check always deals with a single evolutionary step where you simply shouldn't be doing that

wasn’t the reason, I think. I thought the reason was that we only had problems when you need the bad rules due to transitivity because you composed two serivces, and each was doing one (or more) sensible steps. But the transtive relation was only relevant for decoding, not an actual service evolution.

That’s why I hope we can have a “can sensibly evolve” relation (probably with polarities), that’s transitive, and a separate relation “can be decoded at” relation that has the unwanted relations.

But maybe I am too optimistic, and there will always be corner cases where a service needs to change its interface is a “bad” way.

*Note:* This rule is unusual form a regular subtyping perspective, but it is the dual to the one for records.
rossberg marked this conversation as resolved.
Show resolved Hide resolved
Together with the previous rule, it allows extending any optional variant with new tags in an upgrade, regardless of how it is used.
Any party not aware of the extension will treat the new case as `null`.


#### Functions

For a specialised function, any parameter type can be generalised and any result type specialised. Moreover, arguments can be dropped while results can be added. That is, the rules mirror those of tuple-like records, i.e., they are ordered and can only be extended at the end.
Expand Down Expand Up @@ -669,7 +863,7 @@ service { <name> : <functype>; <methtype>;* } <: service { <name> : <functype'>;

### Elaboration

To define the actual coercion function, we extend the subtyping relation to a ternary *elaboration* relation `T <: T' ~> f`, where `f` is a suitable coercion function of type `T -> T'`.
To define the actual coercion function used during deserialisation of a value, we extend the subtyping relation to a ternary *elaboration* relation `T <: T' ~> f`, where `f` is a suitable coercion function of type `T -> T'`.


#### Primitive Types
Expand Down Expand Up @@ -711,7 +905,33 @@ not (null <: <datatype>)
<datatype> <: <datatype'> ~> f
------------------------------------------
<datatype> <: opt <datatype'> ~> \x.?(f x)

not (<datatype> <: <datatype'>)
---------------------------------
opt <datatype> <: opt <datatype'>
~> \x.join_opt (\y.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don’t think we can allow such subtyping premises on the elaborated rules, as these define deserialization.

Shouldn’t this rule (without the premise, and with the if ∃ elaboration) subsume the normal rule?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I tried to keep the structure of the rules without elaboration. But I think the problem is that this rules try to express a phase separation that doesn't actually exist. All the rules should be considered "runtime". I removed the other rule and reformulated this one non-deterministically and added a comment admitting that the formulation is somewhat hand-wavy. Doing it properly would require a 4-place relation like v:t <: v':t', I think, but I'd prefer leaving that for another time.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we can clean up separtely.

I don’t think it’d be a four-place relation: Didn’t we determine that the input data should be considered untyped (and the type description just be an encoding help)? So I’d expect v <: v' : t or maybe more suggestively v ~> v':t.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose we could phrase it like that, but that would be a bit of a disconnect with the actual serialisation format, which does include a type -- it just may not be "principal".

Copy link
Collaborator

@nomeata nomeata Oct 6, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But that disconnect would be intentional: Didn't we just last week realize that it was misguided to take that type as more than just a way to help understand the binary structure of the value?

I think the conceptual phase distinction between “decoding binary to abstract value” and “interpreting a value at the expected type” is very helpful (and how many implementatoins will naturally approach this, too).

nomeata marked this conversation as resolved.
Show resolved Hide resolved
if (exists <datatype''>.
y : <datatype''> /\
<datatype''> <: <datatype> /\
rossberg marked this conversation as resolved.
Show resolved Hide resolved
<datatype''> <: <datatype> ~> f)
then ?(f y)
else null)
```
where
```
v : <datatype> iff M(v : <datatype>) is defined
```
The last rule *optimistically* tries to decode an option value when the types are not statically known to match.
nomeata marked this conversation as resolved.
Show resolved Hide resolved
It succceeds if there would have been a valid type for the value (written `v : t`) that is a subtype of both types.
The effect of this rule is that decoders do not need to perform a subtype check during deserialisation.
Instead, they can simply try to deserialise, and if they encounter a mismatch abort, to the innermost option type, returning `null`.

*Note:* The working assumption is that the type describing its incoming value is typically *principal*, i.e., the most precise type assignable to the value (in a sense that could be made precise).
nomeata marked this conversation as resolved.
Show resolved Hide resolved
In that case, `<datatype''>` will equal `<datatype>` and deserialisation succeeds exactly if the types match.
rossberg marked this conversation as resolved.
Show resolved Hide resolved
For example, to be principal, an empty vector would need to have type `vec empty`, `null` could only have type `null`, and a variant would only be allowed to include tags that actually occur in the value.

However, in practice it would be costly to enforce the requirement that all type descriptions in a serialised value are principal, for both encoder (who would need to compute the principal type) and decoders (who would need to check it).
Consequently, the semantics allows for the possibility that the encoder ascribes a less specific value with some redundant information, and allows the decoder to ignore redundant information like the element type of empty arrays, of null options, or unused variant tags.


#### Records
Expand All @@ -726,8 +946,14 @@ record { <fieldtype>;* } <: record { <fieldtype'>;* } ~> f2
----------------------------------------------------------------------------------------------
record { <nat> : <datatype>; <fieldtype>;* } <: record { <nat> : <datatype'>; <fieldtype'>;* }
~> \x.{f2 x with <nat> = f1 x.<nat>}

record { <fieldtype>;* } <: record { <fieldtype'>;* } ~> f
----------------------------------------------------------------------------------------------
record { <fieldtype>;* } <: record { <nat> : opt <datatype'>; <fieldtype'>;* }
~> \x.{f x with <nat> = null}
```


#### Variants

```
Expand All @@ -740,8 +966,14 @@ variant { <fieldtype>;* } <: variant { <fieldtype'>;* } ~> f2
------------------------------------------------------------------------------------------------
variant { <nat> : <datatype>; <fieldtype>;* } <: variant { <nat> : <datatype'>; <fieldtype'>;* }
~> \x.case x of <nat> y => <nat> (f1 y) | _ => f2 x

opt variant { <fieldtype>;* } <: opt variant { <fieldtype'>;* } ~> f
----------------------------------------------------------------------------------------------
opt variant { <nat> : opt <datatype>; <fieldtype>;* } <: opt variant { <fieldtype'>;* }
~> \x.join_opt (map_opt (\y.case y of <nat> z => null | _ => ?(f x)))
nomeata marked this conversation as resolved.
Show resolved Hide resolved
```


#### Functions

```
Expand Down Expand Up @@ -770,7 +1002,6 @@ service { <name> : <functype>; <methtype>;* } <: service { <name> : <functype'>;
## Open Questions

* Support default field values?
* Better upgradability for variants?
* Support generic type definitions?
* Namespaces for imports?

Expand Down Expand Up @@ -1000,56 +1231,3 @@ The same representation is used for function results.
Note:

* It is unspecified how the pair (B,R) representing a serialised value is bundled together in an external environment.


## Text Format

To enable convenient debugging, we also specify a text format for Candid values.
The types of these values are assumed to be known from context, so the syntax does not attempt to be self-describing.

```
<val> ::=
| <primval> | <consval> | <refval>
| ( <annval> )

<annval> ::=
| <val>
| <val> : <datatype>

<primval> ::=
| <nat> | <int> | <float> (TODO: same as Motoko grammar plus sign)
| <text> (TODO: same as Motoko grammar)
| true | false
| null

<consval> ::=
| opt <val>
| vec { <annval>;* }
| record { <fieldval>;* }
| variant { <fieldval> }

<fieldval> ::= <nat> = <annval>

<refval> ::=
| service <text> (canister URI)
| func <text> . <id> (canister URI and message name)
| principal <text> (principal URI)

<arg> ::= ( <annval>,* )

```

#### Syntactic Shorthands

Analoguous to types, a few syntactic shorthands are supported that can be reduced to the basic value forms:

```
<consval> ::= ...
| blob <text> := vec { N;* } where N* are of bytes in the string, interpreted [as in the WebAssembly textual format](https://webassembly.github.io/spec/core/text/values.html#strings)

<fieldval> ::= ...
| <name> = <annval> := <hash(name)> = <annval>
| <annval> := N = <annval> where N is either 0 or previous + 1 (only in records)
| <nat> := <nat> = null (only in variants)
| <name> := <name> = null (only in variants)
```