Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[IDL] Optimistic subtyping #1959

Closed
rossberg opened this issue Sep 21, 2020 · 9 comments
Closed

[IDL] Optimistic subtyping #1959

rossberg opened this issue Sep 21, 2020 · 9 comments
Assignees
Labels
idl Candid or serialisation language design Requires design work typing Involves the type system

Comments

@rossberg
Copy link
Contributor

rossberg commented Sep 21, 2020

After going through our various discussions and non-solutions to the upgradability problem again (see #1523), I am tempted to revisit one option that we only briefly discussed under the name of “optimistic” deserialisation or “backtracking”.

Here is the problem again. To maintain transitivity in the presence of record extension that can work both in contra-variant position and in higher-order cases (see below for summary of design constraints), we have concluded that it seems inevitable to have a rule

----------------
opt t1 <: opt t2

that works for all t1, t2. The big question, then, is what operational behaviour to assign to this.

Previously, I had naively proposed the “ideal” semantics, which, as @nomeata pointed out, amounts to a runtime test:

-------------------
opt t1 <: opt t2 ~>
  \x. case x (\_. null) (\y. if (t1 <: t2 ~> f) then opt (f y) else null)

The problem with that is that every deserialiser would have to implement subtyping on IDL types, which is an extra cost and arguably unreasonable burden, and chances are that many implementations will take shortcuts.

Another option that we only briefly discussed but dismissed earlier was the “optimistic” approach, i.e., deserialisation will try to deserialise the optional value as t2, but if that fails it aborts and returns null for the option. (Perhaps not really “backtracking” since it never explores a second option but just aborts.)

For an implementation, this is a fairly natural way of handling things, but semantically, it raises at least two questions:

  1. What’s a simple declarative specification?
  2. What semantic property does this have?

For the former, the following might work:

-------------------
opt t1 <: opt t2 ~>
  \x. case x (\_. null) (\y. if (exists t3. y : t3 /\ t3 <: t1 /\ t3 <: t2 ~> f) then opt (f y) else null)

In other words, if there is a “more principal” type t3 for the value, with which we can successfully deserialise, then we do so.

This corresponds to the observation that the only way in which an extra type-check in the ideal approach could fail (I believe) is by having a serialisation type that is not principal. For example, by containing empty vectors or variants with cases whose types don’t match between t1 and t2 but that never actually occur in the value. These pairs of types would never be explored for actual deserialisation, so would require an extra subtype check to maintain the semantics. In the above formulation, we simply allow these “redundant” type components to be ignored.

Consequently, the semantic property of deserialisation generalises in a corresponding manner:

deser(v:t1, t2) = v’ if and only if there exists t3 with t3 <: t1 and t3 <: t2, such that v:t3.

Now, this is a bit odd, but do we think this property is “good enough” for our purposes? Notably, it is no worse than the property we’d get if we didn’t type-annotate serialised values in the first place, but made them untyped (as we originally had). In a way, you could argue, we have brought this whole problem on ourselves only by extracting structural information about the interpretation of the serialised value into a “type”, a feature that other formats do not have. The intended use is that that type is principal, but to make it easier for implementations, our semantics does not enforce that.

WDYT?

For the record, here is a summary of requirements for the upgradability relation and the operational semantics of deserialisation, which was previously scattered across various discussions:

  • Soundness: The upgradability relation implies that no deserialisation can fail.

  • Completeness: The upgradability relation covers all cases of successful deserialisation.

  • Transitivity: The upgradability relation implies that deserialisation cannot fail even across multiple upgrades.

  • Record Extensibility: Records are upgradable by adding new (optional) fields, even when they appear in both positive and negative positions, such that round-trips always remain possible.

  • Higher-order Extensibility: The upgradability relation extends to higher-order cases, where there is no obvious owner of the input/output contract.

  • Language Injectivity: The upgradability relation does not depend on version or role annotations in interfaces that have no counterpart in source languages, i.e., an interface description can be generated from source-level types without special features or cross-version knowledge.

  • Simple Deserialisation: Deserialisation does not require (sub)type checks other than the type-directed traversal of the value blob.

  • Non-coercive Deserialisation: Deserialisation of a value is invariant across super- and sub-types.

  • Type Erasure: Deserialised values do not require carrying dynamic type information on the language side.

@rossberg rossberg added language design Requires design work typing Involves the type system idl Candid or serialisation labels Sep 21, 2020
@rossberg rossberg self-assigned this Sep 21, 2020
@rossberg
Copy link
Contributor Author

@nomeata, @crusso, @matthewhammer

@nomeata
Copy link
Collaborator

nomeata commented Sep 21, 2020

Thanks for pickin up that ball again. So this ties into #870 (semantics of deserialization), and would have to be solved with that.

The original motivation was record fields, but you don’t talk about record fields here… I assume you’d also add a rule that treats a missing field as null when trying to decode it as an opt t?

My gut feeling is that this still amounts to “try to decode, turn failue into silent null”. Which doesn't seem to give much value over untyped formats…

Also, so far we use the same relation for subtying for decoding, and subtyping for upgrades. You give evidence why during decoding, opt true should be accepted (and treated as null) at type opt nat. But do we really want our tooling to allow an actor returning an opt bool in the next version to change that to opt nat?

I could imagine that we keep the previous subtyping rules for opt for interface evolution (but allow adding opt field in argument records), while having a larger relation as the subtyping used for decoding. Looking at your list it seems I am questioning if “Completeness” is actually required, or a even desired property.

Might be nice to write these properties down in terms of the deser function (or is it a relation?), the subtyping relationship and the elaborated subtyping relationship.

Does this proposal has the “coherence” and “transitive coherence” properties? I guess coherence is by construtions in the way you phrase the rules. Transitive coherence … no, that’ can’t hold. We have opt nat <: opt int <: opt nat, but the composition of the coercion functions would always return null, while the opt nat <: opt nat would be the identity.

@rossberg
Copy link
Contributor Author

Thanks for pickin up that ball again. So this ties into #870 (semantics of deserialization), and would have to be solved with that.

Right.

The original motivation was record fields, but you don’t talk about record fields here… I assume you’d also add a rule that treats a missing field as null when trying to decode it as an opt t?

Yes, I'm only focussing on the problematic rule.

My gut feeling is that this still amounts to “try to decode, turn failue into silent null”. Which doesn't seem to give much value over untyped formats…

That is correct. I was considering the history of how we arrived at the typed format. IIRC, our original motivation wasn't actually more checking, but a more compressed representation, e.g., saving the repetition of label indices in a vector etc. So from that perspective, a less typeful semantics could be argued for.

It can also well be argued against, but then we need a feasible alternative, which we so far have failed to produce. :(

Also, so far we use the same relation for subtying for decoding, and subtyping for upgrades. You give evidence why during decoding, opt true should be accepted (and treated as null) at type opt nat. But do we really want our tooling to allow an actor returning an opt bool in the next version to change that to opt nat?

Probably not. In an actual specification, I'm sure we'll want to define two relations for these different use cases.

I could imagine that we keep the previous subtyping rules for opt for interface evolution (but allow adding opt field in argument records), while having a larger relation as the subtyping used for decoding. Looking at your list it seems I am questioning if “Completeness” is actually required, or a even desired property.

Yes, agreed, this property may require some qualification (or even removal).

Might be nice to write these properties down in terms of the deser function (or is it a relation?), the subtyping relationship and the elaborated subtyping relationship.

deser should certainly be a function, we don't want non-determinism in our deserialisers.

Does this proposal has the “coherence” and “transitive coherence” properties? I guess coherence is by construtions in the way you phrase the rules. Transitive coherence … no, that’ can’t hold. We have opt nat <: opt int <: opt nat, but the composition of the coercion functions would always return null, while the opt nat <: opt nat would be the identity.

Right, coherence cannot hold under this proposal. Desirable as it would be, I don't think coherence is possible given such a typing rule, regardless of what dynamic semantics we assign to it.

@nomeata
Copy link
Collaborator

nomeata commented Sep 22, 2020

I get the feeling that there is an underlying shift on implicit goals, which I can’t phrase fully formally. Maybe we have been too ambitious previously. But under these goals, it seems to make sense.

Maybe one way to phrase it formally in a pleasant way is the following:

  • We use the <: relation only for subtyping of interfaces, and possibly be more restrictive (e.g. disallow opt nat <: opt int or other “accidentially” working decoding relations)

  • We formally define an untyped data model for Candid. (It turns out that most implemenations do that anyways, so better make that clear.)

  • The specification of deserialization is factored into two steps (partial functions or relations):

    1. decoding binary blob to the abstract data model
    2. interpretation of the untyped value at the expected type.

    This underlines what you wrote: the type section was added as an encoding aid to describe the value, and not more, so it makes sense to forget about it after step 1.

    The propety “deser(v:t1, t2) = v if and only if there exists t3 with t3 <: t1 and t3 <: t2, such that v:t3” should then become something we can prove.

My gut feeling says that this framing might give us a better framwork to express some of the properties you list above?

@rossberg
Copy link
Contributor Author

Yes, that might make sense.

The underlying goal shift is that the type annotation in the value is no longer viewed as prescriptive, but only as descriptive (it's like a transparent type annotation, not a coercive one). Although we never explicitly stated the former as an intention, I think we all implicitly assumed it and feel that losing it is somewhat unfortunate. But as you say, it might actually align better with implementation reality.

An alternative, btw, would be to require decoders to always produce principal types, so that t1 and t3 always coincide. But that's significant extra work on both encoders (who must compute it) and decoders (who must check it), so seems undesirable. But if you want to sell this to a type-inclined person, you could describe that as a the core model minus the requirement to enforce it.

@matthewhammer
Copy link
Contributor

matthewhammer commented Sep 23, 2020

The intended use is that that type is principal, but to make it easier for implementations, our semantics does not enforce that.

... if you want to sell this to a type-inclined person, you could describe that as a the core model minus the requirement to enforce it.

Makes sense and seems like a reasonable compromise to me.

As for helping implementors, how do the Candid tools relate to that question (cc @chenyan-dfinity)? E.g., perhaps we can have a use case for didc (or the Haskell equivalent) that helps test and verify this for data produced elsewhere?

@rossberg
Copy link
Contributor Author

Closely related, we have so far glossed over coming up with a definition of when a serialised value is actually well-typed, i.e., abstractly, when v:t is valid per se. Clearly, t cannot just be any supertype of v's "principal" one, as there is an implicit assumption underlying our coercive formulation of deserialisation assumes that e.g. t cannot forget any fields in a record. On the other hand, it can add cases in a sum, which is a bit asymmetric, but see above.

It seems like specifying this requires a separate, more restrictive, non-coercive notion of subtyping. Which may reenforce @matthewhammer's earlier point that upgradability is not exactly the same as subtyping -- at least we need to differentiate between the general and the smaller non-coercive subrelation.

Notably, the larger relation does not actually have anything like "principal" types, since it lacks an antisymmetry property (there may be minimal types in some sense, but they are not unique). But the smaller relation ought to have principal types for the values it ascribes.

@matthewhammer
Copy link
Contributor

Which may reenforce @matthewhammer's earlier point that upgradability is not exactly the same as subtyping -- at least we need to differentiate between the general and the smaller non-coercive subrelation.

Humbly accepting this gracious validation with great inner joy. Thank you. : )

@nomeata
Copy link
Collaborator

nomeata commented Nov 23, 2020

I believe this has turned into dfinity/candid#110 and is done.

@nomeata nomeata closed this as completed Nov 23, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
idl Candid or serialisation language design Requires design work typing Involves the type system
Projects
None yet
Development

No branches or pull requests

3 participants