diff --git a/design/IDL-AS.md b/design/IDL-AS.md new file mode 100644 index 00000000000..5737dac70fa --- /dev/null +++ b/design/IDL-AS.md @@ -0,0 +1,287 @@ +The IDL-ActorScript integration +=============================== + +## Goals + +This document specifies the integration of the IDL with the ActorScript +language, in particular: + + * How to translate between an ActorScript `actor` type and an IDL `service` + description in both ways (i.e. _exporting_ IDL from ActorScript, and _importing_ + IDL into ActorScript), and how to translate between the values of these types. + + * The supported work-flows, including a sketch of the involved tools. + +We try to achieve the following goals (but do not achieve them completely) + + * We can round-trip all ActorScript values of sharable type. More precisely: + + When exporting an ActorScript type `ta` into an IDL type `ti`, then + round-tripping a value `va : ta` through `ti` yields a value that is + indistinguishable from `va` at type `ta`. + + * ActorScript can receive all IDL types: The type export has an inverse, the + type import mapping, which is injective (up-to IDL type equivalence via + shorthands). + + * ActorScript can receive all IDL values when importing an IDL: + + When importing an IDL type `ti` into `ta`, then every IDL value `vi : ti` + will successfully translated to an ActorScript value of type `ta`. + +The following are not necessary true: + + * The type export mapping is not total: there are ActorScript types that cannot + be exported, such as mutable arrays. + + * The type export mapping is not injective: there may be different + ActorScript types that map to the same IDL type, e.g. `Char`, `Word32` and + `Nat32`. + + This implies that round-tripping an ActorScript type via the IDL can yield + different types. + + * For some types, not all IDL values may be accepted: for example, `Char` is + may be exported as `nat32`, but not all `nat32` values can be read as + `Char`. This can only be the case for types not in the image of the type + import mapping. + +NB: These properties (and non-properties) are not specific to ActorScript, and +we expect that they will hold similarly for interfaces to other typed languages +with seamless serialization (e.g. built-in, macro or type class based +integration). In this sense, this document serves as a blueprint. Untyped +languages or languages with a code-generation workflow may have a different +story. + +## The type mappings + +We define + * a partial function `e` from ActorScript types to IDL types. + Types that are not in the domain of `e` cannot be exported. + * a partial function `i` from IDL types to ActorScript types. + Types that are not in the domain of `i` cannot be imported. + +These definition treats ActorScript types and IDL types as structural and +infinite; a concrete implementation will have to look through type constructors +in ActorScript and introduce type definitions in the IDL as necessary. + +It assumes that the IDL short-hands (e.g. named or anonymous fields) +are part of the grammar of types, and that `i` is allowed to make difference +choices for types that are short-hands. + +The function is defined with regard to the grammars in [IDL.md](IDL.md) and [Syntax.md](Syntax.md). + +### Type export + +``` +e : -> +e(Null) = null +e(Bool) = bool +e(Nat) = nat +e(Int) = int +e(Nat) = nat for n = 8, 16, 32, 64 +e(Int) = int for n = 8, 16, 32, 64 +e(Word) = nat for n = 8, 16, 32, 64 +e(Float) = float64 +e(Char) = nat32 +e(Text) = text +e(shared { ;* }) = record { ef();* } +e(variant { ;* }) = variant { ef();* } +e([]) = vec (e()) +e(? ) = opt (e()) +e(shared -> ) = func (efn(shared -> )) +e(actor { ;* }) = service { em();* } +e( ( ,* ) ) = record { e();* } +e(Any) = reserved +e(None) = empty + +ef : -> +ef ( : ) = unescape() : e() + +efn : -> +efn(shared -> ()) = ea() -> () oneway +efn(shared -> async ) = ea() -> ea() + +ea : -> ,* +ea( ( ,* ) ) = e();* +ea() = ( e() ) otherwise + +em : -> +em( : ) = unescape() : efn() + +unescape : -> | +unescape("_" "_") = +unescape( "_") = +unescape() = +``` + +### Type import + +``` +i : -> +i(null) = Null +i(bool) = Bool +i(nat) = Nat +i(int) = Int +i(nat) = Nat for n = 8, 16, 32, 64 +i(int) = Int for n = 8, 16, 32, 64 +// i(float32) not defined +i(float64) = Float +i(text) = Text +i(reserved) = Any +i(opt ) = ? i() +i(vec ) = [ i() ] +i(blob) = [ word8 ] // if ActorScript had a bytes type, it would show up here +i(record { ;* }) = ( i(),* ) // matches tuple short-hand +i(record { ;* }) = shared { if();* } +i(variant { ;* }) = variant { if();* } +i(func ) = ifn() +i(service { ;* }) = actor { im();* } + +if : -> +if( : ) = escape() : i() +if( : ) = "_" "_": i() // also for implicit labels + +ifn : -> +ifn((,*) -> () oneway pure?) = shared ia() -> () +ifn((,*) -> (,*) pure?) = shared ia(,*) -> ia(,*) + +ia : ,* -> +ia(,) = i() +ia(,*) = ( i(),* ) otherwise + +im : -> +im( : ) = escape() : ifn() + +escape : -> +escape = "_" if is a reserved identifier in ActorScript +escape = "_" if ends with "_" +escape = if is a valid ActorScript not ending in "_" +escape = "_" hash() "_" otherwise +``` + +### Notes: + + * Up-to short-hands, `i` is injective and the right-inverse of `e`. + + Formally: For all IDL types `t ∈ dom i`, we have that `e(i(t))` is equivalent to + `t`, i.e. either they are the same types, or short-hands of each other. + + * Tuples are exported using the unnamed field short-hand, which is how tuples + are idiomatically expressed in the IDL: + ``` + e(()) = record {} + e((Int, )) = record {int} + e((Int, Nat)) = record {int;nat} + e(shared {i:Int, n:Nat)) = record {i:int; n:nat} + e(shared {_0_:Int, _1_:Nat)) = record {0:int; 1:nat} + ``` + + * The mapping `i` tries to detect types that can be expressed as tuples in + ActorScript. + ``` + i(record {int;nat}) = (Int, Nat) + i(record {int; nat; foo:text}) = shared {_0_:Int; _1_:Nat; foo:Text} + i(record {0:Int, 1:Nat)) = shared {_0_:int; _1_:nat} + ``` + + * The `escape` and `unescape` functions allow round-tripping of IDL field + names that are not valid ActorScript names (fake hash values): + ``` + i(record {int; if:text; foobar_:nat; "_0_":bool}) + = shared (_0_:Int; if_:Text; _1234_:Nat; _4321_:Bool) + ``` + This is another source of partiality for `e`: + ``` + e(shared {clash_ : Nat; clash : Int}) + ``` + is undefined, because `unescape(clash_) = unescape(clash)`. + + * ActorScript functions with type parameters are not in the domain of `e`. + + * Abstract ActorScript types are not in the domain of `e` + + * The translation produces IDL functions without parameter names. But both + the IDL and ActorScript conveniently support non-significant names in + parameter lists. These are essentially comments, and do not affect, for + example, the type section in a message, so it is not necessary to specify + them here. + + But tooling (e.g. `asc` exporting an IDL from an ActorScript type) is of + course free to use any annotations in the ActorScript type (or even names + from pattern in function definitions) also in the exported IDL. + + * The soundness of the ActorScript type system, when it comes to higher-order + use of actor and function references, relies on + ``` + ∀ t1 t2 : dom(e), t1 <: t2 ⟹ e(t1) <: e(t2) + ``` + In other words: ActorScript subtyping must be contained in IDL subtyping. + + * There is no way to produce `float32` or functions with a `pure` annotation. + Importing interfaces that contain these types fails. + +## The value mappings + +For each ActorScript type `t` in the domain of `e`, we need mapping from +ActorScript value of type `t` to an IDL value of type `e(t)`, and vice-versa. + +Note that decoding may only fail for those `t` that are not in the range of `i`. + +These mappings should be straight-forward, given the following clarifications: + +* Characters (of type `Char`) are mapped to their Unicode scalar as a `nat32`. + Decoding a `nat32` that is not a valid Unicode scalar fails. + +## Works flows + +The mapping specified here can be used to support the following use-cases. The +user interfaces (e.g. flag names, or whether `asc`, `idlc`, `dfx` is used) are +just suggestions. + +* Generating IDL from ActorScript + + If `foo.as` is an ActorScript `actor` compilation unit, then + + asc --generate-idl foo.as -o foo.didl + + will type-check `foo.as` as `t = actor { … }`, map the ActorScript type `t` + to an IDL type `e(t)` of the form `service `, and produce a + textual IDL file `foo.didl` that ends with a `service n : `, + where `n` is the name of the actor class, actor, or basename of the source + file. + +* Checking ActorScript against a given IDL + + If `foo.as` is an ActorScript `actor` compilation unit and `foo.didl` a + textual IDL file, then + + asc --check-idl foo.didl foo.as -o foo.wasm + + will import the type service `t_spec` specified in `foo.didl`, using the + mapping `i`, will generate an IDL type `e(t)` as in the previous point, and + and check that `e(t) <: t_spec` (using IDL subtyping). + +* Converting IDL types to ActorScript types + + If `foo.didl` a textual IDL file, then + + idlc foo.didl -o foo.as + + will create an ActorScript library unit `foo.as` that consists of type + definitions. + All ``s and the final `` from `foo.didl` is turned into a `type` + declaration in ActorScript, according to `i`. + Imported IDL files are recursively inlined. + + Variant: Imported IDL files are translated separately and included via + `import` in ActorScript. + +* Importing IDL types from the ActorScript compiler + + If `path/to/foo.didl` a textual IDL file, then a declaration + + import Foo "path/to/foo" + + is treated by `asc` by reading `foo.didl` as if the developer had + run `idlc path/to/foo.didl -o path/to/foo.as`.