An experimental object-oriented language and implementation for exploring and evaluating the WasmFX extension.
Wob is a typed mini-OO language in the style of C# and friends. It is meant to be representative of the most relevant problems that arise when compiling such languages to Wasm with GC extensions. These arguably are:
- generics
- classes
- safe downcasts
- separate compilation
To that end, Wob features:
- primitive data types
- tuples and arrays
- functions and closures
- classes and inheritance
- first-class generics
- safe runtime casts
- simple modules with separate compilation and client-side linking
For simplicity, Wob omits various other common features, such as access control to classes and modules, more advanced control constructs, or sophisticated type inference, which are mostly independent from code generation and data representation problems. It also doesn't currently have interface types.
The wob
implementation encompasses:
- Interpreter
- Compiler to Wasm (WIP)
- A read-eval-print-loop that can run either interpreted or compiled code
The language is fully implemented in the interpreter, but the compiler does not yet support closures. It does, however, fully implement garbage-collected objects, tuples, arrays, text strings, classes, casts, and generics, making use of most of the constructs in the GC proposal's MVP.
For example, here is a short transcript of a REPL session with wob -c -x
:
proposals/gc/wob$ ./wob -x -c
wob 0.1 interpreter
> func f(x : Int) : Int { x + 7 }; f(5);
(module
(type $0 (func))
(type $1 (func (param i32) (result i32)))
(global $0 (mut i32) (i32.const 0))
(func $0 (type 0) (i32.const 5) (call 1) (global.set 0))
(func $1 (type 1) (local.get 0) (i32.const 7) (i32.add))
(export "return" (global 0))
(export "f" (func 1))
(start 0)
)
12 : Int
f : Int -> Int
>
See below for a brief explanation regarding the produced code.
Saying
make
in the wob
directory should produce a binary wob
in that directory.
The wob
binary is both compiler and REPL. For example:
wob
starts the interactive prompt, using the interpreter.wob -c
starts the interactive prompt, but each input is compiled and run via Wasm.wob -c -x
same, but the generated Wasm code is also shown for each input.wob -r <file.wob>
batch-executes the file via the interpreter.wob -r -c <file.wob>
batch-executes the file via Wasm.wob -c <file.wob>
compiles the file to<file.wasm>
.
See wob -h
for further options.
Points of note:
-
The Wasm code produced is self-contained with no imports except the small Wob runtime system (unless the source declares explicit imports of other Wob modules). Consequently, it can run in any Wasm environment supporting the GC proposal.
-
The compiler even supports a headless mode in which the relevant parts of the runtime system is included into each module.
-
That means that there is no I/O. However, a program can communicate results via module exports or run assertions.
-
When batch-executing, all Wasm code is itself executed via the Wasm reference interpreter, so don't expect performance miracles.
Run
make test
to run the interpreter against the tests, or
make wasmtest
to run the compiler.
typ ::=
id ('<' typ,* '>')? named use
'(' typ,* ')' tuple
typ '[' ']' array
typ [' ']' '!' readonly array
typ '$' boxed
('<' id,* '>')? '(' typ,* ')' '->' typ function
typ '->' typ function (shorthand)
Notes:
-
Generics can only be instantiated with boxed types, which excludes the primitive data types
Bool
,Byte
,Int
, andFloat
. These can be converted to boxed types viaBool$
,Int$
, etc. There is no autoboxing. -
Readonly arrays are a supertype of regular arrays, and only they allow covariant subtyping on the element type.
unop ::= '+' | '-' | '^' | '!'
binop ::= '+' | '-' | '*' | '/' | '%' | '&' | '|' | '^' | '<<' | '>>'
relop ::= '==' | '!=' | '<' | '>' | '<=' | '>='
logop ::= '&&' | '||'
exp ::=
int integer literal
float floating-point literal
text text string
id variable use
unop exp unary operator
exp binop exp binary operator
exp relop exp comparison operator
exp logop exp logical operator
exp ':=' exp assignment
'(' exp,* ')' tuple
'[' exp,* ']' array
exp '.' nat tuple access
exp '.' id object access
exp '[' exp ']' array or text access
'#' exp array or text length
exp ('<' typ,* '>')? '(' exp,* ')' function call
'new' id ('<' typ,* '>')? '(' exp,* ')' class instantiation
'new' typ '[' exp ']' '(' exp ')' array instantiation
exp '$' box value
exp '.' '$' unbox boxed value
exp ':' typ static type annotation
exp ':>' typ dynamic type cast
'assert' exp assertion
'return' exp? function return
block block
'if' exp block ('else' block)? conditional
'while' exp block loop
'func' ('<' id,* '>')? '(' param,* ')' anonymous function (shorthand)
(':' typ) block
block :
'{' dec;* '}' block
Notes:
-
There is no distinction between expressions and statements: a block can be used in any expression context and it evaluates to the value of its last expression, or
()
(the empty tuple) if its lastdec
isn't an expression. -
In the REPL, the entire input is treated as a block, and its result is the input result.
-
An
assert
failure is indicated by executing anunreachable
instruction in Wasm and thereby trapping.
param ::=
id ':' typ
dec ::=
exp expression
'let' id (':' typ)? '=' exp immutable binding
'var' id ':' typ '=' exp mutable binding
'func' id ('<' id,* '>')? '(' param,* ')' function
(':' typ) block
'class' id ('<' id,* '>')? '(' param,* ')' class
('<:' id ('<' typ,* '>')? '(' exp,* ')')?
block
'type' id ('<' id,* '>')? '=' typ type alias
Note:
-
All scoping is strictly linear, i.e., it is not currently possible to forward-reference a binding, not even inside a class, except through the use of
this
. Functions and classes can, however, recursively refer to themselves. -
Classes have just one constructor, whose parameters are the class definition's parameters and whose body is the class body.
-
Inside class scope, immutable
let
bindings can only refer to the class parameters orlet
variables from a super-class, or any bindings from outer scope; they have neither access to methods northis
. That prevents the ability to observe uninitialisedlet
through dispatch, which is crucial, since these have to be represented as immutable Wasm fields on the instance. -
Classes are nominal types.
-
For simplicity, there is no access control for objects.
imp ::=
'import' id? '{' id,* '}' 'from' text import
module ::=
imp;* dec;*
Notes:
-
A module executes by executing all contained declarations in order.
-
For simplicity, there is no access control beyond using blocks, that is, all top-level declarations are exported.
-
Imports are loaded eagerly and recursively. When an import specifies a qualifying identifier
M
, then all names from the import list are renamed to include the prefixM
-- this is a hack to avoid the introduction of name spacing constructs. -
With batch compilation, modifying a module generally requires recompiling its dependencies, otherwise Wasm linking may fail.
Bool Byte Int Float Text Object
true false null nan this
Notes:
- The
this
identifier is only bound within a class.
func fac(x : Int) : Int {
if x == 0 { return 1 };
x * fac(x - 1);
};
assert fac(5) == 120;
func foreach(a : Int[], f : Int -> ()) {
var i : Int = 0;
while i < #a {
f(a[i]);
i := i + 1;
}
};
let a = [1, 2, 5, 6, -8];
var sum : Int = 0;
foreach(a, func(k : Int) { sum := sum + k });
assert sum == 6;
func fold<T, R>(a : T[], x : R, f : (Int, T, R) -> R) : R {
var i : Int = 0;
var r : R = x;
while (i < #a) {
r := f(i, a[i], r);
i := i + 1;
};
return r;
};
func f<X>(x : X, f : <Y>(Y) -> Y) : (Int, X, Float) {
let t = (1, x, 1e100);
(f<Int$>(t.0$), f<X>(t.1), f<Float$>(t.2$));
};
let t = f<Bool$>(false$, func<T>(x : T) : T { x });
assert t.0 == 1;
assert !t.1;
assert t.2 == 1e100;
class Counter(x : Int) {
var c : Int = x;
func get() : Int { return c };
func set(x : Int) { c := x };
func inc() { c := c + 1 };
};
class DCounter(x : Int) <: Counter(x) {
func dec() { c := c - 1 };
};
class ECounter(x : Int) <: DCounter(x*2) {
func inc() { c := c + 2 };
func dec() { c := c - 2 };
};
let e = new ECounter(8);
// Module "intpair"
type IntPair = (Int, Int);
func pair(x : Int, y : Int) : IntPair { (x, y) };
func fst(p : IntPair) : Int { p.0 };
func snd(p : IntPair) : Int { p.1 };
A client:
import IP_{pair, fst} from "intpair";
let p = IP_pair(4, 5);
assert IP_fst(p) == 4;
The compiler makes use of the following constructs from the GC proposal.
i8
anyref eqref dataref i31ref (ref $t) (ref null $t) (rtt n $t)
struct array
(Currently unused is i16
).
ref.eq
ref.is_null ref.as_non_null ref.as_i31 ref.as_data ref.cast
i31.new i31.get_u
struct.new struct.get struct.get_u struct.set
array.new array.new_default array.get array.get_u array.set array.len
rtt.canon rtt.sub
(Currently unused are the remaining ref.is_*
and ref.as_*
instructions, ref.test
, the br_on_*
instructions, the signed *.get_s
instructions, and struct.new_default
.)
Wob types are lowered to Wasm as shown in the following table.
Wob type | Wasm value type | Wasm field type |
---|---|---|
Bool | i32 | i8 |
Byte | i32 | i8 |
Int | i32 | i32 |
Float | f64 | f64 |
Bool$ | i31ref | eqref |
Byte$ | i31ref | eqref |
Int$ | ref (struct i32) | eqref |
Float$ | ref (struct f64) | eqref |
Text | ref (array i8) | eqref |
Object | ref (struct) | eqref |
(Float,Text) | ref (struct f64 eqref) | eqref |
Float[] | ref (array f64) | eqref |
Text[] | ref (array eqref) | eqref |
C | ref (struct (ref $vt) ...) | eqref |
eqref | eqref |
Notably, all fields of boxed type have to be represented as eqref
, in order to be compatible with generics.
Types and type parameters have a runtime representation as well, in order to enable casts over generic types.
Wob bindings are compiled to Wasm as follows.
Wob declaration | in global scope | in func/block scope | in class scope |
---|---|---|---|
let |
mutable global |
local |
immutable field in instance struct |
var |
mutable global |
local |
mutable field in instance struct |
func |
func |
not supported yet | immutable field in dispatch struct |
class |
immutable global |
not supported yet | not supported yet |
Note that all global declarations have to be compiled into mutable globals, since they could not be initialised otherwise.
Objects are represented in the obvious manner, as a struct whose first field is a reference to the dispatch table, while the rest of the fields represent:
- class type arguments (immutable)
- constructor arguments (immutable)
let
fields (immutable)var
fields (mutable)
of the class and its super classes, in definition order. All fields but var
ones are immutable. Type arguments are represented as runtime type, and used to check casts. Identical (type or value) arguments just passed forward to a super class are not currently deduped, but could be.
The dispatch table contains references to the function of the class and its super classes, in definition order. In addition, the last field of each dispatch table is a pointer to itself. This is also duplicated in place by dispatch tables for subclasses, so that each dispatch table is interleaved with references to the dispatch tables of all its super classes. The type of these references is eqref
, and it is used to implement the class check for explicit casts. That is, the identity of dispatch tables is used as a proxy for class identities.
For example, the dispatch tables for the classes in the following snippet,
class C() {
func f() {};
};
class D() <: C() {
func g() {};
func h() {};
};
class E() <: D() {
func f() {}; // override
};
are:
$disp_C = [ $f_C | $disp_C ]
$disp_D = [ $f_C | $disp_C | $g_D | $h_D | $disp_D ]
$disp_E = [ $f_E | $disp_C | $g_D | $h_D | $disp_D | $disp_E ]
To check that an object's class is a subclass of D
, the code first uses br_on_cast
to down cast the object to the expected representation, which also implies that its dispatch table contains a slot for $disp_D
. It then checks that the reference in that slot is equal to class D
's dispatch table. Finally, it checks that the runtime types for generic parameters match up.
Class declarations translate to a global binding a class descriptor, which is represented as a struct with the following fields:
Index | Type | Description |
---|---|---|
0 | ref (struct ...) | Dispatch table |
1 | rtt $C | Instance RTT |
2 | ref (func ...) | Constructor function |
3 | ref (func ...) | Pre-allocation |
4 | ref (func ...) | Post-allocation |
5 | ref $Super | Super class descriptor |
Objects are allocated by a class'es constructor function with a 2-phase initialisation protocol:
-
The first phase evaluates constructor args and invokes the pre-alloc hook, which initialises parameter and
let
fields, after possibly invoking the pre-alloc hook of the super class to do the same; this returns the values of all immutable fields (including constructor parameters as hidden fields). -
With these values as initialisation values, the instance struct is allocated.
-
The second phase invokes the post-alloc hook, which evaluates expression declarations and
var
fields, after possibly invoking the post-alloc hook of the super class to do the same; this initialises var fields via mutation. -
Finally, the instance struct is returned.
Generics require boxed types and use eqref
as a universal representation for any value of variable type.
References returned from a generic function call are cast back to their concrete type when that type is known at the call site.
Moreover, arrays and tuples likewise represent all fields of boxed type with eqref
, i.e., they are treated like generic types as well. This is necessary to enable passing them to generic functions that abstract some of their types, for example:
func fst<X, Y>(p : (X, Y)) : X { p.0 };
let p = ("foo", "bar");
fst<Text, Text>(p);
If p
was not represented as ref (struct eqref eqref)
, then the call to fst
would produce invalid Wasm.
All type parameters to a function or class are reified as value parameters of type eqref
, representing a runtime description of the source-level type. Similarly, an object of generic class stores each of the class'es type parameter instantiations as hidden fields.
When invoking a generic function or constructor, a suitable representation is built dynamically for the type arguments (possibly using surrounding type parameters) and passed along.
The type representation is a fairly simple tree structure, where non-generic types are the leaves, which are a reference that is either an i31ref
representing a primitive type (or a box thereof), or a reference to a class'es dispatch table representing that class. Generic types or any other structural types are represented by an array whose first slot is the tag, again an i31ref
for built-in types or a reference to a class'es dispatch table, and the remaining slots are the representations of the type arguments.
Wob type | Runtime type |
---|---|
Bool | i31(1) |
Byte | i31(2) |
Int | i31(3) |
Float | i31(4) |
Text | i31(5) |
Object | i31(6) |
Bool$ | i31(-1) |
Byte$ | i31(-2) |
Int$ | i31(-3) |
Float$ | i31(-4) |
Text$ | [i31(7); i31(5)] |
(Float$,Text) | [i31(8); i31(-4); i31(5)] |
Object[] | [i31(9); i31(6)] |
Object[]! | [i31(10); i31(6)] |
C | $disp_C |
C<Int$,Text> | array [$disp_C; i31(-3); i31(5)] |
Checking type equivalence is a simple parallel tree recursion, short-cut by reference equality. The runtime does not currently perform type canonicalisation.
Type definitions (with the type
keyword) are implemented as functions returning a runtime type.
All runtime representations (including reified generic type parameters) are omitted in "parametric" compilation mode (command line flag -p
). In that mode, casts are forbidden.
Wob requires a tiny runtime system containing a handful of instrinsic operations, such as for Text
type and for comparing runtime types. It is imported by each output module under the name "wob-runtime".
And it comes in the form of a little Wasm module that is generated internally on the fly when using the interpreter.
When running outside the interpreter, the "wob-runtime"
module has to be provided externally. A suitable standalone runtime module can be generated with the compiler invocation
wob -g wob-runtime.wasm
Using option -n
when compiling sources, the compiler also allows producing headless Wasm modules that do not import a runtime. Instead they, include all necessary intrinisics in the produced module.
Each Wob module compiles into a Wasm module. The body of a Wob module is compiled into the Wasm start function.
All global definitions are automatically turned into exports, of their respective binding form (for simplicity, there currently is no access control). In addition, an exported global named "return"
is generated for the program block's result value.
Import declarations directly map to Wasm imports of the same name. No other imports are generated for a module.
A Wob program can be executed by linking its main module. Linking does not require any magic, it simply locates the runime (if used) and all imported modules (by appending .wasm
to the URL, which is interpreted as file path), recursively links them (using a simple registry to share instantiations), and maps exports to imports by name.
The batch compiler inserts a custom section named "wob-sig"
into the Wasm binary, which contains all static type information about the module. This is read back when compiling against an import of another module. (In principle, this information could also be stored in a separate file, since the Wasm code itself is not needed to compile an importing module.)
In the REPL, each input is compiled into a new module and linked immediately.
Furthermore, before compilation, each input is preprocessed by injecting imports from an implicit module named "*env*"
, which represents the REPL's environment. That makes previous interactive definitions available to the module.
Wob's linking model is as straightforward as it can get, and requires no language-specific support. It merely assumes that modules are loaded and instantiated in a bottom-up manner.
The simple template for runner glue code for node.js is provided in js/wob.js
and shown below. With that, invoking
node js/wob.js name
suffices to run a compiled module name.wasm
, provided the dependencies are avaliable in the right place. The Wob runtime system is assumed to be found (as wob-runtime.wasm
) in the current working directory (if not running headless), and imported modules (in compiled form) at their respective import paths, again relative to the current directory.
let fs = require('fs');
let registry = {__proto__: null};
async function link(name) {
if (! (name in registry)) {
let binary = fs.readFileSync(name + ".wasm");
let module = await WebAssembly.compile(new Uint8Array(binary));
for (let im of WebAssembly.Module.imports(module)) {
await link(im.module);
}
let instance = await WebAssembly.instantiate(module, registry);
registry[name] = instance.exports;
}
return registry[name];
}
async function run(name) {
let exports = await link(name);
return exports.return;
}
process.argv.splice(0, 2);
for (let file of process.argv) {
run(file);
}