Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data section offsets #163

Open
ballercat opened this issue Oct 14, 2018 · 11 comments
Open

Data section offsets #163

ballercat opened this issue Oct 14, 2018 · 11 comments

Comments

@ballercat
Copy link
Owner

Problem

There are currently two methods of encoding a Data Section entry into the binary. These are

  • const hello: i32 = 'Hello World!'; - static strings
  • const array: i32[] = [1, 2, 3, 4]; - static array/raw data regions

In both cases the data is encoded into the data section such that it'll take up the first available offset in memory.

The data sections in the WASM spec allow for an explicit offset. An example of this can be seen in the reference spec tests for data section here.

Goal

Define and implement a syntax for allowing an explicit offset to be defined when defining data sections in Walt.

Possible syntax:

A pseudo function call

const memory: Memory = { initial: 1 };
const string: i32 = memory.data(1024 /* offset */, 'Hello World!' /* value */);

This would require for altering the grammar to allow for top-level function calls, as well as some guards on calling non memory.data() function calls.

A static object

const string: i32 = { offset: 1024, value: 'Hello World!' };

This would work great for strings but not so well for static arrays. Also, having an object property(offset) not be part of the object is very odd.

???

Maybe there is an additional way to define this which would make sense, but it does seem like having the memory involved in some way makes the most sense. Especially since in the future it will be possible to have N > 1 memories in a single binary.

@Pyrolistical
Copy link

How about just continue the pointer like concept?

const string: i32 = 1024;
string[] = 'Hello World!';

const array: i32[] = 2024;
array[] = [1, 2, 3];

@Pyrolistical
Copy link

Pyrolistical commented Oct 15, 2018

regarding the multiple memories, how about just a reference?

const string: i32 = &someMemory[1024];
string[] = 'Hello World!';

@elifarley
Copy link

elifarley commented Oct 25, 2018

Here's a suggestion that involves changing the parser:

const hello1: i32 = 'Hello World!'@1024#1; // static string at position 1024 of memory # 1
const hello1b: i32 = 'Hello World!'@1024; // static string at position 1024 of memory # 1 as well
const array: i32[] = [1, 2, 3, 4]@4096#7; // static array/raw data region of memory # 7

Instead, maybe it's better to have offset info near the type declaration, like this:

const hello1: i32@1024#1;
hello1 = 'Hello World!'; // static string at position 1024 of memory # 1

const hello1b: i32@1024;
hello1b = 'Hello World!'; // static string at position 1024 of memory # 1 as well

const array: i32[]@4096#7;
array = [1, 2, 3, 4]; // static array/raw data region of memory # 7

@elifarley
Copy link

And here's another suggestion...

const memory: Memory = Memory.allocate({ initial: 1, number: 1 });
const hello32: i32[] = memory.view({offset: 40}); // starts at position 40
const array64: i64[] = memory.view(); // starts at 0

hello32 = 'Hello!';

array64 = [1, 2, 3, 4];

@jtiscione
Copy link
Contributor

jtiscione commented Nov 16, 2018

This project takes a low level approach, but for strings and arrays, I think this C-style practice of "we'll track the offset as an integer, you're responsible for the length" is just a tiny bit too low-level and is going to elicit a WTF reaction among the project's intended users, who aren't familiar with the C approach. Tracking lengths manually is a burden familiar to C, but alien to JS. There should be sugar for this, or JS programmers are going to reinvent zero-terminated strings.

Maybe there should be five special higher-level sugary types available for use: slice_i32, slice_i64, slice_i64, slice_f64, and string. (Or some other names like array_i32, vector_i32, list_i32, etc.) These could basically be implemented as a standardized set of structs, with first-class syntax support, which programmers would be encouraged to use instead of hacking up their own structs. The only difference between slice_i32 and string would be that string supports string literal syntax like i32 currently does; otherwise they could be implemented identically in WASM.

Each of these types would be a struct holding a pair of scalars: an i64 length and an i32[], i64[], f32[], or f64[] pointer. It doesn't need to bother enforcing an index range (what would it do anyway?) so a programmer should be able to ignore a length if he wants. But it should be made available, at least, whether it's mutable or not, or people will get upset.

So it would look like this:

// Offset is a byte offset, length is number of elements,
// in keeping with ArrayBuffer/TypedArray syntax
const memory: Memory = Memory.allocate({ initial: 1, number: 1});
const mySlice: slice_i32 = memory.view({offset: 1024, length: 10}); // elifarly suggested syntax
const length: i64 = mySlice.length; 
log(length); // prints 10
const primitiveArray: i32[] = mySlice.offset; // explicit syntax
// OR
const primitiveArray: i32[] = mySlice; // coercive syntax?
// THEN... mySlice can support bracket-syntax?
mySlice[0] = 42;
primitiveArray[1] = 666;
log(primitiveArray[0]); // prints 42;
log(mySlice[1]); // prints 666;

@Pyrolistical
Copy link

I like it, but how about some sugar where both mean the same thing

const pointer: slice_i32 = memory.view({offset: 1024, length: 1});
const pointer: i32* = &memory[1024];

@jtiscione
Copy link
Contributor

The first one is a pointer to a struct having a pointer and a length. It basically maps to this:

type Slice32Type = { 'array': i32[], 'length': i32 };  // "sugarless" type
const foo: Slice32Type = memory.view32({array: 1024, length: 1});

(Instead of memory.view(), two methods called memory.view32() and memory.view64() would indicate the element width.)
A sugarless type wouldn't be equipped to handle bracket notation, but the compiler could support it with a sugary type. So you could do foo[3] instead of having to say foo.array[3].

This is how Go does it. There's an array type in Go but they don't want you to pay attention to it. They want you to use their slice type everywhere instead. It's similar to ArrayBuffer and TypedArrays in JS.

In the second, there is no struct or length information, just a pointer. It would map to this I think-

const pointer: i32[] = 1024;

The ampersand might be useful if it can be prepended to things other than memory. But dereference and address operators are not in JS.

@ballercat
Copy link
Owner Author

I've been thinking about this a bunch and mostly the bits about the fact that we would need a sugary type for strings/slices. The way things are shaping out, it seems like the existing syntax + types are not expressive enough.

I don't think I want to keep adding sugary types without allowing the user to do something similar (w/o compiler changes). Even though adding a new type to the compiler directly would be very trivial. I think (and this idea isn't fully baked yet) I'll have to pivot on the types a bit and expose operator overloading (among other things) to the user, so that things like string or slice could be implemented. Better primitives would also make the topic of the issue easier to implement IMO.

In some ways it would be better (less magic in the compiler), in some other ways it would be worse since the syntax would be even farther from JS/flow, at least when dealing with types. Then again, most of this stuff can be built-in/std-lib-ed so that most users don't need to worry about it.

For example indexing into a string could be done via better primitives like so

type String = ({ length: i32 }, i32); // can be used as an i32 or object with field .length
// syntax TBD
// " -> { " denotes a Block not a function, kind of important distinction
operator String[] = (target : String, index : i32) -> {
    // do t.length sanity check perhaps
    return i32.load(t + ((1 + index) << 2));
};

Current array logic would also be "implemented" in the same way, technically it already is, just inside the compiler not in user-land.

Along these lines. The goal at the end of the day is to allow the user to use wasm however. And as much as I'd like to avoid creating a new language/type system there seems to be no way around it without limiting usability.

I'll likely open up a new discussion on the topic as it's a bigger issue than "how to set data sections".

@Pyrolistical
Copy link

Pyrolistical commented Nov 21, 2018

As you can see with with my comment history, I never really cared about the "seems like js" feature. I think C is close enough, and this is why I proposed C like solutions. We have a C like memory management problem here!

As for your operator/block idea, you can simplify the syntax by not implementing an operator keyword, but predefining operators as syntax sugar that maps to a named function.

For example,

function array_accessor(target : String, index : i32) {
    // do t.length sanity check perhaps
    return i32.load(t + ((1 + index) << 2));
}
const foo: String = 'bar';
foo[2] is just sugar for array_accessor(foo, 2)

But this would imply you need to implement function overloading or generics function array_accessor<T>(target: T, index i32) { yikes

@jtiscione
Copy link
Contributor

Hacky idea: preprocessor directives at the top that include type definitions:

#import 'stdlib.walt';
#import 'memory-manager.walt';

Or you could go really nuts with a module system for the compiler that makes it like Babel:

const walt = require('walt-compiler');
const unsigned = require('walt-unsigned-types');
const pointerSyntax = require('walt-pointer-syntax');
walt.compile(`
      export function test(n: u32): void {
        const array: u32[] = &n;
        n[0] = x * y;
      }
`, [unsigned, pointerSyntax]);

@ballercat
Copy link
Owner Author

Yup, the compiler already supports extensions. All of the current features are written as internal (enabled by default) language extensions and grammar. There is a reference implementation of closures as a plugin to demonstrate how a complex extension could be made and injected into a compiler.

I'll probably make a package (similar to babel presents) for all experimental-* features for these type of changes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants