Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create Wasm globals from C code? #12793

Open
juj opened this issue Nov 17, 2020 · 15 comments
Open

Create Wasm globals from C code? #12793

juj opened this issue Nov 17, 2020 · 15 comments

Comments

@juj
Copy link
Collaborator

juj commented Nov 17, 2020

I would like to define WebAssembly globals in C code, something like

int __attribute__((wasm_global)) myGlobal = 5;

that would then emit a

(global $myGlobal i32 (i32.const 5))

into the build. Setting and reading the variable would then emit global.get and global.set, and taking an address &myGlobal would then be forbidden at compile time. Is there a way to do that today, or would it be possible to add?

@sbc100
Copy link
Collaborator

sbc100 commented Nov 17, 2020

Currently that only way to do that is using .S or .s files.

See system/lib/compiler-rt/stack_ops.S for an example.

There are plan to try expose first class WebAssembly concepts to llvm's higher levels and eventually perhaps to C/C++ as well. I believe current efforts are focused on reference types, but it seems like the tequniques could equally well be applied to other first class WebAssembly elements such as globals.

Here is some of the work being done to enable ref types: https://reviews.llvm.org/D91428

@tlively is there a short route we could use to expose wasm globals?

Anyway, hopefully the .S approach will serve well enough for the time being.

@juj
Copy link
Collaborator Author

juj commented Nov 17, 2020

Anyway, hopefully the .S approach will serve well enough for the time being.

Unfortunately that will not let me do what I would want to - I really would need to have support for defining globals directly in C code. The intent is twofold: 1) allow adding globals so that even if I nuke the heap clear, the globals will still preserve the state, and 2) to enable creating "zero extra disk space overhead" TLS items without needing to implement TLS array slots etc.

For solving 1) .S files would be enough, but for 2), .S files would be clumsy.

It'd be amazing if we were to add support for this! :)

@tlively
Copy link
Member

tlively commented Nov 17, 2020

Yes, this is on the roadmap for Igalia's reference types work in LLVM. Specifically, it should be implemented by their fifth milestone. I don't see a shorter route to implementing this, but they are making good progress so it shouldn't be too long.

@juj
Copy link
Collaborator Author

juj commented Nov 17, 2020

Just to be clear, after that work, if one is just creating globals but not using reference types for anything, one would not need to require to run in a Wasm VM that supports wasm reference types?

@tlively
Copy link
Member

tlively commented Nov 17, 2020

Yes, that's correct :) This work forms the foundation for being able to declare WebAssembly tables, memories, and globals from C/C++ and it just so happens that their particular motivating use case needs tables (and reference types) specifically.

@juj juj mentioned this issue Nov 19, 2020
@juj
Copy link
Collaborator Author

juj commented Sep 6, 2021

I wonder if there might have been any updates/progress on LLVM side on this? It has been about 10 months since the last check-in.

@sbc100
Copy link
Collaborator

sbc100 commented Sep 7, 2021

@wingo and @pmatos have been landing llvm changes related to this work. I don't think anything has landed on the clang/C++ side yet. Perhaps they can give a more precise update and/or provide some way for you to follow their progress?

@wingo
Copy link
Contributor

wingo commented Sep 8, 2021

Some context: our goal is reference types in C/C++. Storage locations for reference types (C globals and locals) can't be in linear memory, so we enhanced the WebAssembly backend for LLVM IR to allow a designated non-default address space (AS 1) to indicate allocations in a WebAssembly-managed storage (globals and locals, in practice). This is how you can tell LLVM that a given allocation must be in a global or a local in LLVM -- they are allocations (global variables or allocas) in AS 1. (Locals can also be allocated as part of the backend lowering process, for some SSA variables, but that's after LLVM IR.)

On the front-end, the initial idea was to have an address space qualifier to indicate a definition that should be allocated to a global -- similar in style to OpenCL's __thread, __local etc qualifiers. You need some front-end support for these values, because they carry restrictions: e.g. if you declare a global variable as being a wasm_global, you can't take its address. The front-end needs to ensure that it produces IR that we can handle, and that the user gets a sensible error otherwise.

However, upstream clang saw such a generic feature as being too invasive -- see e.g. https://reviews.llvm.org/D108464#2959591. And they're right in a way -- there is only a weak argument for being able to declare globals as being wasm globals, and no argument at all for function locals. Rather, what you want is an attribute on a type, indicating that values of this type should be allocated to globals; and not all types would have this attribute. That way we hit the reference types use case -- the only real use case -- and we punt on the generic feature.

So, I am refactoring my patch set. Last couple weeks have had a lot of admin on my plate but I hope to have externref/funcref wasm globals in C within 3-4 weeks.

Questions very welcome, @juj and others :)

@wingo
Copy link
Contributor

wingo commented Sep 8, 2021

i should mention that if there's a strong argument for e.g. __wasm_global int x;, we can make that happen. i just don't know what the argument is for it yet

@tlively
Copy link
Member

tlively commented Sep 8, 2021

Besides @juj's reasons above, being able to declare globals would be useful because they can be exported or imported as part of a Module's interface. For modules that are meant to be easily used without large amounts of JS glue, being able to use globals is a much better UX and more direct than having to write values into memory.

@wingo
Copy link
Contributor

wingo commented Sep 9, 2021

Wouldn't the UX concerns be satisfyingly fulfilled by exposing getter and (possibly) setter functions?

@juj
Copy link
Collaborator Author

juj commented Sep 9, 2021

Thanks @wingo,

the main carrot for us would be to be able to implement much more efficiently and conveniently TLS variables in multithreaded builds. Currently an access to a C

_Thread_local int i;

will currently generate an inconvenient access

  (i32.store16 offset=30 align=1
   (local.get $0)
   (i32.const 105)
  )
  (i32.store offset=16
   (local.get $0)
   (i32.load
    (local.tee $1
     (global.get $__tls_base)
    )
   )
  )

which could be simplified to a single global.get with a __wasm_global modifier.

Currently managing the JS object references in dictionaries outside the Wasm module has not caused much problems, though it would be interesting to be able to use externrefs in the future, if paired with the ability to make direct calls to web apis from wasm.

I suppose in general case with wasm reference types, all the wasm reference type globals would be JS GC roots? I wonder how the GC would work with the heap embedding example:

struct { int x; externref_t y; } z __attribute__((wasm_var));

How would JS track liveness of the object held by y? Also, I suppose such structs would need to be thread local? Or how would that work in multithreaded/SAB environment?

@wingo
Copy link
Contributor

wingo commented Sep 9, 2021

@juj the issue is the semantic restrictions, afaics -- what would happen if user code took the address of i, would that be allowed? I would assume so. In that case you couldn't say that "all thread-locals are wasm globals", because you can't take the address of a wasm global. I haven't looked deeply into this but what I think you need is a pass that turns thread-local globals whose address is never taken into WebAssembly globals -- but at the IR level, not the clang level. A kind of special-purpose SROA, which would be an optimization without impact on language semantics. Of course I could be misunderstanding!

For the struct-containing-an-externref example, here it is clear that z cannot be allocated to linear memory (from a semantics POV) because y has no byte representation. Instead I would expect that typeof(z) would have an annotation indicating that it is a reference type (the new approach suggested by rjmccall, instead of address space attributes). But how to represent z ? Of course if it is never passed by reference, its scalar members could be allocated to individual globals. In that way you solve the rooting problem. Or if the GC MVP lands, you could declare a struct type, and have one global for the struct type -- that would allow you to pass z by reference (but with what semantics?). I think those are the possibilities on the low-level side. So you have some restrictions that the backend imposes on the frontend semantics, and we will need to propagate them to the user in some kind of sensible way. Anyway, aggregate reference types are on my early 2022 list :) Your thoughts are very welcome! For a longer-range vision, here's a possible sketch from a little while back.

@juj
Copy link
Collaborator Author

juj commented Sep 9, 2021

what would happen if user code took the address of i, would that be allowed?

No, that does not need to be allowed.

Note that our use case is not about implementing all TLS variables this way, and not even the thread_local keyword in this way, but just the ability to create wasm_global variables (that one cannot take an address of) that would behave as TLSes.

I think you need is a pass that turns thread-local globals whose address is never taken into WebAssembly globals

In general I don't think this kind of machinery is needed (for our intended use case), we would be able to annotate our interesting TLS variables manually/explicitly, and only those would ever need to become TLS globals.

@tlively
Copy link
Member

tlively commented Sep 9, 2021

For that level of fine-grained control, perhaps it would be appropriate to use assembly files, as @sbc100 suggested in an earlier comment. I don't think we would be able to convince clang to accept a patch for a language extension motivated by a small codegen improvement like that. That being said, it would be great to see an optimization pass to turn non-address-taken thread-local variables into globals like @wingo mentioned, either at the IR level or in the WebAssembly backend. With such an optimization pass, you wouldn't need to do anything to get the codegen you want.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants