-
Notifications
You must be signed in to change notification settings - Fork 57
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove unnecessary host functions allocator usage #4
base: main
Are you sure you want to change the base?
Remove unnecessary host functions allocator usage #4
Conversation
|
||
## Future Possibilities | ||
|
||
The existence of the host-side allocator has become questionable over time. It is implemented in a very naive way, and for determinism and backwards compatibility reasons it needs to be implemented exactly identically in every client implementation. Furthermore, runtimes make substantial use of heap memory allocations, and each allocation needs to go twice through the runtime <-> host boundary (once for allocating and once for freeing). Moving the allocator to the runtime side, while it would increase the size of the runtime, would be a good idea. But before the host-side allocator can be deprecated, all the host functions that make use of it need to be updated to not use it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do you move the allocator to the runtime side?
Do you just give it a big block of memory and let it split it up as desired?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't really see the problem? The runtime could just do whatever it wants with its memory.
If the question is how to make sure that the memory usage is bounded, my personal opinion would be that at initialization the runtime does storage_get(":heappages")
and puts the value in its memory allocator, and the allocator crashes if it reaches the limit.
An alternative way is to rely on the fact that the wasm memory needs to be grown by the wasm code with a memory.grow
opcode, so the host can detect when memory usage is over a certain threshold.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not an issue, I'm just not familiar enough with Wasm and was curious about how this would be done in practice
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moving the allocator back into the runtime would make the runtime more similar to a normal program (as normal programs manage their memory themselves, with the exception of "the sbrk
mechanism" which is mimicked by memory.grow
in wasm).
The fact that the allocator is outside of the runtime is what requires additional code/hacks. So the answer on how to move the allocator back to the runtime is simply "remove the hacks".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is my understanding of the RFC correct that it proposes to introduce the new functions but not to remove the old ones?
This seems valid, but after being accepted/implemented, the old versions should be deprecated and then removed entirely (after some significant period of time, to allow backwards compatibility)
Yes!
They should be deprecated, yes, but removing them would mean that you could no longer sync older blocks. |
This is definitely a step into the right direction. I also agree on that the C API approach makes sense under current condition. I am a bit worried though that this proposal doesn't provide any numbers though. I am thinking that in the future, we might need to consider the API design overhaul as if it's 2023 actually. But this is outside of the scope of this proposal. Just for linking purposes, here are a relevant discussion on Substrate: paritytech/substrate#11883 |
Is there an obstacle to host functions actually doing borrowing aka just pass pointers? We need not even copy memory in say:
Also.. A hash interface like We could've wasm types designed for usage directly in the host, like demanding u64 or simd alignment, and then impl appropriate
You'd then cargo patch in sp-sha3 over sha3 and then you've a faster hashing interface. It's more work to do this however.. |
I have opened this RFC with the desire of making things move. I would be more than happy if someone else took over this effort and provided numbers. Personally it seems obvious to me that "allocating memory, writing a hash to it, and freeing said memory" is slower than "writing a hash to memory". And if it turns out that in practice it wasn't slower, I would search for the problem elsewhere.
This is very precisely what this RFC fixes. |
(func $ext_hashing_twox_128_version_2 | ||
(param $data i64) (param $out i32)) | ||
(func $ext_hashing_twox_256_version_2 | ||
(param $data i64) (param $out i32)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are the twox hashes still being used anywhere? I'd thought we deprecated them since they're footguns.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes they are still used for hashing the key of a storage value which is a static key.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They are used in order to know where to store stuff. For example the pallet Foo
stores stuff under the key xxhash64("Foo")
.
While this is out of scope of this RFC, this is also very inefficient (literally every single storage access calculates the xxhashes of the pallet and the item) and should be moved to compile time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We deprecated twox entirely though, no? twox_64 is xxhash64. twox_128 is simply two copies of xxhash64: https://github.com/paritytech/polkadot-sdk/blob/cfa19c37e60f094a2954ffa73b1da261c050f61f/substrate/primitives/core/hashing/src/lib.rs#L77
It's therefore easy to find collisions for both twox flavors, which yields two storage locations which your code believes to be different, but actually wind up being the same. It technically facilitates supply chain attacks against parachains, even if used only at compile time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't quote me on that, but it is likely that nowadays the only input to the twox hash functions are strings hardcoded in the source code (pallet names and storage key names).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't quote me on that, but it is likely that nowadays the only input to the twox hash functions are strings hardcoded in the source code (pallet names and storage key names).
Yes this. I mean it is up to the user, but for maps the default is blake2.
should be moved to compile time.
This is also already done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We'll avoid any pallet dependencies with bizarre hard coded storage names then I guess. lol
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is also already done.
We do not need host calls for twox hash then I guess.
Just for the case where a fixed length buffer is being hashed.
Anyways.. If we typically hash fixed length buffers then maybe this variable length buffer allocation is best to just keep, as the above interfaces demands quite a few more calls.
I see, we need roughly this for backwards compatibility anyways. |
(param $key_type_id i32) (param $seed i64) (param $out i32) (return i32)) | ||
``` | ||
|
||
The behaviour of these functions is identical to their version 1 or version 2 equivalent. Instead of allocating a buffer, writing the output to it, and returning a pointer to it, the new version of these functions accepts an `out` parameter containing the memory location where the host writes the output. The output is always of a size known at compilation time. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The expected behavior (trap?) for an out-of-bounds memory access ought to be specified.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're actually (I guess semi-unintentionally) raising a good point.
If we take for example ext_crypto_ed25519_sign_version_2
, on success the function returns 0 and writes to out
, and on failure the function returns 1 and doesn't write anything. Should a trap happen if out
is out of range even in the situation where 1 would be returned? The answer is important for determinism.
I've added some text mentioning that yes, the range of out
must always be checked. This is a bit arbitrary, but I think that passing a completely invalid input has "higher priority" than a failure in what is being demanded from the function.
Hi, I want to know any plan to move the allocator inside of wasm runtime. |
Same remark as #8 (comment) I would like to propose this RFC, but I don't know how to proceed. |
To move forward with this pull request, I would like to see some basic benchmarks being done. @yjhmelody could you please help with this? |
I would also like to see that this directly moves the entire allocator to the runtime. Not sure why we only should do this half way. |
I can expand the RFC to cover all the host functions. When I opened this RFC (two and a half months ago), I actually expected a swift and unanimous approval. But if RFCs inherently need a long time, I might as well make it as big as possible. |
+1 for going all the way and moving the whole allocator inside of the runtime, so that for new blocks the allocator host functions are never called anymore. |
Yeah sorry on that, but it should actually also not be "swift". Let's do it properly. |
I'm currently updating the RFC. When it comes to The only place where this function is used is in the For the sake of not breaking things, I'm going to add a |
I've updated the RFC. As I've mentioned in the "unresolved questions" section, I expect the offchain worker functions to be slightly slower than right now, but this is IMO negligible since they are offchain. To me, the biggest potential sources of performance degradation comes from the deprecation of The changes to |
Is something in progress? While I agree that RFCs shouldn't necessarily be swift, if nothing is done then the ETA of this RFC is "infinity" and it would be great if it was finite. The more we wait, the more other changes (such as #31) will conflict with this RFC. (and the reason why I originally submitted a stripped-out version is that I anticipated this happening) |
(param $key_type_id i32) (param $seed i64) (param $out i32) (return i32)) | ||
``` | ||
|
||
The behaviour of these functions is identical to their version 1 or version 2 counterparts. Instead of allocating a buffer, writing the output to it, and returning a pointer to it, the new version of these functions accepts an `out` parameter containing the memory location where the host writes the output. The output is always of a size known at compilation time. The runtime execution stops with an error if `out` is outside of the range of the memory of the virtual machine. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does one actually implement a host function that borrows like this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This probably requires some changes to the Substrate procedural macro stuff (which I'm not familiar with).
This is why I mention in the "drawbacks" section that it might be difficult to implement, but I have no idea how much.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This probably requires some changes to the Substrate procedural macro stuff (which I'm not familiar with). This is why I mention in the "drawbacks" section that it might be difficult to implement, but I have no idea how much.
I don't remember from the top of my head whether the macro machinery supports this right now, but yeah, we can do this; not a problem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The macros support borrowing since the beginning you can pass &mut [u8]
quite easily.
Agreed, our unstable host calls for elliptic curves pass |
Sorry,I haven't seen this message before. You can see that when various Ethereum ecosystems use wasm or other virtual machines, they all use the runtime Allocator instead of the host. This is very convenient in verification scenarios, because verification is just a pure function, which helps support more For ecological scenarios, it is a pity that substrate has not supported it. Sorry again, I don't have the energy to promote related improvements recently. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the long delay. I left some nitpicks, generally looking good and we could continue with this RFC from my side.
(func $ext_hashing_twox_128_version_2 | ||
(param $data i64) (param $out i32)) | ||
(func $ext_hashing_twox_256_version_2 | ||
(param $data i64) (param $out i32)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes they are still used for hashing the key of a storage value which is a static key.
(param $key_type_id i32) (param $seed i64) (param $out i32) (return i32)) | ||
``` | ||
|
||
The behaviour of these functions is identical to their version 1 or version 2 counterparts. Instead of allocating a buffer, writing the output to it, and returning a pointer to it, the new version of these functions accepts an `out` parameter containing the memory location where the host writes the output. The output is always of a size known at compilation time. The runtime execution stops with an error if `out` is outside of the range of the memory of the virtual machine. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The macros support borrowing since the beginning you can pass &mut [u8]
quite easily.
(func $ext_input_size_version_1 | ||
(return i64)) | ||
(func $ext_input_read_version_1 | ||
(param $offset i64) (param $out i64)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not just have the read
function and follow the way the other read
functions are implemented?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that doing it this way (first getting the size then getting the content) is way more clear and simple, and more importantly avoids the risk of allocating a buffer too small and having to re-allocate it again later and copy or throw away all the data that was already read.
When reading the storage, knowing the size of the storage value would require a database access, so if reading a storage value required two function calls we would need to access the database twice, which seems not great.
When reading the input, I assume that ext_input_size_version_1
just returns something like self.input.len()
, so there's no problem. Also, you typically read the input only once.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean you could pass out
with a zero length to only get the length of the value.
|
||
- `ext_input_size_version_1`/`ext_input_read_version_1` is inherently slower than obtaining a buffer with the entire data due to the two extra function calls and the extra copying. However, given that this only happens once per runtime call, the cost is expected to be negligible. | ||
|
||
- The `ext_crypto_*_public_keys`, `ext_offchain_network_state`, and `ext_offchain_http_*` host functions are likely slightly slower than their deprecated counterparts, but given that they are used only in offchain workers this is acceptable. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ext_offchain_network_state
doesn't exist anymore?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mention it here: #4 (comment)
I think that this function should be deprecated entirely.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fine by me. Maybe should be mentioned a little bit more obvious in the RFC.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The runtime execution stops with an error if `value_out` is outside of the range of the memory of the virtual machine, even if the size of the buffer is 0 or if the amount of data to write would be 0 bytes. | ||
|
||
```wat | ||
(func $ext_offchain_network_peer_id_version_1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no way to read the external addresses? (Not really sure we still need them)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also mention it here: #4 (comment) and I think it should be deprecated.
It's completely unclear to me how to define "an external address".
I've run a quick small experiment and have tried to extract the allocator out of smoldot, in other words do to smoldot the opposite of what this RFC wants to do to the runtime, and then measure the size of smoldot compiled to Wasm. With the allocator outside of smoldot: 2,789.9kB So basically the size of the code of the allocator is <5 kiB. And that's uncompressed. According to this page, compression seems to reduce the size by ~75%, so we're closer to 1kiB to 1.5kiB of overhead if we moved the allocator to the runtime. |
Sorry if this isn't the right place for this, but I would like to share our insights on the topic: I'm part of a team developing runtimes in Go, and we've had to modify the compiler to ensure compatibility with the external allocator. We've implemented a custom garbage collector that integrates seamlessly with this external allocator, but this is one of the reasons we're not using the standard Go compiler (another is related to the Wasm target). +1 this change to happen. |
- The following function signature is now also accepted for runtime entry points: `(func (result i64))`. | ||
- Runtimes no longer need to expose a constant named `__heap_base`. | ||
|
||
All the host functions that are being superceded by new host functions are now considered deprecated and should no longer be used. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What exactly does deprecated
mean here? Do you mean increasing the version number? If not, wouldn't the existing chain still need these host functions if it starts synchronizing from the old block? So it seems like we always need to ensure these functions are available?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It means what is written: they should no longer be used. Runtimes are what use host functions. Therefore, runtimes should no longer use these host functions.
Do you mean increasing the version number?
This text simply says that if the RFC adds foo_v2
, then foo_v1
should no longer be used. You can see it as increasing the version number, yes, but given that there's no semver involved it's better to think as foo_v1
and foo_v2
as two completely unrelated functions.
|
||
- It is unclear how replacing `ext_storage_get` with `ext_storage_read` and `ext_default_child_storage_get` with `ext_default_child_storage_read` will impact performances. | ||
|
||
- It is unclear how the changes to `ext_storage_next_key` and `ext_default_child_storage_next_key` will impact performances. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For writing variable-length host functions, is it really necessary to revise it?
In fact, the host always needs an Allocator to support the copy of the entrypoint parameter.
So if the new version cannot improve performance, it will only increase the complexity of use.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you define "variable-length host function" and "entrypoint parameter"?
If you're talking about the input to the runtime call, it's also tackled in this RFC.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's just what you are talking about: input to the runtime call
.
The current specification of the input to the runtime call
need host to call alloc func(defined in host side or runtime side).
|
||
```wat | ||
(func $ext_storage_read_version_2 | ||
(param $key i64) (param $value_out i64) (param $offset i32) (result i64)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
result
could be an i32
here; even with an i32
a return value of -1
is not ambiguous as you'll physically never be able to actually allocate a buffer of such size.
So, let me see if I got this right, is the idea here that:
- the guest prepares a buffer of
n
bytes, - it calls this host function and passes a pointer to this buffer and its size,
- the host reads the storage and writes
m
bytes into the buffer, wherem <= n
and returnsm
, - if
m == n
(so either the buffer was exactly the right size or we still have some data left) then the guest enlarges the buffer and calls the host function again with a bigger offset and repeats this until the whole value is read (and if the guest guesses wrong then this could be called many times),
correct?
If so maybe instead we could do this: return an i64
which would be a packed (i32, i32)
(or a i64
with its upper bit reserved) where the first value is going to be whether the read succeeded, and the second value is going to be the actual size of the storage value, and have the algorithm go like this:
- the guest prepares a buffer of
n
bytes, - it calls this host function and passes a pointer to this buffer and its size,
- let
m
be the size of the storage value; ifn <= m
then write it to the buffer and return(1, m)
- if
n > m
then return(0, m)
- on the guest size if a
0
was returned then enlarge the buffer to the exact size needed and call the function again (which is guaranteed to succeed now), otherwise we're done
And also do not require that "if the pointer to the buffer is invalid then runtime execution stops even if length is 0".
This way we gain the following benefits:
- if the buffer is exactly the right size we will only call this function once (as opposed to twice with the current proposal),
- if the buffer is not big enough then we will only call this function twice (as opposed to twice or more with the current proposal),
- there's only going to be one host-guest memcpy (because if the buffer is not big enough then nothing will be copied) which is going to be faster/more efficient,
- the guest can choose whether it wants to try its luck guessing the initial size of the buffer (risk having to reallocate the buffer in the worst case but read the value in only one call in the best case), or it can call the host function with a size of "0" to ask for the right size and prepare a buffer which is of exactly the right size (always read the value in two calls, but won't ever need to reallocate the buffer)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
result could be an i32 here; even with an i32 a return value of -1 is not ambiguous as you'll physically never be able to actually allocate a buffer of such size.
I went with i64
because I'm not really a fan of "-1i32
is not ambiguous". If we ever switch to a 64bits runtime, then it's no longer impossible for 4 GiB to be allocated. Sure, a 4 GiB storage item would be unusually big, but it's also not in the real of completely absurd.
correct?
Yes, but I think that your explanation builds upon a wrong assumption.
if m == n (so either the buffer was exactly the right size or we still have some data left) then the guest enlarges the buffer and calls the host function again with a bigger offset and repeats this until the whole value is read (and if the guest guesses wrong then this could be called many times),
The idea when I designed this function is that the runtime is supposed to know what the size of the storage item is. When you're for example reading a u32
from storage, you know it's always 4 bytes. When you're reading a Vec<something>
, you can first read the length prefix in front of the Vec
(which realistically shouldn't be more than 5 bytes) and then allocate a Vec
of exactly the correct size.
Allocating a buffer whose size is equal to the size of the runtime storage item builds upon the assumption that you can't decode SCALE-encoded items in a streaming way. This assumption is true in practice now, but in the absolute it isn't.
To give an example, if you have a storage item whose format is Vec<Vec<u8>>
, right now you have to allocate a big buffer that contains the entire storage item, then when decoding you copy the data within each inner Vec
. It seems more optimal to me (but I could be wrong) to call storage_read
once to get the size of the outer Vec
, then twice for each inner Vec
, and read the data of each inner Vec<u8>
s directly its their destination buffer.
This RFC is kind of dead, so unless anyone else steps up I volunteer to push it forward. I will implement this in Substrate and report back; let's get this over the finish line. |
As discussed above, it appears twox is sufficently depricated to ignore here. A polkavm-ish question: Is there some minimum overhead for a host function call? If small enough then like I described above .. One could open up the hash functions guts and provide host functions for |
For WASM VMs depending on the implementation hostcalls can be expensive (e.g. if the VM fiddles with the signal handlers when going through the boundary) and for PolkaVM specifically it does involve crossing a thread/process boundary (essentially similar to sending a message through a channel to another thread, but possibly a little cheaper than your average Rust async channel implementation due to optimistic spinlocking I can do), so it's really hard to say and depends on the ratio of work saved vs the number of host calls. Another alternative would be a vectored call, that is something like Nevertheless, I'm not sure if we need this? |
These calls are purely for performance, and have no reason to transfer control to the host. We could've machine code which gets linked in directly maybe? I guess that'd bring adifferent auditing problem..
Nah. It helps avoids some performance footguns, but whatever. IRFT standards where those footguns look likely, like hash-to-curve and HKDF, pay some small cost to make this not a problem, not that we support any of their hash functions anyways. |
I'm not free recently, but if you implement it, I can help review it :) |
This PR adds a new PolkaVM-based executor to Substrate. - The executor can now be used to actually run a PolkaVM-based runtime, and successfully produces blocks. - The executor is always compiled-in, but is disabled by default. - The `SUBSTRATE_ENABLE_POLKAVM` environment variable must be set to `1` to enable the executor, in which case the node will accept both WASM and PolkaVM program blobs (otherwise it'll default to WASM-only). This is deliberately undocumented and not explicitly exposed anywhere (e.g. in the command line arguments, or in the API) to disincentivize anyone from enabling it in production. If/when we'll move this into production usage I'll remove the environment variable and do it "properly". - I did not use our legacy runtime allocator for the PolkaVM executor, so currently every allocation inside of the runtime will leak guest memory until that particular instance is destroyed. The idea here is that I will work on the polkadot-fellows/RFCs#4 which will remove the need for the legacy allocator under WASM, and that will also allow us to use a proper non-leaking allocator under PolkaVM. - I also did some minor cleanups of the WASM executor and deleted some dead code. No prdocs included since this is not intended to be an end-user feature, but an unofficial experiment, and shouldn't affect any current production user. Once this is production-ready a full Polkadot Fellowship RFC will be necessary anyway.
This PR adds a new PolkaVM-based executor to Substrate. - The executor can now be used to actually run a PolkaVM-based runtime, and successfully produces blocks. - The executor is always compiled-in, but is disabled by default. - The `SUBSTRATE_ENABLE_POLKAVM` environment variable must be set to `1` to enable the executor, in which case the node will accept both WASM and PolkaVM program blobs (otherwise it'll default to WASM-only). This is deliberately undocumented and not explicitly exposed anywhere (e.g. in the command line arguments, or in the API) to disincentivize anyone from enabling it in production. If/when we'll move this into production usage I'll remove the environment variable and do it "properly". - I did not use our legacy runtime allocator for the PolkaVM executor, so currently every allocation inside of the runtime will leak guest memory until that particular instance is destroyed. The idea here is that I will work on the polkadot-fellows/RFCs#4 which will remove the need for the legacy allocator under WASM, and that will also allow us to use a proper non-leaking allocator under PolkaVM. - I also did some minor cleanups of the WASM executor and deleted some dead code. No prdocs included since this is not intended to be an end-user feature, but an unofficial experiment, and shouldn't affect any current production user. Once this is production-ready a full Polkadot Fellowship RFC will be necessary anyway.
This PR adds a new PolkaVM-based executor to Substrate. - The executor can now be used to actually run a PolkaVM-based runtime, and successfully produces blocks. - The executor is always compiled-in, but is disabled by default. - The `SUBSTRATE_ENABLE_POLKAVM` environment variable must be set to `1` to enable the executor, in which case the node will accept both WASM and PolkaVM program blobs (otherwise it'll default to WASM-only). This is deliberately undocumented and not explicitly exposed anywhere (e.g. in the command line arguments, or in the API) to disincentivize anyone from enabling it in production. If/when we'll move this into production usage I'll remove the environment variable and do it "properly". - I did not use our legacy runtime allocator for the PolkaVM executor, so currently every allocation inside of the runtime will leak guest memory until that particular instance is destroyed. The idea here is that I will work on the polkadot-fellows/RFCs#4 which will remove the need for the legacy allocator under WASM, and that will also allow us to use a proper non-leaking allocator under PolkaVM. - I also did some minor cleanups of the WASM executor and deleted some dead code. No prdocs included since this is not intended to be an end-user feature, but an unofficial experiment, and shouldn't affect any current production user. Once this is production-ready a full Polkadot Fellowship RFC will be necessary anyway.
I have followed the template in #2.