Remove unnecessary host functions allocator usage #4

tomaka · 2023-07-04T08:59:01Z

I have followed the template in #2.

rphmeier · 2023-07-04T10:08:06Z

text/0004-remove-unnecessary-allocator-usage.md

+
+## Future Possibilities
+
+The existence of the host-side allocator has become questionable over time. It is implemented in a very naive way, and for determinism and backwards compatibility reasons it needs to be implemented exactly identically in every client implementation. Furthermore, runtimes make substantial use of heap memory allocations, and each allocation needs to go twice through the runtime <-> host boundary (once for allocating and once for freeing). Moving the allocator to the runtime side, while it would increase the size of the runtime, would be a good idea. But before the host-side allocator can be deprecated, all the host functions that make use of it need to be updated to not use it.


How do you move the allocator to the runtime side?

Do you just give it a big block of memory and let it split it up as desired?

I don't really see the problem? The runtime could just do whatever it wants with its memory.

If the question is how to make sure that the memory usage is bounded, my personal opinion would be that at initialization the runtime does storage_get(":heappages") and puts the value in its memory allocator, and the allocator crashes if it reaches the limit.

An alternative way is to rely on the fact that the wasm memory needs to be grown by the wasm code with a memory.grow opcode, so the host can detect when memory usage is over a certain threshold.

Not an issue, I'm just not familiar enough with Wasm and was curious about how this would be done in practice

Moving the allocator back into the runtime would make the runtime more similar to a normal program (as normal programs manage their memory themselves, with the exception of "the sbrk mechanism" which is mimicked by memory.grow in wasm).
The fact that the allocator is outside of the runtime is what requires additional code/hacks. So the answer on how to move the allocator back to the runtime is simply "remove the hacks".

rphmeier

Is my understanding of the RFC correct that it proposes to introduce the new functions but not to remove the old ones?

This seems valid, but after being accepted/implemented, the old versions should be deprecated and then removed entirely (after some significant period of time, to allow backwards compatibility)

tomaka · 2023-07-06T05:04:12Z

Is my understanding of the RFC correct that it proposes to introduce the new functions but not to remove the old ones?

Yes!

This seems valid, but after being accepted/implemented, the old versions should be deprecated and then removed entirely (after some significant period of time, to allow backwards compatibility)

They should be deprecated, yes, but removing them would mean that you could no longer sync older blocks.
I'm not necessarily against breaking the possibility to sync older blocks, but that's a big/controversial change not in scope of this RFC.

pepyakin · 2023-07-28T13:02:18Z

This is definitely a step into the right direction. I also agree on that the C API approach makes sense under current condition. I am a bit worried though that this proposal doesn't provide any numbers though.

I am thinking that in the future, we might need to consider the API design overhaul as if it's 2023 actually. But this is outside of the scope of this proposal.

Just for linking purposes, here are a relevant discussion on Substrate: paritytech/substrate#11883

burdges · 2023-07-28T14:49:57Z

Is there an obstacle to host functions actually doing borrowing aka just pass pointers? We need not even copy memory in say:

fn host_function(foo: &'a mut [u8], &'b [u8; 32]) -> &'a mut [u8];

Also..

A hash interface like pub fn keccak_256(data: &[u8]) -> [u8; 32] is inherently inefficient for variable sized data because you must build some Vec<u8> before doing the hashing. Although RustCrypto/digest is imperfect, at least their traits avoid this.

We could've wasm types designed for usage directly in the host, like demanding u64 or simd alignment, and then impl appropriate digest::{Update, Reset, FixedOutput*, ExtendableOutput*, XofReader} directly upon those.

#[derive(Clone)]
#[repr(align(8))]
pub(crate) struct Sha3State {
    state: [u8; 26 * 8],
    //  [u64; PLEN] with PLEN = 25 plus a usize=u64
    // https://github.com/RustCrypto/hashes/blob/master/sha3/src/state.rs#L9
}

impl Update for Sha3State {
    fn update(&mut self, data: &[u8]) {
        // Host call.  On the host side unsafely casts our Sha3State to sha3::Sha3State
    }
}
...

You'd then cargo patch in sp-sha3 over sha3 and then you've a faster hashing interface. It's more work to do this however..

tomaka · 2023-07-30T15:32:33Z

This is definitely a step into the right direction. I also agree on that the C API approach makes sense under current condition. I am a bit worried though that this proposal doesn't provide any numbers though.

I have opened this RFC with the desire of making things move. I would be more than happy if someone else took over this effort and provided numbers. Personally it seems obvious to me that "allocating memory, writing a hash to it, and freeing said memory" is slower than "writing a hash to memory". And if it turns out that in practice it wasn't slower, I would search for the problem elsewhere.
Since nobody else is proposing these changes, it will have to do with me and without numbers.

Is there an obstacle to host functions actually doing borrowing aka just pass pointers? We need not even copy memory in say:

A hash interface like pub fn keccak_256(data: &[u8]) -> [u8; 32] is inherently inefficient for variable sized data because you must build some Vec before doing the hashing. Although RustCrypto/digest is imperfect, at least their traits avoid this.

This is very precisely what this RFC fixes.

burdges · 2023-07-31T00:30:14Z

text/0004-remove-unnecessary-allocator-usage.md

+(func $ext_hashing_twox_128_version_2
+    (param $data i64) (param $out i32))
+(func $ext_hashing_twox_256_version_2
+    (param $data i64) (param $out i32))


Are the twox hashes still being used anywhere? I'd thought we deprecated them since they're footguns.

Yes they are still used for hashing the key of a storage value which is a static key.

They are used in order to know where to store stuff. For example the pallet Foo stores stuff under the key xxhash64("Foo").
While this is out of scope of this RFC, this is also very inefficient (literally every single storage access calculates the xxhashes of the pallet and the item) and should be moved to compile time.

We deprecated twox entirely though, no? twox_64 is xxhash64. twox_128 is simply two copies of xxhash64: https://github.com/paritytech/polkadot-sdk/blob/cfa19c37e60f094a2954ffa73b1da261c050f61f/substrate/primitives/core/hashing/src/lib.rs#L77

It's therefore easy to find collisions for both twox flavors, which yields two storage locations which your code believes to be different, but actually wind up being the same. It technically facilitates supply chain attacks against parachains, even if used only at compile time.

Cyan4973/xxHash#165 (comment)
Cyan4973/xxHash#54 (comment)

Don't quote me on that, but it is likely that nowadays the only input to the twox hash functions are strings hardcoded in the source code (pallet names and storage key names).

Don't quote me on that, but it is likely that nowadays the only input to the twox hash functions are strings hardcoded in the source code (pallet names and storage key names).

Yes this. I mean it is up to the user, but for maps the default is blake2.

should be moved to compile time.

This is also already done.

We'll avoid any pallet dependencies with bizarre hard coded storage names then I guess. lol

This is also already done.

We do not need host calls for twox hash then I guess.

burdges · 2023-07-31T01:03:52Z

This is very precisely what this RFC fixes.

Just for the case where a fixed length buffer is being hashed. sp_io::Hashing inherently demands allocations when hashing a variable length buffer. I just pointed out the variable length buffer interface looks like:

#[runtime_interface]
pub trait Hashing {
        //  Existing fixed length buffer stuff

	fn keccak_256_update(state: &mut Sha3State, data: &[u8])
	fn keccak_256_finalize(state: &Sha3State) -> [u8; 32]

	fn keccak_512_update(state: &mut Sha3State, data: &[u8])
	fn keccak_512_finalize(state: &Sha3State) -> [u8; 64]

	fn sha2_256_update(state: &mut Sha2State, data: &[u8])
	fn sha2_256_finalize(state: &Sha2State) -> [u8; 32]

	fn blake2_256_update(state: &mut Blake2State, data: &[u8])
	fn blake2_256_finalize(state: &Blake2State) -> [u8; 32]
}

Anyways..

If we typically hash fixed length buffers then maybe this variable length buffer allocation is best to just keep, as the above interfaces demands quite a few more calls.

But before the host-side allocator can be deprecated, all the host functions that make use of it need to be updated to not use it.

I see, we need roughly this for backwards compatibility anyways.

rphmeier · 2023-09-01T21:34:17Z

text/0004-remove-unnecessary-allocator-usage.md

+    (param $key_type_id i32) (param $seed i64) (param $out i32) (return i32))
+```
+
+The behaviour of these functions is identical to their version 1 or version 2 equivalent. Instead of allocating a buffer, writing the output to it, and returning a pointer to it, the new version of these functions accepts an `out` parameter containing the memory location where the host writes the output. The output is always of a size known at compilation time.


The expected behavior (trap?) for an out-of-bounds memory access ought to be specified.

You're actually (I guess semi-unintentionally) raising a good point.

If we take for example ext_crypto_ed25519_sign_version_2, on success the function returns 0 and writes to out, and on failure the function returns 1 and doesn't write anything. Should a trap happen if out is out of range even in the situation where 1 would be returned? The answer is important for determinism.

I've added some text mentioning that yes, the range of out must always be checked. This is a bit arbitrary, but I think that passing a completely invalid input has "higher priority" than a failure in what is being demanded from the function.

…nctions-allocator

yjhmelody · 2023-09-08T09:16:33Z

Hi, I want to know any plan to move the allocator inside of wasm runtime.
It definitly will make substrate runtime be more generic.

tomaka · 2023-09-17T10:42:07Z

Same remark as #8 (comment)

I would like to propose this RFC, but I don't know how to proceed.

bkchr · 2023-09-21T14:25:30Z

To move forward with this pull request, I would like to see some basic benchmarks being done. @yjhmelody could you please help with this?

bkchr · 2023-09-21T14:28:37Z

I would also like to see that this directly moves the entire allocator to the runtime. Not sure why we only should do this half way.

tomaka · 2023-09-21T14:31:46Z

I can expand the RFC to cover all the host functions.

When I opened this RFC (two and a half months ago), I actually expected a swift and unanimous approval. But if RFCs inherently need a long time, I might as well make it as big as possible.

koute · 2023-09-21T14:44:56Z

+1 for going all the way and moving the whole allocator inside of the runtime, so that for new blocks the allocator host functions are never called anymore.

bkchr · 2023-09-21T15:51:53Z

When I opened this RFC (two and a half months ago), I actually expected a swift and unanimous approval.

Yeah sorry on that, but it should actually also not be "swift". Let's do it properly.

tomaka · 2023-09-22T08:11:11Z

I'm currently updating the RFC.

When it comes to ext_offchain_network_state_version_1, I'm going in the direction of deprecating the function due to its unclear semantics. It's not clear from the runtime's perspective whether the PeerId or addresses returned by this function can change over time or not (hint: they can change and this function is very racy).

The only place where this function is used is in the node-authorization pallet, in order to grab the PeerId of the local node and mark it as reserved (I don't really understand why you'd mark yourself as reserved, but whatever). While this is slightly off-topic, I also think that the mechanism of reserved peers should be deprecated due to unclear semantics.

For the sake of not breaking things, I'm going to add a ext_offchain_network_peer_id_version_1 function, but I do think that this function should not exist.

tomaka · 2023-09-22T08:51:50Z

I've updated the RFC.

As I've mentioned in the "unresolved questions" section, I expect the offchain worker functions to be slightly slower than right now, but this is IMO negligible since they are offchain.

To me, the biggest potential sources of performance degradation comes from the deprecation of ext_storage_get_version_1, as the runtime might have to call ext_storage_read_version_2 multiple times instead of calling ext_storage_get_version_1 once.
In theory it should be possible to progressively read storage items (in a streaming way) whose size isn't known ahead of time, rather than completely load it in one go, but I understand that this would be a pretty large change to the runtime code and I'm afraid that half-assing the change might lead to reading becoming slow.

The changes to ext_storage_next_key might also cause performance degradation if not implemented properly. This is also a moderately sized change to the runtime code (that will degrade performances if half-assed), but I have the impression that it shouldn't be too hard to do it properly.

tomaka · 2023-10-23T08:51:32Z

To move forward with this pull request, I would like to see some basic benchmarks being done. @yjhmelody could you please help with this?

Is something in progress?

While I agree that RFCs shouldn't necessarily be swift, if nothing is done then the ETA of this RFC is "infinity" and it would be great if it was finite. The more we wait, the more other changes (such as #31) will conflict with this RFC.

(and the reason why I originally submitted a stripped-out version is that I anticipated this happening)

burdges · 2023-10-23T09:39:50Z

text/0004-remove-unnecessary-allocator-usage.md

+    (param $key_type_id i32) (param $seed i64) (param $out i32) (return i32))
+```
+
+The behaviour of these functions is identical to their version 1 or version 2 counterparts. Instead of allocating a buffer, writing the output to it, and returning a pointer to it, the new version of these functions accepts an `out` parameter containing the memory location where the host writes the output. The output is always of a size known at compilation time. The runtime execution stops with an error if `out` is outside of the range of the memory of the virtual machine.


How does one actually implement a host function that borrows like this?

This probably requires some changes to the Substrate procedural macro stuff (which I'm not familiar with).
This is why I mention in the "drawbacks" section that it might be difficult to implement, but I have no idea how much.

This probably requires some changes to the Substrate procedural macro stuff (which I'm not familiar with). This is why I mention in the "drawbacks" section that it might be difficult to implement, but I have no idea how much.

I don't remember from the top of my head whether the macro machinery supports this right now, but yeah, we can do this; not a problem.

The macros support borrowing since the beginning you can pass &mut [u8] quite easily.

burdges · 2023-10-23T12:34:15Z

The more we wait, the more other changes .. will conflict with this RFC.

Agreed, our unstable host calls for elliptic curves pass Vec<u8>s too for example. I managed to get them down to only one or two allocations each, and in cases those are unavoidable, but.. They could be fixed with a PR now, given they're not stable or used anywhere. I just do not know how to pass a &[u8] to a host call can without doing one allocation. This matters for halo2 usage.

yjhmelody · 2023-10-24T02:59:14Z

To move forward with this pull request, I would like to see some basic benchmarks being done. @yjhmelody could you please help with this?

Is something in progress?

While I agree that RFCs shouldn't necessarily be swift, if nothing is done then the ETA of this RFC is "infinity" and it would be great if it was finite. The more we wait, the more other changes (such as #31) will conflict with this RFC.

(and the reason why I originally submitted a stripped-out version is that I anticipated this happening)

Sorry,I haven't seen this message before.
I personally think there is no need to do benchmarking, it will take a lot of effort. And there is no reason not to support the runtime's own Allocator, which can greatly promote the substrate runtime ecosystem, and obviously it will not degrade performance very much unless you use a very poor memory allocator.

You can see that when various Ethereum ecosystems use wasm or other virtual machines, they all use the runtime Allocator instead of the host. This is very convenient in verification scenarios, because verification is just a pure function, which helps support more For ecological scenarios, it is a pity that substrate has not supported it.

Sorry again, I don't have the energy to promote related improvements recently.

bkchr

Sorry for the long delay. I left some nitpicks, generally looking good and we could continue with this RFC from my side.

bkchr · 2023-11-24T16:33:20Z

text/0004-remove-unnecessary-allocator-usage.md

+(func $ext_hashing_twox_128_version_2
+    (param $data i64) (param $out i32))
+(func $ext_hashing_twox_256_version_2
+    (param $data i64) (param $out i32))


Yes they are still used for hashing the key of a storage value which is a static key.

bkchr · 2023-11-24T16:36:10Z

text/0004-remove-unnecessary-allocator-usage.md

+    (param $key_type_id i32) (param $seed i64) (param $out i32) (return i32))
+```
+
+The behaviour of these functions is identical to their version 1 or version 2 counterparts. Instead of allocating a buffer, writing the output to it, and returning a pointer to it, the new version of these functions accepts an `out` parameter containing the memory location where the host writes the output. The output is always of a size known at compilation time. The runtime execution stops with an error if `out` is outside of the range of the memory of the virtual machine.


The macros support borrowing since the beginning you can pass &mut [u8] quite easily.

text/0004-remove-unnecessary-allocator-usage.md

bkchr · 2023-11-27T11:43:03Z

text/0004-remove-unnecessary-allocator-usage.md

+(func $ext_input_size_version_1
+    (return i64))
+(func $ext_input_read_version_1
+    (param $offset i64) (param $out i64))


Why not just have the read function and follow the way the other read functions are implemented?

I think that doing it this way (first getting the size then getting the content) is way more clear and simple, and more importantly avoids the risk of allocating a buffer too small and having to re-allocate it again later and copy or throw away all the data that was already read.

When reading the storage, knowing the size of the storage value would require a database access, so if reading a storage value required two function calls we would need to access the database twice, which seems not great.

When reading the input, I assume that ext_input_size_version_1 just returns something like self.input.len(), so there's no problem. Also, you typically read the input only once.

I mean you could pass out with a zero length to only get the length of the value.

bkchr · 2023-11-27T11:47:44Z

text/0004-remove-unnecessary-allocator-usage.md

+
+- `ext_input_size_version_1`/`ext_input_read_version_1` is inherently slower than obtaining a buffer with the entire data due to the two extra function calls and the extra copying. However, given that this only happens once per runtime call, the cost is expected to be negligible.
+
+- The `ext_crypto_*_public_keys`, `ext_offchain_network_state`, and `ext_offchain_http_*` host functions are likely slightly slower than their deprecated counterparts, but given that they are used only in offchain workers this is acceptable.


ext_offchain_network_state doesn't exist anymore?

I mention it here: #4 (comment)

I think that this function should be deprecated entirely.

Fine by me. Maybe should be mentioned a little bit more obvious in the RFC.

I mention it here: https://github.com/polkadot-fellows/RFCs/pull/4/files#diff-8d18cb8f9f0f3f352164085a0e12f8e019a60727f544b3a0944e0c50f86a6255R293-R299

bkchr · 2023-11-27T11:51:36Z

text/0004-remove-unnecessary-allocator-usage.md

+The runtime execution stops with an error if `value_out` is outside of the range of the memory of the virtual machine, even if the size of the buffer is 0 or if the amount of data to write would be 0 bytes.
+
+```wat
+(func $ext_offchain_network_peer_id_version_1


There is no way to read the external addresses? (Not really sure we still need them)

I also mention it here: #4 (comment) and I think it should be deprecated.

It's completely unclear to me how to define "an external address".

tomaka · 2023-11-27T12:18:46Z

I've run a quick small experiment and have tried to extract the allocator out of smoldot, in other words do to smoldot the opposite of what this RFC wants to do to the runtime, and then measure the size of smoldot compiled to Wasm.
Smoldot uses dlmalloc, which is the allocator used by default by the Rust standard library.

With the allocator outside of smoldot: 2,789.9kB
With the allocator inside of smoldot: 2,794.3kB

So basically the size of the code of the allocator is <5 kiB. And that's uncompressed. According to this page, compression seems to reduce the size by ~75%, so we're closer to 1kiB to 1.5kiB of overhead if we moved the allocator to the runtime.

…in_submit_transaction_version_2

radkomih · 2023-12-11T08:47:43Z

Sorry if this isn't the right place for this, but I would like to share our insights on the topic:

I'm part of a team developing runtimes in Go, and we've had to modify the compiler to ensure compatibility with the external allocator. We've implemented a custom garbage collector that integrates seamlessly with this external allocator, but this is one of the reasons we're not using the standard Go compiler (another is related to the Wasm target).
For us, integrating the host allocator into the runtime is not solely a matter of speed, it's also about facilitating easier adoption and generalization for writing Polkadot runtimes in various languages.

+1 this change to happen.

yjhmelody · 2023-12-11T14:39:13Z

text/0004-remove-unnecessary-allocator-usage.md

+- The following function signature is now also accepted for runtime entry points: `(func (result i64))`.
+- Runtimes no longer need to expose a constant named `__heap_base`.
+
+All the host functions that are being superceded by new host functions are now considered deprecated and should no longer be used.


What exactly does deprecated mean here? Do you mean increasing the version number? If not, wouldn't the existing chain still need these host functions if it starts synchronizing from the old block? So it seems like we always need to ensure these functions are available?

It means what is written: they should no longer be used. Runtimes are what use host functions. Therefore, runtimes should no longer use these host functions.

Do you mean increasing the version number?

This text simply says that if the RFC adds foo_v2, then foo_v1 should no longer be used. You can see it as increasing the version number, yes, but given that there's no semver involved it's better to think as foo_v1 and foo_v2 as two completely unrelated functions.

yjhmelody · 2023-12-11T14:49:00Z

text/0004-remove-unnecessary-allocator-usage.md

+
+- It is unclear how replacing `ext_storage_get` with `ext_storage_read` and `ext_default_child_storage_get` with `ext_default_child_storage_read` will impact performances.
+
+- It is unclear how the changes to `ext_storage_next_key` and `ext_default_child_storage_next_key` will impact performances.


For writing variable-length host functions, is it really necessary to revise it?
In fact, the host always needs an Allocator to support the copy of the entrypoint parameter.
So if the new version cannot improve performance, it will only increase the complexity of use.

Can you define "variable-length host function" and "entrypoint parameter"?
If you're talking about the input to the runtime call, it's also tackled in this RFC.

It's just what you are talking about: input to the runtime call.
The current specification of the input to the runtime call need host to call alloc func(defined in host side or runtime side).

koute · 2024-02-23T05:21:22Z

text/0004-remove-unnecessary-allocator-usage.md

+
+```wat
+(func $ext_storage_read_version_2
+    (param $key i64) (param $value_out i64) (param $offset i32) (result i64))


result could be an i32 here; even with an i32 a return value of -1 is not ambiguous as you'll physically never be able to actually allocate a buffer of such size.

So, let me see if I got this right, is the idea here that:

the guest prepares a buffer of n bytes,

it calls this host function and passes a pointer to this buffer and its size,

the host reads the storage and writes m bytes into the buffer, where m <= n and returns m,

if m == n (so either the buffer was exactly the right size or we still have some data left) then the guest enlarges the buffer and calls the host function again with a bigger offset and repeats this until the whole value is read (and if the guest guesses wrong then this could be called many times),

correct?

If so maybe instead we could do this: return an i64 which would be a packed (i32, i32) (or a i64 with its upper bit reserved) where the first value is going to be whether the read succeeded, and the second value is going to be the actual size of the storage value, and have the algorithm go like this:

the guest prepares a buffer of n bytes,

it calls this host function and passes a pointer to this buffer and its size,

let m be the size of the storage value; if n <= m then write it to the buffer and return (1, m)

if n > m then return (0, m)

on the guest size if a 0 was returned then enlarge the buffer to the exact size needed and call the function again (which is guaranteed to succeed now), otherwise we're done

And also do not require that "if the pointer to the buffer is invalid then runtime execution stops even if length is 0".

This way we gain the following benefits:

if the buffer is exactly the right size we will only call this function once (as opposed to twice with the current proposal),

if the buffer is not big enough then we will only call this function twice (as opposed to twice or more with the current proposal),

there's only going to be one host-guest memcpy (because if the buffer is not big enough then nothing will be copied) which is going to be faster/more efficient,

the guest can choose whether it wants to try its luck guessing the initial size of the buffer (risk having to reallocate the buffer in the worst case but read the value in only one call in the best case), or it can call the host function with a size of "0" to ask for the right size and prepare a buffer which is of exactly the right size (always read the value in two calls, but won't ever need to reallocate the buffer)

result could be an i32 here; even with an i32 a return value of -1 is not ambiguous as you'll physically never be able to actually allocate a buffer of such size.

I went with i64 because I'm not really a fan of "-1i32 is not ambiguous". If we ever switch to a 64bits runtime, then it's no longer impossible for 4 GiB to be allocated. Sure, a 4 GiB storage item would be unusually big, but it's also not in the real of completely absurd.

correct?

Yes, but I think that your explanation builds upon a wrong assumption.

if m == n (so either the buffer was exactly the right size or we still have some data left) then the guest enlarges the buffer and calls the host function again with a bigger offset and repeats this until the whole value is read (and if the guest guesses wrong then this could be called many times),

The idea when I designed this function is that the runtime is supposed to know what the size of the storage item is. When you're for example reading a u32 from storage, you know it's always 4 bytes. When you're reading a Vec<something>, you can first read the length prefix in front of the Vec (which realistically shouldn't be more than 5 bytes) and then allocate a Vec of exactly the correct size.

Allocating a buffer whose size is equal to the size of the runtime storage item builds upon the assumption that you can't decode SCALE-encoded items in a streaming way. This assumption is true in practice now, but in the absolute it isn't.

To give an example, if you have a storage item whose format is Vec<Vec<u8>>, right now you have to allocate a big buffer that contains the entire storage item, then when decoding you copy the data within each inner Vec. It seems more optimal to me (but I could be wrong) to call storage_read once to get the size of the outer Vec, then twice for each inner Vec, and read the data of each inner Vec<u8>s directly its their destination buffer.

koute · 2024-02-23T07:15:49Z

This RFC is kind of dead, so unless anyone else steps up I volunteer to push it forward.

I will implement this in Substrate and report back; let's get this over the finish line.

burdges · 2024-02-23T07:38:27Z

As discussed above, it appears twox is sufficently depricated to ignore here.

A polkavm-ish question: Is there some minimum overhead for a host function call? If small enough then like I described above ..

One could open up the hash functions guts and provide host functions for digest::Update::update and digest::FixedOutput::finalize_into. We'd avoid doing so many allocations even inside the runtime this way.

koute · 2024-02-23T08:05:14Z

A polkavm-ish question: Is there some minimum overhead for a host function call? If small enough then like I described above ..

One could open up the hash functions guts and provide host functions for digest::Update::update and digest::FixedOutput::finalize_into. We'd avoid doing so many allocations even inside the runtime this way.

For WASM VMs depending on the implementation hostcalls can be expensive (e.g. if the VM fiddles with the signal handlers when going through the boundary) and for PolkaVM specifically it does involve crossing a thread/process boundary (essentially similar to sending a message through a channel to another thread, but possibly a little cheaper than your average Rust async channel implementation due to optimistic spinlocking I can do), so it's really hard to say and depends on the ratio of work saved vs the number of host calls.

Another alternative would be a vectored call, that is something like fn hashv(slices: &[*const u8], out_hash: &mut [u8]) which is sort of a hybrid where it only takes a single call but you can hash multiple disjoint chunks of memory as if they were next to each other.

Nevertheless, I'm not sure if we need this?

burdges · 2024-02-23T15:57:12Z

These calls are purely for performance, and have no reason to transfer control to the host. We could've machine code which gets linked in directly maybe? I guess that'd bring adifferent auditing problem..

Nevertheless, I'm not sure if we need this?

Nah. It helps avoids some performance footguns, but whatever.

IRFT standards where those footguns look likely, like hash-to-curve and HKDF, pay some small cost to make this not a problem, not that we support any of their hash functions anyways.

yjhmelody · 2024-02-28T11:22:33Z

This RFC is kind of dead, so unless anyone else steps up I volunteer to push it forward.

I will implement this in Substrate and report back; let's get this over the finish line.

I'm not free recently, but if you implement it, I can help review it :)

This PR adds a new PolkaVM-based executor to Substrate. - The executor can now be used to actually run a PolkaVM-based runtime, and successfully produces blocks. - The executor is always compiled-in, but is disabled by default. - The `SUBSTRATE_ENABLE_POLKAVM` environment variable must be set to `1` to enable the executor, in which case the node will accept both WASM and PolkaVM program blobs (otherwise it'll default to WASM-only). This is deliberately undocumented and not explicitly exposed anywhere (e.g. in the command line arguments, or in the API) to disincentivize anyone from enabling it in production. If/when we'll move this into production usage I'll remove the environment variable and do it "properly". - I did not use our legacy runtime allocator for the PolkaVM executor, so currently every allocation inside of the runtime will leak guest memory until that particular instance is destroyed. The idea here is that I will work on the polkadot-fellows/RFCs#4 which will remove the need for the legacy allocator under WASM, and that will also allow us to use a proper non-leaking allocator under PolkaVM. - I also did some minor cleanups of the WASM executor and deleted some dead code. No prdocs included since this is not intended to be an end-user feature, but an unofficial experiment, and shouldn't affect any current production user. Once this is production-ready a full Polkadot Fellowship RFC will be necessary anyway.

tomaka added 7 commits July 4, 2023 10:58

Remove unnecessary host functions allocator usage

5382940

Tweaks

86c1ffb

More text

f5c6459

More text, I'm done now

e212169

Fix non-zero value for ECDSA recovery

bd45a3f

Forgot to set the number

4bc65f5

Typo

e34bf2c

rphmeier reviewed Jul 4, 2023

View reviewed changes

tomaka added 2 commits July 4, 2023 20:34

More typos fixed

9cd11a6

And another mistake fixed

53e70f9

rphmeier reviewed Jul 5, 2023

View reviewed changes

rphmeier mentioned this pull request Jul 6, 2023

Coretime interface #5

Merged

burdges reviewed Jul 31, 2023

View reviewed changes

rphmeier reviewed Sep 1, 2023

View reviewed changes

tomaka added 2 commits September 5, 2023 12:37

Merge remote-tracking branch 'upstream/main' into rm-unneeded-host-fu…

a52a98b

…nctions-allocator

Add behaviour when out is out of range

45cfba8

bkchr mentioned this pull request Sep 21, 2023

feat: add new allocator design for runtime paritytech/polkadot-sdk#1658

Closed

Some tweaks

f06bd6f

noah-foltz mentioned this pull request Oct 10, 2023

Conviction Voting Delegation Modifications #35

Open

tomaka mentioned this pull request Oct 23, 2023

CoreJam #31

Closed

burdges reviewed Oct 23, 2023

View reviewed changes

bkchr approved these changes Nov 27, 2023

View reviewed changes

tomaka added 4 commits November 27, 2023 14:28

Clarify ext_crypto_ed25519_public_key_version_2

857225e

Clarify ext_offchain_http_request_add_header_version_2 and ext_offcha…

d941d71

…in_submit_transaction_version_2

Clarify ext_offchain_http_request_start_version_2

3c120e4

Actually lets use 1 and not -1

89a4fcc

yjhmelody reviewed Dec 11, 2023

View reviewed changes

bkchr mentioned this pull request Dec 27, 2023

Support allocator inside of runtime #61

Closed

koute reviewed Feb 23, 2024

View reviewed changes

koute mentioned this pull request Feb 23, 2024

Add a PolkaVM-based executor paritytech/polkadot-sdk#3458

Merged

bkchr mentioned this pull request Sep 3, 2024

chain-spec building failure due to WASM allocation size paritytech/polkadot-sdk#5419

Open

anaelleltd added the Reviewed Is ready for on-chain voting. label Sep 10, 2024


		## Future Possibilities

		The existence of the host-side allocator has become questionable over time. It is implemented in a very naive way, and for determinism and backwards compatibility reasons it needs to be implemented exactly identically in every client implementation. Furthermore, runtimes make substantial use of heap memory allocations, and each allocation needs to go twice through the runtime <-> host boundary (once for allocating and once for freeing). Moving the allocator to the runtime side, while it would increase the size of the runtime, would be a good idea. But before the host-side allocator can be deprecated, all the host functions that make use of it need to be updated to not use it.


		- `ext_input_size_version_1`/`ext_input_read_version_1` is inherently slower than obtaining a buffer with the entire data due to the two extra function calls and the extra copying. However, given that this only happens once per runtime call, the cost is expected to be negligible.

		- The `ext_crypto__public_keys`, `ext_offchain_network_state`, and `ext_offchain_http_` host functions are likely slightly slower than their deprecated counterparts, but given that they are used only in offchain workers this is acceptable.


		- It is unclear how replacing `ext_storage_get` with `ext_storage_read` and `ext_default_child_storage_get` with `ext_default_child_storage_read` will impact performances.

		- It is unclear how the changes to `ext_storage_next_key` and `ext_default_child_storage_next_key` will impact performances.

Remove unnecessary host functions allocator usage #4

Are you sure you want to change the base?

Remove unnecessary host functions allocator usage #4

Conversation

tomaka commented Jul 4, 2023

Choose a reason for hiding this comment

tomaka Jul 4, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tomaka Jul 4, 2023 • edited Loading

Choose a reason for hiding this comment

rphmeier left a comment

Choose a reason for hiding this comment

tomaka commented Jul 6, 2023 • edited Loading

pepyakin commented Jul 28, 2023

burdges commented Jul 28, 2023 • edited Loading

tomaka commented Jul 30, 2023 • edited Loading

burdges Jul 31, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tomaka Nov 27, 2023 • edited Loading

Choose a reason for hiding this comment

burdges Nov 27, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

burdges commented Jul 31, 2023 • edited Loading

Choose a reason for hiding this comment

tomaka Sep 5, 2023 • edited Loading

Choose a reason for hiding this comment

yjhmelody commented Sep 8, 2023

tomaka commented Sep 17, 2023

bkchr commented Sep 21, 2023

bkchr commented Sep 21, 2023

tomaka commented Sep 21, 2023 • edited Loading

koute commented Sep 21, 2023

bkchr commented Sep 21, 2023

tomaka commented Sep 22, 2023 • edited Loading

tomaka commented Sep 22, 2023 • edited Loading

tomaka commented Oct 23, 2023 • edited Loading

Choose a reason for hiding this comment

tomaka Oct 23, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

burdges commented Oct 23, 2023 • edited Loading

yjhmelody commented Oct 24, 2023 • edited Loading

bkchr left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tomaka Nov 27, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tomaka commented Nov 27, 2023 • edited Loading

radkomih commented Dec 11, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yjhmelody Dec 12, 2023 • edited Loading

Choose a reason for hiding this comment

koute Feb 23, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

koute commented Feb 23, 2024

burdges commented Feb 23, 2024

koute commented Feb 23, 2024

burdges commented Feb 23, 2024 • edited Loading

yjhmelody commented Feb 28, 2024

tomaka Jul 4, 2023 •

edited

Loading

tomaka Jul 4, 2023 •

edited

Loading

tomaka commented Jul 6, 2023 •

edited

Loading

burdges commented Jul 28, 2023 •

edited

Loading

tomaka commented Jul 30, 2023 •

edited

Loading

burdges Jul 31, 2023 •

edited

Loading

tomaka Nov 27, 2023 •

edited

Loading

burdges Nov 27, 2023 •

edited

Loading

burdges commented Jul 31, 2023 •

edited

Loading

tomaka Sep 5, 2023 •

edited

Loading

tomaka commented Sep 21, 2023 •

edited

Loading

tomaka commented Sep 22, 2023 •

edited

Loading

tomaka commented Sep 22, 2023 •

edited

Loading

tomaka commented Oct 23, 2023 •

edited

Loading

tomaka Oct 23, 2023 •

edited

Loading

burdges commented Oct 23, 2023 •

edited

Loading

yjhmelody commented Oct 24, 2023 •

edited

Loading

tomaka Nov 27, 2023 •

edited

Loading

tomaka commented Nov 27, 2023 •

edited

Loading

yjhmelody Dec 12, 2023 •

edited

Loading

koute Feb 23, 2024 •

edited

Loading

burdges commented Feb 23, 2024 •

edited

Loading