-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update to a more recent version of wasmi #18
Conversation
I have absolutely no idea why but I'm getting out of bound memory accesses errors coming from wasmi. I have made sure that the behavior of all the memory-growth-related functions are identical to before this PR and also identical to wasmtime. |
@Robbepop I hope you don't mind pinging you in case you have an idea, as I'm a bit clueless. After this PR, wasmi emits a trap due to out of bound memory accesses ( This didn't happen before this PR, and also doesn't happen with wasmtime. The PR was relatively straight forward. The code in itself is also relatively straight forward. There could be a bug in the upper layers, but then I don't see why the problem wouldn't happen with wasmtime. It could maybe be caused because the |
Hmmm for questions like these it would have been amazing to have better debugging support built-into edit: Are you using |
The Substrate/Polkadot client has its own memory allocator. It's not the Wasm code that manages its memory. Instead, it calls a host function to allocate memory and another host function to free it. While the behavior of this function is very complicated, my suspicion would be that the return value of that function isn't communicated properly to the Wasm code, maybe due to a bug in the new resumable functions feature. The Wasm code would then use an incorrect value as if it was the pointer. Of course I'm basing that suspicion on the fact that the code in smoldot has always been relatively robust, and based on what has changed between before and after this PR.
I don't disable them in the config 🤷 Maybe |
Yeah it could be connected to a bug in the relatively new resumable function feature. I will probably need some more tests for it. Unfortunately there is no Wasm spec testsuite for resumable functions. :( |
twiggy diff reportDifference in .wasm size before and after this pull request.
|
Note for myself: The easiest way to reproduce would be to call The next step to debug would be to log the list of all functions being called, alongside with their parameters and return value, before and after this PR, and compare. If that doesn't give any hint, also log the calls to |
My hypothesis was wrong, as the error happens after a single host function call to However, one interesting thing is that the error doesn't happen with Polkadot runtime v0.9.16. It happens with Westend v0.9.30. |
Disabling all Wasm features (such as |
I think I've managed to reproduce the problem. Here's a "minimal" failing test: #[test]
fn repro() {
let engine = wasmi::Engine::default();
let module = wasmi::Module::new(
&engine,
&include_bytes!("wasm_data.wasm")[..]
)
.unwrap();
let mut store = wasmi::Store::new(&engine, ());
#[derive(Debug, Clone)]
struct InterruptedTrap {
name: String,
}
impl core::fmt::Display for InterruptedTrap {
fn fmt(&self, f: &mut core::fmt::Formatter) -> core::fmt::Result {
write!(f, "Interrupted")
}
}
impl wasmi::core::HostError for InterruptedTrap {}
let mut linker = wasmi::Linker::<()>::new();
for import in module.imports() {
match import.ty() {
wasmi::ExternType::Func(func_type) => {
let name = import.name().to_owned();
let func = wasmi::Func::new(
&mut store,
func_type.clone(),
move |_caller, _params, _ret| {
Err(wasmi::core::Trap::from(InterruptedTrap {
name: name.clone(),
}))
},
);
linker.define(import.module(), import.name(), func).unwrap();
}
wasmi::ExternType::Memory(memory_type) => {
let memory = wasmi::Memory::new(&mut store, *memory_type).unwrap();
linker
.define(import.module(), import.name(), memory)
.unwrap();
}
_ => unreachable!(),
}
}
let instance = linker.instantiate(&mut store, &module).unwrap();
let instance = instance.ensure_no_start(&mut store).unwrap();
let mut out = [wasmi::Value::I64(0)];
let mut call = instance
.get_func(&store, "Core_version")
.unwrap()
.call_resumable(
&mut store,
&[wasmi::Value::I32(0), wasmi::Value::I32(0)],
&mut out,
)
.unwrap();
loop {
match call {
wasmi::ResumableCall::Resumable(r) => {
let err = r.host_error().downcast_ref::<InterruptedTrap>().unwrap();
if err.name == "ext_logging_max_level_version_1" {
call = r
.resume(&mut store, &[wasmi::Value::I32(0)], &mut out)
.unwrap()
} else {
println!("{:?}", err.name);
println!("success!");
break;
}
}
wasmi::ResumableCall::Finished => unreachable!(),
}
}
} I have attached the wasm data here: wasm_data.zip (zipped in order to satisfy GitHub) The way the Wasm code behaves is that it calls If I replace the body of the closure with this, then the test succeeds (the test succeeds when move |_caller, _params, _ret| {
if name == "ext_logging_max_level_version_1" {
_ret[0] = wasmi::Value::I32(0);
Ok(())
} else {
Err(wasmi::core::Trap::from(InterruptedTrap {
name: name.clone(),
}))
}
} In other words, if the function directly returns when it is called, then the test works. If, however, the function emits a trap, then is resumed, then the test fails. Please note, as I've mentioned in a comment above, that this happens only with a specific Wasm module (the Westend v0.9.30 runtime). A different but very similar Wasm module (the Polkadot v0.9.16 runtime) works just fine. |
@tomaka Thanks a lot for digging into the problem and producing a "minimal" test case! I am going to try to further minimize it for
Also thanks for letting me know about this. Feels good to know that at least sometimes it is actually working. 😅 |
Fixes bug encountered in this GitHub PR: smol-dot/smoldot#18
@tomaka I fixed the bug in this PR: wasmi-labs/wasmi#671 |
* fix bug in resumable calls Fixes bug encountered in this GitHub PR: smol-dot/smoldot#18 * add tests for the resumable call bug fix
Seems to work, thank you! |
Cargo.toml
Outdated
@@ -82,7 +82,7 @@ smallvec = "1.10.0" | |||
snow = { version = "0.9.1", default-features = false, features = ["default-resolver"] } | |||
tiny-keccak = { version = "2.0", features = ["keccak"] } | |||
twox-hash = { version = "1.6.3", default-features = false } | |||
wasmi = { version = "0.9.1", default-features = false, features = ["core"] } # TODO: having to add `core` is sketchy; maybe report this | |||
wasmi = { git = "https://github.com/paritytech/wasmi", default-features = false } # TODO: { version = "0.25.0", default-features = false } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PR is ready to go after this is switched back to a published version
@tomaka I just release |
@tomaka btw. if you want to run [profile.release]
lto = "fat"
codegen-units = 1 makes a huge difference for |
It's already the case: Lines 128 to 136 in 3b8f08b
|
You are correct. The runtime (and PvF) are restricted to the wasm MVP for now. AFAIK the only exception is mutable globals. But don't nail me down on this. But most def nothing as big as bulk memory.
I brought this up because substrate itself will most likely never upgrade to a newer wasmi. We doubled down on wasmtime and will probably simplify the code soon my removing swappable executor support. So smoldot right now would be the only piece of software which we can use to do a @tomaka Sorry for posting this into this closed issue. This is a separate discussion and I can transfer it to a new issue if you want to engage with it at all. |
No. I don't benchmark this because the light client has no performance issue at all w.r.t. running the runtime. Even if wasmi was ten times slower it would still be fine.
Haven't tested either, as the light client doesn't sync from the genesis, and the full node uses wasmtime by default. Syncing from genesis with smoldot takes a long time and might stall due to unfixed networking issues, so I'm not very motivated to try it out. However, I've just tried syncing a bit around block 5 million (where my database is) and everything seems to work. |
Also, smoldot can only sync up to around block 6 million or so if I remember correctly. At some point, the runtime starts using features that smoldot doesn't implement yet (code substitutes, state_version v1, and child tries come to mind). |
Sorry for my ignorance: Those network issues arise when you use smoldot as a full node with wasmi? Sorry if this didn't make sense. I didn't look into smoldot for some time.
Child tries are pretty old. But were only used for contracts which are not on the relay chain. But I think they were used for crowdloans and this is probably when they become a problem for smoldot. But I get it. It adds a lot of API surface when this could have been implemented with prefix trees as well. |
Completely orthogonal to wasmi. |
Work in progress. Blocked by wasmi-labs/wasmi#658 and wasmi-labs/wasmi#659.
cc #17