Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

transmute fatal errors into SYS_ASSERTION_FAILED exit code. #548

Merged
merged 6 commits into from
May 11, 2022

Conversation

raulk
Copy link
Member

@raulk raulk commented May 11, 2022

Closes #508.

This PR transmutes fatal errors into the SYS_ASSERTION_FAILED exit code at the top layer (Executor) to produce a consensus outcome on fatal errors.

It also modifies the Backtrace components to generalise them so they can tackle syscall errors as well as fatal errors. The cause is informed at different times for each, so the APIs have been adapted accordingly:

  • For syscall errors, we inform the cause as soon as the syscall fails. Thus we reset the backtrace and accumulate the propagation chain (Backtrace#begin()).
  • For fatal errors, we inform the cause at the top. Thus we accumulate the propagation chain, then we inform the cause (Backtrace#set_cause()).

I think capturing backtraces (both FVM backtraces and Rust backtraces) is crucial for debuggability. Because I had no way of generating fatal errors to verify the behaviour, I added an integration test which performs no assertions, but prints the backtrace on stdout.

Backtraces from syscall-motivated aborts look like this:

message failed with backtrace:
00: f010000 (method 1) -- actor aborted with code 1 (9)
--> caused by: ipld::open -- failed to parse cid: CIDv0 requires a DagPB codec (1: illegal argument)

Whereas fatal error backtraces display the error chain from anyhow, as well as the backtrace if captured:

fatal backtrace: message failed with backtrace:
00: f010001 (method 1) -- fatal error (10)
--> caused by: [FATAL] Error: [from=f1d3nehuc4u3l5mn7hazppnogf3oe6l6ymaicbkhi, to=f010001, seq=1, m=1, h=0]: missing state: baeaikaia, Stacktrace:
   0: std::backtrace_rs::backtrace::libunwind::trace
             at /rustc/cb121987158d69bb894ba1bcc21dc45d1e0a488f/library/std/src/../../backtrace/src/backtrace/libunwind.rs:93:5
   1: std::backtrace_rs::backtrace::trace_unsynchronized
             at /rustc/cb121987158d69bb894ba1bcc21dc45d1e0a488f/library/std/src/../../backtrace/src/backtrace/mod.rs:66:5
   2: std::backtrace::Backtrace::create
             at /rustc/cb121987158d69bb894ba1bcc21dc45d1e0a488f/library/std/src/backtrace.rs:328:13
   3: std::backtrace::Backtrace::capture
             at /rustc/cb121987158d69bb894ba1bcc21dc45d1e0a488f/library/std/src/backtrace.rs:296:9
   4: anyhow::error::<impl anyhow::Error>::msg
             at /Users/raul/.cargo/registry/src/github.com-1ecc6299db9ec823/anyhow-1.0.56/src/error.rs:81:36
   5: <fvm::kernel::default::DefaultKernel<C> as fvm::kernel::BlockOps>::block_open::{{closure}}
             at /Users/raul/W/pl/fvm/fvm/src/kernel/default.rs:304:28
   6: core::option::Option<T>::ok_or_else
             at /rustc/cb121987158d69bb894ba1bcc21dc45d1e0a488f/library/core/src/option.rs:1067:25
   7: <fvm::kernel::default::DefaultKernel<C> as fvm::kernel::BlockOps>::block_open
             at /Users/raul/W/pl/fvm/fvm/src/kernel/default.rs:299:20
   8: fvm::syscalls::ipld::open
             at /Users/raul/W/pl/fvm/fvm/src/syscalls/ipld.rs:10:22
   9: core::ops::function::Fn::call
             at /rustc/cb121987158d69bb894ba1bcc21dc45d1e0a488f/library/core/src/ops/function.rs:77:5
  10: <wasmtime::linker::Linker<fvm::syscalls::InvocationData<K>> as fvm::syscalls::bind::BindSyscall<(A,),Ret,Func>>::bind::{{closure}}
             at /Users/raul/W/pl/fvm/fvm/src/syscalls/bind.rs:171:44
  ...

@raulk raulk force-pushed the raulk/fatal-error-consensus-outcome branch from 4ba2efb to c2bf65f Compare May 11, 2022 16:52
@raulk raulk changed the title transmute fatal errors into SYS_INTERNAL_ERROR exit code. transmute fatal errors into SYS_ASSERTION_FAILED exit code. May 11, 2022
@raulk raulk force-pushed the raulk/fatal-error-consensus-outcome branch from aa0efaa to 168dd9c Compare May 11, 2022 19:17
@raulk raulk force-pushed the raulk/fatal-error-consensus-outcome branch from b1f8125 to fcfc70f Compare May 11, 2022 20:06
@raulk raulk marked this pull request as ready for review May 11, 2022 20:07
@raulk raulk changed the title transmute fatal errors into SYS_ASSERTION_FAILED exit code. transmute fatal errors into SYS_ASSERTION_FAILED exit code. May 11, 2022
@raulk raulk mentioned this pull request May 11, 2022
48 tasks
Copy link
Member

@Stebalien Stebalien left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One nit, but feel free to ignore it.

/// every time an actor returns. That's why `begin()` resets any currently
/// accumulated state, as once an error occurs, we want to track its
/// propagation all the way up.
pub fn begin(&mut self, cause: Cause) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So much nicer.

Comment on lines 114 to 130
pub struct FatalCause {
/// The alternate-formatted message from the anyhow error.
pub error_msg: String,
/// The backtrace, captured if the relevant
/// [environment variables](https://doc.rust-lang.org/std/backtrace/index.html#environment-variables) are enabled.
pub backtrace: String,
}

/// The ultimate "cause" of a failed message.
#[derive(Clone, Debug)]
pub enum Cause {
/// The original cause was a syscall error.
Syscall(SyscallCause),
/// The original cause was a fatal error.
Fatal(FatalCause),
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can collapse these into a single enum.

/// The ultimate "cause" of a failed message.
#[derive(Clone, Debug)]
pub enum Cause {
    /// The original cause was a syscall error.
    Syscall {
        module: &'static str,
        /// The syscall function name.
        function: &'static str,
        /// The exact syscall error.
        error: ErrorNumber,
        /// The informational syscall message.
        message: String,
    },
    /// The original cause was a fatal error.
    Fatal {
        /// The alternate-formatted message from the anyhow error.
        error_msg: String,
        /// The backtrace, captured if the relevant
        /// [environment variables](https://doc.rust-lang.org/std/backtrace/index.html#environment-variables) are enabled.
        backtrace: String,
    },
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had inline structs in a previous version, but promoted them to types so I could annotate them with derive macros. I think I don't need those now, so I can demote them again.

@raulk raulk enabled auto-merge (squash) May 11, 2022 22:50
@raulk raulk merged commit 8d4d002 into master May 11, 2022
@raulk raulk deleted the raulk/fatal-error-consensus-outcome branch May 11, 2022 23:03
Copy link
Contributor

@arajasek arajasek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

produce a consensus outcome for fatal errors
3 participants