Set RUST_BACKTRACE=1 for production services: create a crate and use it #5360

sunshowers · 2024-03-30T03:26:51Z

While debugging an instance of #2416, I saw at gc08's /pool/ext/8a199f12-4f5c-483a-8aca-f97856658a35/crypt/debug/oxz_nexus_65a11c18-7f59-41ac-b9e7-680627f996e7/oxide-nexus:default.log.1711665000:

thread 'tokio-runtime-worker' panicked at nexus/db-queries/src/db/sec_store.rs:65:60:
called `Result::unwrap()` on an `Err` value: InternalError { internal_message: "database error (kind = Unknown): result is ambiguous: error=rpc error: code = Unavailable desc = error reading from server: read tcp [fd00:1122:3344:109::3]:56722->[fd00:1122:3344:105::3]:32221: read: connection reset by peer [exhausted]\n" }
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
[ Mar 28 22:01:58 Stopping because all processes in service exited. ]
[ Mar 28 22:01:58 Executing stop method (:kill). ]

In this case the issue is pretty clear, but I'm wondering if we've considered setting RUST_BACKTRACE=1 in our production environment. Having backtraces is something that can definitely aid in debugging, but maybe it isn't a big deal because the core file can show what's going on. (But see #5359.)

According to https://stackoverflow.com/questions/29421727/how-much-overhead-does-rust-backtrace-1-have it seems like there's some performance cost, so we'd have to measure it carefully.

Wonder if @hawkw has thoughts here.

Tasks

Give feedback

Create a small crate to set RUST_BACKTRACE=1 if it isn't set already (and maybe RUST_LIB_BACKTRACE as well)
Use the crate in nexus
Use the crate in sled-agent
Use it in wicketd
Use it elsewhere (add tasks for other services that would benefit)
Options

The text was updated successfully, but these errors were encountered:

davepacheco · 2024-04-01T17:28:37Z

I definitely think this is worthwhile. Even in environments where I've had easy access to core files and easy ways to get stuff like a stack trace out, it was still very valuable to have the runtime-printed stack trace available.

jclulow · 2024-04-01T17:31:50Z

Is it possible to build the binary so it defaults to this behaviour, instead of requiring the environment be set?

hawkw · 2024-04-01T18:42:55Z

Wonder if @hawkw has thoughts here.

I don't have a strong opinion --- IMO, we should have some way of collecting backtraces from production crashes; if we can get this data from the core, it's maybe less pressing, but I would err on the side of including them.

jgallagher · 2024-04-02T15:21:51Z

Is it possible to build the binary so it defaults to this behaviour, instead of requiring the environment be set?

It doesn't look like it. The default panic hook calls get_backtrace_style, which reads from RUST_BACKTRACE unless set_backtrace_style has been called. set_backtrace_style is unstable (rust-lang/rust#93346), so we can't call that even if we wanted to.

We could set RUST_BACKTRACE from inside the program, presumably as one of the first thing we do in main? 😬 Gross, but it works, so I thought I'd mention it.

sunshowers · 2024-04-02T20:11:27Z

I like the idea of setting RUST_BACKTRACE=1 within main. I'd do something like if RUST_BACKTRACE isn't set, set it to 1.

sunshowers · 2024-04-02T20:14:35Z

(Note to people in the future wondering -- setting a panic hook isn't enough. For example, anyhow reads RUST_BACKTRACE).

sunshowers · 2024-04-02T20:28:08Z

Another consideration here is whether we want anyhow to also capture backtraces. That is controlled with both RUST_BACKTRACE and another env var, RUST_LIB_BACKTRACE. anyhow documents the logic.

sunshowers · 2024-04-02T21:47:55Z

rust-analyzer uses the same strategy too, so we'd be in good company:

https://github.com/rust-lang/rust-analyzer/blob/c3b8c2a25413e2aa58295d18c12902a624471b74/crates/rust-analyzer/src/bin/main.rs#L105-L107

sunshowers changed the title ~~Should we run our production services with RUST_BACKTRACE=1?~~ Set RUST_BACKTRACE=1 for production services: create a crate and use it Apr 2, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Set RUST_BACKTRACE=1 for production services: create a crate and use it #5360

Set RUST_BACKTRACE=1 for production services: create a crate and use it #5360

sunshowers commented Mar 30, 2024 •

edited

Loading

Tasks

davepacheco commented Apr 1, 2024

jclulow commented Apr 1, 2024

hawkw commented Apr 1, 2024

jgallagher commented Apr 2, 2024

sunshowers commented Apr 2, 2024

sunshowers commented Apr 2, 2024

sunshowers commented Apr 2, 2024

sunshowers commented Apr 2, 2024

Set RUST_BACKTRACE=1 for production services: create a crate and use it #5360

Set RUST_BACKTRACE=1 for production services: create a crate and use it #5360

Comments

sunshowers commented Mar 30, 2024 • edited Loading

Tasks

davepacheco commented Apr 1, 2024

jclulow commented Apr 1, 2024

hawkw commented Apr 1, 2024

jgallagher commented Apr 2, 2024

sunshowers commented Apr 2, 2024

sunshowers commented Apr 2, 2024

sunshowers commented Apr 2, 2024

sunshowers commented Apr 2, 2024

sunshowers commented Mar 30, 2024 •

edited

Loading