You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Jun 3, 2020. It is now read-only.
The Secret Connection implementation in tendermint-rs presently has no notion of timeouts for things like connecting/"dialing" validators or reading from/writing to sockets.
Without timeouts, there are many cases in which the KMS could potentially deadlock indefinitely inside of blocking I/O operations, and at least one user has experienced this.
Here are a few recommendations:
futures/tokio: this is the up-and-coming ecosystem solution to this general problem, and the one I'd recommend. It's a full asynchronous event model which solves, among other things, timeouts. When async/await support ships in Rust 1.38 (scheduled to be released in August), migrating from blocking I/O should be fairly straightforward.
libc crate + poll(2) system call: if we wanted to stick with blocking I/O, the poll(2) system call, as invoked through the libc crate, can be used to determine I/O readiness prior to performing a blocking I/O call, and also takes a timeout as an argument. This would probably be the lowest impact way to implement timeouts as it wouldn't involve switching away from blocking I/O.
Watchdog: the existing threads run in a loop, which can periodically send a heartbeat (via e.g. a channel) to a watchdog thread. The watchdog can kill and restart threads which appear to be deadlocked. I think this might be a good idea in general, but it seems like the nuclear option and it would probably be good only as a last resort, with one of the more graceful approaches above used as the first line of defense. I'm a bit unclear on what the Rust semantics are when e.g. a thread dies while holding a mutex - it may cause a PoisonError which would corrupt the state of the whole program and cause it to crash.
The text was updated successfully, but these errors were encountered:
The Secret Connection implementation in tendermint-rs presently has no notion of timeouts for things like connecting/"dialing" validators or reading from/writing to sockets.
Without timeouts, there are many cases in which the KMS could potentially deadlock indefinitely inside of blocking I/O operations, and at least one user has experienced this.
Here are a few recommendations:
futures
/tokio
: this is the up-and-coming ecosystem solution to this general problem, and the one I'd recommend. It's a full asynchronous event model which solves, among other things, timeouts. Whenasync
/await
support ships in Rust 1.38 (scheduled to be released in August), migrating from blocking I/O should be fairly straightforward.libc
crate +poll(2)
system call: if we wanted to stick with blocking I/O, thepoll(2)
system call, as invoked through thelibc
crate, can be used to determine I/O readiness prior to performing a blocking I/O call, and also takes a timeout as an argument. This would probably be the lowest impact way to implement timeouts as it wouldn't involve switching away from blocking I/O.PoisonError
which would corrupt the state of the whole program and cause it to crash.The text was updated successfully, but these errors were encountered: