-
Notifications
You must be signed in to change notification settings - Fork 668
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: Improve DB deadlock detection/logging #5137
fix: Improve DB deadlock detection/logging #5137
Conversation
a8b7e96
to
c21cf45
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea @jbencin. I just had one request for a little simplification.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
cb78652
to
0bec5b2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
There was a version of |
032def0
to
dbfd7ea
Compare
dbfd7ea
to
1bf3417
Compare
Rebased and fixed a conflict, please re-approve |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Putting this as a NACK for now until we can get the below addressed.
Per the sprint call today (9 Sep), there's basically two ways under consideration for discovering deadlocks:
-
Crashing the node and printing a caller thread stack trace if the caller of
tx_begin_immediate()
fails to begin a transaction within a specified amount of time. -
Having
tx_begin_immediate()
return an error after a specified amount of time.
I don't think either approach is desirable, because neither tells us what we really want to know: which thread is holding the transaction at the time the caller tries (and fails) to run tx_begin_immediate()
? To do this, we'd need to somehow get the lock-holder to print a stack trace.
The approach I'd recommend is trying to find a way to crash the node, and print all threads' stack traces. I don't think returning an error is the right thing here, because the caller may livelock instead -- it could continuously try and fail to begin the transaction, which makes finding the root cause from the logs harder (since the livelocked thread would keep the node running for longer than when the deadlock condition arose).
It almost certainly would with this PR. The first process holding the lock for an abnormally long period of time is almost certainly doing so because it is trying to acquire another lock. That's how deadlocks generally occur (it's also possible that it got stuck in an infinite loop). So both processes should be trying and failing to acquire locks, and so both should print a stack trace to warn/error logs |
As mentioned in the call, it's entirely possible that the thread acquires one lock and then spins while trying to acquire the second lock (or just spins in general, without regards to any other locks), which would not result in its stack trace being printed. |
I see what you mean now, if you don't set the busy handler, rusqlite defaults to returning an How about I reduce the scope of this PR to just upgrading the logging to Also let me know what you think about using a re-entrant mutex as described in #5112 |
Gah, it appears Github ate my original comment. I don't know if it hit your inboxes or not. Lemme try again. There are several places in the code where multiple transactions have to be initiated in order to carry out a task. For example, processing a Stacks block requires holding open a transaction to both the Stacks chain state as well as the sortition DB. The problem is that different threads may acquire transactions in conflicting orders, leading to livelocks and/or deadlocks, depending on how a failure to acquire one of the requisite locks is held. Basically, the way I'd like to get this fixed involves two tasks:
Basically, we need to provide a singleton "database manager" that can ensure that whenever a thread acquires a particular set of transactions, they do so in the right order, and do so atomically (so, all transactions begin, or all abort). The database manager system would log a backtrace of each caller whenever it tries to begin a set of transactions, so if it gets stuck, we can figure out who is blocking it. How do we build this in a way that minimizes the refactoring burden? I think we'd treat the singleton database manager state as a global within the code, and provide a static API for initiating these transaction sequences and recording (to the global) the callers' backtraces. Specifically, we'd need the following: pub enum TransactionSequence { /* ... */ } This would be the set of valid transaction sequences, which we convince ourselves cannot conflict with one another. Fortunately, there aren't that many of them. pub struct TransactionSet {
pub(crate) txs: HashMap<TransactionSequence, DBTx>
/* ... */
} This would represent a set of open transactions. It would have a impl TransactionSet {
pub(crate) fn commit(self) -> Result<(), (Self, Error)> {
/* ... */
}
pub(crate) fn rollback(self) -> Result<(), (Self, Error)> {
/* ... */
}
} The The pub mod DBManager {
/* This is a sqlite DB connection; see below */
static State : std::sync::Mutex<Option<DBConn>> = std::sync::Mutex::new(None);
pub fn tx_begin_multi(sequence: TransactionSequence) -> Result<TransactionSet, Error> {
/* ... */
}
} The The We would remove all direct calls to I've opted to use a Sqlite DB to implement
This approach addresses @obycode's original intent by capturing transaction acquisition orders within the type system, and it helps us find regressions and deadlocks more readily than only logging failures to open transactions (since more often than not, we also want to know which process is preventing the caller from opening them). |
If, for the moment, we're just interested in improving the debugging process here, we could do something like the following:
pub fn tx_begin_immediate_sqlite<'a>(conn: &'a mut Connection) -> Result<DBTx<'a>, sqlite_error> {
conn.busy_handler(Some(tx_busy_handler))?;
let tx = Transaction::new(conn, TransactionBehavior::Immediate)?;
let mut locks_table = LOCKS_TABLE.lock().unwrap();
// the debug format for `Connection` includes the path, so deref the Transaction to a Connection
locks_table.insert(format!("{:?}", tx.deref()), format!("{:?}", std::thread::current().name()));
Ok(tx)
} And then when the busy handler reaches enough wait time, dump the stack and the locks table. That would at least tell us what thread last held the lock in question. This is kind of hacky (it doesn't clear the entries in the table when they close their transactions, it relies on debug formats), but it would require very little refactoring (no refactoring, really: the only lines changed are the ones above, plus an initializer for the static table), and should make the debugging process much easier. |
@kantai I'm not sure how useful this is if the entries are never cleared from the hash table, but that can be easily addressed with a couple changes:
|
While it would make the data structure someone clearer if entries are cleared, the most recent holder of a db's lock is sufficient to debug a deadlock. If Thread A is waiting on DB 1's lock, presumably whichever thread is the last to obtain DB 1's lock is still holding it (otherwise, Thread A would be able to get the lock).
Threads can hold multiple transactions (indeed, that's the problem), so we'd want the table to map from dbs to threads.
While that could be a relatively straightforward refactor, I don't think its necessary for the reasoning above: the last holder of each lock is really all the information we would need to debug dead locks. This refactor isn't just a matter of implementing drop() for DBTx: DBTx isn't a struct, its just a type alias, and a foreign type, so this would actually require creating a new struct, probably implementing Deref, etc. to simplify the refactor, and then make sure that everywhere in the codebase that currently expects a |
@jcnelson I appreciate the detailed reply. I like this idea, but I think this would take a while to implement and test adequately, so I'll try to get some simple logging changes merged in the next couple days, so we can be sure to have something for the Nakamoto release A couple questions/comments on this proposal:
|
1bf3417
to
502d039
Compare
502d039
to
021ab58
Compare
32a6ccf
to
292cd89
Compare
This pull request has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs. |
Description
This PR makes two changes to
tx_busy_handler()
, which is passed as a callback to rusqlite and called if waiting too long for a DB locktx_busy_handler()
were debug level, now we have warn/error level messages if we are waiting too longNote that this PR doesn't make deadlocks less likely to occur, it just makes them easier to find, and generates an error when they are detected, which may be recoverable or may crash the application (depending on how the caller handles the error)
Applicable issues
Additional info (benefits, drawbacks, caveats)
I could have added precise timing using
Instant::now()
, but that would require wrappingtx_busy_handler()
in a closure, which creates lifetime issues because it's a callback. This could be addressed by adding theBox
ed callback to theDBTx
type, but this would require changing a lot of code, which I didn't think was worth itChecklist
docs/rpc/openapi.yaml
andrpc-endpoints.md
for v2 endpoints,event-dispatcher.md
for new events)clarity-benchmarking
repobitcoin-tests.yml