`jl_record_backtrace` may miss tasks? #56046

d-netto · 2024-10-08T14:29:34Z

We've been using jl_record_backtrace extensively in production to print backtraces when one of our servers hits a state of degraded performance or deadlock.

We're afraid, however, that there could be a ABA-like bug in jl_record_backtrace which may lead us to miss a task for which we're attempting to record the backtrace.

Here is the code for reference:

JL_DLLEXPORT size_t jl_record_backtrace(jl_task_t *t, jl_bt_element_t *bt_data, size_t max_bt_size) JL_NOTSAFEPOINT
{
    jl_task_t *ct = jl_current_task;
    jl_ptls_t ptls = ct->ptls;
    if (t == ct) {
        return rec_backtrace(bt_data, max_bt_size, 0);
    }
    bt_context_t *context = NULL;
    bt_context_t c;
    int16_t old = -1;
    while (!jl_atomic_cmpswap(&t->tid, &old, ptls->tid) && old != ptls->tid) {
        int lockret = jl_lock_stackwalk();
        // if this task is already running somewhere, we need to stop the thread it is running on and query its state
        if (!jl_thread_suspend_and_get_state(old, 1, &c)) {
            jl_unlock_stackwalk(lockret);
            if (jl_atomic_load_relaxed(&t->tid) != old)
                continue;
            return 0;
        }
        jl_unlock_stackwalk(lockret);
        if (jl_atomic_load_relaxed(&t->tid) == old) {
            jl_ptls_t ptls2 = jl_atomic_load_relaxed(&jl_all_tls_states)[old];
            if (ptls2->previous_task == t || // we might print the wrong stack here, since we can't know whether we executed the swapcontext yet or not, but it at least avoids trying to access the state inside uc_mcontext which might not be set yet
                (ptls2->previous_task == NULL && jl_atomic_load_relaxed(&ptls2->current_task) == t)) { // this case should be always accurate
                // use the thread context for the unwind state
                context = &c;
            }
            break;
        }
        // got the wrong thread stopped, try again
        jl_thread_resume(old);
    }
    if (context == NULL && (!t->ctx.copy_stack && t->ctx.started && t->ctx.ctx != NULL)) {
        // need to read the context from the task stored state
        jl_jmp_buf *mctx = &t->ctx.ctx->uc_mcontext;
#if defined(_OS_WINDOWS_)
        memset(&c, 0, sizeof(c));
        if (jl_simulate_longjmp(*mctx, &c))
            context = &c;
#elif defined(JL_HAVE_UNW_CONTEXT)
        context = t->ctx.ctx;
#elif defined(JL_HAVE_UCONTEXT)
        context = jl_to_bt_context(t->ctx.ctx);
#elif defined(JL_HAVE_ASM)
        memset(&c, 0, sizeof(c));
        if (jl_simulate_longjmp(*mctx, &c))
            context = &c;
#else
     #pragma message("jl_record_backtrace not defined for unknown task system")
#endif
    }
    size_t bt_size = 0;
    if (context)
        bt_size = rec_backtrace_ctx(bt_data, max_bt_size, context, t->gcstack);
    if (old == -1)
        jl_atomic_store_relaxed(&t->tid, old);
    else if (old != ptls->tid)
        jl_thread_resume(old);
    return bt_size;
}

The case which I think may be pathological is the following:

t is initially scheduled on thread 1. The compare-and-swap at the while loop will fail and old will get a value of 1.
We will stop thread 1 in jl_thread_suspend_and_get_state, but let's say that task t was faster than us and got rescheduled in thread 2.
The check t->tid == 1 fails, and we will resume thread 1.
Task t is again faster than us, and very quickly got migrated from thread 2 to thread 1.
At the time we hit the top of the while loop again, we will have t->tid == 1, and old == 1. The compare-and-swap succeeds even though this task is still running in a thread, which will cause us to leave early from the while loop.

Is the case I'm describing here indeed pathological? Are there any invariants that I could be missing that will make this scenario impossible to occur?

Thanks in advance.

CC: @vtjnash.

The text was updated successfully, but these errors were encountered:

vtjnash · 2024-10-08T14:41:13Z

Yes, good catch, there appears to be a missing re-assignment of old = -1; at the end of that loop which means in the ABA case, we accidentally actually acquire the lock on the thread despite not actually having stopped the thread; or in the counter-case, we try to run through this logic with old==-1 on the next iteration, and that isn't valid either (jl_thread_suspend_and_get_state should return failurea and the loop with abort too early)

There was a missing re-assignment of old = -1; at the end of that loop which means in the ABA case, we accidentally actually acquire the lock on the thread despite not actually having stopped the thread; or in the counter-case, we try to run through this logic with old==-1 on the next iteration, and that isn't valid either (jl_thread_suspend_and_get_state should return failure and the loop will abort too early). Fix #56046

vtjnash mentioned this issue Oct 8, 2024

stackwalk: fix jl_thread_suspend_and_get_state race #56047

Merged

vtjnash closed this as completed in #56047 Oct 10, 2024

vtjnash closed this as completed in 224ff57 Oct 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`jl_record_backtrace` may miss tasks? #56046

`jl_record_backtrace` may miss tasks? #56046

d-netto commented Oct 8, 2024 •

edited

Loading

vtjnash commented Oct 8, 2024

jl_record_backtrace may miss tasks? #56046

jl_record_backtrace may miss tasks? #56046

Comments

d-netto commented Oct 8, 2024 • edited Loading

vtjnash commented Oct 8, 2024

`jl_record_backtrace` may miss tasks? #56046

`jl_record_backtrace` may miss tasks? #56046

d-netto commented Oct 8, 2024 •

edited

Loading