Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tracing, core: fix deadlock in register_callsite #2020

Closed
wants to merge 6 commits into from

Conversation

hawkw
Copy link
Member

@hawkw hawkw commented Mar 24, 2022

Motivation

Currently, there is a potential deadlock in register_callsite that can
occur if registering a callsite causes another callsite to be hit for
the first time. If this occurs, the thread will try to lock the callsite
registry while it is already holding the lock, resulting in a deadlock.

This is unfortunate, since it can occur if a subscriber's
register_callsite calls any function that contains a tracing event, if
that event has not previously been registered. This has occurred in e.g.
tokio-console due to using a tokio channel in the subscriber.

Solution

This commit fixes the deadlock by using the callsite::try_register
function added to tracing-core, which uses a thread-local to bail if
the current thread is already registering a callsite. In order to do
this, we replace the std::sync::Once in MacroCallsite with a
hand-written atomic once-like loop. This is because, while we only want
to register the callsite once, we may have to try to register it more
than one time before try_register returns true. Now, we set a flag
only if the callsite has been successfully registered.

Signed-off-by: Eliza Weisman [email protected]

Signed-off-by: Eliza Weisman <[email protected]>
## Motivation

Currently, the `callsite::register` function will unconditionally
attempt to lock the global callsite registry. This is unfortunate,
because it can result in a deadlock if a subscriber's
`register_callsite` method calls into code that itself contains
`tracing` instrumentation with callsites that haven't yet been
registered.

## Solution

This commit adds a new `callsite::try_register` function that *attempts*
to register a callsite, returning `true` if it was successfully
registered. If the current thread is registering a callsite (or, on
`no_std`, if a callsite is being registered at all), it will return
`false`, indicating that the registration must be retried.

Signed-off-by: Eliza Weisman <[email protected]>
## Motivation

Currently, there is a potential deadlock in `register_callsite` that can
occur if registering a callsite causes _another_ callsite to be hit for
the first time. If this occurs, the thread will try to lock the callsite
registry while it is already holding the lock, resulting in a deadlock.

This is unfortunate, since it can occur if a subscriber's
`register_callsite` calls any function that contains a tracing event, if
that event has not previously been registered. This has occurred in e.g.
`tokio-console` due to using a tokio channel in the subscriber.

## Solution

This commit fixes the deadlock by using the `callsite::try_register`
function added to `tracing-core`, which uses a thread-local to bail if
the *current* thread is already registering a callsite. In order to do
this, we replace the `std::sync::Once` in `MacroCallsite` with a
hand-written atomic once-like loop. This is because, while we only want
to register the callsite once, we may have to *try* to register it more
than one time before `try_register` returns true. Now, we set a flag
only if the callsite has been successfully registered.

Signed-off-by: Eliza Weisman <[email protected]>
@hawkw hawkw requested a review from a team as a code owner March 24, 2022 20:08
jswrenn added a commit to jswrenn/tracing-allocations that referenced this pull request Mar 24, 2022
hawkw added a commit that referenced this pull request Apr 22, 2022
…1.x (#2083)

## Motivation

Currently on `v0.1.x`, the global callsite registry is implemented as a
`Mutex<Vec<&'static dyn Callsite>>`. This has a few downsides:

* Every time a callsite is registered, the `Mutex` must be locked. This
  can cause a deadlock when a `register_callsite` implementation calls
  into code that _also_ registers a callsite. See #2020 for details.

* Registering callsites is relatively slow and forces the program to
  synchronize on the callsite registry (and not on *individual*
  callsites). This means that if two threads are both registering
  totally different callsites, both threads must wait for the lock.
  Although this overhead is amortized over the entire rest of the
  program, it does have an impact in short-running applications where
  any given callsite may only be hit once or twice.

* Memory allocation may occur while the registry is locked. This makes
  the use of `tracing` inside of memory allocators difficult or
  impossible.

On the `master` branch (v0.2.x), PR #988 rewrote the callsite registry
to use an intrusive atomic singly-linked list of `Callsite`s. This
removed the need for locking the callsite registry, and made it possible
to register callsites without ever allocating memory. Because the
callsite registry on v0.2 will *never* allocate, this also makes it
possible to compile `tracing-core` for `no_std` platforms without
requiring `liballoc`. Unfortunately, however, we cannot use an identical
implementation on v0.1.x, because using the intrusive linked-list
registry for *all* callsites requires a breaking change to the
`Callsite` trait (in order to add a way to get the callsite's
linked-list node), which we cannot make on v0.1.x.

## Solution

This branch adds a linked-list callsite registry for v0.1.x in a way
that allows us to benefit from *some* of the advantages of this approach
in a majority of cases. The trick is introducing a new `DefaultCallsite`
type in `tracing-core` that implements the `Callsite` trait. This type
can contain an intrusive list node, and *when a callsite is a
`DefaultCallsite`*, we can register it without having to push it to the
`Mutex<Vec<...>>`. The locked vec still _exists_, for `Callsite`s that
are *not* `DefaultCallsite`s, so we can't remove the `liballoc`
dependency, but in most cases, we can avoid the mutex and allocation.

Naturally, `tracing` has been updated to use the `DefaultCallsite` type
from `tracing-core`, so the `Vec` will only be used in the following
cases:

* User code has a custom implementation of the `Callsite` trait, which
  is [not terribly common][1].
* An older version of the `tracing` crate is in use.

This fixes the deadlock described in #2020 when `DefaultCallsite`s are
used. Additionally, it should reduce the performance impact and memory
usage of the callsite registry.

Furthermore, I've changed the subscriber registry so that a
`RwLock<Vec<dyn Dispatch>>` is only used when there actually are
multiple subscribers in use. When there's only a global default
subscriber, we can once again avoid locking for the registry of
subscribers. When the `std` feature is disabled, thread-local scoped
dispatchers are unavailable, so the single global subscriber will always
be used on `no_std`.

We can make additional changes, such as the ones from #2020, to _also_
resolve potential deadlocks when non-default callsites are in use, but
since this branch rewrites a lot of the callsite registry code, that
should probably be done in a follow-up.

[1]: https://cs.github.com/?scopeName=All+repos&scope=&q=%28%2Fimpl+.*Callsite%2F+AND+language%3Arust%29+NOT+%28path%3A%2Ftracing%2F**+OR+path%3A%2Ftracing-*%2F**+OR+path%3A%2Ftokio-trace*%2F**%29%29

Signed-off-by: Eliza Weisman <[email protected]>
@hawkw
Copy link
Member Author

hawkw commented May 2, 2022

Whoops, this PR was supposed to be closed by #2083, which kind of obsoleted it. We may still want to bring back a version of this change as well, for non-linked-list callsites.

@hawkw hawkw closed this May 2, 2022
kaffarell pushed a commit to kaffarell/tracing that referenced this pull request May 22, 2024
…1.x (tokio-rs#2083)

## Motivation

Currently on `v0.1.x`, the global callsite registry is implemented as a
`Mutex<Vec<&'static dyn Callsite>>`. This has a few downsides:

* Every time a callsite is registered, the `Mutex` must be locked. This
  can cause a deadlock when a `register_callsite` implementation calls
  into code that _also_ registers a callsite. See tokio-rs#2020 for details.

* Registering callsites is relatively slow and forces the program to
  synchronize on the callsite registry (and not on *individual*
  callsites). This means that if two threads are both registering
  totally different callsites, both threads must wait for the lock.
  Although this overhead is amortized over the entire rest of the
  program, it does have an impact in short-running applications where
  any given callsite may only be hit once or twice.

* Memory allocation may occur while the registry is locked. This makes
  the use of `tracing` inside of memory allocators difficult or
  impossible.

On the `master` branch (v0.2.x), PR tokio-rs#988 rewrote the callsite registry
to use an intrusive atomic singly-linked list of `Callsite`s. This
removed the need for locking the callsite registry, and made it possible
to register callsites without ever allocating memory. Because the
callsite registry on v0.2 will *never* allocate, this also makes it
possible to compile `tracing-core` for `no_std` platforms without
requiring `liballoc`. Unfortunately, however, we cannot use an identical
implementation on v0.1.x, because using the intrusive linked-list
registry for *all* callsites requires a breaking change to the
`Callsite` trait (in order to add a way to get the callsite's
linked-list node), which we cannot make on v0.1.x.

## Solution

This branch adds a linked-list callsite registry for v0.1.x in a way
that allows us to benefit from *some* of the advantages of this approach
in a majority of cases. The trick is introducing a new `DefaultCallsite`
type in `tracing-core` that implements the `Callsite` trait. This type
can contain an intrusive list node, and *when a callsite is a
`DefaultCallsite`*, we can register it without having to push it to the
`Mutex<Vec<...>>`. The locked vec still _exists_, for `Callsite`s that
are *not* `DefaultCallsite`s, so we can't remove the `liballoc`
dependency, but in most cases, we can avoid the mutex and allocation.

Naturally, `tracing` has been updated to use the `DefaultCallsite` type
from `tracing-core`, so the `Vec` will only be used in the following
cases:

* User code has a custom implementation of the `Callsite` trait, which
  is [not terribly common][1].
* An older version of the `tracing` crate is in use.

This fixes the deadlock described in tokio-rs#2020 when `DefaultCallsite`s are
used. Additionally, it should reduce the performance impact and memory
usage of the callsite registry.

Furthermore, I've changed the subscriber registry so that a
`RwLock<Vec<dyn Dispatch>>` is only used when there actually are
multiple subscribers in use. When there's only a global default
subscriber, we can once again avoid locking for the registry of
subscribers. When the `std` feature is disabled, thread-local scoped
dispatchers are unavailable, so the single global subscriber will always
be used on `no_std`.

We can make additional changes, such as the ones from tokio-rs#2020, to _also_
resolve potential deadlocks when non-default callsites are in use, but
since this branch rewrites a lot of the callsite registry code, that
should probably be done in a follow-up.

[1]: https://cs.github.com/?scopeName=All+repos&scope=&q=%28%2Fimpl+.*Callsite%2F+AND+language%3Arust%29+NOT+%28path%3A%2Ftracing%2F**+OR+path%3A%2Ftracing-*%2F**+OR+path%3A%2Ftokio-trace*%2F**%29%29

Signed-off-by: Eliza Weisman <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant