-
-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rt: do not trace tasks while locking OwnedTasks
#6036
Conversation
If a polled tasks completes while `OwnedTasks` is locked, it will not be able to remove itself from `OwnedTasks`, resulting in a deadlock. Fixes tokio-rs#6035.
tokio/src/runtime/task/trace/mod.rs
Outdated
task.as_raw().state().transition_to_notified_for_tracing(); | ||
// store the raw tasks into a vec | ||
tasks.push(task.as_raw()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should be taking a refcount to the task instead of using as_raw
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the API for that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can add this:
impl<S: 'static> Clone for Task<S> {
fn clone(&self) -> Task<S> {
self.raw.ref_inc();
Task::new(self.raw)
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Huh. I realized that transition_to_notified_for_tracing
actually performs a refcount increment. Because of that, you don't need the additional refcount increment. However, I would probably prefer to see this refcount management a bit more wrapped up in a less error-prone api.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm. Unfortunately, now I'm quite confused by how refcount management is supposed to work. I removed only the increment in transition_to_notified_for_tracing
, but doing so causes the tests to SIGABRT:
free(): invalid pointer
What's the relationship between notifying a task, polling a task, and its refcount?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The behavior of each function is documented in harness.rs
. There's also a bunch of documentation at the top of mod.rs
.
Basically, each Task
(or Notified
) object owns a refcount. Various methods will consume the refcount. We want the refcount that transition_to_notified_for_tracing
creates to be owned by a Task
somehow, rather than implicitly via a RawTask
. This is because it generally makes code much more clear, and it also makes the code more robust towards error paths due to RAII cleanup.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it, thanks! So, to summarize, the crash occurred because:
- cloning
Task
incremented the refcount - polling the task decremented the refcount
- dropping
Task
decremented the refcount
...which is one too many decrements. transition_to_notified_for_tracing
needs to increment the refcount (much like transition_to_notified_by_ref
), because polling is going to consume that count.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
Edit: I confused myself...
2849fb3
to
c0d5cf7
Compare
c0d5cf7
to
2849fb3
Compare
tokio/src/runtime/task/trace/mod.rs
Outdated
// set the notified bit | ||
task.as_raw().state().transition_to_notified_for_tracing(); | ||
// store the tasks into a vec | ||
tasks.push(task.clone()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, so, I've figured out how I think we should handle this refcount business:
- First, we wrap
transition_to_notified_for_tracing
with a method onTask
that returns aNotified
. ThisNotified
will own the refcount that we created inside oftransition_to_notified_for_tracing
. - Next, when you want to poll the task, instead of calling
poll
directly, you instead useOwnedTasks::assert_owner
to convert theNotified
into aLocalNotified
. - To poll the task, you call
LocalNotified::run
.
This way, we only increment the refcount once (so there's no call to Task::clone
), and all refcounts we create are owned by Notified
object, so their destructor will clear it if it gets dropped before it is polled, e.g. due to a panic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done!
tokio/src/runtime/task/mod.rs
Outdated
cfg_taskdump! { | ||
pub(super) fn notify_for_tracing(&self) -> Notified<S> { | ||
self.as_raw().state().transition_to_notified_for_tracing(); | ||
Notified(self.clone()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since transition_to_notified_for_tracing
already increments the refcount, you do not also need to clone it. You probably have a memory leak right now.
Notified(self.clone()) | |
Notified(self) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think there's a leak here. Remember, when we're calling notify_for_tracing
, we're iterating over OwnedTasks
, and need to get notified copies of those tasks into a separate vector. A clone has to occur somewhere. Either we can do it here, or we can modified this method to consume self
and do the clone in trace_helper
instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as I can tell, you increment twice (in transition_to_notified_for_tracing
and clone
), but only decrement once (in run
)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, yep. I didn't catch that run mem::forget
s the LocalNotified
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of removing the ref_inc
from transition_to_notified_for_tracing
, it makes more sense to me to remove the call to clone
. This way, you only touch the atomic once instead of twice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in 4264f61, but note that there's now a need for cloning in trace_owned
, because we have to stash the Task
s (or Notified
)s outside of OwnedTasks
.
`notify_for_tracing` already increments refcount with its `clone`, and `LocalNotified::run` forgets (not drops) the underlying task.
tokio/src/runtime/task/mod.rs
Outdated
@@ -361,6 +361,13 @@ impl<S: 'static> Task<S> { | |||
fn header_ptr(&self) -> NonNull<Header> { | |||
self.raw.header_ptr() | |||
} | |||
|
|||
cfg_taskdump! { | |||
pub(super) fn notify_for_tracing(self) -> Notified<S> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should be able to fix that like this:
pub(super) fn notify_for_tracing(self) -> Notified<S> { | |
pub(super) fn notify_for_tracing(&self) -> Notified<S> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a compile error. Notified
wraps a Task
, not a &Task
.
error[E0308]: mismatched types
--> tokio/src/runtime/task/mod.rs:368:22
|
368 | Notified(self)
| -------- ^^^^ expected `Task<S>`, found `&Task<S>`
| |
| arguments to this struct are incorrect
|
= note: expected struct `runtime::task::Task<S>`
found reference `&runtime::task::Task<S>`
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since you created a new refcount, you can create a new Task
using the raw
field.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it, thanks! Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just one nit remaining.
If a polled tasks completes while
OwnedTasks
is locked, it will not be able to remove itself fromOwnedTasks
, resulting in a deadlock.Fixes #6035.