-
-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Merged by Bors] - Fix asset_debug_server hang. There should be at most one ThreadExecut… #7825
[Merged by Bors] - Fix asset_debug_server hang. There should be at most one ThreadExecut… #7825
Conversation
…or's ticker for one thread.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you sure both these changes are necessary? The MainThreadExecutor doesn't get cloned into debug asset server app.
Is this only a problem with Is it possible to write a test in bevy_tasks that shows that this deadlocks before this PR? |
@NiklasEi can you check if this fixes your use of the debug asset server? I want to make sure this fixes that key issue before giving this a more thorough review. |
There are mainly two changes: use bevy_app::App;
use bevy_ecs::prelude::*;
fn run_sub_app(mut sub_app: NonSendMut<DebugApp>) {
sub_app.app.update();
}
struct DebugApp {
app: App,
}
fn main() {
let mut app = bevy_app::App::new();
let sub_app = bevy_app::App::new();
app.insert_non_send_resource(DebugApp { app: sub_app });
app.add_system(run_sub_app);
app.update();
}
Yes, it is possible to write a test to repro the dead lock, i'll give it a try. This bug is hard to reason, I spend like 20+ hours on it :'). The only fact I am sure of is: "if the Ticker get leaked, then the async_executor enter the "troubled" state, that it can't be notify." But this doesn't promise a deadlock, if the thread can be unparked by any other means, it still able to proceed. I've tried to create a separate thread just unpark the main thread, it also able to run.
In theory EDIT: format |
…onflict check, it would block.
Just added an example, without the fix, it would block. You can try disable check by returning |
I think I figured out the details. The deadlock is caused by following steps.
let forever = async {
loop {
ticker_1.tick().or(ticker_2.tick()).await
}
}
future::block_on(forever, work_future);
Back to the code fix, if we replace the EDIT: format |
@hymm @james7132 ping |
This explanation makes sense to me. Thanks for figuring it out. So my PR #7564 fixes things by not reusing the executor and so it the inner executor doesn't get into the weird state. While this PR fixes things by not ever having the second ticker in an or. My test code in the other pr's comments never deadlocked, because I needed to add a second executor that the outer schedule is using. I'm pretty sure I prefer the change in this PR. We don't need to keep recreating the scope executor and we're no longer doing the double ticking which always felt a little weird. In the longer term, this seems to be a bug in async executor and we should consider upstreaming a fix. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@NiklasEi can you check if this fixes your use of the debug asset server? I want to make sure this fixes that key issue before giving this a more thorough review.
I just checked it and yes, this PR also fixes my stuck integration tests 👍
Just wanted to quickly chime in and thank @shuoli84 for digging into this rather complex bug. I'll leave a full review soon. Definitely want this fix before 0.10 goes live. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code looks good to me now. The logic for which tickers need to be ticked in scope is getting a little complicated, so it'd be nice to have some unit tests for that, but not going to block on that. The multiple tickers code should be getting removed when we remove !Send resources from the world.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sans a few code quality nits, this looks good to me. Great work!
Just opened a pr #7865, which basically runs the |
let scope_ticker = scope_executor.ticker().unwrap(); | ||
if let Some(external_ticker) = external_executor.ticker() { | ||
if tick_task_pool_executor { | ||
let external_ticker = if !external_executor.is_same(scope_executor) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice change. This is definitely easier to follow.
bors r+ |
#7825) …or's ticker for one thread. # Objective - Fix debug_asset_server hang. ## Solution - Reuse the thread_local executor for MainThreadExecutor resource, so there will be only one ThreadExecutor for main thread. - If ThreadTickers from same executor, they are conflict with each other. Then only tick one.
bevyengine#7825) …or's ticker for one thread. # Objective - Fix debug_asset_server hang. ## Solution - Reuse the thread_local executor for MainThreadExecutor resource, so there will be only one ThreadExecutor for main thread. - If ThreadTickers from same executor, they are conflict with each other. Then only tick one.
…or's ticker for one thread.
Objective
Solution