-
-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
create a new scope executor for every scope #7564
Conversation
This test works on main. But if my mental model of what is causing the dealock was right, this should deadlock. fn can_run_nested_multithreaded_schedules() {
let mut world = World::default();
world.init_resource::<MainThreadExecutor>();
world.init_resource::<SystemOrder>();
let mut inner_schedule = Schedule::default();
inner_schedule.set_executor_kind(ExecutorKind::MultiThreaded);
inner_schedule.add_system(make_function_system(0));
let mut outer_schedule = Schedule::default();
outer_schedule.set_executor_kind(ExecutorKind::MultiThreaded);
outer_schedule.add_system(move |world: &mut World| {
inner_schedule.run(world);
});
outer_schedule.run(&mut world);
assert_eq!(world.resource::<SystemOrder>().0, vec![0]);
} |
There's definitely some timing issues associated with the bug. I eventually added enough |
Are there any downsides to this PR? The debug asset server is very important for rendering development. The ability to iterate shaders live saves a ton of time given that bevy render/pbr etc take a long time to compile. |
This looks like it removes reuse of a per-thread executor, to instead create a new thread executor every time a scope is used. That sounds like it has the potential for a performance regression due to creating new thread executors for the duration of the scope. |
I won't have time to look into this issue more for at least a week. I do plan on investigating more when I do have time. But in case no one else figures anything out, I do consider this change to be low risk. The change to reuse a thread local executor instead of creating a new one was made during this release cycle in #7087. So this is basically just reverting that change |
Following code can reproduce dead lock. use bevy_app::App;
use bevy_ecs::prelude::*;
fn run_sub_app(mut sub_app: NonSendMut<DebugApp>) {
sub_app.app.update();
}
struct DebugApp {
app: App,
}
fn main() {
let mut app = bevy_app::App::new();
let sub_app = bevy_app::App::new();
app.insert_non_send_resource(DebugApp { app: sub_app });
app.add_system(run_sub_app);
app.update();
} |
I think the reason is, |
I wondered how LocalExecutor on main thread get ticked, until see the code |
There are two executors here with one running inside the other one. The exclusivity is per executor, and so running 2 systems that want exclusive access in different executors is allowed.
They're also ticked inside the bevy_tasks::scope which the multithreaded executor runs inside. |
The test code above is a little out of date. The inner schedule should have some apply_system_buffers in it. My investigations showed that the deadlock was happening during the startup schedule of the debug app. Typically it happened during the 2nd or 3rd apply_system_buffers in that schedule. The schedule doesn't otherwise have any systems in it. But even with trying to add some apply_system_buffers to the test it doesn't deadlock. |
I have run into issues with integration tests of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comparing this to prior to pipelined rendering's merge, this indeed is just a revert. If this is fixing the usage of the debug asset server, we shouldn't ship 0.10 without this.
I think I find something interesting. The problem actually is "the async-executor entered a state that it is not able to be triggered just by spawn." Normally, when spawn_exclusive_system_task, it triggers/notifies the main threads spawn thread: ThreadId(4) -> Executor { id: 9, active: 1, global_tasks: 0, local_runners: [], sleepers: 2 } 0 f: bevy_ecs::schedule::executor::multi_threaded::MultiThreadedExecutor::spawn_exclusive_system_task::{{closure}}
executor[9] notify wake 2 Waker { data: 0x600001da4850, vtable: 0x107eb9698 } now: "\"count:2 free_ids:[] wakers:1 <wakers: 1 Waker { data: 0x600001dd2fb0, vtable: 0x107eb9698 }>\\n\"" But when the problem happens, it can't trigger the executor. The spawn thread: ThreadId(4) -> Executor { id: 9, active: 1, global_tasks: 0, local_runners: [], sleepers: 2 } 0 f: bevy_ecs::schedule::executor::multi_threaded::MultiThreadedExecutor::spawn_exclusive_system_task::{{closure}}
executor[9] notify no effort Why? That's something I am still figuring out. It appears the executor's
Actually when the ticker/sleeper is just notified and running, this is not a problem. In our case, the main thread is already parked, then this is a problem. The executor believes the ticker is running or already notified, so it won't notify it again. Then the main thread parked forever. |
This is other executor's states. Only Executor [9]'s count is
|
I think I found the root cause, there are two /// the thread local one
thread_local! {
static LOCAL_EXECUTOR: async_executor::LocalExecutor<'static> = async_executor::LocalExecutor::new("local executor");
static THREAD_EXECUTOR: Arc<ThreadExecutor<'static>> = Arc::new(ThreadExecutor::new());
}
/// and also the MainExecutor resource
#[derive(Resource, Default, Clone)]
pub struct MainThreadExecutor(pub Arc<ThreadExecutor<'static>>); And the troublesome task is created by code: external_ticker.tick().or(scope_ticker.tick()).await; |
Just opened #7825 , check it out? |
Closing in favor of #7825. |
Objective
Solution
scope
and remove the reused thread local one.cargo run --example load_gltf --features debug_asset_server
deadlocks without this pr, but works with it. But I'm unsure of the root cause of the deadlock, so this is not a guarantied fix.Changelog