Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ResourceLoader.load_threaded_request use_sub_threads = true causes Editor + Player to hang. #78734

Closed
cosmoddd opened this issue Jun 27, 2023 · 17 comments · Fixed by #85039
Closed

Comments

@cosmoddd
Copy link

Godot version

4.1

System information

Godot v4.1.beta3 - Windows 10.0.22621 - Vulkan (Forward+) - dedicated NVIDIA GeForce RTX 2060 (NVIDIA; 31.0.15.3203) - Intel(R) Core(TM) i7-8700K CPU @ 3.70GHz (12 Threads)

Issue description

When I use ResourceLoader.load_threaded_request(_scene, "", true) in 4.0.3 Stable, my set of three scenes loads in about five seconds.

When I use the same command in 4.1b3, to load the same data, the program hangs for roughly 3-5 minutes, or crashes. After spending time troubleshooting, I am left with the impression that the use_sub_threads feature seems to slow down the editor when there are heavier scenes to load.

I tried moving the level loading thread to a new thread group (the new 4.1 feature), but this threw a ton of errors dealing with the level loading being on the wrong thread.

Other things I've done to try to alleviate the issue:

  • clean reimport of .godot project file
  • clean export to standalone build in Windows
  • disabled physics where possible

All to no avail. I'm including a smaller project so you can try to see what's happening during the ResourceLoader.load_threaded_request(level, "", true) command yourself.

I didn't change anything in my project - except upgrade to 4.1.

Possible regression?

Steps to reproduce

Download and extract the project: here

Import the project.

Load in a series of models that test the engine's ability to load a large scene.

Play the scene and notice the load speed timer in the output window.

On line 16 of BaseLevel.gd, compare the editor behavior for
ResourceLoader.load_threaded_request(level, "", true)
to
ResourceLoader.load_threaded_request(level, "", false)

Notice how there's a serious bottleneck when the use_sub_threads setting is set to 'true'.

Minimal reproduction project

Download and extract the project: here

@RandomShaper
Copy link
Member

[...] compare the editor behavior [...]

From the GDScript in the MRP:

# setting the threaded_request ot true causes a bottleneck in the editor

I'm not sure I'm understanding the issue. The editor is not affected by that GDScript code, which is only executed at runtime.

I've tried anyway to trigger problems in multi-threaded loading, to no avail. I've even forced lower number of threads than actually on my system (12, to match yours; and even 4) and added an artificial delay to the load of each individual resource, to simulate a slower hard disk, to see if I could create some kind of deadlock.

So, I need more clues. It would be great to get minidumps at the moments of both the reported situations (locked for a long time and on the crash).

@cosmoddd
Copy link
Author

Thank you for taking the time to prioritize this. Since it doesn't seem to be happening for you on the Minimal Reproduction Project, I've done two things.

The first is that I've included a crash dump from my most recent attempt to run my full project. I'm attaching it as a .zip below.

Error code:

Faulting application name: Godot_v4.1-rc1_win64.exe, version: 4.1.0.0, time stamp: 0x00000000
Faulting module name: ntdll.dll, version: 10.0.22621.1848, time stamp: 0x48d14984
Exception code: 0xc0000374
Fault offset: 0x000000000010be19
Faulting process id: 0x0x8238
Faulting application start time: 0x0x1D9A9BD89CFF0B2
Faulting application path: C:\Users\gregh\AppData\Roaming\Godot\app_userdata\Godot Version Manager\versions\Godot_v4.1-rc1_win64.exe
Faulting module path: C:\WINDOWS\SYSTEM32\ntdll.dll
Report Id: 366515d4-e472-4fb9-98be-6f9280d5634b
Faulting package full name: 
Faulting package-relative application ID: 

The file format is in Windows Event Viewer, .evtx and I've saved it inside a .zip file.

Download godot level load crash event files here

The second thing is I've included my project in full so you can see the issue happening directly.

Steps to reproduce:

  1. Download and extract the project: here (Currently in 4.1Rc1)

  2. Import the project.

  3. Press F5 to start a new game.

  4. Press two colors and enter your name and click "Proceed"

  5. Notice the load speed timer in the output window.

  6. Navigate to res://Scripts/SceneManagement/SceneManager.gd

  7. On line 63, change
    ResourceLoader.load_threaded_request(level, "", false)
    to
    ResourceLoader.load_threaded_request(level, "", true)

  8. Compare the speeds of how the level loads between the two modes.

I hope this provides more clarity to the issue. If this is still not giving you useful information, let me know.

@cosmoddd
Copy link
Author

cosmoddd commented Jun 28, 2023

Here's a visual indicator of what happens when I load the game using both methods.
ResourceLoader.load_threaded_request(level, "", false)
dimaond start no_thread_sm2

ResourceLoader.load_threaded_request(level, "", true)
dimaond start thread

If you have a preferred method of getting the minidumps of these processes, let me know and I'll provide them.

@RandomShaper
Copy link
Member

Thank you very much for the extra info, but sadly I'm still not able to reproduce the issue.

The best way of providing .dmp files would be to use WinDbg (unless you have it already installed, you can get it from https://learn.microsoft.com/en-us/windows-hardware/drivers/debugger/).

Once installed:

  • Start the game.
  • Start WinDbg (X64) from the Start menu.
  • In the menu bar, File -> Attach to a Process. It's easier if you select By Executable and find the one starting with godot in the list (be sure to pick the one corresponding to the running game, not the editor).
  • Exercise the game until you run into one of the pathological situations.
  • If it's the crash, the debugger will likely break itself.
  • If it's the deadlock, you'll have to choose Debug -> Break from the menu bar.
  • Now, in the Command window, at the botoom, you can type commands. Type this one: .dump /mA C:\Users\<your_windows_user>\Desktop\godot.dmp and hit Enter.
  • Now you should have the roughly 1.5 GiB file there. Please provide it to me via some appropriate file sharing method (ideally, one for the crash and one for the deadlock).

I know it's a lot to ask, but it will help immensely. If someone from the production team can reproduce, they may help obatining the minidumps.

@cosmoddd
Copy link
Author

Thanks for getting back to me with detailed instructions on debugging this issue.

Using the method above, I was not able to get my build to crash, only lock up.

Here is the minidump file, compressed in a .zip. (lmk if you need me to post uncompressed)

Here is a gif of my screen following the steps given, including the errors thrown.

dimaond start debug dump2_sm

This error only occurred when ResourceLoader.load_threaded_request(level, "", true) was set to "true."

I appreciate the attention on this issue and I hope my information is useful.

@kaist-kihwan
Copy link

I have similar issue. When I use ResourceLoader.load_threaded_request(scene, "", true), at certain point, the progress stops lke below.

0.53125
0.54166662693024
0.69014418125153
0.86001598834991
0.86001598834991
0.86001598834991
0.86001598834991
0.86001598834991
0.86001598834991

@RandomShaper
Copy link
Member

@cosmoddd, hmm... I was expecting the WinDbg you could download from there was the same I have on my machine (which is part of the Windows SDK), but it turns out it's a different one (maybe just newer?).

Thank you very much for providing the dump! It's actually a crash, by the way. If you can reproduce the stall at some point, please capture it, too.

Unfortunately, I didn't realize that we don't (and can't) have PDB files for official builds. However, I may still have a trick up my sleeve before having to ask you to get the minidumps from a custom build I would provide to you.

Aside, can you please try with the project setting threading/worker_pool/use_system_threads_for_low_priority_tasks being false. (@akien-mga, if that turns out to avoid the issues, we could at least document it as a workaround in the release notes in case I'm not able to do the proper fix, or make a PR that assumes the functional value of the setting.)

@univeous
Copy link
Contributor

I'm having the same issue in Godot 4.1 rc1.
After disabling threading/worker_pool/use_system_threads_for_low_priority_tasks, ResourceLoader.load_threaded_request(level, "", true) can properly load without hanging, but it spits out some error logs about Attempting to initialize the wrong RID.
There are no such logs when ResourceLoader.load_threaded_request(level, "", false).

@cosmoddd
Copy link
Author

cosmoddd commented Jun 30, 2023

I'd be happy to use a custom build if it gives you the info you need. Let me know what the preferred protocol is for sending that over.

As for setting threading/worker_pool/use_system_threads_for_low_priority_tasks to false, that did make a difference! My game's loading times behave as they did in version 4.0.3.

One thing of note:

When I start a new game using ResourceLoader.load_threaded_request(level, "", true)
and threading/worker_pool/use_system_threads_for_low_priority_tasks to false, these errors are thrown.

image

This is not a consistent error. It only seems to happen about 5 out of the past 10 times I load my scenes. Troubleshooting where exactly this happens in my code is proving difficult, since it only tells me the script which is the source of the error, but doesn't give a stack trace beyond the c++ level.

Update: I've traced the code where the error is thrown... ResourceLoader.load_threaded_request(level, "", true)

image

@RandomShaper
Copy link
Member

Interestingly enough, I'm able to reproduce the issue with the official build, but neither with my local MSVC nor MinGW builds, despite being non-dev, etc. I still can investigate it with what I have (and @akien-mga will kindly provide). Stay tuned.

@akien-mga
Copy link
Member

akien-mga commented Jun 30, 2023

Here's the config used for official builds: https://github.com/godotengine/build-containers#toolchains

Windows: MinGW 9.0.0, GCC 11.2.0, binutils 2.37, on Fedora 36

I'll make you an official build of 4.1-rc2 with debug symbols. In the past we've had crashes due to regressions in binutils.
The problem may come from LTO.

Edit: Here you go. It's built with production=yes debug_symbols=yes, so the options are -std=gnu++17 -flto -mwindows -gdwarf-4 -g2 -O2 -w -isystem thirdparty/glad -DTOOLS_ENABLED -DDEBUG_ENABLED -DNDEBUG.

https://downloads.tuxfamily.org/godotengine/testing/Godot_v4.1-rc2_win64_debugsyms.x86_64.zip

@RandomShaper
Copy link
Member

If tomorrow is the release day, I won't have time to fix this for 4.1.0-stable in the end.

@akien-mga
Copy link
Member

akien-mga commented Jun 30, 2023

Not tomorrow, but some time next week. But depending on the complexity of the bugfix, we might consider it out of scope for 4.1 already as we don't plan another RC (unless really needed), so we wouldn't take much risk.

4.1.1 will come a couple of weeks after 4.1.0 so we can include the fix then.

@akien-mga
Copy link
Member

The issue is worked around by #78977, so this can be considered fixed. But I'll keep the issue open as I believe RandomShaper intends to look into a better fix later (though it's starting to look like we'd have to fix MinGW or GCC itself :)).

@cosmoddd
Copy link
Author

cosmoddd commented Nov 14, 2023

This issue has resurfaced in Godot 4.2 Beta6.

The steps to reproduce this bug are the same as described in my initial report.

The Minimal Reproduction Project will consistently reproduce this issue.

@akien-mga
Copy link
Member

I don't seem to reproduce the issue on Linux, but that's consistent with the fact that this seemed to be a Windows/MinGW specific issue.

We changed the MinGW version we target in 4.2-beta3 to use a newer version. This might have impacted this. Could you test both 4.2-beta3 and 4.2-beta2 and let us know if you see any difference?

If both still have the bug, could you test older dev snapshots to pinpoint the first one which regressed? (Assuming that you don't have the problem in 4.1.3-stable.)
Conversely, if neither have the bug, please test newer snapshots leading to beta6.

@cosmoddd
Copy link
Author

cosmoddd commented Nov 15, 2023

I've tested 4.1.3 and all the 4.2 betas in sequence with the Minimal Reproduction Project:

4.1.3-stable: no issues after Run Current Scene 10 times
4.2-beta1: no issues after Run Current Scene 10 times
4.2-beta2: no issues after Run Current Scene 10 times
4.2-beta3: no issues after Run Current Scene 10 times
4.2-beta4: no issues after Run Current Scene 10 times
4.2-beta5: no issues after Run Current Scene 10 times

4.2-beta6:

When I press Run Current Scene in beta6, the loading is unpredictable. The test either stalls and then crashes or loads quickly as expected. This gif illustrates what's happening.

1 opens to scene
2 hang then crash
3 hang then open
4 opens
5 opens
6 opens
7 opens
8 opens
9 opens
10 opens
11 opens
12 hang then open
13 hang then crash
14 opens

When I reopen the editor, I can usually get it to hang then crash at least once on the first three tries.

loading thread issues return

A second test:

1 hang then open
2 hang then open
3 hang then crash
4 open
5 open
6 open
7 hang then crash

loading thread issues return 2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment