Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Thread::wait_to_finish throws error when called on a joined/returned thread #71048

Open
Jimmio92 opened this issue Jan 8, 2023 · 7 comments
Open

Comments

@Jimmio92
Copy link

Jimmio92 commented Jan 8, 2023

Godot version

4.0 master

System information

Win 10, AMD Radeon RX 590

Issue description

I have a thread to handle UPNP stuff which blocks when called.

When the thread is done, it signals. The main thread awaits this signal, and then calls wait_to_finish -- this throws an error about not being able to join a thread from itself. This means it's being cleaned up already like expected -- yet there is also a warning saying we must call wait_to_finish or it won't be properly cleaned up.

I have solved this problem with my personal fork of Godot by changing Thread and everything in the engine that uses it to use normal naming conventions (wait_to_finish becomes join, like everything else is named), and providing detach. I also moved the "cleanup" code, of which there was none (why was that warning even there?), to the destructor, making the class no longer require the warning about calling wait_to_finish, and also bringing it more in line with what most people expect out of a thread.

My only concern is.... I don't know what I'm doing when it comes to GitHub -- do I make a pull request? The fact that I changed the name of a method that's in.... some 45 files if my text editor didn't lie... means I doubt this will make it in to 4.0, if ever. But there's definitely a problem here, even if my code isn't the solution.

Steps to reproduce

Create a thread; don't call wait_to_finish on it. Warning appears that seems needless.

Create a thread and await a signal saying it's done. After the await, call wait_to_finish. Note it throws an error.

Minimal reproduction project

test_case_thread.zip
Open the example.tscn scene and run it with F6. Note the warning about not having called wait_to_finish and also the error about trying to call join on itself.

NOTE: I tried to keep it exactly accurate to what went wrong so it still does do a UPNP discover -- this is not part of the problem, merely there to be the same timed external event.

@Chaosus
Copy link
Member

Chaosus commented Jan 14, 2023

Not sure whether it's a bug, or intended behaviour for threads. Maybe some of @godotengine/core devs may say better.

@Jimmio92
Copy link
Author

Pretty sure it's a bug as there's no way to use threads without a warning/error.

Also, with all the talk about Godot becoming AAA compatible (see https://godotengine.org/article/whats-missing-in-godot-for-aaa/ ).... the change to C++ standard threads is backwards from what they're wanting. Huge performance losses on Windows -- each mutex lock/unlock has to jump ring levels to the kernel and back. Critical Sections can be used as a drop in replacement and they often do not have to jump to the kernel.

Design by committee is always challenging, but we should never backpedal, imo.

@vnen
Copy link
Member

vnen commented Jan 30, 2023

The thing is that when you do await discover_done the function will be resumed by the thread that emitted the signal. So the wait_to_finish() call will happen on the thread, which triggers the can't wait to finish on itself error (which means it never really waited). Such error will end the _ready() call/resume, which then will clean up the stack, freeing the Thread object that was created locally, thus triggering the warning (thread destroyed without call to wait_to_finish()). It seems to me everything is working as intended.

You can solve this by emitting the signal in a deferred call, since that will be queued and then called from the main thread. The issue is that there is no emit_deferred() (which I think would an welcome addition), so you need to do something like emit_signal.call_deferred(discover_done.get_name(), true) (the get_name() here just to avoid strings and typos).

For me the only "solution" to this would be adding an emit_deffered() method to Signal to be easier to use on this case.

@Jimmio92
Copy link
Author

Thank you for providing a way to work around my issue.. but I still feel this can be more elegantly solved by just allowing detach to be called on the internal thread object and changing the logic on thread destruct internally as I've done in my personal fork of Godot.. but I'm totally unfamiliar with GitHub and the whole process. (Note I also renamed "wait_to_finish" to "join" to better match... everything else... but I did so without talking to anyone, so I closed the pull request -- I don't know if I even did that correctly or not)

I've been kind of upset about the whole issue -- I love Godot, but having dug into the source here and finding they threw out OS-specific, perfectly good, and more performant code in favor of black-box magic voodoo provided by the C++ standard library when Juan just started mentioning things needing fixed for AAA usage... There's some mixed signals here. Is the engine getting more performant or easier to write? You cannot have both.

@RandomShaper
Copy link
Member

While it's sad that std::mutex is not backed by Windows' critical section objects, if you can prove actual performance issues as a result of that, please open a proposal so we re-evaluate the matter. In scripting you are not ayway expected to squeeze the processor at such a low level. Godot also has SpinLock internally, used where it matters.

Regarding the mechanics of threads (self-destruction), naming of methods, etc., you can as well open as proposal. Probably you're not alone in believing wait_to_finish() is a needlessly non-standard name.

Now, I believe @vnen's reply addresses this sufficiently as not a bug.

@Jimmio92
Copy link
Author

Jimmio92 commented Feb 1, 2023

I'll write up a separate proposal for the Windows threading stuff; I shouldn't have mentioned it at all here just because it was related to threads... it seems to have distracted from the actual issue.

I personally feel the current Thread implementation can exhibit undefined behavior as it detaches a thread during the deconstruct. This seems fine for an unjoined thread, but a thread deconstructing from a join is already gone -- a detach at this point is likely in error. It also states you must call wait_to_finish, which is not true -- it detaches when deconstructing. These two things IMO are bugs and the reason for this report, a proposal for my changes, and (an attempt at a) pull request to at least partially fix it.

Though, I am now seeing that detached threads are technically supported as it currently stands -- if the destructor is called it detaches -- but I was under the impression it's best to call detach early, not right before/during destruct, though this may be fine and defined behavior (I'm not sure) and thus only half as big of a bug as I thought originally. Still, don't detach when not joinable even if detach doesn't get added to the script side (an if check in the destructor would solve it), though I really feel it should be up to the end-user/dev to call detach/join depending on their needs, not for detach to automagically get called behind the scenes.

Again, sorry for bringing the OS specific stuff up in this; that's not what this bug report is about -- warnings, errors, and lack of detach is all this is about.

@RandomShaper
Copy link
Member

Detaching happens to try to avoid a crash as much as possible. If the thread has not been joined, the C++ runtime is free to crash the program. That's why the user is warned they should join (wait_to_finish()). Detaching is not done for supporting fire-and-forget thread dynamics in Godot (which, on the other hand, would require much more than that).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants