Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor Node Processing to allow Scene Multithreading #75901

Merged
merged 1 commit into from
May 10, 2023

Conversation

reduz
Copy link
Member

@reduz reduz commented Apr 10, 2023

  • Node processing works on the concept of process groups.
  • A node group can be inherited, run on main thread, or a sub-thread.
  • Groups can be ordered.
  • Process priority is now present for physics.

This is the first steps towards implementing godotengine/godot-proposals#6424. No threading or thread guards exist yet.

Steps:

1. Ability to create thread groups in nodes.
2. Ability to run groups threaded.
3. Create message passing system.
4. Protect the whole Node API for thread misuse in debug mode.
5. Test and bechmark.

A small benchmark (near a hundred characters playing run animation in 3D) has a 3x speedup with this PR (it should have more, but after running vtune to investigate, there are many memory synchronization bottlenecks in the animation code that have to be fixed that are unrelated to this PR).

How does it work?

Setting up groups

Nodes now have a Thread Group option in the process section. If a group is selected, this node and children nodes will now belong to a thread group. This means, that PROCESS and PHYSICS_PROCESS callbacks will be called in parallel for any node belonging to this thread group. Thread groups also have an order variable. Groups are executed in-order from lesser to greater and everything in the same group is executed in parallel.

image

By default, all nodes belong to the default group, which has process order 0, and processes in the main thread unless overriden with the thread_group option. This ensures backwards compatibility.

Synchronizing

Because it is common to want to use the results of the processing and set it to something that is not in the current scene, new functions were added to Node, such as : call_deferred_thread_group, which works like call_deferred(), but using the thread group.
With these functions, it is possible to send messages to a process group when its processed in a thread, as well as send messages from the process group to another one (be it in the main thread, or a sub-thread).

Thread safety

To ensure thread safety when processing groups, the following changes were made:

  • SceneTree is now fully thread safe.
  • Node is fully thread safe

The following error macros were added so they can be used in nodes (on the C++ side):

  • ERR_THREAD_GUARD and ERR_THREAD_GUARD_V Use them at the beginning of functions to ensure that they are not being called from the wrong process group.
  • ERR_MAIN_THREAD_GUARD and ERR_MAIN_THREAD_GUARD_V Use them at the beginning of functions to ensure that this function can only be called from the main thread (calling them from groups will fail). Examples of functions that can only be called from the main thread are Node::add_child or Node::remove_child.

The functions above only perform checks if nodes are inside the SceneTree (so you can still build a scene in a thread, and then use call_deferred() to put it in the main thread). Additionally, the checks above only take place on debug builds. They are not compiled during release builds.

These functions also perform no locking or memory synchronization, they simply check the thread ID validity using TLS, so they
are very efficient.

Testing

There is a new command-line option that allows running Godot while forcing single-threaded mode in order to compare or debug:

$ godot --single-threaded-scene

How do you normally use this?

The idea is that, in Godot, most scenes are generally self contained units. An enemy, a bullet, an NPC, etc. If those scenes have something that does significant complex processing (such as animation playback, character physics, pathfinding, etc), then these scenes will greatly benefit from running their process iteration in parallel when there is a significant amount of them (dozens or hundreds).

If they do something simple (like, just motion), then its likely there will be not much of a performance improvement and the performance could be affected instead. For those cases (lots of elements, simple logic), the swarm proposal is up as a proposed solution.

TODO (after merging this PR):

Protecting the Object API

Things that happen at Object level (set/get/set_meta/get_property_list/call/etc) are not protected. Those functions will need to become virtual and protections be added at Node level in the overrides.

Protecting the rest of the Node API

The Node API (all existing nodes) is quite large. Protecting all of it with the guard macros described above would ensure that the scene system is fully and completely thread-safe, but it's a lot of work to do it everywhere (and should be done as follow-up PRs). One possibility would do this only in the most common nodes for now (Control, Node2D, Node3D) and do the rest at a later time, gradually.

GDScript and other languages

GDScript should probably implement a call to ERR_THREAD_GUARD on functions inheriting from Node automatically (it should have no cost), which can only be disabled with some function annotation like @no_thread_guard or something.

For other languages, I think it should be possible to efficiently allow this logic manually by encapsulating a function call. Of course, it would be up to the user (if they don't do anything with thread groups, they don't need to do this).

Multiplayer API is not thread safe.

The multiplayer code (rpc as example) is not thread safe, the Node API does not care about this, so we need to discuss if we want to implement safety at Node level or Multiplayer level.

Some nodes need changes

  • AnimationTreePlayer can't be run in parallel because the state variable is shared. This needs to be moved to thread local storage to enable running it in parallel.
  • CharacterBody currently can't probably work in parallel because the physics server expects it to work in the main thread. This needs to be revised. It is possible it works, though.

Transform notifications are not threaded

Currently, accumulated transform notifications are not threaded, the list of notifications should be moved to the thread group. This would improve parallelism a bit more.

Synchronization pressure is not great

While for the most part the scene system runs well in single threaded mode, Godot abuses in many places of things that can seriously degrade multithreaded performance. Example of this is synchronization instructions (mutexes) and atomics (used a lot in types such as Vector, String, StringName, Ref<>, etc). The later can put serious memory pressure on multiple threads if the same memory is being accessed (as an example, a lot of nodes playing and accessing the same animation). This needs to be redesigned/fixed and proper benchmarks need to be done.

Multi threaded debugging

Merging this PR and making multi-threaded development easier means that the debugger ability to debug multiple threads has to finally be fixed.

Multi threaded profiling

A multithreaded profiler is overdue in Godot , so its possible to visualize all threads working in all frames over time.

TODO (before merging this PR).

Agree on things, then relevant documentation can be filled.

@aaronfranke aaronfranke added this to the 4.x milestone Apr 10, 2023
@reduz reduz changed the title Refactor Node Processing (WIP) Refactor Node Processing to allow Scene Multithreading(WIP) Apr 10, 2023
@reduz
Copy link
Member Author

reduz commented Apr 10, 2023

Changed the approach and code, it should be a lot simpler now.

@fire
Copy link
Member

fire commented Apr 13, 2023

There seems to be some errors in the cicd.

  1. b/doc/classes/Node.xml changed
  2. ERROR: ZIPReader must be opened before use in the sanitizer
  3. Warning: scene\main\scene_tree.cpp(849): warning C4458: declaration of 'node_count' hides class member

@reduz reduz force-pushed the refactor-node-processing branch 5 times, most recently from fde9a2e to b579f3e Compare April 22, 2023 09:05
@reduz reduz marked this pull request as ready for review April 23, 2023 14:34
@reduz reduz requested a review from a team as a code owner April 23, 2023 14:34
@reduz reduz requested a review from a team as a code owner April 23, 2023 15:08
@reduz reduz changed the title Refactor Node Processing to allow Scene Multithreading(WIP) Refactor Node Processing to allow Scene Multithreading Apr 23, 2023
@reduz reduz requested a review from a team as a code owner April 23, 2023 15:20
@WolfgangSenff
Copy link
Contributor

How (if at all) does this affect input functions across iOS and Android and other exports?

@reduz
Copy link
Member Author

reduz commented Apr 23, 2023

@WolfgangSenff Input still happens on the main thread, this is only for threading the process callbacks. Given most users just use the Input singleton, not much should change.

@fire
Copy link
Member

fire commented Apr 23, 2023

Can you estimate the technical feasibility of a backport to Godot Engine 4.0?

@reduz
Copy link
Member Author

reduz commented Apr 23, 2023

@fire I think its very unlikely because this will be a very large change when completed.
But if you read this article, means 4.1 will not be that far off.

@RevoluPowered
Copy link
Contributor

@reduz do you think this will work with ENet by default to move it from the main thread to a separate thread?

Would be really handy for something I am working on.

* Node processing works on the concept of process groups.
* A node group can be inherited, run on main thread, or a sub-thread.
* Groups can be ordered.
* Process priority is now present for physics.

This is the first steps towards implementing godotengine/godot-proposals#6424.
No threading or thread guards exist yet in most of the scene code other than Node. That will have to be added later.
@reduz reduz force-pushed the refactor-node-processing branch from eecd1f1 to 98c655e Compare May 9, 2023 17:18
@adamscott
Copy link
Member

adamscott commented May 9, 2023

I created a test project, tell me @reduz if it is an appropriate benchmark.

image

Each instance (robot) lives in its own subthread, and has a script that makes it select one animation after n seconds. My computer struggles to run the project, but the multithreaded scenes perform a little better in general then when running --single-threaded-scene.

Files

75901.zip
75901.zip (with more personality!)

Footage

Footage without --single-threaded-scene
Capture vidéo du 2023-05-09 14-06-17.webm

Footage with --single-threaded-scene
Capture vidéo du 2023-05-09 14-07-22.webm

What's missing from the test

  • Making calls with call_deferred_thread_group()
  • Making calls with call_thread_safe()
  • Set properties with set_deferred_thread_group()
  • Set properties with set_thread_safe()
  • Test different group orders

Notes

@Calinou I could use your benchmark tool in the test.

I used the Godot 3D Robot Character made by @arianagchow
https://captainripley.itch.io/godot-3d-robot-character
https://github.com/AGChow/3D-Godot-Robot-Platformer-Character

@reduz
Copy link
Member Author

reduz commented May 9, 2023

@adamscott There is a good amount of things that need to be improved that are killing threaded performance right now in AnimationPlayer. After the PR is merged I will do a pass at fixing some things.

Copy link
Member

@akien-mga akien-mga left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed with other maintainers, this is a first step towards allowing proper scene multithreading, which is a work that will span multiple months and probably take until 4.2 or 4.3 to be fully stable. Until then, the feature will be flagged as experimental to make it clear that it's a work in progress, and might change from release to release.

This first step lays the foundation and shouldn't negatively impact the engine when the scene multithreading is not used, so we agreed to merge this now to unblock other parts of the work that will be based on this, even if it's not the fully finalized form that this will take.

@reduz also started working on some guidelines for contributors for writing thread safe scene code which will be shared soon.

@akien-mga akien-mga merged commit 5271186 into godotengine:master May 10, 2023
@akien-mga
Copy link
Member

Thanks!

@Targma
Copy link

Targma commented Jun 5, 2023

I would appreciate if error regarding thread guard would include some identifiable information for node (name/ call line number from C# code would be perfect). Currently i have no good way to pinpoint errors working in C#.

@Calinou
Copy link
Member

Calinou commented Jun 5, 2023

I would appreciate if error regarding thread guard would include some identifiable information for node (name/ call line number from C# code would be perfect). Currently i have no good way to pinpoint errors working in C#.

Can you paste the full error message you're referring to?

@Targma
Copy link

Targma commented Jun 5, 2023

Godot_v4.1-dev3_mono_win64

ERROR: This function in this node can only be accessed from either the main
thread or a thread group. Use call_deferred() instead.
   at: (scene/gui/control.cpp:3068)

image

What i would want is to at least include node name on which this was called to help with debugging. Preferably with script name too.
image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.