-
-
Notifications
You must be signed in to change notification settings - Fork 21.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dynamic BVH for rendering and godot physics [3.2] #44901
Conversation
d9124a3
to
58df1d2
Compare
I'm moving this out of draft now, as it is ready for some review. I've probably got a few more functions to change parameter names to include As for reviewers, anyone who is interested, feel free to test it out and look through. It doesn't require a whole lot of specific Godot knowledge (e.g. in visual server or physics) as it is quite encapsulated. I'll try and put up some demo projects tomorrow. A key aspect is the whole PR is made to be backward compatible, and the default is to use the octree (same as before). Users would actively have to set
Demo ProjectsGodot Physicsoctree_physics_test.zip Godot physics - Octree 47fps RenderingSmall objects, small screen size. 10K non moving objects 10K moving objects Note that lots of other bottlenecks occur too. Just updating 10K objects in the scene tree can take a while, so overall frame rates don't measure how much faster BVH is relative to octree. However quite part from performance the main criteria is correctness and solving the octree bugs. |
f4046e9
to
b7b48fd
Compare
I will today try and convert some of the recursive functions to iterative. The performance difference does seem to be negligible (as expected, because of the large leaf sizes), but it does offer some protection against stack overflow, so it is probably better in some cases despite the extra complexity. Particularly the convex cull test I hadn't noticed used an alloca function within the recursion, which could have increased stack usage when culling against complicated hulls, so it's good to have this potential problem closed. I've put in a separate commit for now so it is easier to see the changes, can squash later or on request when Akien is ready for merging. The iterative stack is shamelessly copied from reduz BVH, so if there are any bugs, it's all his fault, honest! 😁 EDIT: Ok I think that's done now, I've converted I think all the non-debug recursive routines to iterative. Unrelated, physics memory useMoved discussion to #45024. |
c4c3d12
to
6157368
Compare
8cbdbdf
to
9c7c37e
Compare
Complete rewrite of spatial partitioning using a bounding volume hierarchy rather than octree. Switchable in project settings between using octree or BVH for rendering and physics.
On @akien-mga suggestion changed defaults to on for godot physics and rendering. Settings are in: These settings may end up being temporary in the long run, as once any problems are ironed out there should be no need to keep octree. |
Thanks! |
@lawnjelly Could you please explain in simple terms how the BVH (ex. octree) tree balance parameter ranging from 0.0 to 1.0 may affect performance and why the default value is 0.0. And when we say performance, the word sounds too generic in this context. So I have a few specific questions:
I tried the bvh render benchmark project but it doesn't seem to make any difference whether the balance is at 0.0 or 1.0. Thanks! Awesome work, as usual. |
Ah that's probably something I should change the explanation slightly for in the classref. The tree balance is for the octree only (it was written before the BVH, and is unrelated to the BVH). Default value is 0.0 to maintain compatibility with the old octree. Increasing it results in larger leaf nodes, to a maximum of 1.0, which often results in best peformance. The only snag turned out to be that it often exposes a bug in octree, which turned out to be unfixable, hence the need for BVH. Relevant PRs are #41123 and #42927. Also note that octree and BVH are 3D only. |
Complete rewrite of spatial partitioning using a bounding volume hierarchy rather than octree.
Switchable in project settings between using octree or BVH for rendering and physics.
Background
Octree suffers from several problems, including 2 rather nasty bugs:
Mesh flickering [Octo-tree] #42895 (comment)
I'd already done a PR to try and patch up and work around some of the problems in 3.2 #41123, but overall I'm of the opinion that the current octree is groaning with multiple bugs and is not a good solution, so I think if we can replace it in the long run that will be best.
BVH is conceptually simpler than octree, especially in that items only appear in one node, whereas in octree the item has to keep track of being in multiple nodes, which is very slow and error prone (see bug (2)).
There are several pre-existing open source dynamic BVH implementations for reference, e.g.:
Unfortunately all of these only have one item stored per leaf node. This makes the code simpler in places, however if doesn't take good advantage of cache and data orientated design. So here I have written a template that can handle multiple items per leaf.
In particular this means that:
As well as allowing culling tests, the Godot octree also handles collision pairing (for Godot physics, lights and many other situations). This had to be handled for the BVH to replace octree. Pairing is handled as an extra top layer, in bvh.h. The template can either be used with the pairing layer or without. In 3.2 pairing is always used (so far).
In addition to these advantages, the BVH also uses pools for nodes and leaves, assuring efficient memory use and O(1) allocation / deallocation, and good cache coherency. In general it uses less memory than octree for the same scene.
Correctness & Swappability
The main attraction for 3.2 is stability and correctness, neatly working around the octree problems. Of course being a large template this will need testing, it is possible there will be bugs. For this reason I have written the PR so that the user can swap between octree and BVH using project settings. The default is currently octree in both cases, but this can be changed.
The code to swap between the two in visual_server is slightly more verbose than I would like but it seemed a good compromise (any alternative ideas to do this welcome).
Internals
Internally the BVH maintains 2 trees, one for pairable items and one for non-pairable. This makes collision pairing faster for non-pairable moving items - they only need to be checked against the pairable tree.
I also considered separating static and dynamic trees, using a metric if an object hadn't moved in a while to move it to the static tree. This would be faster but the code is so fast already I think everything else has to catch up first (unless it were used for e.g. ray tracing).
It is written to potentially take advantage of SIMD but there's really been no need so far. The custom AABB class uses negative maximum which is a trick to allow AABB checks in one SIMD operation. To go down this route I would probably just add a couple of dummy values to the custom AABB (to make 8 checks), or use structure of arrays or similar.
The convex hull culling uses a neat trick - when entering a leaf node (which may contain e.g. 256 items) it precalculates which hull planes cut the node AABB. This means that only these planes need to be checked. This can more than double speed.
BVH parameters (node expansion and pairing expansion) are now determined internally from the world bound. They didn't seem to have a massive effect on performance, and for now I've decided against having them settable manually as project settings, as the trade off for confusion may not be worth it.
Performance
Although octree is not often a bottleneck, unless using games with large numbers of objects, the BVH implementation is considerably faster than octree.
The actual performance increase of the system itself is far higher, but obviously frame rate increases will only be seen when spatial partitioning is a bottleneck. In Godot physics for example, the BVH is 50x faster than the physics code. So while the BVH is probably signifcantly faster than Bullet, Bullet is still faster in most cases than Godot physics. Hopefully @pouleyKetchoupp will be able to improve this in 4.0.
Future
Although written for 3.2, the template is mostly self contained and we can try it in Godot 4.0 as well. There are plans to use linear methods in some cases, but we need a BVH or similar for pairing etc.
Fixes #42895
Fixes #38142 (better than the octree fix, and the octree fix only works for render_tree_balance > 0.0).
May fix #42563
May fix #40059