Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mesh Instances randomly not rendering (regression since 3.3) #53374

Closed
Tracked by #63198
TokisanGames opened this issue Oct 3, 2021 · 34 comments
Closed
Tracked by #63198

Mesh Instances randomly not rendering (regression since 3.3) #53374

TokisanGames opened this issue Oct 3, 2021 · 34 comments

Comments

@TokisanGames
Copy link
Contributor

TokisanGames commented Oct 3, 2021

Godot version

3.3.4 official

System information

Win 10/64 NVIDIA GeForce GTX 1060/PCIe/SSE2

Issue description

Too frequently some meshes won't render in the editor, and sometimes in game. Switching to another scene and switching back, or closing and reopening will temporarily fix it.

We were developing on 3.2.1 and I never had the issue. As we explored the 3.3 branch I started noticing this error. It's frequent enough that it's a problem. It could have happened any time from 3.2.2.

@Arnklit Already reported this in #47051, and provided a test project, but it was closed prematurely. Though since it's intermittent I doubt the core devs will see it until they start making a large scene with tons of objects. @mbrlabs also confirmed the problem in that ticket.

It happens to all sorts of meshes. It has also affected @Zylann's heightmap terrain, one time, in the editor. One quad (say 5mx5m) was invisible and I couldn't get it to become visible until reloading the scene (prior to any of my gdnative work).

Here I'm building a house from a kit. I already had this scene open. I was in another scene, then switched back to it and saw this. I moved the camera closer for the shot:

image

I switched to the other scene, then came back, and the two meshes rendered again:

image

The floor is a fairly simple mesh with one wood material. The wall is more complex. Notice in the first picture you can see the shutter and glass. Here is the full object:

image

This wall object has separate glass and shutter meshes. The mesh that disappeared includes two materials, the same wood used on all objects in all of these pictures, and a rusty metal frame. Both materials on the mesh didn't render, and both materials are visible on all the other meshes.

The mesh instances have their mesh data stored in the tscn file. They were originally imported from fbx, but that file has already been deleted. The textures are separate dds files. The materials are separate binary files, though most of our other materials are tres. Here we are using MTLOD and an immediate geometry node, however I've experienced this on our meshes that are not using MTLOD. These shots are all close enough that we're only looking at LOD0.

I do not believe this is not an issue with LOD, file types or materials. It decides whether it is going to render these objects when the scene opens. Nothing I can do in the scene will get it to render. Not selecting it, moving it, changing visibility, or moving the camera. Only by switching or reopening scenes fixes (or breaks) these meshes.

My console is full of these errors, which repeat when I switch back and forth between my scenes sometimes.

ERROR: get: Condition "!id_map.has(p_rid.get_data())" is true. Returned: __null
   At: ./core/rid.h:150
ERROR: instance_set_transform: Condition "!instance" is true.
   At: servers/visual/visual_server_scene.cpp:713
ERROR: get: Condition "!id_map.has(p_rid.get_data())" is true. Returned: __null
   At: ./core/rid.h:150
ERROR: instance_set_scenario: Condition "!instance" is true.
   At: servers/visual/visual_server_scene.cpp:629
ERROR: get: Condition "!id_map.has(p_rid.get_data())" is true. Returned: __null
   At: ./core/rid.h:150
ERROR: instance_set_visible: Condition "!instance" is true.
   At: servers/visual/visual_server_scene.cpp:780
ERROR: get: Condition "!id_map.has(p_rid.get_data())" is true. Returned: __null
   At: ./core/rid.h:150
ERROR: instance_attach_skeleton: Condition "!instance" is true.
   At: servers/visual/visual_server_scene.cpp:913

Steps to reproduce

Build out a complex scene for a game with a lot of objects to greatly increase your likelihood of seeing the error.
Now at least 3 known gamedevs experience this problem.

Minimal reproduction project

Look in #47051

@Calinou
Copy link
Member

Calinou commented Oct 3, 2021

Try disabling the BVH in the project settings and see if the issue persists.

@Zylann
Copy link
Contributor

Zylann commented Oct 3, 2021

Last time I experienced meshes not rendering with RID errors was in #44642, although you might be hitting another variant of that if you also get it in game.

@lawnjelly
Copy link
Member

lawnjelly commented Oct 4, 2021

Try disabling the BVH in the project settings and see if the issue persists.

It's worth a try (just in case there is some indirect effect), but these errors are coming from setting object data, not reading, and the BVH doesn't set anything on objects (or read if I remember correctly, all the data is sent in exactly the same way as octree). Although it could possibly be caused by the client code leaking objects in the BVH, i.e. creating them and not deleting them during a scene change. 🤔

Another experimental option to try if it is was the BVH, is rendering/threads/thread_safe_bvh. This may only be in the 3.4 betas I can't remember when it was added.

I'll see if I can reproduce it.

Also knowing whether it occurs in the latest 3.4 beta would be useful too, just in case it has been fixed since.

There's also an existing RID bug we found in 4.x (#49501) I can't remember if this same vulnerability exists in 3.x. reduz said he would do his own fix but it hasn't materialised yet.

The association with scene switching is interesting. I'm not really familiar with how the editor does scene switching, whether it detaches / attaches branches in a main scene tree or what. It could be there is a dangling reference, maybe a RID or ObjectID that is involved.

EDIT:
The addon in #47051 no longer seems compatible with 3.x dev, so that makes it less useful as an MRP.

@akien-mga akien-mga modified the milestones: 3.3, 3.4 Oct 4, 2021
@Arnklit
Copy link
Contributor

Arnklit commented Oct 4, 2021

Yeah I've been seeing this exact same issue on the last two complicated 3D scenes I've made for my "How Good Can I Make Godot Look" series. Sometimes meshes are just gone until you reopen. But yeah it was so random that I thought it was difficult to log a bug on it.

@akien-mga
Copy link
Member

akien-mga commented Oct 4, 2021

Likely the same issue as #50630, and I remember another one with GridMap tiles randomly disappearing but GitHub search is failing me.
#48790 might also be related.

@akien-mga
Copy link
Member

For people who are able to reproduce the issue, it would be useful to test and track down which build first introduced the bug.

According to #47051 it dates back at least to 3.2.4 RC 5, and I haven't seen it reported against 3.2.3. So it would be good to confirm that 3.2.3 is not affected, and which 3.2.4 beta or RC build introduced the regression.
https://downloads.tuxfamily.org/godotengine/3.2.4/

@mbrlabs
Copy link
Contributor

mbrlabs commented Oct 4, 2021

Yes i remember that issue. I did a lot of testing back then but could not reproduce it reliably (only happend in the editor for me so it was not that critical). I still occasionally have this problem when working on big scenes.

The earliest version i encountered this with was 3.2.4 beta3. See #47051 (comment)

@akien-mga
Copy link
Member

So far it sounds like you might all be on Windows with Nvidia drivers? Not that it's necessarily a factor as that's the most common gamer config ;)

@mbrlabs
Copy link
Contributor

mbrlabs commented Oct 4, 2021

Yes, i'm on Windows 10 with the latest Nvidia (GeForce RTX 2060/PCIe/SSE) driver. I also had this issue with an older GTX card.

@TokisanGames
Copy link
Contributor Author

I will experiment with the bvh settings and other versions and see if I can narrow it down. It might take some time. A change in the bvh was exactly the kind of thing I was looking for, so I hopeful this is it.

I was working on importing the village kit and switching between scenes hundreds of times over a few hours yesterday. I noticed this rendering bug occur twice in all that time. It likely has hidden meshes that weren't obvious. But it still happens consistently, maybe daily.

The console errors may or may not be related.

I'm using the nvdia studio driver. The issue has been going on so long that I've upgraded my drivers at least a few times.

@lawnjelly
Copy link
Member

I'm using the nvdia studio driver. The issue has been going on so long that I've upgraded my drivers at least a few times.

I suspect the relationship with drivers may be a red herring - it is likely this is affecting the timing, and it is likely a semi-random timing issue which is triggering the bug. Driver issues can cause objects to disappear, but I struggle to think of how they could cause error message our side with RIDs etc.

@TokisanGames
Copy link
Contributor Author

Would the BVH apply to lights? Here is 3.3.4 w/ use_bvh disabled in the project settings. I noticed this issue yesterday and occasionally other times also.

Simple scene, no light is rendered.

image

Switching scenes reports errors (below) but does not fix the light. Closing and reopening it once did not fix it. A second time it finally loaded.

image

My other asset scenes I'm making with the same set up render the lighting just fine.

Or what about gizmos? Sometimes they disappear as well. Unselecting and re-selecting the collision shape, no gizmo.

image

Switch to another loaded scene and back, select the object:

image

Errors w/ a few additional line numbers:

ERROR: instance_set_visible: Condition "!instance" is true.
   At: servers/visual/visual_server_scene.cpp:780
ERROR: get: Condition "!id_map.has(p_rid.get_data())" is true. Returned: __null
   At: ./core/rid.h:150
ERROR: instance_set_scenario: Condition "!instance" is true.
   At: servers/visual/visual_server_scene.cpp:629
ERROR: get: Condition "!id_map.has(p_rid.get_data())" is true. Returned: __null
   At: ./core/rid.h:150
ERROR: instance_set_transform: Condition "!instance" is true.
   At: servers/visual/visual_server_scene.cpp:713
ERROR: get: Condition "!id_map.has(p_rid.get_data())" is true. Returned: __null
   At: ./core/rid.h:150
ERROR: instance_attach_skeleton: Condition "!instance" is true.
   At: servers/visual/visual_server_scene.cpp:913
ERROR: get: Condition "!id_map.has(p_rid.get_data())" is true. Returned: __null
   At: ./core/rid.h:150
ERROR: instance_set_base: Condition "!instance" is true.
   At: servers/visual/visual_server_scene.cpp:456
ERROR: get: Condition "!id_map.has(p_rid.get_data())" is true. Returned: __null
   At: ./core/rid.h:150

@lawnjelly
Copy link
Member

lawnjelly commented Oct 5, 2021

Would the BVH apply to lights? Here is 3.3.4 w/ use_bvh disabled in the project settings. I noticed this issue yesterday and occasionally other times also.

Yes it also applies to lights, BVH is the spatial partitioning scheme used for culling. However, if these screenshots etc are still showing problems with the BVH disabled in rendering/quality/spatial_partitioning/use_bvh, then that would seem to rule it out as the direct cause of the problems (because the bug is still occurring when you are using octree instead).

I stress direct because it is also possible that the problem is occurring due to misuse of the spatial partitioning scheme by the client code. I.e. if the client code is creating the same objects multiple times in the tree or not deleting them correctly, the culling can still potentially return this deleted object as being seen by a camera.

One possibility is that these bugs are being caused by multithreaded access to the spatial partitioning scheme, by it octree or bvh. This is not meant to be supported, but to try and identify this as a cause, I have added the project setting rendering/threads/thread_safe_bvh. If you enable this and enable the BVH, and the problem disappears, then it would identify the problem as being caused by multithread access.

There are also lots of other possibilities, this may have nothing to do with spatial partitioning. The reason for the speculation is that the BVH was introduced around the time this bug was first reported, but it may be coincidence.

EDIT:
Looking at the latest error lines, I don't think all of these look likely to be caused by a spatial partitioning error, some of these are called by instances entering or exiting the tree. This could possibly be a race condition of setting these before the instance has been created... 🤔 or the instance is being used after deletion, dangling reference.

This will probably need to be debugged by someone who can reproduce. And just to note that the MRP referenced no longer works in 3.4 dev, so we could do with an up to date MRP - I can't even attempt to debug it at the moment with no MRP.

@TokisanGames
Copy link
Contributor Author

I'm now using 3.x 72a922e with use_bvh and threadsafe bvh.

I just made this scene and had saved it, imported it in a test scene and verified it. Then I just renamed the parent node from Gate to ShedDoorWay, saved the scene, and saw a bunch of console errors pop up all at once, far more than normal. When I looked at my viewport I noticed that RightFrame is invisible.

image

ERROR: Condition "!id_map.has(p_rid.get_data())" is true. Returned: nullptr
   at: RID_Owner<struct VisualServerScene::Instance>::get (D:\gd\godot\core/rid.h:140)
ERROR: Condition "!instance" is true.
   at: VisualServerScene::instance_set_scenario (servers\visual\visual_server_scene.cpp:599)
ERROR: Condition "!id_map.has(p_rid.get_data())" is true. Returned: nullptr
   at: RID_Owner<struct VisualServerScene::Instance>::get (D:\gd\godot\core/rid.h:140)
ERROR: Condition "!instance" is true.
   at: VisualServerScene::instance_attach_skeleton (servers\visual\visual_server_scene.cpp:876)
ERROR: Condition "!id_map.has(p_rid.get_data())" is true. Returned: nullptr
   at: RID_Owner<struct RasterizerStorageGLES3::Mesh>::getornull (D:\gd\godot\core/rid.h:149)
ERROR: Condition "!mesh" is true. Returned: VisualServer::PRIMITIVE_MAX
   at: RasterizerStorageGLES3::mesh_surface_get_primitive_type (drivers\gles3\rasterizer_storage_gles3.cpp:3999)
ERROR: Condition "!id_map.has(p_rid.get_data())" is true. Returned: nullptr
   at: RID_Owner<struct RasterizerStorageGLES3::Mesh>::getornull (D:\gd\godot\core/rid.h:149)
ERROR: Condition "!mesh" is true. Returned: PoolVector<uint8_t>()
   at: RasterizerStorageGLES3::mesh_surface_get_array (drivers\gles3\rasterizer_storage_gles3.cpp:3926)
ERROR: Condition "!id_map.has(p_rid.get_data())" is true. Returned: nullptr
   at: RID_Owner<struct RasterizerStorageGLES3::Mesh>::getornull (D:\gd\godot\core/rid.h:149)
ERROR: Condition "!mesh" is true. Returned: 0
   at: RasterizerStorageGLES3::mesh_surface_get_array_len (drivers\gles3\rasterizer_storage_gles3.cpp:3911)
ERROR: Condition "!id_map.has(p_rid.get_data())" is true. Returned: nullptr
   at: RID_Owner<struct RasterizerStorageGLES3::Mesh>::getornull (D:\gd\godot\core/rid.h:149)
ERROR: Condition "!mesh" is true. Returned: PoolVector<uint8_t>()
   at: RasterizerStorageGLES3::mesh_surface_get_index_array (drivers\gles3\rasterizer_storage_gles3.cpp:3956)
ERROR: Condition "!id_map.has(p_rid.get_data())" is true. Returned: nullptr
   at: RID_Owner<struct RasterizerStorageGLES3::Mesh>::getornull (D:\gd\godot\core/rid.h:149)
ERROR: Condition "!mesh" is true. Returned: 0
   at: RasterizerStorageGLES3::mesh_surface_get_array_index_len (drivers\gles3\rasterizer_storage_gles3.cpp:3918)
ERROR: Condition "!id_map.has(p_rid.get_data())" is true. Returned: nullptr
   at: RID_Owner<struct RasterizerStorageGLES3::Mesh>::getornull (D:\gd\godot\core/rid.h:149)
ERROR: Condition "!mesh" is true. Returned: VisualServer::PRIMITIVE_MAX
   at: RasterizerStorageGLES3::mesh_surface_get_primitive_type (drivers\gles3\rasterizer_storage_gles3.cpp:3999)
ERROR: Condition "!id_map.has(p_rid.get_data())" is true. Returned: nullptr
   at: RID_Owner<struct RasterizerStorageGLES3::Mesh>::getornull (D:\gd\godot\core/rid.h:149)
ERROR: Condition "!mesh" is true. Returned: 0
   at: RasterizerStorageGLES3::mesh_surface_get_format (drivers\gles3\rasterizer_storage_gles3.cpp:3991)
ERROR: Condition "!id_map.has(p_rid.get_data())" is true. Returned: nullptr
   at: RID_Owner<struct RasterizerStorageGLES3::Mesh>::getornull (D:\gd\godot\core/rid.h:149)
ERROR: Condition "!mesh" is true. Returned: AABB()
   at: RasterizerStorageGLES3::mesh_surface_get_aabb (drivers\gles3\rasterizer_storage_gles3.cpp:4007)
ERROR: Condition "!id_map.has(p_rid.get_data())" is true. Returned: nullptr
   at: RID_Owner<struct RasterizerStorageGLES3::Mesh>::getornull (D:\gd\godot\core/rid.h:149)
ERROR: Condition "!mesh" is true. Returned: Vector<AABB>()
   at: RasterizerStorageGLES3::mesh_surface_get_skeleton_aabb (drivers\gles3\rasterizer_storage_gles3.cpp:4047)
ERROR: Condition "!id_map.has(p_rid.get_data())" is true. Returned: nullptr
   at: RID_Owner<struct RasterizerStorageGLES3::Mesh>::getornull (D:\gd\godot\core/rid.h:149)
ERROR: Condition "!mesh" is true. Returned: Vector<PoolVector<uint8_t>>()
   at: RasterizerStorageGLES3::mesh_surface_get_blend_shapes (drivers\gles3\rasterizer_storage_gles3.cpp:4014)
ERROR: Condition "!id_map.has(p_rid.get_data())" is true. Returned: nullptr
   at: RID_Owner<struct RasterizerStorageGLES3::Mesh>::getornull (D:\gd\godot\core/rid.h:149)
ERROR: Condition "!mesh" is true. Returned: PoolVector<uint8_t>()
   at: RasterizerStorageGLES3::mesh_surface_get_array (drivers\gles3\rasterizer_storage_gles3.cpp:3926)
ERROR: Condition "!id_map.has(p_rid.get_data())" is true. Returned: nullptr
   at: RID_Owner<struct RasterizerStorageGLES3::Mesh>::getornull (D:\gd\godot\core/rid.h:149)
ERROR: Condition "!mesh" is true. Returned: 0
   at: RasterizerStorageGLES3::mesh_surface_get_array_len (drivers\gles3\rasterizer_storage_gles3.cpp:3911)
ERROR: Condition "!id_map.has(p_rid.get_data())" is true. Returned: nullptr
   at: RID_Owner<struct RasterizerStorageGLES3::Mesh>::getornull (D:\gd\godot\core/rid.h:149)
ERROR: Condition "!mesh" is true. Returned: PoolVector<uint8_t>()
   at: RasterizerStorageGLES3::mesh_surface_get_index_array (drivers\gles3\rasterizer_storage_gles3.cpp:3956)
ERROR: Condition "!id_map.has(p_rid.get_data())" is true. Returned: nullptr
   at: RID_Owner<struct RasterizerStorageGLES3::Mesh>::getornull (D:\gd\godot\core/rid.h:149)
ERROR: Condition "!mesh" is true. Returned: 0
   at: RasterizerStorageGLES3::mesh_surface_get_array_index_len (drivers\gles3\rasterizer_storage_gles3.cpp:3918)
ERROR: Condition "!id_map.has(p_rid.get_data())" is true. Returned: nullptr
   at: RID_Owner<struct RasterizerStorageGLES3::Mesh>::getornull (D:\gd\godot\core/rid.h:149)
ERROR: Condition "!mesh" is true. Returned: VisualServer::PRIMITIVE_MAX
   at: RasterizerStorageGLES3::mesh_surface_get_primitive_type (drivers\gles3\rasterizer_storage_gles3.cpp:3999)
ERROR: Condition "!id_map.has(p_rid.get_data())" is true. Returned: nullptr
   at: RID_Owner<struct RasterizerStorageGLES3::Mesh>::getornull (D:\gd\godot\core/rid.h:149)
ERROR: Condition "!mesh" is true. Returned: 0
   at: RasterizerStorageGLES3::mesh_surface_get_format (drivers\gles3\rasterizer_storage_gles3.cpp:3991)
ERROR: Condition "!id_map.has(p_rid.get_data())" is true. Returned: nullptr
   at: RID_Owner<struct RasterizerStorageGLES3::Mesh>::getornull (D:\gd\godot\core/rid.h:149)
ERROR: Condition "!mesh" is true. Returned: AABB()
   at: RasterizerStorageGLES3::mesh_surface_get_aabb (drivers\gles3\rasterizer_storage_gles3.cpp:4007)
ERROR: Condition "!id_map.has(p_rid.get_data())" is true. Returned: nullptr
   at: RID_Owner<struct RasterizerStorageGLES3::Mesh>::getornull (D:\gd\godot\core/rid.h:149)
ERROR: Condition "!mesh" is true. Returned: Vector<AABB>()
   at: RasterizerStorageGLES3::mesh_surface_get_skeleton_aabb (drivers\gles3\rasterizer_storage_gles3.cpp:4047)
ERROR: Condition "!id_map.has(p_rid.get_data())" is true. Returned: nullptr
   at: RID_Owner<struct RasterizerStorageGLES3::Mesh>::getornull (D:\gd\godot\core/rid.h:149)
ERROR: Condition "!mesh" is true. Returned: Vector<PoolVector<uint8_t>>()
   at: RasterizerStorageGLES3::mesh_surface_get_blend_shapes (drivers\gles3\rasterizer_storage_gles3.cpp:4014)

If I click on any of the nodes in the tree, I get this:

ERROR: (Node not found: "/root/EditorNode/@@592/@@593/@@601/@@603/@@607/@@611/@@612/@@613/@@629/@@630/@@639/@@640/@@6281/@@6115/@@6116/@@6117/@@6118/@@6119/Gate/LeftFrame/Wall_Gate_L_LOD1" (absolute path attempted from "/root/EditorNode/@@592/@@593/@@601/@@603/@@607/@@611/@@614/@@615/@@616/Scene/@@2847").)
   at: (scene\main\node.cpp:1323)

I've had no problem renaming root nodes in the past. However, I've noticed this editor error recently on previous days too, using other versions (most likely 3.3.4). And it just happened on another scene when I renamed the root node. Renaming root, then saving the scene does work, but I can't do anything with it until I close and reopen it. Then I made another scene and upon importing the asset got those Node not found errors, but everything worked, including renaming root. I also get those errors some times when I do other things like importing files, making them local, and moving the nodes around. Throughout this work session, and many times in 3.3.4 I've continually had these editor node console errors and challenges renaming nodes which I never had in previous versions. Once I get one of these errors, i can't right-click or delete a node unless I go to another scene and back. This occurred in 3.3.4 w/ bvh and today in 3.x w/ threadsafe bvh.

Back to the invisible mesh. Closing and reopening the scene, even restarting Godot, RightFrame is still invisible and gives me these errors:

ERROR: Condition "!(p_format & VisualServer::ARRAY_FORMAT_VERTEX)" is true.
   at: RasterizerStorageGLES3::mesh_add_surface (drivers\gles3\rasterizer_storage_gles3.cpp:3344)
ERROR: Index p_surface = 0 is out of bounds (mesh->surfaces.size() = 0).
   at: RasterizerStorageGLES3::mesh_surface_set_material (drivers\gles3\rasterizer_storage_gles3.cpp:3883)
ERROR: Index p_surface = 0 is out of bounds (instance->materials.size() = 0).
   at: VisualServerScene::instance_set_surface_material (servers\visual\visual_server_scene.cpp:730)
ERROR: Index p_surface = 0 is out of bounds (mesh->surfaces.size() = 0).
   at: RasterizerStorageGLES3::mesh_surface_get_primitive_type (drivers\gles3\rasterizer_storage_gles3.cpp:4000)

Actually, now my Wall_Gate_R_LOD0 mesh is corrupt. The inspector shows ArrayMesh instead of my mesh. LOD1, LOD2 are fine. This is the first time I've seen Godot corrupt my mesh upon saving. I reimported from my fbx and resaved the file. So I don't know if this is a BVH issue or a different one. This mesh disappearred because Godot corrupted it, rather than just not displaying. I haven't yet seen an instance of not displaying yet, but its early.

@lawnjelly
Copy link
Member

lawnjelly commented Oct 7, 2021

Thanks that is more useful info 👍 , it helps us discount threading issue in the spatial partitioning.

I think we have now covered most of the possibles for a bug within spatial partitioning, and it now looks like the bug is elsewhere, and we should be widening the net, especially trying to bisect the commit where this first appeared. This could be a pain because not everyone seems to exhibit the issue, and there aren't premade builds for those commits. It should be possible to bisect to a particular beta though, which would give us a shortlist.

You may well be onto something with the renaming / handling of duplicate nodes.

Again I would ask if you or anyone is able to paste one of these projects into the thread (especially a before / after corruption, if that seems to be happening). Without a minimum reproduction project it is hard to work out what is happening.

@TokisanGames
Copy link
Contributor Author

TokisanGames commented Oct 17, 2021

I've been using a 3.4 build @akien-mga made (v3.4.beta-pr53411-a2e374d.official [a2e374d41]) w/ bvh and threadsafe bvh turned on. It took a while, but it finally hid a mesh. It seems this set up works a lot better than 3.3, which seemed to occur daily or even a couple times a day. This is the first time in several days that it occurred where I caught it. However that does include life, work, and working on other aspects so it's possible this is just an appearance and not reality.

Here you can see I found my Stairs5 failed to render.

image

The object was there and I could select it in the tree. When I moved the gizmo, it did move, and these error messages appeared in the console. Then I opened the stairs5 scene, switched back, and the stairs reappeared. Then I when I moved it, I didn't receive any error messages.

ERROR: Condition "!instance" is true.
   at: instance_set_transform (servers/visual/visual_server_scene.cpp:680)
ERROR: Condition "!id_map.has(p_rid.get_data())" is true. Returned: nullptr
   at: get (./core/rid.h:140)
ERROR: Condition "!instance" is true.
   at: instance_set_visible (servers/visual/visual_server_scene.cpp:745)
ERROR: Condition "!id_map.has(p_rid.get_data())" is true. Returned: nullptr
   at: get (./core/rid.h:140)
ERROR: Condition "!instance" is true.
   at: instance_set_scenario (servers/visual/visual_server_scene.cpp:599)
ERROR: Condition "!id_map.has(p_rid.get_data())" is true. Returned: nullptr
   at: get (./core/rid.h:140)
ERROR: Condition "!instance" is true.
   at: instance_attach_skeleton (servers/visual/visual_server_scene.cpp:876)

What I reported in the last comment was that the mesh was actually corrupted. Every other instance on this issue has only been a display error. I have not encountered that again. One of my team members did have an interesting bug where the mesh was gone from the viewport, and the mesh display in the inspector was blank. Mine have always displayed except the one time when the mesh got corrupted. His mesh did not get corrupted either.

Also what Zylann connected in the terrain ticket above ( Zylann/godot_heightmap_plugin#260 (comment)) is exactly what I was referring to in the original post.

image

@lawnjelly

Again I would ask if you or anyone is able to paste one of these projects into the thread (especially a before / after corruption, if that seems to be happening). Without a minimum reproduction project it is hard to work out what is happening.

I have assets I cannot share. Also the minimum project is not useful because it's a temporary problem. As soon as I switch scenes, it goes away. If I saved it for you, closed it and reopened it on my own system, the problem is gone. There's nothing in the file that is a problem. You just have to work with a large project yourself if you want to see it on your own system.

@TokisanGames
Copy link
Contributor Author

TokisanGames commented Oct 17, 2021

It just happened with hterrain. Twice in one day with threadsafe bvh turned on. :/ Switching scenes does not turn it back on. I had to close and reopen the scene to fix it.

image

@akien-mga akien-mga changed the title 3.3 Mesh Instances not rendering Mesh Instances randomly not rendering in the editor (regression since 3.3) Oct 18, 2021
@TokisanGames
Copy link
Contributor Author

Mesh Instances randomly not rendering in the editor (regression since 3.3)

@akien-mga though almost all of the instances I have experienced have been in the editor, I have had meshes not appear while running in game since 3.2.1. And since the editor is a running game, there's really little difference between the renderer and visual server for either scenario.

@akien-mga akien-mga changed the title Mesh Instances randomly not rendering in the editor (regression since 3.3) Mesh Instances randomly not rendering (regression since 3.3) Oct 19, 2021
@TokisanGames
Copy link
Contributor Author

I tried running my game in 3.2.1, but there are too many problems with it. Code that should work is invalid. It doesn't read all of the meshes that were imported in 3.3 or 3.4 and saved in a native mesh format. So I'm not going to work backward.

Instead, I built a new 3.x da5c843 debug version w/ threadsafe BVH on, and worked today while running the editor through the MSVC debugger. I set some breakpoints in the code and recorded a video of my screen whenever I hit an error to debug the callstack and objects.

Here are some conclusions so far:

  1. The errors are happening all the time. While opening, closing, and creating object scenes, and switching between scenes, I'm hitting a block of 3-4 errors at least hourly if not more like every 10-15 minutes. However the times I hit an error that provides an obvious visual artifact is much rarer, maybe 1/10th the time.
  2. The problems occur when VisualServer wants to set_visible, set_transform, or set_scenario on a particular resource (RID), which happens on all objects every time it switches scenes, including editor viewport objects (e.g. SpatialEditorViewport whatever that is). Only once in a while does it trigger on set_visible, hence point 1. On each scene switch, the scene goes through _enter_tree() and _exit_tree().
  3. The Set (red-black tree) that stores the RIDs of the objects in the scene is not properly populated. These errors are occurring because the RIDs of the objects in the scene are missing from the Set of RIDs the visual server queries.

Here is one of the three areas where the error messages come from, where VisualServer attempts to get the instance of the RID.

VisualServerScene::instance_set_transform(RID p_instance, const Transform &p_transform) {
	Instance *instance = instance_owner.get(p_instance);
	ERR_FAIL_COND(!instance);

instance_owner is an RID_Owner, which is a class with a Set called id_map. A Set is a red-black tree and is used to contain all RIDs for the scene: Set<RID_Data *> id_map;

Here is instance_owner.get()

rid.h::get(p_rid)
		ERR_FAIL_COND_V(!p_rid.is_valid(), nullptr);
		ERR_FAIL_COND_V(!id_map.has(p_rid.get_data()), nullptr);

The first line doesn't produce an error, so the RID object is valid.

I expanded the second macro and added some debugging code to determine that in almost all cases the RID object and the resource ID integer within are valid. However when this fails, the ID is NOT in the Set (id_map). In only one instance today was the RID data invalid.

  1. I dumped the entire red-black tree across several error instances to confirm that the element wasn't just misplaced. It's unlikely the problem is in the implementation of the tree. It's much more likely that the problem occurs when the Set is being populated. On another day I'll explore that.

Here's a video showing my debugging progress. It's long, and gets more efficient in later chapters. Look for the chapter markers on the timeline. I captured 5 error sessions, but only one was visual, at 1:06:45.

https://youtu.be/ozET9itjeD4

@lawnjelly
Copy link
Member

lawnjelly commented Oct 20, 2021

Yes RIDs are used as handles to the objects in the visual server. Your bug is most likely occurring because the objects have been deleted from the owner lists, but there are still dangling references.

Essentially there are two obvious possibilities:

  • The RID itself is garbage. This can happen due to overwritten data but is usually less likely.
  • The RID is a handle to an object that has been previously deleted.

Debugging it may involve finding out what area of code is freeing the RIDs in question (looking at the call stack for the free, most probably VisualServerScene::free), and then tracking down where the RID dangling reference is still being held (call stack).

The difficulty may be there will be a lot of legit 'free' calls to the VisualServer, and you won't know ahead of time which are likely to cause the problem.

New / Free debug database

This is such a common problem (I've encountered it on most codebases / engines I've worked on) and usually the solution for tracking these bugs is to have a debug build memory tracker that maintains a database of where the 'new' and 'delete' calls are coming from, in terms of lines of code and files etc. Then when an out of order delete occurs, it can easily pinpoint the offending line.

Godot doesn't have this yet. It is something that could potentially be added but it's a lot easier when you have a reproducible example bug. It's a bit complex to add to Godot, because these new / delete calls are all happening indirectly through the VisualServer, so you essentially need to add some macro magic to the call site, and do everything to pass this through to the VisualServer, and also have this turn on / offable with the DEV_ENABLED build flag.

#43826 (or the code within) could be useful for printing stack traces once the offending free has been found.

You can alternatively rig it so that when you call free, it doesn't actually free the objects but just sets a flag on them to mark them as freed, then check every time you get a RID from an owner whether the flag is invalid. And hope it hits the bug before you run out of RAM, because you will essentially be leaking everything in this situation.

Where the bug lies

I suspect the bug is in the editor code somewhere (or plugin, addon or user code), or something triggered from here, something out of order or a threading race condition, in the scene loading / unloading code maybe. I have no idea what the editor does to change scenes, I generally know very little about the editor code.

You may stumble on the problem by luck, but these types of bugs can turn out to be incredible wild goose chases, and it is usually better to add proper debugging mechanisms as above, for this and future bugs of this type.

@TokisanGames
Copy link
Contributor Author

TokisanGames commented Nov 3, 2021

Thanks. I didn't know about that one. It certainly looks suspicious, and that class shows up in some of my call stack traces. I was debugging it quite a bit today, but nothing conclusive yet.

SpatialEditor is sus. It does some weird stuff.

Also I just realized where I pointed out that errors were occurring (add_child) is only when the problem discovered and messages are printed. The RIDs are probably being freed before then, and my logs should be able to identify that.

@TokisanGames
Copy link
Contributor Author

Turning off the grid in the viewport prevents the error messages and symptoms on my system. Specifically disabling all three Editor->Editor Settings->Editors->3d->grid_*_plane, though only xz is enabled by default.

Guys, please test this on your systems and let me know if you get any !id_map.has(p_rid.get_data()) error messages. You should be able to test this with any build of 3.3 and later.

If that fixes it for everyone, then it looks like I've solved the issue.

The Cause

SpatialEditor has two 3-item arrays: grid and grid_instance. These hold the meshes and instances of the 3D grid shown in the viewports for each of the 3 grid planes.

These arrays are continually freed and repopulated whenever the camera or an object is moved, or several other events. They are repopulated only if the grid is enabled. Which means that sometimes (all the time), the old RIDs are repeatedly freed.

This is the root problem: It is not safe to free RIDs multiple times. When they are freed, they leave a dangling pointer.

Every update cycle in SpatialEditor, RIDs are sent to the visual server to be freed. Sometimes with current pointers, sometimes with dangling pointers.

VisualServer has no way to tell if an RID is valid or not. rid.is_valid() is true even if its dangling, and rid.get_id()!=0) will cause an exception if the pointer is dangling.

VS gets the pointer from each RID and searches for it in each of ~20 red black trees to see if the pointer matches any instance in any tree. If there's no match, nothing happens. However if there is a match, the instance is removed from the tree.

When you have a scene with hundreds of objects, it only takes a few scene switches to generate thousands of compares. Eventually it finds a match and a random instance gets removed from whichever r/b tree owned it (cameras, lights, meshes, etc).

The instance is still referenced elsewhere in the engine, but since the VisualServer can't find the instance in any of its id_maps, it spams the console with error messages, usually for the same instance over and over. Switching scenes or closing/reopening scenes causes SpatialEditor to repopulate its grid arrays, but it could free another random instance.

The Fix

I could just edit SpatialEditor and manually assign RID() to the grid arrays after free(). However this issue has highlighted a bigger problem in the engine. Freeing RIDs leaves dangling pointers. It's not safe to free an RID multiple times. If you do, you could remove instances from random id_maps throughout the engine, not just in VisualServer but also in Physics and anywhere else they are used.

My PR will tweak RID and every instance of free(RID) to enforce cleaning up these dangling pointers (passing the RIDs by reference and setting _data=nullptr). This also means RID.is_valid() will actually mean something. I've already got it working in all the VisualServers. I just need to do it for Physics and any other place. I'll push up a PR probably tomorrow with the RID changes and VS::free() sanity checks.

Also, RID.h, handles release and debug mode differently. Debug mode searches a red/black tree to see if it contains an RID. In release mode it doesn't even use a red/black tree. Instead the RID just keeps a direct reference to its owner. I really don't like having two versions of code, with release mode having more functions available. I intend to bring the two versions closer together to reduce complexity of finding bugs in the future.

@lawnjelly
Copy link
Member

lawnjelly commented Nov 5, 2021

@tinmanjuggernaut well done for finding this (and @Zylann )! Yes that does sound plausible.

Before spending any time attempting changes to RID I'd recommend discussing with reduz (on godot rocket chat), as it is a very core area of the engine (he may want to make any changes there himself, or propose a different solution).

@Zylann
Copy link
Contributor

Zylann commented Nov 5, 2021

My PR will tweak RID and every instance of free(RID) to enforce cleaning up these dangling pointers (passing the RIDs by reference and setting _data=nullptr)

You cannot have this work for the script API, free is exposed and RIDs cannot be passed by reference there... so your best option still remains to reset every freed RID afterward.
In C++ and GDScript I've been avoiding this mess by creating thin wrappers, which is why I knew it wasn't a problem on my side:
https://github.com/Zylann/godot_voxel/blob/370bce9de37d7a89f929e62ec01863c927ffbb7c/util/godot/direct_mesh_instance.h#L10
https://github.com/Zylann/godot_heightmap_plugin/blob/master/addons/zylann.hterrain/util/direct_mesh_instance.gd (interest is more limited in GDScript , but in C++ it's liberating)

Besides it would really be cleaner if freeing an empty RID generated a proper error so issues can be found earlier, it's prone to mistakes.

@TokisanGames
Copy link
Contributor Author

I had to tear RID apart to find the issue, so I'm familiar with it. I will ask reduz.

Either every caller of any free(RID) needs to also do rid=RID(), or RID needs to clean itself up, or a combination, like it's clean in the engine and dirty in script.

I currently have it working the original way for the script API, and a clean way for the engine. Several areas of the engine regularly send bad data to free.

As for script, it could possibly generate an error if desired, but there's no way for it to know that the pointer is dangling, unless 1) it try{}s to access the memory and 2) it's not pointing to a valid object, but this issue demonstrates that it does far too often. Many times I watched SpatialEditor, which should have only referenced VisualInstance, referencing AnimationNodeOneShot.

@TokisanGames
Copy link
Contributor Author

Find the PR at #54650.
Juan asked @RandomShaper to be the primary reviewer, but I invite all of you guys to look and comment.

@RandomShaper
Copy link
Member

RandomShaper commented Nov 6, 2021

It would be awesome to get an MRP based in the project in the original post. The one in #47051 has indeed a problem related to tab switching, but it's no longer the one reported there.

What happens there now is related to timing. If you switch to and away from the fur ball scene, a piece of script in the Fur addon causes errors about certain nodes not being in the tree:

func update_physics_object(delay : float) -> void:
	yield(_shell_fur_object.get_tree().create_timer(delay), "timeout")
	_physics_pos = _current_physics_object().global_transform.origin
	_physics_rot = _current_physics_object().global_transform.basis.get_rotation_quat()

While the fur scene is the active one, update_physics_object() is called. If you switch away from it in a shorter time than delay (which seems to be 0.5), by the time the yield() ends, the "current physics object" is no longer in the tree, causing the errors. I woulnd't call that a Godot bug. It may be possible to freeze the state of a scene in a tab that becomes inactive and unfreeze when it's active again, but it's possibly not worth the hassle. Sounds like a non trivial at all thing to implement.

@akien-mga
Copy link
Member

Fixed by #54650.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants