Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[3.x] Clear freed RIDs to fix disappearing meshes and id_map errors #54650

Merged
merged 1 commit into from
Dec 9, 2021

Conversation

TokisanGames
Copy link
Contributor

@TokisanGames TokisanGames commented Nov 5, 2021

Background

#53374 highlights an issue of disappearing meshes and error spam.

The root cause is when RIDs are freed, they leave a dangling pointer. Freed RIDs are not safe to free again, at least by the VisualServer or PhysicsServer. Doing so may randomly free other mesh instances if the invalid pointer matches one already in use by another RID.

Some areas in the engine, such as SpatialEditor continually spam VisualServer::free(RID) with already freed RIDs, so though uncommon, freeing RIDs that other classes own occurs far too frequently.

Changes

  • This PR manually clears RIDs wherever they are freed by VisualServer or PhysicsServer in the engine, and possibly reused (i.e. not freed in the destructor) with rid = RID();.

Fixes #53374, #44642, #38229, #48433, #47356, #44712
Probably fixes: #41360, #50630

core/rid.h Outdated Show resolved Hide resolved
core/rid.h Outdated Show resolved Hide resolved
@TokisanGames TokisanGames changed the title RID enforces clearing dangling pointers [WIP] RID enforces clearing dangling pointers Nov 6, 2021
@lawnjelly
Copy link
Member

lawnjelly commented Nov 6, 2021

I haven't examined in great details the VisualServer multithread wrapping / queue system as it contains a bunch of macro magic, but as far as I understand it, the free calls for the RIDs pass through this, so there may be a delay and it may be taking place on a different thread, so race conditions may occur, so I'm a bit wary we cover any possibilities due to this. This afaik is why the majority of fast functions have no return value (as that could potentially stall such a queue).

I'm sure RandomShaper will be more familiar but to me it seems there is a potential danger of the ordering not being as expected:
Client code:

1) Call free on RID
2) Delete RID

Server code

3) Receives free call for deleted RID object
4) accesses freed memory etc... crash / corruption

Whereas with the existing system of passing the RID by value this danger is avoided. This could easily happen if a RID is used e.g. as a local variable.

EDIT : All that said I'm not super familiar with RIDs (having just treated them like black boxes 😁 ), I'm having a look now. Ahh seeing that they actually do have a shared RID_Data which the objects derive from may make it a bit easier, and may work around the above problem making it moot - the RID itself is just a glorified pointer it looks like, and doesn't do anything on deletion. Ah but it can still crash I think, because you could be passing a RID by reference that has since been deleted. When it tries to set _data to NULL on the RID .. boom.

Second thing
I'm not sure I understand why setting _data in RID to null (by reference) will help in the situation where the RID is only an intermediate:

  • RID stored in list
  • calls another function passing RID by value
  • this function calls free, the intermediate RID _data is set to null
  • RID in the list still contains a non-null RID...

Overall the RIDs in 3.x do look a bit error prone in terms of lifetime and validity, and I wonder why they weren't implemented as handles which are usually better in this regard, or at least handles with pointers that can be refreshed on demand and checked for validity etc. Maybe handles will have threading overhead anyway because in the Godot 'free-for-all' everytime you need to get / check one, you strictly speaking need to check for race conditions etc where something else has freed the object. I gather they have been changed quite a bit in 4.x, I wonder if any of the improvement ideas there could be backported.

Irrespective of whether we keep with pointers in 3.x, having a DEV_ENABLED implementation that uses handles (maybe in parallel to the pointers) might be an effective way of tracking down misuse errors. 🤔

EDIT: Yes on looking at 4.x, it looks like it is using pretty much handles, with a pool ID and a counter. I don't think this would be a big deal to get working in 3.x. See what @RandomShaper thinks.

core/rid.h Outdated Show resolved Hide resolved
core/rid.h Outdated Show resolved Hide resolved
@lawnjelly
Copy link
Member

Just to keep up to date on this, I've discussed this with @RandomShaper and I'm trying out an implementation in 3.x with handles. This may end up being a better approach for RIDs overall in 3.x (similar to 4.x), or for just for bug finding in a specific build.. we'll know better soon, but don't spend too much time on this in the meantime.

What would be useful though if you are interested is if you can track down the bugs in the spatial editor and do a fix PR for these only. This will be useful whatever approach we use for the RIDs, and it's good practice anyway to have one fix per PR (as is there seems to be two things in this PR).

@TokisanGames
Copy link
Contributor Author

@lawnjelly has rewritten RID to use handles as is done in 4.0 with PR #54907. Once that PR is merged, I'll redo this PR to address the other issues in VisualServer and SpatialEditor (# 3 in OP, 1,2,4 are replaced.).

@akien-mga
Copy link
Member

@tinmanjuggernaut It's not visible to me either, I guess it wasn't sent. Maybe it's in a pending review?

@TokisanGames
Copy link
Contributor Author

@akien-mga It does say "Pending" but I have no option to have it not pend.

@lawnjelly Here is my comment:

The old code freed every grid_instance and grid, every tick, regardless of whether they were in use or not, and valid or not.

Now it only considers the ones that are enabled, using the same wonky iteration scheme defined in SpatialEditor::_init_grid() here:

https://github.com/godotengine/godot/blob/30033e7670346b705a10ffe0253d813c1c1db8a7/editor/plugins/spatial_editor_plugin.cpp#L5752

Where the RIDs are created in the two paragraphs here:
https://github.com/godotengine/godot/blob/30033e7670346b705a10ffe0253d813c1c1db8a7/editor/plugins/spatial_editor_plugin.cpp#L5846

@lawnjelly
Copy link
Member

Ah sorry for confusion, it may be some moderation thing just until your first PR is merged, something like that. 🙂

I'm not sure the if grid_enabled is wise to add in this PR. Reason being that it opens up potential for leaks, either now or in future. Consider if someone changes grid_enabled[i] while a RID still exists.

So I would go for simply setting to NULL here, and we can evaluate improvements in another PR?

As an aside the whole grid thing here I can see a few potential problems so it could do with an overhaul, but it would be nice to just get the immediate bugs fixed first.

Copy link
Member

@lawnjelly lawnjelly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Look good! 😄
Sorry for being picky it's always traumatic especially with first PRs. Great work on finding this! 👍

@TokisanGames
Copy link
Contributor Author

TokisanGames commented Dec 9, 2021

Alright, but stand by. I'm putting in a few more if... is_valid() checks that should be in these files.

@akien-mga
Copy link
Member

BTW is the list of fixed issues in the OP still up-to-date with the current state of this PR?

@lawnjelly
Copy link
Member

BTW is the list of fixed issues in the OP still up-to-date with the current state of this PR?

Ah yes this is correct the commit message needs to be properly filled out. See:
https://github.com/godotengine/godot/blob/master/CONTRIBUTING.md

@TokisanGames
Copy link
Contributor Author

Ok. I added more is_valid checks to every instance where they were missing. I also adjusted the commit message.

It should fix all of the Fixes I listed. The probablies and bugsquad add are unknown.

@TokisanGames
Copy link
Contributor Author

I have moved the assignments inside the if statements. I have limited the commit message to only fixing my bug report.

@akien-mga
Copy link
Member

Thanks!

@akien-mga
Copy link
Member

Cherry-picked for 3.4.1.

@EzraT
Copy link

EzraT commented Dec 12, 2021

Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants