Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Latent Coroutine Resumes for Unloaded Streamed Actor, Crash #19

Closed
nwilson100 opened this issue Sep 30, 2023 · 5 comments
Closed

Latent Coroutine Resumes for Unloaded Streamed Actor, Crash #19

nwilson100 opened this issue Sep 30, 2023 · 5 comments

Comments

@nwilson100
Copy link

Context:

The issue occurs when using a latent coroutine (with FForceLatentCoroutine) on an Actor that gets unloaded as a result of asynchronously unloading a level using level streaming. Specifically in my case, UGameplayStatics::UnloadStreamLevelBySoftObjectPtr.

The apparent issue observed is that coroutines resume on actors that have been unloaded from the world. At the point of resuming, the __coro_frame_ptr's __this points to the Actor that has been unloaded: Their UWorld * is null, and their ActorHasBegunPlay data member has been set to HasNotBegunPlay. While using this appears to be a valid memory access, the coroutine has resumed for an actor who is no longer a part of any UWorld, resulting in crashes for any functionality that requires the actor to have a world. It's a valid Actor in terms of existing in memory, but you certainly would not want to do anything with it at that point.

I'm assuming this behavior is undesirable, as the latent coroutine continues running on the actor's original world that it was loaded into, despite that actor no longer existing in the world, but still existing in memory. While we have latent this protection, there doesn't seem to be protection against resuming on actors that have been unloaded and shouldn't be executing its usual gameplay code.

Expected/Desired behavior:

Some mechanism should prevent coroutines from resuming if the owning Actor has been unloaded via streaming. Additionally, if possible, other UObject's who have that unloaded Actor set as their Outer will also have their coroutines cancelled in a way that prevents resuming, so they also are not left executing without a UWorld in their outer chain or with an owner that shouldn't be used.

Repro steps:

This issue reproduces for me 100% of the time.

  1. Ensure "Use Background Level Streaming" is enabled in project settings.
  2. Create an Actor class which defines a coroutine with FForceLatentCoroutine.
  3. For ease reproducing a crash, define the coroutine such that it does something with the pointer from GetWorld() frequently. My test uses co_await UE5Coro::Latent::Seconds to access GetWorld() at an interval of 0.1 seconds.
  4. Run this Coroutine on BeginPlay().
  5. Place the actor in a level which you can load via level streaming.
  6. Load the level using UGameplayStatics::LoadStreamLevelBySoftObjectPtr.
  7. Wait at least a moment before unloading, to ensure the coroutine has began.
  8. Unload the level using UGameplayStatics::UnloadStreamLevelBySoftObjectPtr.
  9. During the unload process, the game crashes due to the coroutine resuming after the actor has been unloaded and its world is null.
  10. While debugging the crash, check the __coro_frame_ptr and look into the data of __this to see that object is in an unloaded state. ActorHasBegunPlay will be HasNotBegunPlay, and the Actor will have no UWorld.
@nwilson100
Copy link
Author

Update:
In the provided repro case, sometimes the latent coroutine resumes after "this" has become invalid, so calling any method on the actor results in a crash and its use is undefined behavior. Here is a snippet of an example __coro_frame_ptr in this state. In this way, it appears unloading an actor can also cause latent coroutine "this" protection to be unreliable.

image

@landelare
Copy link
Owner

Thank you for the detailed bug report! I managed to reproduce something similar myself.

@landelare
Copy link
Owner

@nwilson100 is the test branch any better?

@nwilson100
Copy link
Author

@landelare The fix worked! I tested two cases for 1000 repeats with automation that reproduced consistently before (with the worst case at 1/10 repro rate). The first case I tested the fix against was the one shared with the report, and the second case involved a coroutine running in a UObject who had an actor component as its outer, then unloading the owning actor.

Thanks for the fix and great work with this plugin.

@landelare
Copy link
Owner

Thank you again for the detailed bug report, the fix will ship in 1.10.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants