New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

OSX FileSystemWatcher needs to root RunningInstance #51538

Closed

carlossanlop wants to merge 1 commit into dotnet:main from carlossanlop:gchandle

Member

carlossanlop commented Apr 20, 2021 •

edited

Loading

Fixes #30056

This is an implementation of the hypothesis shared here: #30056 (comment)

FileSystemWatcher is currently causing CI failures, and one potential root cause may be that we are using an event that was generated by an object that is not rooted. The issue might go away if we root the RunningInstance instance and get it freed when the FileSystemWatcher is disposed.


          POC: OSX FileSystemWatcher needs to root RunningInstance

2399c5e

carlossanlop added the area-System.IO label

carlossanlop self-assigned this

ghost commented Apr 20, 2021

Tagging subscribers to this area: @carlossanlop
See info in area-owners.md if you want to be subscribed.

Issue Details

Proof of concept. Please do not review yet.

I'm quickly testing the hypothesis shared here: #30056 (comment)

I'm currently having trouble building on my Macbook so I need to use the CI.

Author:	carlossanlop
Assignees:	carlossanlop
Labels:	`area-System.IO`
Milestone:	-

carlossanlop added the NO-MERGE label

carlossanlop mentioned this pull request

SafeHandle use-after-dispose in FileSystemWatcher on OSX #30056

Closed

carlossanlop removed the NO-MERGE label

carlossanlop changed the title ~~[Do not merge] Test OSX FileSystemWatcher needs to root RunningInstance~~ OSX FileSystemWatcher needs to root RunningInstance

carlossanlop requested a review from stephentoub

April 22, 2021 20:16

carlossanlop marked this pull request as ready for review

April 22, 2021 20:17

carlossanlop requested review from ericstj, danmoseley, jkotas, wfurt, jozkee and adamsitnik

April 22, 2021 20:19

carlossanlop commented

View reviewed changes

src/libraries/System.IO.FileSystem.Watcher/src/System/IO/FileSystemWatcher.OSX.cs

@@ @@ -49,6 +55,11 @@ private void StartRaisingEvents() @@
                               _enabled = true;
                               var instance = new RunningInstance(this, _directory, _includeSubdirectories, TranslateFlags(_notifyFilters));
+                              if (_cachedRunningInstance.IsAllocated)
+                              {
+                                  throw new Exception("Should've called StopRaisingEvents before restarting");

Member Author

carlossanlop Apr 22, 2021 •

edited

Loading

Obviously need a resource string here. But before I add one, do we want to throw here? Should we simply reuse the existing running instance if the user accidentally calls StartRaisingEvents more than once in a row?

The previous behavior was to create a brand new instance every time, which is why I thought it would make sense to continue trying to do the same.

Another option is to free any allocated instances, then set the new instance with Alloc.

Member

danmoseley Apr 22, 2021

To today if I call Start twice in a row, do I experience the same behavior as if I called Start once? Ie, it's effectively a no-op? If so then it seems to me that we don't have any good reason to break them. We essentially document that any redundant Start calls are ignored, and preserve that behavior.
If instead today the user gets into a bad state, then starting to throw might be helpful.

Member

stephentoub Apr 23, 2021

Should we simply reuse the existing running instance if the user accidentally calls StartRaisingEvents more than once in a row?

Can you elaborate on what someone's code would look like to call StartRaisingEvents twice in a row? The method is private. What path through the code would enable that, other than erroneously using an FSW instance in parallel?

carlossanlop commented

View reviewed changes

src/libraries/System.IO.FileSystem.Watcher/src/System/IO/FileSystemWatcher.OSX.cs

+                          if (_cachedRunningInstance.IsAllocated)
+                          {
+                              _cachedRunningInstance.Free();

Member Author

carlossanlop Apr 22, 2021

Do we want it freed in StopRaisingEvents, or should it be freed only on FinalizeDispose?

Member

danmoseley Apr 22, 2021

If it is held by RunningInstance then it should presumably be freed in the Dispose of RunningInstance. And you call Dispose here unless you want to preserve the RunningInstance across Start/Stop calls. Which depends on the semantics of FSW. I'm guessing the semantics are that we release all native resources when you do Stop, in case it holds a handle to the directory being watched for example. So in that case StartRaisingEVents creates a RI and StopRaisingEvents disposes the RI

Member

danmoseley Apr 22, 2021

and of course Dispose should also release it in case StopRaisingEvents never got called.

danmoseley reviewed

View reviewed changes

src/libraries/System.IO.FileSystem.Watcher/src/System/IO/FileSystemWatcher.OSX.cs Show resolved Hide resolved

danmoseley reviewed

View reviewed changes

src/libraries/System.IO.FileSystem.Watcher/src/System/IO/FileSystemWatcher.OSX.cs

@@ @@ -20,11 +20,17 @@ namespace System.IO @@
               {
                   public partial class FileSystemWatcher
                   {
+                      private GCHandle _cachedRunningInstance;

Member

danmoseley Apr 22, 2021

nit, newline after.

jkotas reviewed

View reviewed changes

src/libraries/System.IO.FileSystem.Watcher/src/System/IO/FileSystemWatcher.OSX.cs

@@ @@ -20,11 +20,17 @@ namespace System.IO @@
               {
                   public partial class FileSystemWatcher
                   {
+                      private GCHandle _cachedRunningInstance;

Member

jkotas Apr 22, 2021

Should this GCHandle rather be in the RunningInstance ? RunningInstance has the callback that the GCHandle protects, it would be best if this GCHandle is close to where the protected callbacks are.

Member

jkotas Apr 22, 2021

You can actually simplify the RunningInstance to just use function pointer + GCHandle once you do that. The marshaled delegate that it uses right now is less efficient than function pointer + GCHandle.

kevincathcart-cas commented Apr 23, 2021

There is a lot that is questionable here (in the underlying code, I'm not talking about the proposed changes per-se).

The RuntimeInstance is normally de-facto rooted while the FileSystemWatcher is rooted (at least up until the FileSystemWatcher is disabled or disposed). This happens though a rather roundabout route: One of its methods (CleanupEventStream()) is registered as a callback on the CancellationTokenSource that gets held by the FileSystemWatcher. But this feels like an awfully indirect way to ensure the RuntimeInstance remains rooted.

It is worth noting that CleanupEventStream() is thread-safe, and it needs to be, because it can get called from both the callback, and the from the the FileSystemWatcher's finalizer at pretty much the same time, just by chance, since both the callback and the finalizer will try to clean things up if the watcher is no longer reachable. However, this implies it is both the callback and the FileSystemWatcher's responsibility to stop the RuntimeInstance if the watcher becomes unreachable.

That would almost make sense if leaking RuntimeInstances could happen (i.e. having the CancellationTokenSource not reference a still executing RuntimeInstance), and if the RuntimeInstance also rooted itself, because it would let it eventually clean itself after the watcher has gone away when/if the callback gets triggered, but if this were supposed to be such a weird edge case failsafe, I would have expected a comment saying as much. Instead this just looks a lot like confusion of responsibilities.

It is also very much unclear why CleanupEventStream() bothers calling FSEventStreamStop() against the SafeHandle, when the SafeHandle does this itself in its ReleaseHandle method. Furthermore note the try finally around it. If that were actually needed, then why does the exact same call not need that same logic inside the SafeHandle's ReleaseHandle method?

@stephentoub's analysis would basically be that the RuntimeInstance was somehow not rooted via the CTS, so the FSW's finalizer does not trigger CleanupEventStream(). But the callback comes in, and enters CleanupEventStream() while racing the SafeHandle's finalizer. Sure that absolutely does seem like it could cause a crash, and one that having a GCHandle root the RuntimeInstance could avoid. However, I'm not clear on how the RuntimeInstance would end up not rooted by the CancellationTokenSource, while still allowing the callbacks to happen.

It sure looks like that would only happen if:
a) a FSW is invalidly being manipulated by multiple threads at once, such that two StartRaisingEvents() are running at once, or
b) MacOS's FSEventStreamStart function succeeds but returns false.

jkotas added a commit to jkotas/runtime that referenced this pull request


          Cleanup lifetime management in OSX FileSystemWatcher

f03049d

May fix dotnet#51538

jkotas mentioned this pull request

Cleanup lifetime management in OSX FileSystemWatcher #52019

Merged

Member

jkotas commented Apr 28, 2021

I have posted #52019 with a different take on this issue.

carlossanlop closed this

carlossanlop deleted the gchandle branch

April 29, 2021 02:50

jkotas added a commit that referenced this pull request


          Cleanup lifetime management in OSX FileSystemWatcher (#52019)

964b500

May fix #51538

karelz added this to the 6.0.0 milestone

ghost locked as resolved and limited conversation to collaborators

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Reviewers

jkotas jkotas left review comments

danmoseley danmoseley left review comments

stephentoub Awaiting requested review from stephentoub

ericstj Awaiting requested review from ericstj

wfurt Awaiting requested review from wfurt

jozkee Awaiting requested review from jozkee

adamsitnik Awaiting requested review from adamsitnik

Labels