Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OSX FileSystemWatcher needs to root RunningInstance #51538

Closed
wants to merge 1 commit into from

Conversation

carlossanlop
Copy link
Member

@carlossanlop carlossanlop commented Apr 20, 2021

Fixes #30056

This is an implementation of the hypothesis shared here: #30056 (comment)

FileSystemWatcher is currently causing CI failures, and one potential root cause may be that we are using an event that was generated by an object that is not rooted. The issue might go away if we root the RunningInstance instance and get it freed when the FileSystemWatcher is disposed.

@ghost
Copy link

ghost commented Apr 20, 2021

Tagging subscribers to this area: @carlossanlop
See info in area-owners.md if you want to be subscribed.

Issue Details

Proof of concept. Please do not review yet.

I'm quickly testing the hypothesis shared here: #30056 (comment)

I'm currently having trouble building on my Macbook so I need to use the CI.

Author: carlossanlop
Assignees: carlossanlop
Labels:

area-System.IO

Milestone: -

@carlossanlop carlossanlop added the NO-MERGE The PR is not ready for merge yet (see discussion for detailed reasons) label Apr 20, 2021
@carlossanlop carlossanlop removed the NO-MERGE The PR is not ready for merge yet (see discussion for detailed reasons) label Apr 22, 2021
@carlossanlop carlossanlop changed the title [Do not merge] Test OSX FileSystemWatcher needs to root RunningInstance OSX FileSystemWatcher needs to root RunningInstance Apr 22, 2021
@carlossanlop carlossanlop marked this pull request as ready for review April 22, 2021 20:17
@@ -49,6 +55,11 @@ private void StartRaisingEvents()
_enabled = true;

var instance = new RunningInstance(this, _directory, _includeSubdirectories, TranslateFlags(_notifyFilters));
if (_cachedRunningInstance.IsAllocated)
{
throw new Exception("Should've called StopRaisingEvents before restarting");
Copy link
Member Author

@carlossanlop carlossanlop Apr 22, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Obviously need a resource string here. But before I add one, do we want to throw here? Should we simply reuse the existing running instance if the user accidentally calls StartRaisingEvents more than once in a row?

The previous behavior was to create a brand new instance every time, which is why I thought it would make sense to continue trying to do the same.

Another option is to free any allocated instances, then set the new instance with Alloc.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To today if I call Start twice in a row, do I experience the same behavior as if I called Start once? Ie, it's effectively a no-op? If so then it seems to me that we don't have any good reason to break them. We essentially document that any redundant Start calls are ignored, and preserve that behavior.
If instead today the user gets into a bad state, then starting to throw might be helpful.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we simply reuse the existing running instance if the user accidentally calls StartRaisingEvents more than once in a row?

Can you elaborate on what someone's code would look like to call StartRaisingEvents twice in a row? The method is private. What path through the code would enable that, other than erroneously using an FSW instance in parallel?


if (_cachedRunningInstance.IsAllocated)
{
_cachedRunningInstance.Free();
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want it freed in StopRaisingEvents, or should it be freed only on FinalizeDispose?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it is held by RunningInstance then it should presumably be freed in the Dispose of RunningInstance. And you call Dispose here unless you want to preserve the RunningInstance across Start/Stop calls. Which depends on the semantics of FSW. I'm guessing the semantics are that we release all native resources when you do Stop, in case it holds a handle to the directory being watched for example. So in that case StartRaisingEVents creates a RI and StopRaisingEvents disposes the RI

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and of course Dispose should also release it in case StopRaisingEvents never got called.

@@ -20,11 +20,17 @@ namespace System.IO
{
public partial class FileSystemWatcher
{
private GCHandle _cachedRunningInstance;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit, newline after.

@@ -20,11 +20,17 @@ namespace System.IO
{
public partial class FileSystemWatcher
{
private GCHandle _cachedRunningInstance;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this GCHandle rather be in the RunningInstance ? RunningInstance has the callback that the GCHandle protects, it would be best if this GCHandle is close to where the protected callbacks are.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can actually simplify the RunningInstance to just use function pointer + GCHandle once you do that. The marshaled delegate that it uses right now is less efficient than function pointer + GCHandle.

@kevincathcart-cas
Copy link

There is a lot that is questionable here (in the underlying code, I'm not talking about the proposed changes per-se).

The RuntimeInstance is normally de-facto rooted while the FileSystemWatcher is rooted (at least up until the FileSystemWatcher is disabled or disposed). This happens though a rather roundabout route: One of its methods (CleanupEventStream()) is registered as a callback on the CancellationTokenSource that gets held by the FileSystemWatcher. But this feels like an awfully indirect way to ensure the RuntimeInstance remains rooted.

It is worth noting that CleanupEventStream() is thread-safe, and it needs to be, because it can get called from both the callback, and the from the the FileSystemWatcher's finalizer at pretty much the same time, just by chance, since both the callback and the finalizer will try to clean things up if the watcher is no longer reachable. However, this implies it is both the callback and the FileSystemWatcher's responsibility to stop the RuntimeInstance if the watcher becomes unreachable.

That would almost make sense if leaking RuntimeInstances could happen (i.e. having the CancellationTokenSource not reference a still executing RuntimeInstance), and if the RuntimeInstance also rooted itself, because it would let it eventually clean itself after the watcher has gone away when/if the callback gets triggered, but if this were supposed to be such a weird edge case failsafe, I would have expected a comment saying as much. Instead this just looks a lot like confusion of responsibilities.

It is also very much unclear why CleanupEventStream() bothers calling FSEventStreamStop() against the SafeHandle, when the SafeHandle does this itself in its ReleaseHandle method. Furthermore note the try finally around it. If that were actually needed, then why does the exact same call not need that same logic inside the SafeHandle's ReleaseHandle method?


@stephentoub's analysis would basically be that the RuntimeInstance was somehow not rooted via the CTS, so the FSW's finalizer does not trigger CleanupEventStream(). But the callback comes in, and enters CleanupEventStream() while racing the SafeHandle's finalizer. Sure that absolutely does seem like it could cause a crash, and one that having a GCHandle root the RuntimeInstance could avoid. However, I'm not clear on how the RuntimeInstance would end up not rooted by the CancellationTokenSource, while still allowing the callbacks to happen.

It sure looks like that would only happen if:
a) a FSW is invalidly being manipulated by multiple threads at once, such that two StartRaisingEvents() are running at once, or
b) MacOS's FSEventStreamStart function succeeds but returns false.

jkotas added a commit to jkotas/runtime that referenced this pull request Apr 28, 2021
@jkotas
Copy link
Member

jkotas commented Apr 28, 2021

I have posted #52019 with a different take on this issue.

@carlossanlop carlossanlop deleted the gchandle branch April 29, 2021 02:50
jkotas added a commit that referenced this pull request Apr 29, 2021
@karelz karelz added this to the 6.0.0 milestone May 20, 2021
@ghost ghost locked as resolved and limited conversation to collaborators Jun 19, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

SafeHandle use-after-dispose in FileSystemWatcher on OSX
6 participants