Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

filewatcher: Implement support for directories #561

Merged
merged 1 commit into from
Sep 1, 2022

Conversation

fishy
Copy link
Member

@fishy fishy commented Aug 26, 2022

This replaces the filewatcher part of changes of
#472.

@fishy fishy requested a review from a team as a code owner August 26, 2022 20:07
@fishy fishy requested review from kylelemons, pacejackson and ghirsch-reddit and removed request for a team August 26, 2022 20:07
filewatcher/filewatcher.go Outdated Show resolved Hide resolved
} else {
r.data.Store(&atomicData{
data: d,
mtime: mtime,
})
// remove all previously watched files
for _, path := range watchedFiles {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

loop through watcher.WatchedFiles() or whatever instead?

Copy link
Member Author

@fishy fishy Aug 26, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's actually added by a fork of the original fsnotify package we were using, but since the original one is already stopped developing it's time to switch to the fork.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:ohno:

// of fs events happens (for example, when multiple files within the
// directory changed).
if timer == nil {
timer = time.AfterFunc(fsEventsDelay, forceReload)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there's still a race condition here it looks like... if it takes longer than 200ms for the files to get written, and then you have an error watching the file(s), I think it can cause the watcher to effectively "forget" about the files forever. It doesn't currently look to me like the polling interval will recover, though I may be missing code that would make it do so.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we only remove and readd watched files after parser returned success, we also log and ignore all failures of adding files to watch, so failure to watch one file would not affect other files (but yes that particular file could be forgotten forever).

if it takes longer than 200ms for the files to get written

The intended usage of filewatcher is really for those rename swaps, not really open the files one-by-one and write the updates (we can probably add some more documents to clarify on that). and if someone really is using it to watch directories that's updated one-by-one, the delay is still configurable so they can configure it to be a larger number to mitigate those risks.

Copy link
Contributor

@kylelemons kylelemons Aug 26, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the problem that I forsee is that this is used to monitor a CSI driver mount point, and those currently (and for a long time since they haven't even fixed it yet) rm the files before writing the new ones, it seems very easy for a lot of or large files to exceed 200ms, especially given the issues we've seen with iops being exhausted during pod churn.

I recognize that this is pretty hard to do though, and this is already a general purpose API, so we're slightly tied to it. This is why for v2 we're not using a general purpose library, so we can paper over these issues. In this case, we can do so because we know the exact file paths we expect to not forget.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yea, the k8s implementation is not great :/

do you want to make the default longer? I do not want to make it too long mainly to not make the secrets test to sleep for too long (we currently don't expose the api to customize it so it has to use the default one). but maybe 1s is not too bad?

@fishy fishy force-pushed the directory-watch branch 2 times, most recently from 669f2b6 to cbf06a7 Compare August 27, 2022 15:21
Copy link
Contributor

@pacejackson pacejackson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just some comments about comments

Comment on lines 70 to 71
// Please note that DirParser should always return the consistent type.
// Inconsistent type will cause panic, as does returning nil data and nil error.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Please note that DirParser should always return the consistent type.
// Inconsistent type will cause panic, as does returning nil data and nil error.
// Please note that a DirParser must return a consistent type.
// Inconsistent types will cause a panic, as does returning nil data and nil error.

Why would a nil error cause a panic?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's the caveat from using atomic.Value:

Store sets the value of the Value to x. All calls to Store for a given Value must use values of the same concrete type. Store of an inconsistent type panics, as does Store(nil).

We can bypass this by doing something similar to 316e844, but I don't think we should do that unless absolutely necessary.

for _, path := range watcher.WatchList() {
watcher.Remove(path)
}
// then readd all new files to watch
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// then readd all new files to watch
// then read all new files to watch

Comment on lines 286 to 287
// Optional, the delay between fs events and actually read and parse the
// changes.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Optional, the delay between fs events and actually read and parse the
// changes.
// Optional, the delay between receiving the fs events and actually reading and parsing the
// changes.

// Optional, the delay between fs events and actually read and parse the
// changes.
//
// It's used to avoid short burst of fs events (for example, when watching a
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// It's used to avoid short burst of fs events (for example, when watching a
// It's used to avoid short bursts of fs events (for example, when watching a

This replaces the filewatcher part of changes of
reddit#472.
@fishy fishy merged commit 0b74864 into reddit:master Sep 1, 2022
@fishy fishy deleted the directory-watch branch September 1, 2022 16:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

4 participants