Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixes Resolver delete race against file resolve #3581

Merged
merged 5 commits into from
Oct 2, 2020

Conversation

pmlopes
Copy link
Contributor

@pmlopes pmlopes commented Sep 23, 2020

Signed-off-by: Paulo Lopes [email protected]

Motivation:

Cleaning the FileSystem cache happens usually on a shutdown thread, outside the resolver monitor. It also changes the resolver state to set the cache dir object to null.

This has a couple of side effects:

  1. A call to resolve while the delete is happening will use the "to be deleted directory" as root.
  2. A resolve call during clear cache will see that the cacheDir object is null which will lead to resolved file to be written to the ${user.dir}.
  3. As cacheDir shared the same directory for all instances (by default) a clear for instance A will wipe the cache for instance B

The last item was also reported by #3510

This PR addresses the issue in the following way:

  1. cacheDir is final and computed at <ctor> time.
  2. clearCache will be synchronized on the resolver iif the is cacheDir
  3. caches are suffixed with an UUID instead of sharing a common directory.

The last item only addresses the uniqueness and independence of caches not the full discussion on #3510 . As a note the mentioned issue warns about potential collisions, however wikipedia tells us that the change of a collision is 1 in a billion: https://en.wikipedia.org/wiki/Universally_unique_identifier#Version_4_(random) . Given that most unices can handle a few million processes:

cat /proc/sys/kernel/pid_max
4194304

I'm not attempting to change the current use of UUIDs + on regular operations caches are deleted on VM shutdown, which means the probability will be even smaller.

Signed-off-by: Paulo Lopes <[email protected]>
Copy link
Contributor

@tsegismont tsegismont left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Member

@vietj vietj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a couple of changes.

@pmlopes
Copy link
Contributor Author

pmlopes commented Sep 25, 2020

I've applied the "fixes" from the comments and added a few extra checks:

  • on close, if the shutdown hook is null, avoid enter a synchronized block.
  • the shutdown hook thread, will not spawn a new thread with deadline to delete the cache if a delete is already in progress.
  • the resolve file operation will throw ISE if the cache dir is being deleted

All validation tests pass on my side + the issue that was used to identify the problem (vertx-web tests) also pass on my side (no leaks of caches to the CWD or TMP.

Copy link
Member

@vietj vietj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some changes

@pmlopes
Copy link
Contributor Author

pmlopes commented Sep 28, 2020

All changes added.

@pmlopes pmlopes requested a review from vietj September 28, 2020 11:09
Signed-off-by: Paulo Lopes <[email protected]>
@pmlopes pmlopes merged commit 4744506 into master Oct 2, 2020
@vietj vietj deleted the issues/race-resolver-close branch May 3, 2022 07:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants