Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] CSI failed to recover FUSE mount point for AlluxioRuntime #2719

Closed
TrafalgarZZZ opened this issue Mar 9, 2023 · 2 comments · Fixed by #2747
Closed

[BUG] CSI failed to recover FUSE mount point for AlluxioRuntime #2719

TrafalgarZZZ opened this issue Mar 9, 2023 · 2 comments · Fixed by #2747
Labels
bug Something isn't working

Comments

@TrafalgarZZZ
Copy link
Member

What is your environment(Kubernetes version, Fluid version, etc.)

Describe the bug
FUSE Recovery failed when using AlluxioRuntime.

What you expect to happen:
Alluxio FUSE should be successfully recovered after deleting the FUSE pod.

How to reproduce it

Simply run the e2e script in #2477 can reproduce this bug.

Additional Information

@TrafalgarZZZ TrafalgarZZZ added the bug Something isn't working label Mar 9, 2023
@TrafalgarZZZ
Copy link
Member Author

The bug is probably caused by an incorrect order among Pod Readiness, FUSE mount point readiness and CSI Plugin's recover() func logic.

Currently, CSI plugin recovers broken mount points only if it detects any FUSE container restarts and become ready. However, it is possible that the execution order goes like:

  1. Alluxio FUSE Pod restarts
  2. Alluxio FUSE Pod ready
  3. CSI detects container restart and recovers broken mount points (At this time, Alluxio FUSE mount point is not ready, so nothing happened)
  4. Alluxio FUSE mount point ready
  5. CSI Plugin won't retry Step 1 to 4 because it detects no container restarts

@cheyang
Copy link
Collaborator

cheyang commented Mar 9, 2023

The bug is probably caused by an incorrect order among Pod Readiness, FUSE mount point readiness and CSI Plugin's recover() func logic.

Currently, CSI plugin recovers broken mount points only if it detects any FUSE container restarts and become ready. However, it is possible that the execution order goes like:

  1. Alluxio FUSE Pod restarts
  2. Alluxio FUSE Pod ready
  3. CSI detects container restart and recovers broken mount points (At this time, Alluxio FUSE mount point is not ready, so nothing happened)
  4. Alluxio FUSE mount point ready
  5. CSI Plugin won't retry Step 1 to 4 because it detects no container restarts

Can scanning /proc/self/mountinfo solve this issue?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants