Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Try to produce a bad behavior related to setting up frequently stopped pools with over ten devices #714

Open
mulkieran opened this issue Jun 18, 2024 · 5 comments
Assignees

Comments

@mulkieran
Copy link
Member

No description provided.

@mulkieran mulkieran self-assigned this Jun 18, 2024
@jbaublitz
Copy link
Member

Can you elaborate here on why you're confident there's a bug with this? Is it related to code that you saw?

@mulkieran
Copy link
Member Author

The requirements for the bug are:

  • More than ten devices, so that at least one isn't written to when the pool-level metadata gets written.
  • A version of stratisd from before #3606 .

Then, what should cause the bug, with some degree of probability:

  1. Create a pool.
  2. Cause some action for the pool-level metadata to be written.
  3. Stop the pool.
  4. Start the pool. My belief is that the following will happen:
    The last_update_time() result for each BDA will be stale. Sorting of the BDAs by the last update time will yield a chosen device from which to read the metadata. Because the last update time is invalid for all, the sorting will be meaningless. After some starts and stops, fewer if the number of devices far exceeds ten, the device with the older pool-level metadata will be picked and the pool will be set up using that older pool-level metadata.

So far we haven't been able to produce this, using the easily checked filesystem limit as a canary and twenty test devices.

@jbaublitz
Copy link
Member

Can you point to the code that you found that makes you believe this bug exists?

@mulkieran
Copy link
Member Author

Can you point to the code that you found that makes you believe this bug exists?

https://github.com/stratis-storage/stratisd/blob/master/src/engine/strat_engine/liminal/setup.rs#L42

If, at this point, the BDAs are stale wrt. last update time, then the device selected to read the pool-level metadata may possibly contain stale metadata.

@jbaublitz
Copy link
Member

Ah, I think I understand. This is related to the fact that we cached the BDA but didn't update it if it was changed, is that correct? And we've since corrected this problem in more recent versions?

@mulkieran mulkieran added this to the Metadata-related milestone Sep 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Pending
Development

No branches or pull requests

2 participants