-
Notifications
You must be signed in to change notification settings - Fork 243
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
store: simplify imagestore implementation #1784
Merged
openshift-merge-bot
merged 4 commits into
containers:main
from
giuseppe:refactor-image-store
Mar 1, 2024
Merged
Changes from all commits
Commits
Show all changes
4 commits
Select commit
Hold shift + click to select a range
f31412a
layers: add infra to lock multiple files
giuseppe 92b9480
store: simplify imagestore implementation
giuseppe 0ab61df
store: lock both graphroot and the imagestore root
giuseppe d1cc476
overlay: support imageStore with partial images
giuseppe File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is still outstanding.
I don't see how this is different than the previous version, what locking do we need here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, see the top-level comment #1784 (review) .
One immediate thing that springs to mind is deleting the parent layer while the child is being created/mounted/unmounted. I don’t know that we can fully defend against those kinds of things (we don’t hold layers locked for the whole time a child is mounted, and mount reference counts are, I think, basically per-machine and not visible over network shares).
Or maybe a layer with the same ID (pulled layers have deterministic IDs) being simultaneously created in both stores, making the path returned by
d.dir[2]()
change over time — e.g. forgetStagingDir
as an example.There might well be more, or maybe these are handled by some mechanism I don’t know about.
Assuming the concern is real:
getROLayerStoresLocked
to find the right layer, and handling locking, the driver should not be iterating over the secondary layer stores again / redundantly without locking. Structure the code, or document it, in a way that is likely to be maintained correctly over time.LOCKING BUG
inlayers.go
)I could see an argument that this is out of scope of this PR … I’m thinking that with this touching
dir2
anddir
, while the feature is still comparatively new, now is the best chance for fixing this we’ll ever have. Admittedly the timing is not super convenient now, but it will probably never be ideal.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the assumption with additional image stores is that images/layers are not removed. There are cases where we don't grab any lock, e.g. a read-only file system, so that is already the current state.
When we use a read-only layer, the layer could disappear also while a container is running, and there is nothing that prevents the removal now:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, the silent deletion of dependent parents seems hard to fix. It’s also not really documented that the users must prevent that, AFAICT.
What about concurrent creations, and other operations?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've not thought about them, but are they very different than delete though? We would end up with the same layer in multiple stores and use it from the first one where we find it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could see e.g. a partial pull of a layer being started on our store, triggering the creation of a staging directory; before it finishes, the other image store creates the same layer; and now the code trying too commit the partially-pulled layer sees the new layer, determines a different staging directory path, and rejects the commit attempt. (I’m not saying that this is the only problem.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's address this in a separate PR. I've played with it this morning, but I am not sure yet how this should be done.
Do you've any suggestions?