-
Notifications
You must be signed in to change notification settings - Fork 242
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
store: simplify imagestore implementation #1784
store: simplify imagestore implementation #1784
Conversation
f7ad9de
to
a5de3f7
Compare
Can you run a test against podman and Buildah as well? |
f6e38aa
to
5385212
Compare
bc35798
to
814fddc
Compare
this is ready for review. Both Podman and Buildah pass Podman test PR: containers/podman#21070 |
LGTM |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are parts of this, like removing all the d.imageStore
conditions throughout overlay, which I very much like.
On the general principle, I suppose that’s pre-existing, but I just can’t see how this works. All of the read-only stores need to be RO locked, don’t they? And locks are handled in the generic code, drivers can’t do that for themselves. So any kind of iteration over stores in dir2
seems completely broken to me.
So… what’s really going on? Is dir2
only to handle cross-store parent layers during an overlay mount, or something? If that were so, the driver code should not be touching the other stores during any other operation; and still that cross-store parent situation would need to be somehow explicitly tracked and locked.
I’m afraid I can’t do a full review now. Maybe there is some way to split this into much smaller PRs?
drivers/overlay/overlay.go
Outdated
if d.imageStore != "" { | ||
_ = os.RemoveAll(filepath.Join(d.imageStore, stagingDir)) | ||
} | ||
_ = os.RemoveAll(filepath.Join(d.home, stagingDir)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does this work? stagingDir
is here a per-driver directory, but getStagingDir
creates per-layer-ID staging directories.
Similarly the EDIT check use of use in ListLayers
seems inappropriate now — or quite possibly it needs to be retained for compatibility with older writers, but then it should be explicitly documented as such a non-obvious case.
Stylitically, I’d prefer:
- Define a separate
globalStagingDir
/perLayerStagingDir
constants; don’t use a single string for two different purposes! - As I always want, document top-level symbols.
store.go
Outdated
stores := driver.AdditionalImageStores() | ||
if s.imageStoreDir != "" { | ||
stores = append([]string{s.graphRoot}, stores...) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would greatly appreciate consolidating this data further:
- Only ever call
AdditionalImageStores()
once inload
- Only ever have one
if s.imageStoreDir != ""
condition, somewhere around there. - Together, populate something like
s.rwImageLayerStoreDirs
/s.roImageLayerStoreDirs
. And then have all other graph iterations be trivial loops.
func (s *store) createGraphDriverLocked() (drivers.Driver, error) { | ||
driverRoot := s.imageStoreDir | ||
imageStoreBase := s.graphRoot | ||
if driverRoot == "" { | ||
driverRoot = s.graphRoot | ||
imageStoreBase = "" | ||
} | ||
config := drivers.Options{ | ||
Root: driverRoot, | ||
ImageStore: imageStoreBase, | ||
Root: s.graphRoot, | ||
ImageStore: s.imageStoreDir, | ||
RunRoot: s.runRoot, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAICS this effectively reverses the semantics of the two fields. While I 100% support the resulting simpler code and semantics, I’m not smart enough to review such changes in a GitHub PR without ~re-doing the work.
One possible approach:
- Define completely new fields with the new semantics. Set them in the first commit.
- In the first and/or subsequent commits, migrate the users to the new fields.
- Now that the old fields have no readers, possibly rename the fields back to the original names, with the new semantics,
store.go
Outdated
|
||
ro := store != s.imageStoreDir && store != s.graphRoot | ||
|
||
rls, err := s.newLayerStore(rlpath, glpath, s.graphDriver, false, ro) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this introducing the “second read-write layer store” concept? Yes, it mirrors the “second read-write image store” thing (which I found dubious in #1578), but AFAICS the Delete
code which uses the “second RW image store” never deletes layers from the second RW layer store.
So either that should be added as well, or (hopefully) we don’t need this complexity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is needed because we have this weird specification that a layer must be deleted from the current store if it is found there. We even have a test for it :/
https://github.com/containers/storage/blob/main/tests/split-store.bats#L96-L101
I am fine if we decide to break this specification, but let's do it separately after we verify with CRI-O folks that it is not necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fine… but I can’t see how that works. layersToRemove
is only applied to one rlstore
. That uses getLayerStoreLocked
, and AFAICS never gets to write to this special member of roLayerStoresUseGetters
.
I’m probably being stupid.
newpath := filepath.Join(homedir, d.String(), "dir", filepath.Base(id)) | ||
if _, err := os.Stat(newpath); err != nil { | ||
additionalHomes := d.homes[1:] | ||
if d.imageStore != "" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can’t this condition be determined at driver initialization time?
Every single consumer of d.homes
is either indexing [0]
or [:1]
, suggesting that the data can be recorded in some more convenient way.
drivers/vfs/driver.go
Outdated
homedir = d.homes[0] | ||
} | ||
newpath := filepath.Join(homedir, d.String(), "dir", filepath.Base(id)) | ||
if _, err := os.Stat(newpath); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is still outstanding.
I don't see how this is different than the previous version, what locking do we need here?
} | ||
|
||
newpath := path.Join(homedir, id) | ||
|
||
if _, err := os.Stat(newpath); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is still outstanding.
I don't see how this is different than the previous version, what locking do we need here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, see the top-level comment #1784 (review) .
One immediate thing that springs to mind is deleting the parent layer while the child is being created/mounted/unmounted. I don’t know that we can fully defend against those kinds of things (we don’t hold layers locked for the whole time a child is mounted, and mount reference counts are, I think, basically per-machine and not visible over network shares).
Or maybe a layer with the same ID (pulled layers have deterministic IDs) being simultaneously created in both stores, making the path returned by d.dir[2]()
change over time — e.g. for getStagingDir
as an example.
There might well be more, or maybe these are handled by some mechanism I don’t know about.
Assuming the concern is real:
- I’d like to see the scope of the problem restricted to the unavoidable minimum (whatever that is); in all situations where the generic code is iterating over
getROLayerStoresLocked
to find the right layer, and handling locking, the driver should not be iterating over the secondary layer stores again / redundantly without locking. Structure the code, or document it, in a way that is likely to be maintained correctly over time. - In the situations where the problem remains outstanding afterwards and is not fixable short-term, we should loudly document the problem, the situations where it arises, and the constraints, for future maintainers (compare
LOCKING BUG
inlayers.go
) - Alternatively, if this is never going to be fixed, and users need to use the feature appropriately (“the image store must never be modified while it is used with outer stores” ?), clearly document that in end-user documentation, and maybe point at it in the code to acknowledge and justify the unsynchronized accesses.
- (As another option, a few years ago overlay used to do its own per-layer locking. I have removed that because I thought it’s redundant; actually external users can get a pointer to the driver object, so it is not 100% redundant, and doing that locking here might also be an option. I would prefer not to add that cost to every operation.)
I could see an argument that this is out of scope of this PR … I’m thinking that with this touching dir2
and dir
, while the feature is still comparatively new, now is the best chance for fixing this we’ll ever have. Admittedly the timing is not super convenient now, but it will probably never be ideal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the assumption with additional image stores is that images/layers are not removed. There are cases where we don't grab any lock, e.g. a read-only file system, so that is already the current state.
When we use a read-only layer, the layer could disappear also while a container is running, and there is nothing that prevents the removal now:
# mkdir -p /tmp/store/{rw,ro,empty}
# podman --root /tmp/store/rw pull busybox
# mount --bind --read-only /tmp/store/rw /tmp/store/ro
# podman --root /tmp/store/empty --storage-opt overlay.imagestore=/tmp/store/ro run -d busybox top
5d4baa2643c77deca49ff9b1a0a10df5c2965626b725b8a2225db54706e69980
# podman --root /tmp/store/rw rmi busybox
Untagged: docker.io/library/busybox:latest
Deleted: 9211bbaa0dbd68fed073065eb9f0a6ed00a75090a9235eca2554c62d1e75c58f
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, the silent deletion of dependent parents seems hard to fix. It’s also not really documented that the users must prevent that, AFAICT.
What about concurrent creations, and other operations?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've not thought about them, but are they very different than delete though? We would end up with the same layer in multiple stores and use it from the first one where we find it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could see e.g. a partial pull of a layer being started on our store, triggering the creation of a staging directory; before it finishes, the other image store creates the same layer; and now the code trying too commit the partially-pulled layer sees the new layer, determines a different staging directory path, and rejects the commit attempt. (I’m not saying that this is the only problem.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's address this in a separate PR. I've played with it this morning, but I am not sure yet how this should be done.
Do you've any suggestions?
f3ffbd6
to
5c00dd3
Compare
store.go
Outdated
|
||
ro := store != s.imageStoreDir && store != s.graphRoot | ||
|
||
rls, err := s.newLayerStore(rlpath, glpath, s.graphDriver, false, ro) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fine… but I can’t see how that works. layersToRemove
is only applied to one rlstore
. That uses getLayerStoreLocked
, and AFAICS never gets to write to this special member of roLayerStoresUseGetters
.
I’m probably being stupid.
19fc9a6
to
be5720d
Compare
Is this related to the Zstd work? |
no, this is not related to zstd |
I don't think it is too important for Podman 5.0, but it solves an issue that was reported some time ago: #1779 |
@kolyshkin @nalind PTAL |
@saschagrunert PTAL |
591d900
to
7d1b3f3
Compare
@haircommander @saschagrunert this PR affects the "split store" functionality in CRI-O |
cc @kannon92 |
7d1b3f3
to
2b50bc4
Compare
rebased |
this is a preparation for the next commit, which will use this feature. Signed-off-by: Giuseppe Scrivano <[email protected]>
2b50bc4
to
2a625d5
Compare
simplify the imagestore implementation and use it as a first-class store. It reduces the amount of code to deal with an image store in the driver (only overlay supports it at the moment). Closes: containers#1779 Signed-off-by: Giuseppe Scrivano <[email protected]>
when an image store is used, lock it as well as the graphroot. Signed-off-by: Giuseppe Scrivano <[email protected]>
Signed-off-by: Giuseppe Scrivano <[email protected]>
2a625d5
to
d1cc476
Compare
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: giuseppe, saschagrunert The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/lgtm |
simplify the imagestore implementation and use it as a first-class store. It reduces the amount of code to deal with an image store in the driver (only overlay supports it at the moment).
Closes: #1779
Signed-off-by: Giuseppe Scrivano [email protected]