-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Why are COPY
layers cacheable?
#1357
Comments
My guess is that this was added for non disk-local contexts like S3 & friends. If that is the case maybe disabling COPY caching when the context is a |
As far as I can tell from reading the docs, you can only COPY things that are already in the local docker build context. Even if they could be remote, you'd still have to hash them in order to check the layer cache. And to hash them you'd presumably have to download them locally first. |
@donmccasland sorry to ping you again, but if you can provide any guidance here I'd appreciate it! If there's anything to do I'm more than happy to implement. |
@isker Sorry for the delay. @donmccasland is no longer actively working on this project. Reason why copy layers are cacheable is to determine if the subsequent run commands can be used from cache.
The way kaniko caching works is, it uses the cache until a cache key for one command in dockerfile is changed. I would suggest, you could re-arrange your dockerfile so that file copies not affecting run command should come later. |
@tejal29 Thanks for responding.
This is what I did not understand. Unfortunately our Dockerfile is fundamentally copy-heavy. The ordering of COPYs and RUNs is already optimal in this regard. If you'll entertain more questions, why does Kaniko's layer cache work that way? Why can't it go back to the cache after producing a layer locally? I will open a PR documenting this behavior in the README. |
Thanks for the explanation @tejal29 that makes sense! I'm trying to follow the caching logic here and I'm probably missing something obvious, but would something like this work? With COPY commands specifically, instead of:
do something like:
This way the local context is always used as a source for COPY commands, but would also be used to determine whether the existing cache for subsequent layers can be used or not. Edit: thinking about it some more, we don’t even need to upload an empty layer or check for one’s existence. Just include the COPY command’s key in the subsequent commands’ composite keys-which is already done? |
@isker we can do in a person sync sometime if that helps. I like @kamaln7's solution to not stop caching where
@kamaln7 would like to submit a PR or a Design Doc for this? |
@tejal29 I will try to produce a PR for this. It looks to me that we could achieve all of the things you listed just by deleting the code that makes COPYs cacheable. Like @kamaln7 said, we already calculate the composite cache key for non-cacheable COPYs: Lines 186 to 187 in f20f495
This makes me wonder why COPYs were cacheable in the first place 🤔 . |
Cached COPY layers are expensive in that they both need to beretrieved over the network and occupy space in the layer cache. They are unnecessary in that we already have all resources needed to execute the COPY locally, and doing so is a trivial file-system operation. This is in contrast to RUN layers, which can do arbitrary and unbounded work. The end result is that cached COPY commands were more expensive when cached, not less. Remove them. Resolves GoogleContainerTools#1357
Cached COPY layers are expensive in that they both need to be retrieved over the network and occupy space in the layer cache. They are unnecessary in that we already have all resources needed to execute the COPY locally, and doing so is a trivial file-system operation. This is in contrast to RUN layers, which can do arbitrary and unbounded work. The end result is that cached COPY commands were more expensive when cached, not less. Remove them. Resolves GoogleContainerTools#1357
Cached COPY layers are expensive in that they both need to be retrieved over the network and occupy space in the layer cache. They are unnecessary in that we already have all resources needed to execute the COPY locally, and doing so is a trivial file-system operation. This is in contrast to RUN layers, which can do arbitrary and unbounded work. The end result is that cached COPY commands were more expensive when cached, not less. Remove them. Resolves GoogleContainerTools#1357
Cached COPY layers are expensive in that they both need to be retrieved over the network and occupy space in the layer cache. They are unnecessary in that we already have all resources needed to execute the COPY locally, and doing so is a trivial file-system operation. This is in contrast to RUN layers, which can do arbitrary and unbounded work. The end result is that cached COPY commands were more expensive when cached, not less. Remove them. Resolves GoogleContainerTools#1357
Cached COPY layers are expensive in that they both need to be retrieved over the network and occupy space in the layer cache. They are unnecessary in that we already have all resources needed to execute the COPY locally, and doing so is a trivial file-system operation. This is in contrast to RUN layers, which can do arbitrary and unbounded work. The end result is that cached COPY commands were more expensive when cached, not less. Remove them. Resolves GoogleContainerTools#1357
Cached COPY layers are expensive in that they both need to be retrieved over the network and occupy space in the layer cache. They are unnecessary in that we already have all resources needed to execute the COPY locally, and doing so is a trivial file-system operation. This is in contrast to RUN layers, which can do arbitrary and unbounded work. The end result is that cached COPY commands were more expensive when cached, not less. Remove them. Resolves GoogleContainerTools#1357
Cached COPY layers are expensive in that they both need to be retrieved over the network and occupy space in the layer cache. They are unnecessary in that we already have all resources needed to execute the COPY locally, and doing so is a trivial file-system operation. This is in contrast to RUN layers, which can do arbitrary and unbounded work. The end result is that cached COPY commands were more expensive when cached, not less. Remove them. Resolves GoogleContainerTools#1357
Cached COPY layers are expensive in that they both need to be retrieved over the network and occupy space in the layer cache. They are unnecessary in that we already have all resources needed to execute the COPY locally, and doing so is a trivial file-system operation. This is in contrast to RUN layers, which can do arbitrary and unbounded work. The end result is that cached COPY commands were more expensive when cached, not less. Remove them. Resolves #1357
The README says that only
RUN
layers are cacheable. But last yearCOPY
s were also made cacheablein dbabcb1 (cc @donmccasland).
I'm not an expert in docker, but it's not clear at face value why you'd want to cache such layers. If you're
COPY
ing some data in, you already have it locally, right? But when layer caching is enabled with a remote repository, instead of just executing that local copy, you hash that local data, then go to the network to download a layer containing the same data. Or, upon a miss, you spend time pushing the layer to your cache.We have a performance-sensitive image build using Kaniko that has a few
RUN
s that we'd like cached, and some largeCOPY
s that, being cached, are actually offsetting performance improvements from cachingRUN
s. And our image cache is much larger than we'd like.@donmccasland can you clarify the intent of cacheable
COPY
s? And would you be receptive to a PR adding a flag to disable caching forCOPY
commands?Thanks for your time!
The text was updated successfully, but these errors were encountered: