Is there a way not to include build-args in the cache key? #1008

thomasfulton · 2020-01-28T15:37:19Z

We are trying to use the kaniko executor in cloud build. We pass in a build timestamp and build commit sha into our dockerfile as build-args. Since these change with every build, the cache keys change every build even though the contents of the layers are the same. This causes a 100% cache miss rate. Is there any way around this? We would like to be able to pass in build args that change every build but still use kaniko cache.

tejal29 · 2020-01-29T18:36:32Z

Thanks @thomasfulton for your issue.
We actually did some to add build-args as part of cache key.
Can you be more specific about

Do you need all builds args to be excluded or just some?

We can provide a flag to configure it.

thomasfulton · 2020-01-29T21:01:54Z

@tejal29 Either way, either excluding all build args or having it be configurable so we can select which arguments to exclude. Both ways would work.

Out of curiosity why are the docker layer hashes themselves not used as the cache keys?

james-emerton · 2020-01-29T22:00:23Z

I'm experiencing the same issue; our build uses a build-arg to specify which version of a python wheel to install.

Our Dockerfile is as follows:

FROM python:3.7-buster

# Required for pg_dump and pg_restore
RUN echo "deb http://apt.postgresql.org/pub/repos/apt buster-pgdg main" >> /etc/apt/sources.list \
    && curl -s https://www.postgresql.org/media/keys/ACCC4CF8.asc | apt-key add - \
    && apt-get update \
    && apt-get install -y postgresql-client-10 \
    && apt-get clean \
    && rm -rf /var/lib/apt/lists/*

COPY ./requirements.txt /

RUN pip install -r requirements.txt

ARG version
WORKDIR /app
RUN pip install --target /app spire-server==$version

EXPOSE 10880
ENTRYPOINT ["/usr/local/bin/python", "-m", "spired.cli"]

It would be nice if the build-arg wasn't considered part of the cache key until the ARG or, even better, until it was actually used.

raijinsetsu · 2020-02-03T21:48:39Z

I've run into exactly the same issue and I'm also getting 100% cache miss.

Before I happened on this issue, I had assumed that the hash of a RUN layer was done by hashing the expanded command line and the hashes of prior layers and state (ex. ENV declarations do not create layers but do create "state"). In that case, if the command line used a build argument, the existing cached layer would get used as long as THAT build argument hadn't changed after expansion.

hlavacek · 2020-02-05T15:12:56Z

We are also experiencing this issue, in our case we would like to pass git commit IDs as build arguments to our builds, so the app can use it for logging or traceability - but with the current approach it is always invalidating the whole cache, not just from the moment the argument is used in the docker file like documented on https://docs.docker.com/engine/reference/builder/#impact-on-build-caching

raijinsetsu · 2020-02-07T16:56:03Z

Actually... It looks like Kaniko uses EVERY variable passed through --build-arg on the command line, even if it is never referenced by an ARG line.

tejal29 · 2020-02-12T21:33:10Z

Thanks folks, as soon as we get some bandwidth, we will fix this.

cvgw · 2020-02-13T00:39:21Z

Out of curiosity why are the docker layer hashes themselves not used as the cache keys?

Because layer digest produced by kaniko are not idempotent; running the same build twice will always produce different layer digests unless the --reproducible flag is used. This is working as expected (just a consequence of Kaniko's fundamental design).

Also, I'm not sure how you would get the layer digest without building the layer which at that point you aren't caching.

Edit - I realize now you might have meant something other than the digest of the layer, perhaps you meant a hash of all paths in the filesystem or something like that. Apologies for the confusion.

@thomasfulton

cvgw · 2020-02-13T00:41:45Z

I'd want to understand how docker handles this case before making any major changes and if at all possible avoid adding another flag (kaniko already has a lot of configuration).

I'm having a hard time imagining how you could build a deterministic cache key without the build args. How do you know the build args won't affect the output of the layer?

raijinsetsu · 2020-02-13T01:00:23Z

Hi Cole, Docker only takes the ARG and ENV directives into account from the line that appear downwards. A FROM directive resets that - prior ARE are forgotten after the FROM. This is documented in the Dockerfile documentation. Kaniko, on the other hand, doesn't look at the ARG directive at all and all build args specified on the command line, regardless of if they are used, are composed into the cache key. I saw this when running - v=debug. We just want Kaniko to behave like Docker. Thanks

…

On Wed, Feb 12, 2020, 19:41 Cole Wippern ***@***.***> wrote: I'd want to understand how docker handles this case before making any major changes and if at all possible avoid adding another flag (kaniko already has a lot of configuration). I'm having a hard time imagining how you could build a deterministic cache key without the build args. How do you know the build args won't affect the output of the layer? — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#1008?email_source=notifications&email_token=ABT6LNX3COF5IC2FHUZN7GLRCSJMTA5CNFSM4KMUQZP2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOELS6DNQ#issuecomment-585490870>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABT6LNVVEDPBNEUMFUK3HEDRCSJMTANCNFSM4KMUQZPQ> .

raijinsetsu · 2020-02-13T01:02:32Z

PS - I worked around this by writing the value to a file, copying that file just before the RUN that needs it, and having the script which is run use the file instead of an environment variable. It's not pretty, but I could not wait for this fix.

…

On Wed, Feb 12, 2020 at 8:00 PM Lucas Lacroix ***@***.***> wrote: Hi Cole, Docker only takes the ARG and ENV directives into account from the line that appear downwards. A FROM directive resets that - prior ARE are forgotten after the FROM. This is documented in the Dockerfile documentation. Kaniko, on the other hand, doesn't look at the ARG directive at all and all build args specified on the command line, regardless of if they are used, are composed into the cache key. I saw this when running - v=debug. We just want Kaniko to behave like Docker. Thanks On Wed, Feb 12, 2020, 19:41 Cole Wippern ***@***.***> wrote: > I'd want to understand how docker handles this case before making any > major changes and if at all possible avoid adding another flag (kaniko > already has a lot of configuration). > > I'm having a hard time imagining how you could build a deterministic > cache key without the build args. How do you know the build args won't > affect the output of the layer? > > — > You are receiving this because you are subscribed to this thread. > Reply to this email directly, view it on GitHub > <#1008?email_source=notifications&email_token=ABT6LNX3COF5IC2FHUZN7GLRCSJMTA5CNFSM4KMUQZP2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOELS6DNQ#issuecomment-585490870>, > or unsubscribe > <https://github.com/notifications/unsubscribe-auth/ABT6LNVVEDPBNEUMFUK3HEDRCSJMTANCNFSM4KMUQZPQ> > . >

cvgw · 2020-02-13T17:43:03Z

Hi Cole, Docker only takes the ARG and ENV directives into account from the line that appear downwards. A FROM directive resets that - prior ARE are forgotten after the FROM. This is documented in the Dockerfile documentation. Kaniko, on the other hand, doesn't look at the ARG directive at all and all build args specified on the command line, regardless of if they are used, are composed into the cache key. I saw this when running - v=debug. We just want Kaniko to behave like Docker. Thanks
…
On Wed, Feb 12, 2020, 19:41 Cole Wippern @.***> wrote: I'd want to understand how docker handles this case before making any major changes and if at all possible avoid adding another flag (kaniko already has a lot of configuration). I'm having a hard time imagining how you could build a deterministic cache key without the build args. How do you know the build args won't affect the output of the layer? — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#1008?email_source=notifications&email_token=ABT6LNX3COF5IC2FHUZN7GLRCSJMTA5CNFSM4KMUQZP2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOELS6DNQ#issuecomment-585490870>, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABT6LNVVEDPBNEUMFUK3HEDRCSJMTANCNFSM4KMUQZPQ .

@raijinsetsu thanks for the info; to make sure I understand correctly given the following scenario

Dockerfile

FROM foo:latest
ARG VERSION
RUN apt-get install -y bar-${VERSION}

FROM baz:latest
ARG COMMIT
RUN git checkout ${COMMIT}

docker build --build-arg COMMIT=xf443d VERSION=v0.14.3 ./

The cache key for stage 0 (FROM foo:latest) would contain the ARG VERSION, but not COMMIT

The cache key for stage 1 (FROM baz:latest) would contain the ARG COMMIT, but not VERSION

Is that essentially correct?

raijinsetsu · 2020-02-13T21:28:25Z

According to the Docker documentation: FROM foo:latest key=() ARG VERSION RUN apt-get install -y bar-${VERSION} key=(ARG VERSION=v0.14.3) FROM baz:latest key=(ARG VERSION=v0.14.3) ARG COMMIT ARG BUILD_TYPE RUN git checkout ${COMMIT} key=(ARG COMMIT=xf443d,ARG BUILD_TYPE=development) RUN make key=(ARG COMMIT=xf443d,ARG BUILD_TYPE=development) The "key" value I show on the right is that additional data to be included in the key with the commands. Reference: https://docs.docker.com/engine/reference/builder/#scope Hope that helps. On Thu, Feb 13, 2020 at 12:43 PM Cole Wippern <[email protected]> wrote:

…

Hi Cole, Docker only takes the ARG and ENV directives into account from the line that appear downwards. A FROM directive resets that - prior ARE are forgotten after the FROM. This is documented in the Dockerfile documentation. Kaniko, on the other hand, doesn't look at the ARG directive at all and all build args specified on the command line, regardless of if they are used, are composed into the cache key. I saw this when running - v=debug. We just want Kaniko to behave like Docker. Thanks … <#m_4694278128194927049_> On Wed, Feb 12, 2020, 19:41 Cole Wippern *@*.***> wrote: I'd want to understand how docker handles this case before making any major changes and if at all possible avoid adding another flag (kaniko already has a lot of configuration). I'm having a hard time imagining how you could build a deterministic cache key without the build args. How do you know the build args won't affect the output of the layer? — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#1008 <#1008>?email_source=notifications&email_token=ABT6LNX3COF5IC2FHUZN7GLRCSJMTA5CNFSM4KMUQZP2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOELS6DNQ#issuecomment-585490870>, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABT6LNVVEDPBNEUMFUK3HEDRCSJMTANCNFSM4KMUQZPQ . @raijinsetsu <https://github.com/raijinsetsu> thanks for the info; to make sure I understand correctly given the following scenario Dockerfile FROM foo:latest ARG VERSION RUN apt-get install -y bar-${VERSION} FROM baz:latest ARG COMMIT RUN git checkout ${COMMIT} docker build --build-arg COMMIT=xf443d VERSION=v0.14.3 ./ The cacher key for stage 0 (FROM foo:latest) would contain the ARG VERSION, but not COMMIT The cache key for stage 1 (FROM baz:latest) would contain the ARG COMMIT, but not VERSION Is that essentially correct? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#1008?email_source=notifications&email_token=ABT6LNSTTWHRPAPMBAPRIBDRCWBCRA5CNFSM4KMUQZP2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOELV5XQI#issuecomment-585882561>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABT6LNQ2XBDK4R2XEZKEISTRCWBCRANCNFSM4KMUQZPQ> .

tejal29 · 2020-02-28T10:06:29Z

It would be nice if the build-arg wasn't considered part of the cache key until the ARG or, even better, until it was actually used.

Agreed. #1085 addresses this issue.

tejal29 · 2020-02-28T21:12:41Z

Hey folks i pushed following images with a fix if anyone is interested in trying out

gcr.io/kaniko-project/executor:cache_fix
gcr.io/kaniko-project/executor:cache_fix_debug 
gcr.io/kaniko-project/warmer:cache_fix_debug

evheny0 · 2020-03-10T16:08:46Z

gcr.io/kaniko-project/executor:cache_fix with different build args from command line works for me

zzh8829 · 2021-12-17T00:29:18Z

@tejal29 the fix in #1085 is not according to the dockerfile standard https://docs.docker.com/engine/reference/builder/#impact-on-build-caching

In particular, all RUN instructions following an ARG instruction use the ARG variable implicitly (as an environment variable), thus can cause a cache miss. All predefined ARG variables are exempt from caching unless there is a matching ARG statement in the Dockerfile.

In this fix, only explicit ARG usage are treated as cache miss where in the docker standard all RUN statement should in fact trigger cache miss. A simple way to reproduce the bug is like this

ARG test
RUN printenv

with the kaniko cache RUN printenv will incorrectly print the cached ARG instead of using the latest result. With dockerbuild this behaves as expected.

tejal29 added area/caching For all bugs related to cache issues kind/enhancement New feature or request labels Jan 29, 2020

tejal29 added this to the Release v1.0.0 milestone Jan 29, 2020

tejal29 added good first issue Good for newcomers help wanted Looking for a volunteer! labels Jan 29, 2020

tejal29 self-assigned this Feb 4, 2020

cvgw added needs-discussion Items which need more discussion before commitment and removed good first issue Good for newcomers labels Feb 13, 2020

tejal29 modified the milestones: Release v1.0.0, Release v1.1.0 Feb 27, 2020

tejal29 mentioned this issue Feb 28, 2020

remove build args from composite key and replace all build args #1085

Merged

3 tasks

tejal29 added the in progress label Feb 28, 2020

tejal29 closed this as completed in #1085 Mar 17, 2020

zzh8829 mentioned this issue Dec 17, 2021

ARG/ENV used in script does not invalidate build cache #1688

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is there a way not to include build-args in the cache key? #1008

Is there a way not to include build-args in the cache key? #1008

thomasfulton commented Jan 28, 2020

tejal29 commented Jan 29, 2020

thomasfulton commented Jan 29, 2020

james-emerton commented Jan 29, 2020

raijinsetsu commented Feb 3, 2020

hlavacek commented Feb 5, 2020

raijinsetsu commented Feb 7, 2020

tejal29 commented Feb 12, 2020

cvgw commented Feb 13, 2020 •

edited

Loading

cvgw commented Feb 13, 2020

raijinsetsu commented Feb 13, 2020 via email

raijinsetsu commented Feb 13, 2020 via email

cvgw commented Feb 13, 2020 •

edited

Loading

raijinsetsu commented Feb 13, 2020 via email

tejal29 commented Feb 28, 2020

tejal29 commented Feb 28, 2020

evheny0 commented Mar 10, 2020

zzh8829 commented Dec 17, 2021

Is there a way not to include build-args in the cache key? #1008

Is there a way not to include build-args in the cache key? #1008

Comments

thomasfulton commented Jan 28, 2020

tejal29 commented Jan 29, 2020

thomasfulton commented Jan 29, 2020

james-emerton commented Jan 29, 2020

raijinsetsu commented Feb 3, 2020

hlavacek commented Feb 5, 2020

raijinsetsu commented Feb 7, 2020

tejal29 commented Feb 12, 2020

cvgw commented Feb 13, 2020 • edited Loading

cvgw commented Feb 13, 2020

raijinsetsu commented Feb 13, 2020 via email

raijinsetsu commented Feb 13, 2020 via email

cvgw commented Feb 13, 2020 • edited Loading

raijinsetsu commented Feb 13, 2020 via email

tejal29 commented Feb 28, 2020

tejal29 commented Feb 28, 2020

evheny0 commented Mar 10, 2020

zzh8829 commented Dec 17, 2021

cvgw commented Feb 13, 2020 •

edited

Loading

cvgw commented Feb 13, 2020 •

edited

Loading