Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cleaning Jib's local base image cache #1956

Closed
chanseokoh opened this issue Sep 5, 2019 · 9 comments
Closed

Cleaning Jib's local base image cache #1956

chanseokoh opened this issue Sep 5, 2019 · 9 comments

Comments

@chanseokoh
Copy link
Member

chanseokoh commented Sep 5, 2019

We don't delete cached base image layers. (Note application layers are under a build directory.)

Some ideas:

  • Automatic (configurable?) or manual (new task/goal)? Or Both?
  • LRU? Add access time metadata? Keep a log?
  • Prints info on local cache usage during builds?

See #1982 as well.

@ndarilek
Copy link

Out of curiosity, where is this cache?

It'd be nice to have a bit more transparency about what is cached where. I know about the cache under build/, but I'm trying to track down slow builds whenever we update our base image (slower than I think are justified for a pull, that is) and I have no idea where base images might be cached. Are they docker saved, cached in the daemon, etc.?

@TadCordle
Copy link
Contributor

TadCordle commented Sep 11, 2019

On linux, the base image cache is at $HOME/.cache/google-cloud-tools-java/jib, unless you set -Djib.useOnlyProjectCache=true, in which case it uses the same cache in build/ for both base image and application layers.

FYI, Jib doesn't use Docker for anything other than for building to the Docker daemon (i.e. ./gradlew jibDockerBuild or ./mvnw jib:dockerBuild) or using a Docker daemon base image (which hasn't been released yet). Jib maintains its own cache separate from Docker.

@ndarilek
Copy link

ndarilek commented Sep 11, 2019 via email

@TadCordle
Copy link
Contributor

TadCordle commented Sep 11, 2019

jib.useOnlyProjectCache is a system property without a corresponding configuration parameter, so I think you either need to pass it via commandline, or put it in a gradle.properties file. It looks like you can add something like systemProp.jib.useOnlyProjectCache=true to gradle.properties, but I haven't tested, so I'm not sure.

As for skaffold.yaml, I think you can just pass it as an arg.

...
build:
  artifacts:
  - image: ...
    jibGradle:
      args:
      - -Djib.useOnlyProjectCache=true
...

@ndarilek
Copy link

ndarilek commented Sep 11, 2019 via email

@chanseokoh
Copy link
Member Author

chanseokoh commented Sep 11, 2019

FYI, you can set the environment variables for the docker command in recent versions.

jib {
  dockerClient.environment = [ DOCKER_HOST: '...', DOCKER_TLS_VERIFY: '...' ]
}

You can also pass the environment through
-Djib.dockerClient.environment=key1="value1",key2="value2".

Having an FAQ entry sounds like a great idea. We will also think about exposing this information in other ways.

For diagnosing #1946, I'd start with a standalone build without Skaffold or Minikube. See #1970 and #1917 for ideas. And do #1946 (comment) to understand what exactly is happening. We can follow up in #1946. I'd like to know and fix the problem as much as you do.

@ndarilek
Copy link

ndarilek commented Sep 11, 2019 via email

@sstock
Copy link

sstock commented Dec 3, 2021

When considering any cache cleaning options, please take into account people with slow, unreliable, and/or metered Internet connections.

Automatic removal of cached images may be an insignificant concern on a high speed connection with unlimited usage. But when it takes an hour to download 50MB (real scenario with T-Mobile) the picture is very different. Or when there is a 15GB per month quota before throughput is greatly throttled (Verizon). That is assuming the connection is reliable enough for the download to finish. In these situations once something has been successfully downloaded, having to download it again can be a real problem. And with metered connections incur monetary cost.

Obviously manual removal is an option. Automatic can work for the above cases provided it can be configured. For example, LRU combined with a minimum TTL (months to years) can work well. Perhaps with an option to exempt some images from removal. A nice feature would be to allow reviewing the cleanup plan before anything is erased (i.e. dry run). If a few minute review can save hours of waiting to re-download an inadvertent image removal, it is worthwhile.

@JoeWang1127
Copy link

close as not planned

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants