`daemon.Write` implementation is pretty naive #205

mattmoor · 2018-06-06T14:08:26Z

Today the daemon.Write interface just uses tarball.Write into the docker load interface. While correct, this can be incredibly slow for scenarios like ko -L because of its lack of incrementality.

In particular, for a large base image we produce and stream a fat tarball to the daemon for every publish. On top of this, since we don't have a local cache wrapping remote.Image, we download the base image every time. remote.Write elides both upload and download through the careful use of existence checks. We should be equally careful in daemon.Write.

One option to explore for this is what rules_docker did in its incremental load script. However, we should be careful to measure how this performs on a full daemon (we've seen superlinear behavior in some of the daemon calls before).

The text was updated successfully, but these errors were encountered:

mattmoor · 2018-06-06T14:15:40Z

cc @dlorenc not sure if/what of your portfolio of tools might be using daemon.Write.

This would certainly be mitigated by the local caching we've been talking about, which feels relevant to kaniko + FROM caching, but ideally we'd just never access the content from layers present in the daemon.

mattmoor · 2018-06-10T15:37:36Z

I think that this could take advantage of the recently added l.(*remote.Layer) trick to direct the daemon to pull the image itself (vs. having us side-load it).

Then we'd want to add the capacity to omit layers from the tarball via a callback here.

This should avoid potential pathological behavior in the daemon, which I believe would otherwise require us to:

Enumerate all images in the daemon
Enumerate all the diff-ids for each image

mattmoor · 2018-06-10T19:01:04Z

Correction: l.(*remote.MountableLayer). We would also have to generalize this to capture the full name.Reference instead of just the name.Repository.

Mounting is just one way to take advantage of this information, so preserve as much information as we can to support other potential uses of this information. Related: google#205

This should be considered a relatively advanced option, but for folks that know what they are doing you can reduce the amount of data that you need to encode in the tarball for the daemon to load it. The ultimate use case of this option will be from `daemon.Write`, which currently uses the `docker load` interface to pull image into the daemon, however, this currently reuploads (and redownloads) the base image on each write in context like `ko`. If we can determine the set of layers that already exist in the daemon we can elide these from the tarball to dramatically improve performance. Related: google#205

Mounting is just one way to take advantage of this information, so preserve as much information as we can to support other potential uses of this information. Related: #205

jonjohnsonjr · 2019-09-12T19:37:56Z

cc @ekcasey is this similar to the hack you were describing? Or does imgutil do a different kind of incremental daemon loading?

ekcasey · 2019-09-26T17:57:55Z

@jonjohnsonjr We also create a daemon image using the docker load interface.

However, we know that we are extending a base that already exists in the daemon and therefore we can do a hack. We discovered we could load a tarball that omits layer blobs, when a layer with the same chain ID already exists in the daemon. Therefore, when creating images we load tarballs with a manifest.json that look like this example:

[
  {
    "Config": "51a25b04935cbaccdbb1dea72fb16a80f2757402e8463e918f2c75eb57ef7469.json",
    "Layers": [
      "",
      "",
      "",
      "",
      "",
      "",
      "",
      "",
      "",
      "",
      "",
      "",
      "",
      "",
      "",
      "",
      "",
      "",
      "",
      "",
      "",
      "",
      "",
      "",
      "",
      "",
      "",
      "",
      "",
      "",
      "",
      "",
      "",
      "",
      "",
      "",
      "/dcc323fceebaf70aef8963672f6a4ada5edf1f949cf2f7412d02c484a2315fc8.tar",
      "/6848676545aa4bac2b01866242d43b5c0edb6d5ec4457262aedecaab12ca0a39.tar",
      "/777892cdea451f26a93d0778f283823c9b09b39368763ca2b424c535d04b255b.tar",
      "/396c22d2de0ac4f52f4abddca706b76828db5b6e4f182cac964dbceda07c91b6.tar"
    ],
    "RepoTags": [
      "pack.local/builder/727574716f636379677a:latest"
    ]
  }
]

and only includes the additional layers. This significantly speeds things up but it isn't a general case solution.

This hack unfortunately can't be used to avoid reimporting layers to the daemon unless they appear in the exact same order as an existing image.

jonjohnsonjr · 2019-09-26T18:25:23Z

This hack unfortunately can't be used to avoid reimporting layers to the daemon unless they appear in the exact same order as an existing image.

Yeah that's what rules_docker is doing here as well. I think it's actually a really common case during development, given that you're often re-loading the same base over and over again.

jonjohnsonjr · 2019-10-03T20:51:57Z

Looks like rules_docker does a linear probe of the daemon to determine what it already has -- I wonder if doing a binary search is faster, or if we expect so few layers to be shared that linear is better (since it guarantees only one miss).

Continuation of this PR: google#209 This should be considered a relatively advanced option, but for folks that know what they are doing you can reduce the amount of data that you need to encode in the tarball for the daemon to load it. The ultimate use case of this option will be from daemon.Write, which currently uses the docker load interface to pull image into the daemon, however, this currently reuploads (and redownloads) the base image on each write in context like ko. If we can determine the set of layers that already exist in the daemon we can elide these from the tarball to dramatically improve performance. Related: google#205

github-actions · 2020-10-27T01:49:18Z

This issue is stale because it has been open for 90 days with no
activity. It will automatically close after 30 more days of
inactivity. Reopen the issue with /reopen. Mark the issue as
fresh by adding the comment /remove-lifecycle stale.

mattmoor added the ko label Jun 6, 2018

imjasonh mentioned this issue Jun 6, 2018

git-init and creds-init images are too big knative/build#179

Closed

mattmoor mentioned this issue Jun 10, 2018

Preserve the full source reference in MountableLayer #208

Merged

mattmoor mentioned this issue Jun 10, 2018

Add the option to exclude certain layers from the tarball. #209

Closed

jonjohnsonjr removed the ko label Mar 21, 2019

jonjohnsonjr added enhancement New feature or request good first issue Good for newcomers labels Sep 11, 2019

This was referenced Oct 3, 2019

Add option to filter layers from tarball #558

Closed

Implement incremental daemon loading #559

Closed

ekcasey mentioned this issue Oct 5, 2020

local should use ggcr v1.Daemon buildpacks/imgutil#62

Closed

github-actions bot added the lifecycle/stale label Oct 27, 2020

jonjohnsonjr mentioned this issue Nov 10, 2020

Filters, matchers, and elision (tracking issue) #821

Closed

github-actions bot closed this as completed Nov 27, 2020

jonjohnsonjr added lifecycle/frozen and removed lifecycle/stale labels Jan 15, 2021

jonjohnsonjr reopened this Jan 15, 2021

This was referenced Aug 10, 2024

[ERROR]: encore eject docker <DOCKER_IMAGE_TAG> encoredev/encore#1316

Closed

ggcr: Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running? #1999

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`daemon.Write` implementation is pretty naive #205

`daemon.Write` implementation is pretty naive #205

mattmoor commented Jun 6, 2018

mattmoor commented Jun 6, 2018

mattmoor commented Jun 10, 2018

mattmoor commented Jun 10, 2018

jonjohnsonjr commented Sep 12, 2019

ekcasey commented Sep 26, 2019 •

edited

Loading

jonjohnsonjr commented Sep 26, 2019

jonjohnsonjr commented Oct 3, 2019

github-actions bot commented Oct 27, 2020

daemon.Write implementation is pretty naive #205

daemon.Write implementation is pretty naive #205

Comments

mattmoor commented Jun 6, 2018

mattmoor commented Jun 6, 2018

mattmoor commented Jun 10, 2018

mattmoor commented Jun 10, 2018

jonjohnsonjr commented Sep 12, 2019

ekcasey commented Sep 26, 2019 • edited Loading

jonjohnsonjr commented Sep 26, 2019

jonjohnsonjr commented Oct 3, 2019

github-actions bot commented Oct 27, 2020

`daemon.Write` implementation is pretty naive #205

`daemon.Write` implementation is pretty naive #205

ekcasey commented Sep 26, 2019 •

edited

Loading