Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

daemon.Write implementation is pretty naive #205

Open
mattmoor opened this issue Jun 6, 2018 · 8 comments
Open

daemon.Write implementation is pretty naive #205

mattmoor opened this issue Jun 6, 2018 · 8 comments
Labels
enhancement New feature or request good first issue Good for newcomers lifecycle/frozen

Comments

@mattmoor
Copy link
Collaborator

mattmoor commented Jun 6, 2018

here

Today the daemon.Write interface just uses tarball.Write into the docker load interface. While correct, this can be incredibly slow for scenarios like ko -L because of its lack of incrementality.

In particular, for a large base image we produce and stream a fat tarball to the daemon for every publish. On top of this, since we don't have a local cache wrapping remote.Image, we download the base image every time. remote.Write elides both upload and download through the careful use of existence checks. We should be equally careful in daemon.Write.

One option to explore for this is what rules_docker did in its incremental load script. However, we should be careful to measure how this performs on a full daemon (we've seen superlinear behavior in some of the daemon calls before).

@mattmoor mattmoor added the ko label Jun 6, 2018
@mattmoor
Copy link
Collaborator Author

mattmoor commented Jun 6, 2018

cc @dlorenc not sure if/what of your portfolio of tools might be using daemon.Write.

This would certainly be mitigated by the local caching we've been talking about, which feels relevant to kaniko + FROM caching, but ideally we'd just never access the content from layers present in the daemon.

@mattmoor
Copy link
Collaborator Author

I think that this could take advantage of the recently added l.(*remote.Layer) trick to direct the daemon to pull the image itself (vs. having us side-load it).

Then we'd want to add the capacity to omit layers from the tarball via a callback here.

This should avoid potential pathological behavior in the daemon, which I believe would otherwise require us to:

  1. Enumerate all images in the daemon
  2. Enumerate all the diff-ids for each image

@mattmoor
Copy link
Collaborator Author

Correction: l.(*remote.MountableLayer). We would also have to generalize this to capture the full name.Reference instead of just the name.Repository.

mattmoor added a commit to mattmoor/go-containerregistry that referenced this issue Jun 10, 2018
Mounting is just one way to take advantage of this information, so preserve as much information as we can to support other potential uses of this information.

Related: google#205
mattmoor added a commit to mattmoor/go-containerregistry that referenced this issue Jun 10, 2018
This should be considered a relatively advanced option, but for folks that know
what they are doing you can reduce the amount of data that you need to encode in
the tarball for the daemon to load it.

The ultimate use case of this option will be from `daemon.Write`, which
currently uses the `docker load` interface to pull image into the daemon,
however, this currently reuploads (and redownloads) the base image on each write
in context like `ko`. If we can determine the set of layers that already exist
in the daemon we can elide these from the tarball to dramatically improve
performance.

Related: google#205
mattmoor added a commit that referenced this issue Jun 11, 2018
Mounting is just one way to take advantage of this information, so preserve as much information as we can to support other potential uses of this information.

Related: #205
@jonjohnsonjr jonjohnsonjr removed the ko label Mar 21, 2019
@jonjohnsonjr jonjohnsonjr added enhancement New feature or request good first issue Good for newcomers labels Sep 11, 2019
@jonjohnsonjr
Copy link
Collaborator

cc @ekcasey is this similar to the hack you were describing? Or does imgutil do a different kind of incremental daemon loading?

@ekcasey
Copy link
Contributor

ekcasey commented Sep 26, 2019

@jonjohnsonjr We also create a daemon image using the docker load interface.

However, we know that we are extending a base that already exists in the daemon and therefore we can do a hack. We discovered we could load a tarball that omits layer blobs, when a layer with the same chain ID already exists in the daemon. Therefore, when creating images we load tarballs with a manifest.json that look like this example:

[
  {
    "Config": "51a25b04935cbaccdbb1dea72fb16a80f2757402e8463e918f2c75eb57ef7469.json",
    "Layers": [
      "",
      "",
      "",
      "",
      "",
      "",
      "",
      "",
      "",
      "",
      "",
      "",
      "",
      "",
      "",
      "",
      "",
      "",
      "",
      "",
      "",
      "",
      "",
      "",
      "",
      "",
      "",
      "",
      "",
      "",
      "",
      "",
      "",
      "",
      "",
      "",
      "/dcc323fceebaf70aef8963672f6a4ada5edf1f949cf2f7412d02c484a2315fc8.tar",
      "/6848676545aa4bac2b01866242d43b5c0edb6d5ec4457262aedecaab12ca0a39.tar",
      "/777892cdea451f26a93d0778f283823c9b09b39368763ca2b424c535d04b255b.tar",
      "/396c22d2de0ac4f52f4abddca706b76828db5b6e4f182cac964dbceda07c91b6.tar"
    ],
    "RepoTags": [
      "pack.local/builder/727574716f636379677a:latest"
    ]
  }
]

and only includes the additional layers. This significantly speeds things up but it isn't a general case solution.

This hack unfortunately can't be used to avoid reimporting layers to the daemon unless they appear in the exact same order as an existing image.

@jonjohnsonjr
Copy link
Collaborator

This hack unfortunately can't be used to avoid reimporting layers to the daemon unless they appear in the exact same order as an existing image.

Yeah that's what rules_docker is doing here as well. I think it's actually a really common case during development, given that you're often re-loading the same base over and over again.

@jonjohnsonjr
Copy link
Collaborator

Looks like rules_docker does a linear probe of the daemon to determine what it already has -- I wonder if doing a binary search is faster, or if we expect so few layers to be shared that linear is better (since it guarantees only one miss).

jonjohnsonjr added a commit to jonjohnsonjr/go-containerregistry that referenced this issue Oct 3, 2019
Continuation of this PR:
google#209

This should be considered a relatively advanced option, but for folks that know
what they are doing you can reduce the amount of data that you need to encode in
the tarball for the daemon to load it.

The ultimate use case of this option will be from daemon.Write, which
currently uses the docker load interface to pull image into the daemon,
however, this currently reuploads (and redownloads) the base image on each write
in context like ko. If we can determine the set of layers that already exist
in the daemon we can elide these from the tarball to dramatically improve
performance.

Related: google#205
jonjohnsonjr added a commit to jonjohnsonjr/go-containerregistry that referenced this issue Oct 3, 2019
Continuation of this PR:
google#209

This should be considered a relatively advanced option, but for folks that know
what they are doing you can reduce the amount of data that you need to encode in
the tarball for the daemon to load it.

The ultimate use case of this option will be from daemon.Write, which
currently uses the docker load interface to pull image into the daemon,
however, this currently reuploads (and redownloads) the base image on each write
in context like ko. If we can determine the set of layers that already exist
in the daemon we can elide these from the tarball to dramatically improve
performance.

Related: google#205
jonjohnsonjr added a commit to jonjohnsonjr/go-containerregistry that referenced this issue Nov 13, 2019
Continuation of this PR:
google#209

This should be considered a relatively advanced option, but for folks that know
what they are doing you can reduce the amount of data that you need to encode in
the tarball for the daemon to load it.

The ultimate use case of this option will be from daemon.Write, which
currently uses the docker load interface to pull image into the daemon,
however, this currently reuploads (and redownloads) the base image on each write
in context like ko. If we can determine the set of layers that already exist
in the daemon we can elide these from the tarball to dramatically improve
performance.

Related: google#205
@github-actions
Copy link

This issue is stale because it has been open for 90 days with no
activity. It will automatically close after 30 more days of
inactivity. Reopen the issue with /reopen. Mark the issue as
fresh by adding the comment /remove-lifecycle stale.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request good first issue Good for newcomers lifecycle/frozen
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants