Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dockerTools: add streaming image support, improve speed and reduce IO #91084

Merged
merged 13 commits into from
Jun 29, 2020

Conversation

purcell
Copy link
Member

@purcell purcell commented Jun 19, 2020

Motivation for this change

In the context of client work, we (@utdemir and myself) have been producing large (multi-GB) docker images with Nix. These images work nicely once produced, with the obvious limitations of unwieldy images, but the machinery for producing them via Nix is costly in a couple of senses:

  1. A great deal of IO is performed in the course of building images, since intermediate tar files are created for each layer, then a final tar created for the full image.
  2. Docker images are often immediately loaded into a docker daemon or copied around with tools like skopeo, but dockerTools always realises them into the Nix store, where they will be included in binary caches and generally take up lots of disk space, particularly in a CI environment.

We set about addressing both of these costs, by using the following scheme:

  • Firstly, we rewrite the Nix + bash code for producing a docker image tarball as a Python program which avoids creating intermediate files.
  • Next, we provide a streamLayeredImage function, analogous to buildLayeredImage, but which produce as their outputs scripts which - when run - write the corresponding docker image to stdout, streaming directly from the underlying layer content store paths. For many users, this will be all they need.
  • buildLayeredImage uses the new streamLayeredImage machinery to realise the image tarballs into the Nix store as before, for those users who still want this functionality.

Our results have been encouraging locally. Given the following example large image definition:

{nixpkgs}:
let
  pkgs = import nixpkgs { };
in
pkgs.dockerTools.buildLayeredImage {
  name = "test";
  maxLayers = 4;
  tag = "latest";
  contents = [
    (pkgs.writeScriptBin "test" "echo 1212")
    pkgs.ghc
    pkgs.qt4
    pkgs.qemu
    pkgs.clang
  ];
  config = {
    Cmd = "ghci";
  };
}

building after these changes takes 1m33s vs 5m58s with the previous code. Additionally, with the streamLayeredImage function, the same image can be piped to docker load in 2 minutes 30 seconds.

We're presenting these changes for feedback on API and approach. Some further notes:

  • These changes pass the existing nixos tests.
  • Other offline testing in our local project environment has taken place.
  • We target a newer Docker image standard (1.2) than the previous code, and therefore do not need to use tarsum. We believe that the newer standard - which has been supported since Docker 1.12 - should be fine for all reasonable real-world uses.
  • There was previously the ability to override ownership of files in images to a non-root user, but this does not seem necessary or desirable, and we have not provided that option.
  • If additional documentation is needed as part of this, we'd be happy to provide it if directed.
  • We have yet to add some separate tests for the streaming API: the code is covered by existing tests, but specific test cases may be helpful to prevent regressions later.
  • buildImage does not use buildLayeredImage to build a one-layered image, since previously those two functions had parallel implementations. We have not changed this, but there is an opportunity to do so.

Thanks!

Things done
  • Tested using sandboxing (nix.useSandbox on NixOS, or option sandbox in nix.conf on non-NixOS linux)
  • Built on platform(s)
    • NixOS
    • macOS
    • other Linux distributions
  • Tested via one or more NixOS test(s) if existing and applicable for the change (look inside nixos/tests)
  • Tested compilation of all pkgs that depend on this change using nix-shell -p nixpkgs-review --run "nixpkgs-review wip"
  • Tested execution of all binary files (usually in ./result/bin/)
  • Determined the impact on package closure size (by running nix path-info -S before and after)
  • Ensured that relevant documentation is up to date
  • Fits CONTRIBUTING.md.

@ofborg ofborg bot added 8.has: documentation 10.rebuild-darwin: 0 This PR does not cause any packages to rebuild 10.rebuild-linux: 0 This PR does not cause any packages to rebuild labels Jun 19, 2020
@gilligan
Copy link
Contributor

I guess there aren't really any user-facing changes are there? Which means there is no documentation that needs to be updated, unless there is documentation with implementation details.

@ofborg ofborg bot added the 2.status: merge conflict This PR has merge conflicts with the target branch label Jun 21, 2020
@utdemir
Copy link
Member

utdemir commented Jun 21, 2020

I guess there aren't really any user-facing changes are there?

buildLayeredImage is not supposed to have any user-facing change, except the uid and gid omission Steve mentioned above (which are already not covered by the existing docs). However, it might be useful to mention the new streamLayeredImage function alongside with commands like:

$(nix-build) | docker load
$(nix-build) | gzip --fast | skopeo copy docker-archive:/dev/stdin docker://some_docker_registry/myimage:tag

since they avoid realizing the image tarball in the Nix store, which is desirable in some cases.

@utdemir utdemir force-pushed the intro-stream-layered-image branch from d371ec9 to 1619952 Compare June 21, 2020 00:36
@ofborg ofborg bot removed the 2.status: merge conflict This PR has merge conflicts with the target branch label Jun 21, 2020
@flokli
Copy link
Contributor

flokli commented Jun 21, 2020

[…] However, it might be useful to mention the new streamLayeredImage function alongside with commands like:

$(nix-build) | docker load
$(nix-build) | gzip --fast | skopeo copy docker-archive:/dev/stdin docker://some_docker_registry/myimage:tag

since they avoid realizing the image tarball in the Nix store, which is desirable in some cases.

Ack 👍 That'd be really nice :-)

@utdemir
Copy link
Member

utdemir commented Jun 22, 2020

Thank you for the suggestions @gilligan and @flokli. We updated the PR accordingly (and also did a rebase from master).

Let us know if there is anything else we should do to get this in a mergeable state, or if there is any other suggestions or clarifications I can make.

@purcell
Copy link
Member Author

purcell commented Jul 16, 2020

Hi @adrian-gierakowski -- co-author of the change here. I'm a Mac guy, and I can't get linuxkit to work for me on Catalina, so I'm working on getting access to a Linux build machine for testing purposes.

Here's my take on what you're seeing: in the buildImage world, the image itself is assembled entirely inside the Linux builder. What you (quite reasonably) want in your use case is for the streamLayeredImage result script to run on the darwin host system, but for the store items which form the contents of the image to be built by the Linux builder. So I feel like something like this should work:

let
  linuxPkgs = import <nixpkgs> {  system = "x86_64-linux"; };
  hostPkgs = import <nixpkgs> {};
in
hostPkgs.dockerTools.streamLayeredImage {
  name = "hello";
  contents = [ linuxPkgs.hello ];
  config = {
    Cmd = [ "${linuxPkgs.hello}/bin/hello" ];
  };
}

adrian-gierakowski added a commit to rhinofi/nixpkgs that referenced this pull request Jul 21, 2020
…(MacOS)

to avoid error described in NixOS/nix#3321 (comment)

the problem manifests itself when building streamLayeredImage in single user mode
on MacOS using method described in NixOS#91084 (comment)
(via a remote builder)
@adrian-gierakowski
Copy link
Contributor

adrian-gierakowski commented Jul 25, 2020

@purcell That worked, thanks! I did have to jump through a bunch of hoops though to make it work as a few things turned out to be broken on MacOS. All of the fixes below are needed to create an image on MacOS with the new implementation:

And somehow unrelated, /nix/store permissions on the image need a fix to be able to run the image as a non root user: #93811

Thanks for the great work!

adrian-gierakowski added a commit to adrian-gierakowski/nixpkgs that referenced this pull request Jul 26, 2020
Needed to allow running image as non root user.

Fixes a regression introduced by NixOS#91084
adrian-gierakowski added a commit to rhinofi/nixpkgs that referenced this pull request Jul 26, 2020
Needed to allow running image as non root user.

Fixes a regression introduced by NixOS#91084
@thatsmydoing
Copy link
Contributor

Another regression is that extraCommands no longer lets you create writable directories inside the image. This is one of the proposed scenarios when it was introduced here #52870

Having the uid and gid functionality would also be nice but I can understand that might cause some confusion and a better API would probably be better.

@utdemir
Copy link
Member

utdemir commented Aug 6, 2020

Another regression is that extraCommands no longer lets you create writable directories inside the image.

That is interesting. As far as I can see extraCommands does not change the permissions, and pretty much just copies a store path inside the container as-is. @thatsmydoing, do you have an example I can use to reproduce your issue?

@thatsmydoing
Copy link
Contributor

Running the following:

{ pkgs }:

with pkgs;

dockerTools.buildLayeredImage {
  name = "test";
  tag = "latest";
  contents = [ busybox ];
  extraCommands = ''
    mkdir -m 1777 tmp
  '';
  config.Cmd = [ "stat" "/tmp" ];
}

results in

  File: /tmp
  Size: 4096            Blocks: 8          IO Block: 4096   directory
Device: 24h/36d Inode: 1720124     Links: 2
Access: (0555/dr-xr-xr-x)  Uid: (    0/ UNKNOWN)   Gid: (    0/ UNKNOWN)
Access: 2020-08-06 09:52:00.000000000
Modify: 1970-01-01 00:00:01.000000000
Change: 2020-08-06 09:52:00.000000000

The main difference is that previously, the layer tar is made as part of the runCommand so permissions can be preserved as they were. Now that it's only built as a store path, the permissions get reset by nix and then the layer tar is made.

@utdemir
Copy link
Member

utdemir commented Aug 6, 2020

Oh, right. Thanks @thatsmydoing . I'll fix it tomorrow, probably by making the customisation layer derivation create a tar archive instead of a directory.

utdemir added a commit to utdemir/nixpkgs that referenced this pull request Sep 4, 2020
…rball

This fixes as issue described here[1], where permissions set by 'extraCommands'
were ignored by Nix.

[1] NixOS#91084 (comment)
adisbladis pushed a commit to adisbladis/nixpkgs that referenced this pull request Nov 19, 2020
…rball

This fixes as issue described here[1], where permissions set by 'extraCommands'
were ignored by Nix.

[1] NixOS#91084 (comment)

(cherry picked from commit ae82f81)
adisbladis pushed a commit that referenced this pull request Nov 19, 2020
…rball

This fixes as issue described here[1], where permissions set by 'extraCommands'
were ignored by Nix.

[1] #91084 (comment)

(cherry picked from commit ae82f81)
@groodt
Copy link
Contributor

groodt commented Aug 9, 2021

@purcell @grahamc

Apologies for digging up a closed PR. I wasn't quite sure where to ask the question.

I'm looking to load some images built with dockerTools.buildLayeredImage into Bazel, but if the intermediate layers are compressed, the load is faster.
https://discourse.nixos.org/t/dockertools-buildlayeredimage-possible-to-compress-layers/14417/5

From what I can gather from the awful specs, is that I'm looking for OCI compatible images that are:
application/vnd.oci.image.layer.v1.tar+gzip

From what I can ascertain, the default for Docker is to export images with compressed layers. It seems to be the default, but it doesn't seem to be required or recommended from what I can see.

Is there a reason why the dockerTools uses uncompressed intermediate layers or was this only easier to implement or not seen as necessary / useful? If I was to find time to create a patch, would it be accepted and would you help point me in the right direction? Or should it rather be a different function to streamLayeredImage / buildLayeredImage? Or is my only option to try and perform surgery on the image outside dockerTools using my own function to hack it to work for my needs?

@utdemir
Copy link
Member

utdemir commented Aug 10, 2021

Hi @groodt,

From what I can gather from the awful specs, is that I'm looking for OCI compatible images that are: application/vnd.oci.image.layer.v1.tar+gzip

streamLayeredImage is aiming to implement is Docker Image Specification v1.2 , and as far as I can see it does not mention being able to compress the layers.

I am not super familiar with OCI compatible images, so I don't know if they are compatible/similar to what is implemented; or even possible to implement this with our unusual appraoch.

Is there a reason why the dockerTools uses uncompressed intermediate layers or was this only easier to implement or not seen as necessary / useful

The reason I can think of is that, streamLayeredImage works by traversing the store paths twice, initially just to calculate the file sizes of the layers, and again to actually stream it as a tarball. This was the only way to stream the tarball without writing to disk; since tar file format puts the file size before the file contents, and that forces us to calculate the layer size before emitting the layer itself. If we were to compress the layers, I assume that'd require us to do the compression twice, and I'm afraid that'd be too wasteful.

If I was to find time to create a patch, would it be accepted and would you help point me in the right direction?

If it ends up as a performance gain (smaller images, faster image creation etc.), I don't see why it wouldn't be accepted. As for the direction, it's pretty much modifying stream_layered_image.py to emit a tarball according to the standard. I can't say more because I just don't know the OCI standard, but I'd be happy to help if you have any questions with the existing script/approach.

Or is my only option to try and perform surgery on the image outside dockerTools using my own function to hack it to work for my needs?

To be honest, that sounds like the easiest. Likely something like skopeo can do the trick. If you're keen to contribute and think that is useful, maybe we should even have it as part of dockerTools in some form.

@thatsmydoing
Copy link
Contributor

thatsmydoing commented Aug 10, 2021

We've written our own image building tools that generate OCI here https://github.com/iknow/nix-utils/tree/master/oci if you're interested (we also don't compress them though). The paradigm works a bit differently from dockerTools though, we eschew runAsRoot entirely and everything has to be specified declaratively. And since it's working with raw layers directly, there's a few footguns.

That said, based on my experience, it probably wouldn't be too hard to port dockerTools to build OCI instead and there are some advantages. Layer-wise, OCI is pretty much just an unpacked docker image so there's no need to do multiple passes as you don't have to put the layer tars into the main docker image tar. That should amount to less building done overall (and less space usage since you can share layers across images).

@purcell
Copy link
Member Author

purcell commented Aug 10, 2021

Isn't there already a separate ociTools too? @utdemir and I didn't work on that, so at this stage I'm not sure how much overlap there is between it and dockerTools.

@groodt
Copy link
Contributor

groodt commented Aug 10, 2021

Thanks for the replies everyone. There's a lot for me to look at.

It does appear that OCI is the best-bet for longer-term compatibility and that Docker is now using https://github.com/distribution/distribution/blob/main/docs/spec/manifest-v2-2.md
which is very similar to OCI from what I can gather
https://github.com/opencontainers/image-spec/blob/main/media-types.md#compatibility-matrix

@purcell I think ociTools seems like it isn't really solving the same problem. It seems to build quite a minimal single layer image. It also seems to be the older format and not schemaVersion 2.

@purcell
Copy link
Member Author

purcell commented Aug 10, 2021

I think ociTools seems like it isn't really solving the same problem. It seems to build quite a minimal single layer image. It also seems to be the older format and not schemaVersion 2.

Yeah, that could well be. Like I say, we overhauled dockerTools, but it was only just after ociTools was added IIRC.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
8.has: documentation 10.rebuild-darwin: 0 This PR does not cause any packages to rebuild 10.rebuild-linux: 0 This PR does not cause any packages to rebuild
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants