Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issues with buildx #20189

Closed
kuza55 opened this issue Nov 15, 2023 · 22 comments
Closed

Issues with buildx #20189

kuza55 opened this issue Nov 15, 2023 · 22 comments
Labels
backend: Docker Docker backend-related issues bug

Comments

@kuza55
Copy link

kuza55 commented Nov 15, 2023

Describe the bug
I am trying to use the new buildx features from source.

I have a multi-stage build with a docker_image target that builds on top of a base target.

With the regular builder, things work fine, but with the buildx builder, I am running into this error today:

Dockerfile:2
--------------------
   1 |     ARG BASE_IMAGE=docker/base:base
   2 | >>> FROM $BASE_IMAGE
   3 |
   4 |     RUN pip install opentelemetry-distro opentelemetry-exporter-otlp
--------------------
ERROR: failed to solve: europe-west4-docker.pkg.dev/smart-shoreline-391915/espresso-docker/base/base:0.1: failed to authorize: failed to fetch oauth token: unexpected status: 403 Forbidden

The error here is probably unrelated to pants, docker is failing to pull this image from a remote repo, however docker should not need to pull this image since it exists locally because it was just built by pants:

$ docker images
REPOSITORY                                                                                       TAG               IMAGE ID       CREATED          SIZE
europe-west4-docker.pkg.dev/smart-shoreline-391915/espresso-docker/base/base                     0.1               f8d553f88c7e   14 hours ago     534MB

When I disable buildx, my build works fine.

Pants version
From source on main

OS
Ubuntu via WSL

@kuza55 kuza55 added the bug label Nov 15, 2023
@kaos
Copy link
Member

kaos commented Nov 15, 2023

Hi, thanks for reporting.

Just to confirm, is the output field for your docker_image target using the default value of {"type": "docker"}?

Also, please include the git sha of your source version, as main is a moving target ;)

@kuza55
Copy link
Author

kuza55 commented Nov 15, 2023

Git SHA is 7e15e5c

I am using the default value for type.

I am not using a multi-platform build afaik; I am using a multi-stage build where the stages are separate docker_image targets.

@kaos
Copy link
Member

kaos commented Nov 15, 2023

Git SHA is 7e15e5c

👍🏽

I am using the default value for type.

I am not using a multi-platform build afaik; I am using a multi-stage build where the stages are separate docker_image targets.

yea, I mis-read that for a split second.. just picked up the multi-.. part ;p

@kaos
Copy link
Member

kaos commented Nov 15, 2023

How do you stitch multiple docker_image targets into a single multi-stage build? Or, I guess it's not a multi-stage build in Docker terms, perhaps? (i.e. in Docker a multi-stage build is one docker build using a single Dockerfile with multiple images defined in it.)

Do you see the europe-west4-docker.pkg.dev/smart-shoreline-391915/espresso-docker/base/base:0.1 image when you run docker images ?

@kuza55
Copy link
Author

kuza55 commented Nov 15, 2023

I have 2 targets,

# docker/base/BUILD
docker_image(
    name="base",
    image_tags=["0.1"],
    registries=[
        "@gcp",
    ],
    cache_to={
        "type": "registry",
        "ref": "europe-west4-docker.pkg.dev/smart-shoreline-391915/espresso-docker/base/cache:latest",
        "mode": "max"
    },
    cache_from={
        "type": "registry",
        "ref": "europe-west4-docker.pkg.dev/smart-shoreline-391915/espresso-docker/base/cache:latest"
    }
)
# docker/prod/BUILD
docker_image(
    name="prod",
    image_tags=["0.1"],
    cache_to={
        "type": "registry",
        "ref": "europe-west4-docker.pkg.dev/smart-shoreline-391915/espresso-docker/prod/cache:latest",
        "mode": "max"
    },
    cache_from={
        "type": "registry",
        "ref": "europe-west4-docker.pkg.dev/smart-shoreline-391915/espresso-docker/prod/cache:latest"
    }
)

And then docker/prod/Dockerfile starts with

ARG BASE_IMAGE=docker/base:base
FROM $BASE_IMAGE

I am not super clear on how pants stitches these together or what the terminology is if not a multi-stage build.

And yes, I do see the relevant image when I run docker images; I included a snippet of that output in the original post.

@kaos
Copy link
Member

kaos commented Nov 15, 2023

OK, I'm not sure what's up with buildx here. Perhaps @riisi have more insights?

Regarding what Pants does, it simply chains multiple docker builds, something ~eq. to:

docker build -t base:0.1 src/base
docker build -t prod:0.1 --build-arg BASE_IMAGE=base:0.1 src/prod

and the "magic" here is the build arg BASE_IMAGE, so pants passes in the image name of your base image that you identified using a default value pointing to its target address.

@riisi
Copy link
Contributor

riisi commented Nov 16, 2023

however docker should not need to pull this image since it exists locally because it was just built by pants

I would have thought when you are using the --cache-from option, the second image build will still try to fetch the remote cache from the registry, so it will need to have creds.
That said, failure to retrieve cache shouldn't result in failed build.

I'm wondering if this is a more general problem with fetching/pushing to GCP - have you tried building and pushing a single image?

After you build (package) the image with Pants, can you see the image exists locally with docker images ?

What do you have in pants.toml?

@kuza55
Copy link
Author

kuza55 commented Nov 16, 2023

This is actually separate from my attempts to use caching.

Here is my toml file:

use_buildx = true
env_vars = [
  "DOCKER_CONFIG=%(homedir)s/.docker",
  "DOCKER_BUILDKIT=0",
  "HOME",
  "AWS_PROFILE=apricot",
]
tools = [
  "docker-credential-gcloud", # or docker-credential-gcloud when using artifact registry
  "dirname",
  "readlink",
  "python3",
  # These may be necessary if using Pyenv-installed Python.
  "cut",
  "sed",
  "bash",
  # This is for aws
  "docker-credential-ecr-login",
  "getent"
]
default_repository = "{directory}/{name}"

[docker.registries.gcp]
default = true
address = "europe-west4-docker.pkg.dev"
repository = "smart-shoreline-391915/espresso-docker/{directory}/{name}"

And yes, as noted in my first post, I see the image when I run docker images.

I have read that buildx has it's own cache, but I have been unable to figure out how to inspect it.

To be clear, my main concern here is not whether I can build the image or not, but what the provenance of the image is when I have my credentials configured correctly and whether it will unnecessarily read things from the network when it already has the image locally.

@riisi
Copy link
Contributor

riisi commented Nov 16, 2023

You may need to enable the containerd image store (although actually that may be needed for multiplatform builds only).

Running with the Pants option --docker-build-verbose may help troubleshoot this - this will give you the docker CLI commands that Pants is constructing as @kaos alluded to above. E.g.

18:42:36.36 [INFO] stdout: "['/usr/local/bin/docker', 'buildx', 'build', '--cache-from=type=inline', '--cache-to=type=inline', '--output=type=docker', '--pull=False']"

@kuza55
Copy link
Author

kuza55 commented Nov 16, 2023

I see pants running this command:

docker buildx build --pull=False --tag europe-west4-docker.pkg.dev/smart-shoreline-391915/espresso-docker/prod/prod:0.1 --build-arg BASE_IMAGE=europe-west4-docker.pkg.dev/smart-shoreline-391915/espresso-docker/base/base:0.1 --file docker/prod/Dockerfile .

Which reproduces the error when I run it from the sandbox. When I remove the buildx fragment, it builds fine despite auth errors.

After a bit more googling it seems like this is "expected" for buildx:

moby/moby#42893
moby/buildkit#2343

Presumably this means it requires a dependency on a repository, though it does make me a bit concerned that the build depends on what is on a remote service and has potential race conditions if someone else writes to the same tag.

@kuza55
Copy link
Author

kuza55 commented Nov 16, 2023

Enabling the containerd image store did not help, but also seems like a hack around the nonhermetic nature of buildx.

@riisi
Copy link
Contributor

riisi commented Nov 16, 2023

potential race conditions if someone else writes to the same tag.

Wondering if there's a solution to this by using Pants to generate a deterministic hash in the tag.

@kuza55
Copy link
Author

kuza55 commented Nov 16, 2023

I think a deterministic hash tag would be great.

Showing these tags in build output and providing a way to access them in other build commands would also be great if possible (e.g. a shell target to start a container).

I have not dug too deep into what pants already supports here with git hashes etc, but I do want to get to a workflow where multiple people can build, publish & run containers without tripping over each other and not needing to manually edit tags in a repo.

@riisi
Copy link
Contributor

riisi commented Nov 16, 2023

The deterministic hash part should already be possible. I'm not sure re. the rest. Probably worth asking / searching in Slack for these types of questions.

@kuza55
Copy link
Author

kuza55 commented Nov 16, 2023

I think you pasted the wrong link for the deterministic hashes?

It feels like it should be the standard though? Having builds be nonhermetic, even when the underlying fault is with docker feels like a footgun.

@riisi
Copy link
Contributor

riisi commented Nov 16, 2023

Fixed. Yes, I agree it would be nice to have a default (or even some recommendations) to avoid this.

@kaos
Copy link
Member

kaos commented Nov 16, 2023

regarding a stable hash, see the {pants.hash} interpolation value from https://www.pantsbuild.org/docs/tagging-docker-images#string-interpolation-using-placeholder-values

@huonw huonw added the backend: Docker Docker backend-related issues label Nov 16, 2023
@ndellosa95
Copy link
Contributor

Based on the extensive discussion on this issue here. I think this can potentially be solved by either a) aliasing the upstream image in the build context of the downstream image to point to a local image or b) setting the value of the build arg in pants to the address of the local image directly. I'm going to set aside some time either today or tomorrow to experiment with this and prove out that this works.

@ndellosa95
Copy link
Contributor

ndellosa95 commented Dec 7, 2023

Okay so I did some exploration here - unfortunately I was unable to get something working. Buildx drivers other than the default docker driver are totally unable to pull images from the local image store, they can only pull images from a registry.

There is a solution here though, which is to use buildx bake to package images with buildx - instead of doing these docker builds separately pants could map the docker builds into a single bake file and then call the bake command. I am going to experiment with this now and confirm it works as I anticipate.

@ndellosa95
Copy link
Contributor

Confirmed that bake works pretty nicely!

@riisi
Copy link
Contributor

riisi commented Jan 5, 2024

@ndellosa95 I've looked into the issue with multiple dependent images and was able to get it working using the Containerd Image Store.

Here's what I've used to test this locally (M2 Mac):

# BUILD
docker_image(
    name="base",
    source="Dockerfile.base",
    cache_to={"type": "local", "dest": "/tmp/docker/pants-test-cache"},
    cache_from={"type": "local", "src": "/tmp/docker/pants-test-cache"},
    build_platform=["linux/amd64" ,"linux/arm64"],
)

docker_image(
    name="final",
    source="Dockerfile.final",
    cache_to={"type": "local", "dest": "/tmp/docker/pants-test-cache"},
    cache_from={"type": "local", "src": "/tmp/docker/pants-test-cache"},
    build_platform=["linux/amd64", "linux/arm64"],
)
# Dockerfile.base
FROM python:3.8
RUN echo "base image" >> base.txt
# Dockerfile.final
ARG PARENT=:base
FROM ${PARENT}

RUN cat base.txt && \
  echo "final image" >> final.txt
# pants.toml (relevant config only)
[GLOBAL]
pants_version = "2.19.0rc3"

[docker]
use_buildx=true
# (Note that I didn't need to map any env vars)

Enable the containerd image store - either using Docker Desktop or by setting Docker Engine config via /etc/docker/daemon.json:

{
  "features": {
    "containerd-snapshotter": true
  }
}

Switch to the default "docker" build driver (e.g., do not use docker-container) - e.g., docker builder use desktop-linux (or default).

Note I ran into this issue on my machine causing the error docker: 'buildx' is not a docker command. and was able to resolve it by creating a symlink per this comment.

I was also able to test this successfully with Github Actions.

@ndellosa95 Would you be able to check if this helps with your use case?

I'm going to put in a PR to update the docs to suggest this as the recommended approach. The containerd image store is in beta but as far as I can see has been stable for a while and there are no obvious limitations I can see.

Here is a PR to update the example-docker repo.

@riisi
Copy link
Contributor

riisi commented Mar 7, 2024

Closing this as I believe it's fixed since 2.19 with above approach - let me know otherwise.

@riisi riisi closed this as completed Mar 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend: Docker Docker backend-related issues bug
Projects
None yet
Development

No branches or pull requests

5 participants