Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ADD from url fails with 'invalid not-modified ETag' #905

Closed
mcfedr opened this issue Mar 26, 2019 · 19 comments · Fixed by #1159
Closed

ADD from url fails with 'invalid not-modified ETag' #905

mcfedr opened this issue Mar 26, 2019 · 19 comments · Fixed by #1159

Comments

@mcfedr
Copy link
Contributor

mcfedr commented Mar 26, 2019

Build fails at:

ADD https://getcomposer.org/composer.phar /usr/local/bin/composer

Log:

 => [1/17] FROM docker.io/library/php:7.3-fpm@sha256:cf8e94d24d94329f13bcd430ae586f80278247e1c43e5f8b3d52c4ab16d2464f                                                                                                         0.0s
 => ERROR https://getcomposer.org/composer.phar                                                                                                                                                                               0.3s
 => [internal] load build context                                                                                                                                                                                             0.0s
------
 > https://getcomposer.org/composer.phar:
------
invalid not-modified ETag: "5c912760-1d3e0e"

I'm wondering if this is related to #835 but thought worth reporting so it can also be tested.

The etag seems to work just fine, i.e. curl -I -H 'if-none-match: "5c912760-1d3e0e"' https://getcomposer.org/composer.phar returns 304.

@ahmetb
Copy link

ahmetb commented Aug 23, 2019

Also seeing it from github releases, such as:

	ADD https://github.com/Yelp/dumb-init/releases/download/v1.2.2/dumb-init_1.2.2_amd64 /bin/dumb_init

@rwe
Copy link
Contributor

rwe commented Sep 4, 2019

I believe I understand the cause.

The fix should be very simple , however I'm not set up for building in this ecosystem. Hopefully someone can provide a patch . EDIT see (untested) PR #1159.

I was seeing this too with a Google Cloud Storage URL and so decided to investigate. It seems unlikely that both GitHub releases and Google Cloud Storage are "misbehaving" with respect to ETags, and it's likely that instead BuildKit may be misinterpreting the If-None-Match spec.

Background info

Docker version info Docker for Mac 2.1.1.0 (27260)
Client:
 Debug Mode: false
 Plugins:
  app: Docker Application (Docker Inc., v0.8.0)
  buildx: Build with BuildKit (Docker Inc., v0.2.2-10-g3f18b65-tp-docker)

Server:
Containers: 15
Running: 0
Paused: 0
Stopped: 15
Images: 2126
Server Version: 19.03.1
Storage Driver: overlay2
Backing Filesystem: extfs
Supports d_type: true
Native Overlay Diff: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
Volume: local
Network: bridge host ipvlan macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 894b81a4b802e4eb2a91d1ce216b8817763c29fb
runc version: 425e105d5a03fabd737a126ad93d62a9eeede87f
init version: fec3683
Security Options:
seccomp
Profile: default
Kernel Version: 4.14.131-linuxkit
Operating System: Docker Desktop
OSType: linux
Architecture: x86_64
CPUs: 4
Total Memory: 1.945GiB
Name: docker-desktop
ID: WESG:FJU6:DS3Q:6OS3:D7SX:VK5N:Q4UH:A5BS:T46V:2MJH:YKWB:XQPX
Docker Root Dir: /var/lib/docker
Debug Mode: true
File Descriptors: 29
Goroutines: 44
System Time: 2019-09-04T19:03:23.537795672Z
EventsListeners: 2
HTTP Proxy: gateway.docker.internal:3128
HTTPS Proxy: gateway.docker.internal:3129
Registry: https://index.docker.io/v1/
Labels:
Experimental: true
Insecure Registries:
127.0.0.0/8
Live Restore Enabled: false
Product License: Community Engine

Reproduction Steps

Dockerfile

FROM scratch
ADD https://storage.googleapis.com/gpt-2/models/774M/checkpoint

The first run (or a run after the cache is cleared) succeeds:

DOCKER_BUILDKIT=1 docker build .
[+] Building 0.3s (5/5) FINISHED
 => [internal] load .dockerignore                                                              0.0s
 => => transferring context: 244B                                                              0.0s
 => [internal] load build definition from Dockerfile                                           0.0s
 => => transferring dockerfile: 121B                                                           0.0s
 => https://storage.googleapis.com/gpt-2/models/774M/checkpoint                                0.0s
 => [1/1] ADD https://storage.googleapis.com/gpt-2/models/774M/checkpoint .                    0.0s
 => exporting to image                                                                         0.0s
 => => exporting layers                                                                        0.0s
 => => writing image sha256:9a630453cc77705a4f57c121393a38a4f45f65eba6155ba6c3f27ecab18e2b05   0.0s

Running the same command again:

[+] Building 0.2s (3/4)
 => [internal] load build definition from Dockerfile                                           0.0s
 => => transferring dockerfile: 36B                                                            0.0s
 => [internal] load .dockerignore                                                              0.0s
 => => transferring context: 35B                                                               0.0s
 => ERROR https://storage.googleapis.com/gpt-2/models/774M/checkpoint                          0.1s
------
 > https://storage.googleapis.com/gpt-2/models/774M/checkpoint:
------
invalid not-modified ETag:

Clearing the cache with docker builder prune resets the state.

Cause

If a single ETag is requested in If-None-Match, the server may not include that (unambiguous) ETag header in the response.

Detailed demonstration

Requesting the file directly:

curl --http1.1 -s -L -D /dev/stderr -o /dev/null https://storage.googleapis.com/gpt-2/models/774M/checkpoint

Response headers. Notice the ETag.

HTTP/1.1 200 OK
X-GUploader-UploadID: AEnB2Up6PhdYRb_UZ18VAl30f6XLzFkOoBPnSYSjSKqzk90Go8Zqk-zZoenkbL3inKQz1ozoLjcObKKuIbvOV7XFlSSFO6aW0Q
Expires: Wed, 04 Sep 2019 19:23:16 GMT
Date: Wed, 04 Sep 2019 19:23:16 GMT
Cache-Control: private, max-age=0
Last-Modified: Tue, 20 Aug 2019 15:50:08 GMT
ETag: "ca0368fcd3c4c1a99aca42511d0c1f12"
x-goog-generation: 1566316208157027
x-goog-metageneration: 1
x-goog-stored-content-encoding: identity
x-goog-stored-content-length: 77
Content-Type: application/octet-stream
x-goog-hash: crc32c=BI0EFw==
x-goog-hash: md5=ygNo/NPEwamaykJRHQwfEg==
x-goog-storage-class: MULTI_REGIONAL
Accept-Ranges: bytes
Content-Length: 77
Server: UploadServer
Alt-Svc: quic=":443"; ma=2592000; v="46,43,39"

Requesting the file If-None-Match with that ETag:

curl --http1.1 -s -L -D /dev/stderr -o /dev/null -H 'If-None-Match: "ca0368fcd3c4c1a99aca42511d0c1f12"' https://storage.googleapis.com/gpt-2/models/774M/checkpoint

Notice that the ETag is not included in the response headers.

HTTP/1.1 304 Not Modified
X-GUploader-UploadID: AEnB2UpEqNrI5fDJjglD2f4--3CCzskMyUg-Fo1RZxoqqHq17HG8W_gURMO6uUVy9B6Mg4450GyA4yRTjPqEJY8v6dtxhuHcLQ
Expires: Wed, 04 Sep 2019 19:23:36 GMT
Date: Wed, 04 Sep 2019 19:23:36 GMT
Cache-Control: private, max-age=0
Last-Modified: Tue, 20 Aug 2019 15:50:08 GMT
Content-Length: 0
Server: UploadServer
Alt-Svc: quic=":443"; ma=2592000; v="46,43,39"

Requesting the file with multiple ETags in If-None-Match:

curl --http1.1 -s -L -D /dev/stderr -o /dev/null -H 'If-None-Match: "ca0368fcd3c4c1a99aca42511d0c1f12", "foobar"' https://storage.googleapis.com/gpt-2/models/774M/checkpoint

Now the ETag is included to disambiguate.

HTTP/1.1 200 OK
X-GUploader-UploadID: AEnB2UpD_nKA4ZMNpJvC97lMJfyXjcr9myMAxojFypxW8lUNiEGiwdaOtezf74-OBCHDXn7T4Ru57oelrDHb01wY9IMU1Qdl6A
Expires: Wed, 04 Sep 2019 19:26:02 GMT
Date: Wed, 04 Sep 2019 19:26:02 GMT
Cache-Control: private, max-age=0
Last-Modified: Tue, 20 Aug 2019 15:50:08 GMT
ETag: "ca0368fcd3c4c1a99aca42511d0c1f12"
x-goog-generation: 1566316208157027
x-goog-metageneration: 1
x-goog-stored-content-encoding: identity
x-goog-stored-content-length: 77
Content-Type: application/octet-stream
x-goog-hash: crc32c=BI0EFw==
x-goog-hash: md5=ygNo/NPEwamaykJRHQwfEg==
x-goog-storage-class: MULTI_REGIONAL
Accept-Ranges: bytes
Content-Length: 77
Server: UploadServer
Alt-Svc: quic=":443"; ma=2592000; v="46,43,39"

Fix

The following lines of code should be updated so that, if only one ETag was requested, and no ETag header was returned in the response, the requested ETag is assumed:

rwe added a commit to rwe/buildkit that referenced this issue Sep 4, 2019
If a single ETag is requested in `If-None-Match`, some servers do not
include that (unambiguous) ETag header in the response.

For detailed description, see:
moby#905 (comment)
rwe added a commit to rwe/buildkit that referenced this issue Sep 4, 2019
If a single ETag is requested in `If-None-Match`, some servers do not
include that (unambiguous) ETag header in the response.

For detailed description, see:
moby#905 (comment)

Signed-off-by: Robert Estelle <[email protected]>
@mcfedr
Copy link
Contributor Author

mcfedr commented Sep 12, 2019

Not that this should change the fix, as its clearly need for the real world cases, but, out of interest, RFC 7232 describing the 304 response appear does require the server to send back the ETag

The server generating a 304 response MUST generate any of the
following header fields that would have been sent in a 200 (OK)
response to the same request: Cache-Control, Content-Location, Date,
ETag, Expires, and Vary.

tonistiigi pushed a commit to tonistiigi/buildkit that referenced this issue Sep 20, 2019
If a single ETag is requested in `If-None-Match`, some servers do not
include that (unambiguous) ETag header in the response.

For detailed description, see:
moby#905 (comment)

Signed-off-by: Robert Estelle <[email protected]>
(cherry picked from commit c63c6f9)
thaJeztah added a commit to thaJeztah/docker that referenced this issue Oct 24, 2019
full diff: moby/buildkit@f704282...ae10b29

fixes:

- moby/buildkit#1144 Fix socket handling
    - fsutil: Handle copying unix sockets in local sources
    - update github.com/containerd/continuity to 75bee3e2ccb6
    - update github.com/tonistiigi/fsutil to 3d2716dd0a4d
- moby/buildkit#1150 ssh: Fix file descriptor leak when doing SSH forwarding
    - fixes moby/buildkit#1146 SSH Forwarding Doesn't Release File Descriptors
- moby/buildkit#1156 llbsolver: Fix using multiple remote cache importers
- moby/buildkit#1159 http: Handle missing but unambiguous ETags in response
    - fixes moby/buildkit#905 ADD from url fails with 'invalid not-modified ETag'
    - fixes moby/buildkit#784 invalid not-modified ETag with local network ADD in docker
- moby/buildkit#1166 solver: Fix possible inefficient parallelization in solver
    - fix cases where some events were dropped resulting inefficient parallelization.
- moby/buildkit#1168 vendor: update go-runc to e029b79d
- moby/buildkit#1140 contenthash: Fix bug with symlink in source path of a copy operation
    - fixes moby/buildkit#974 COPY --from fails when source path involves a symlink
    - fixes moby/buildkit#785 COPY rpc error: code = Unknown desc = not found: not found
    - fixes moby/buildkit#958 Issue COPY a file to a symlink directory
- moby/buildkit#1139 executor: oom_score_adj is no longer set from main process

Signed-off-by: Sebastiaan van Stijn <[email protected]>
nottrobin added a commit to nottrobin/jaas.ai that referenced this issue Jan 21, 2020
The jaas-dashboard clone was being cached in subsequent builds, so we
weren't pulling in the latest version of the repo for every build.

This should fix that. @jkfran found
[this technique](https://stackoverflow.com/questions/36996046/how-to-prevent-dockerfile-caching-git-clone)
of using `ADD` to query https://api.github.com/,
but unfortunately it didn't work for me because I kept getting the error
"invalid not-modified ETag" from GitHub. This
[should have been fixed](moby/buildkit#905)
but the fix doesn't seem to have made its way into latest docker-ce yet.
So at some point in the future we could use this solution.

For the time being I've instead used https://httpbin.org/uuid to create
a cache-busting line in the `Dockerfile` just above the git clone.
This is less efficient, in that the GitHub solution would allow
GitHub to return a 304 if nothing had changed, which would allow
Docker to make use of the further caching below that point, whereas in
this implementation it will simply clone the repo every time. But I
believe this is the best we can do for now.
@deeky666
Copy link

Seems like this should have been fixed by #1159. Is the patch released already? I'm on 19.03.8 and still experiencing this issue when trying to add PHP Composer (as described above).

@delboy1978uk
Copy link

I was getting this issue, but I found doing a docker system prune meant that I could rebuild again. Maybe this will help someone.

@tobia
Copy link

tobia commented Mar 23, 2021

I'm also still having this issue (with the same exact URL) with docker 20.10.5

Which version is this supposed to have been fixed in?

@delboy1978uk
Copy link

To get around the docker system prune annoyance, we stopped using ADD and just used a RUN command using curl

@robinjoerke
Copy link

To get around the docker system prune annoyance, we stopped using ADD and just used a RUN command using curl

please be aware that ADD is eveluated on every build and will not use cache in case the resource has change while a RUN curl will cache the result once and use the build cache from that point on. So using RUN and curl is not a full replcaement here

@delboy1978uk
Copy link

Interesting, thanks for that @robinjoerke

@dcoliversun
Copy link

I'm also still having this issue with docker 20.10.8

@robinjoerke
Copy link

I'm still having this issue with docker 20.10.11 (Docker for Windows)

@thaJeztah
Copy link
Member

The original issue was fixed two years ago; if you encounter this, and have more details, please use #2420 instead.

@frafra
Copy link

frafra commented Oct 26, 2022

Still happening with Docker 20.10.18. No OAuth or authentication is involved:

$ curl -I "https://oss.sonatype.org/service/local/artifact/maven/content?r=releases&g=org.openrefine&a=openrefine&v=3.6.1&c=linux&p=tar.gz"
HTTP/2 200 
date: Wed, 26 Oct 2022 13:25:28 GMT
content-type: application/x-gzip
content-length: 141540183
server: Nexus/2.15.1-02 Noelios-Restlet-Engine/1.1.6-SONATYPE-5348-V8
x-frame-options: SAMEORIGIN
x-content-type-options: nosniff
last-modified: Mon, 22 Aug 2022 19:56:56 GMT
etag: "{SHA1{cf151677b2a0184b73d637dbce9e6c82d98684de}}"
content-disposition: attachment; filename="openrefine-3.6.1-linux.tar.gz"
vary: Accept-Charset,Accept-Encoding,Accept-Language,Accept

@nevmerzhitsky
Copy link

Still happening with Docker 20.10.18. No OAuth or authentication is involved:

And in "Docker version 20.10.20, build 9fdeb9c" too:

$ curl -I "https://getcomposer.org/download/2.1.14/composer.phar"
HTTP/2 200
server: nginx
date: Wed, 23 Nov 2022 09:55:36 GMT
content-type: application/octet-stream
content-length: 2291189
last-modified: Tue, 30 Nov 2021 09:51:43 GMT
vary: Accept-Encoding
etag: "61a5f42f-22f5f5"
accept-ranges: bytes

Dockerfile:

FROM php:8.0-cli

RUN export DEBIAN_FRONTEND=noninteractive && \
    apt update && \
    apt install -y --no-install-recommends \
        git \
        unzip \
    && \
    rm -rf /var/lib/apt/lists/*

ADD https://github.com/mlocati/docker-php-extension-installer/releases/latest/download/install-php-extensions /usr/local/bin/
ADD https://getcomposer.org/download/2.1.14/composer.phar /usr/local/bin/

Output:

$ dc build --progress=plain --no-cache
#1 [internal] load build definition from Dockerfile
#1 transferring dockerfile: 1.01kB done
#1 DONE 0.0s

#2 [internal] load .dockerignore
#2 transferring context: 2B done
#2 DONE 0.0s

#3 [internal] load metadata for docker.io/library/php:8.0-cli
#3 DONE 0.0s

#4 [internal] load build context
#4 DONE 0.0s

#5 [ 1/11] FROM docker.io/library/php:8.0-cli
#5 CACHED

#6 [ 2/11] RUN export DEBIAN_FRONTEND=noninteractive &&     apt update &&     apt install -y --no-install-recommends         git         unzip     &&     rm -rf /var/lib/apt/lists/*
#6 ...

#7 https://getcomposer.org/download/2.1.14/composer.phar
#7 ERROR: invalid not-modified ETag: "61a5f42f-22f5f5"

#8 https://github.com/mlocati/docker-php-extension-installer/releases/latest/download/install-php-extensions
#8 CANCELED

#6 [ 2/11] RUN export DEBIAN_FRONTEND=noninteractive &&     apt update &&     apt install -y --no-install-recommends         git         unzip     &&     rm -rf /var/lib/apt/lists/*
#6 0.392
#6 0.392 WARNING: apt does not have a stable CLI interface. Use with caution in scripts.
#6 0.392
#6 CANCELED

@nevmerzhitsky
Copy link

$ docker info
Client:
 Context:    default
 Debug Mode: false
 Plugins:
  buildx: Docker Buildx (Docker Inc., v0.9.1)
  compose: Docker Compose (Docker Inc., v2.12.1)
  dev: Docker Dev Environments (Docker Inc., v0.0.3)
  extension: Manages Docker extensions (Docker Inc., v0.2.13)
  sbom: View the packaged-based Software Bill Of Materials (SBOM) for an image (Anchore Inc., 0.6.0)
  scan: Docker Scan (Docker Inc., v0.21.0)

Server:
 Containers: 12
  Running: 12
  Paused: 0
  Stopped: 0
 Images: 83
 Server Version: 20.10.20
 Storage Driver: overlay2
  Backing Filesystem: extfs
  Supports d_type: true
  Native Overlay Diff: true
  userxattr: false
 Logging Driver: json-file
 Cgroup Driver: cgroupfs
 Cgroup Version: 1
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
 Swarm: inactive
 Runtimes: io.containerd.runtime.v1.linux runc io.containerd.runc.v2
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: 9cd3357b7fd7218e4aec3eae239db1f68a5a6ec6
 runc version: v1.1.4-0-g5fd4c4d
 init version: de40ad0
 Security Options:
  seccomp
   Profile: default
 Kernel Version: 5.15.74.2-microsoft-standard-WSL2
 Operating System: Docker Desktop
 OSType: linux
 Architecture: x86_64
 CPUs: 4
 Total Memory: 7.761GiB
 Name: docker-desktop
 ID: AFAM:KYJR:GL7S:TRK3:PO6T:UPAF:7PMC:ZWMD:W6WC:GUYS:OBBV:KKCV
 Docker Root Dir: /var/lib/docker
 Debug Mode: false
 HTTP Proxy: http.docker.internal:3128
 HTTPS Proxy: http.docker.internal:3128
 No Proxy: hubproxy.docker.internal
 Registry: https://index.docker.io/v1/
 Labels:
 Experimental: false
 Insecure Registries:
  hubproxy.docker.internal:5000
  127.0.0.0/8
 Live Restore Enabled: false

WARNING: No blkio throttle.read_bps_device support
WARNING: No blkio throttle.write_bps_device support
WARNING: No blkio throttle.read_iops_device support
WARNING: No blkio throttle.write_iops_device support

@thaJeztah
Copy link
Member

thaJeztah commented Nov 23, 2022

The error is produced by code that we use from BuildKit; in this part of the BuildKit code;

if resp.StatusCode == http.StatusNotModified {
respETag := etagValue(resp.Header.Get("ETag"))
if respETag == "" && onlyETag != "" {
respETag = onlyETag
// Set the missing ETag header on the response so that it's available
// to .save()
resp.Header.Set("ETag", onlyETag)
}
md, ok := m[respETag]
if !ok {
return "", "", nil, false, errors.Errorf("invalid not-modified ETag: %v", respETag)
}

I'm not sure if it's an issue with the code, or if it's an issue with the server from which you're downloading though. I'm not very familiar with that part of the BuildKit code, but from looking at the code, it seems that it's producing that error if BuildKit previously downloaded that URL (and stored the ETag), and when checking if the URL is still current (or if the cache can be used), it got a 304 "StatusNotModified" status from the server, but the server actually replied with a different ETag (in other words; the server responds that the content wasn't modified, but the ETag indicates that it was)?

@bronger
Copy link

bronger commented Oct 19, 2023

And what can one do about it? Is it possible to kill the locally stored ETags somehow?

@thaJeztah
Copy link
Member

@bronger this is a 4-year, nearly 5-year old ticket; if you're running into an issue on a current version of docker or buildkit, and have steps to reproduce that can help to investigate, please open a new ticket instead with steps to reproduce, and information about your environment.

@duzun
Copy link

duzun commented Sep 9, 2024

Just use RUN curl -q -o /path/to/file http://url.to/file.sh

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet