-
Notifications
You must be signed in to change notification settings - Fork 783
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Buildah failing to pull large docker images with error storing to blob file #2224
Comments
@smothiki thanks for the issue report. It looks like you're running out of TMPDIR space in |
TMPDIR is large and didnt work
I think this issue is also related #1092 |
@TomSweeneyRedHat what else.I can try. We are stuck on migrating to openshift because of this error. Shall I try giving different temp dir?. Could you point me to the code where it might be happening? |
@smothiki It's happening somewhere deep in the parsing of the layer by the storage code. I suspect in the Couple of questions. Can you pull the image using Docker? Please do try another tmpdir, it won't hurt, but at this point I'm not as optimistic. Can you share the image you're trying to pull and/or the exact command and Dockerfiles (if any) that you're using? |
@TomSweeneyRedHat I executed this command I can't share the image, unfortunately. |
@mtrmac @vrothberg WDYT? |
Is there anything in particular to point at that? My best guess is that the error is actually when reading the image (perhaps the server is failing the read, or the compressed file is corrupt, possibly at https://github.com/klauspost/pgzip/blob/3286875a1223e4bd304d6bcc4bb2c463fae762f2/gunzip.go#L543 . @smothiki Could you please try the following, to separate the read from the write paths?
Also, the debug log of the original |
Oh, the |
@mtrmac I did the three steps with skopeo as mentioned
buildah pull docker-archive:/home/some/engine/test:test failed with below error
error computing local image name for "docker-archive:/home/some/engine/test:docker.io/library/test:latest": error opening tarfile "/home/some/engine/test:docker.io/library/test:latest" as a source image: error opening file "/home/some/engine/test:docker.io/library/test:latest": open /home/some/engine/test:docker.io/library/test:latest: no such file or directory |
The weird part is we have the same image in a different registry and buildah pull was working with that image and skopeo inspect of two images have the same layers |
I apologize, my mistake; As for the Anyway, @nalind, any idea how to separately test/invoke the c/storage/pkg/archive decompression path? |
Outside of writing code to call it directly, there isn't really a "decompress this" path from the CLI, though I suppose using This particular error message looks like it's coming from If you've got the image in a directory, I'd be curious if switching to the directory and running |
|
I am also facing same issue. I tried the approach suggested above The strange part is, i was able to build image using buildah just 1 day before in same setup, however same is not working now, I am able to pull image using docker but not using buildah, buildah pull is throwing error "Error writing blob: error storing blob to file "/var/tmp/storage951065376/3": unexpected EOF" Do we have solution for this issue. Could anyone please help. Thanks |
Hi,
The image is hosted on an internal registry, therefore we assume no network issues.
The problem doesn't occur always with the same image and same worker node, but rather occasionally with different images and worker nodes.
Might there be the same root cause or should we file an issue against |
Podman and Buildah use the same underlying library containers/image and the issue could very well be in there. |
Hum, that’s concerning. Can you narrow it down to a set of images, or possibly a probability? Tracking down a concurrency bug in decompression is going to be ugly, especially without a reproducer. If you do get a reproducer (even if only statistical, i.e. “this fails in 10% of cases” implies 1000 tries is probably enough to validate), it would be interesting to try replacing the use of |
O.k. it's weird. @mtrmac When you say replacing pgzip with gzip, can we do that on our own by configuration or would this be done programmatically? |
That would be a change in the Go code + recompile. BTW we have now seen a report of an “unexpected EOF” when copying large images where no compression is involved; that well might be an entirely different bug (initial data seems to suggest so), or it might mean that the focus on compression is misguided. |
Basically find all lines like
pgzip "compress/gzip" … though it might require a few more changes or similar changes to not-100%-the-same lines.
|
We give up, for now. Due to too many problems we've migrated back to docker-ce. |
i have the problem also in a big layer. it comes from ibm/webshere and it would be very nice to fix it. thanks |
I am not sure this can be fixed. Basically we pull down the image blob to disk, then we explode the image blob into containers/storage. Worse case we end up with 2X Size. |
but what can we do to solve this ? maybe some debug logging ? Maybe the hint above about replacing pgzip with gzip ? |
@nalind Ideas? |
We have a pull request to update the compression libraries for containers/storage. But this would make the image non standard and would not be usable via other container engines. @giuseppe PTAL Bottom line right now, I believe you need up to 2x size to store the image. |
@rhatdan i have no problem with the storage, the file system is big enough. but it doesn´t work on pulling the layer from the docker registry. 17:58:05 level=debug msg="error copying src image ["docker://was-traditional-base:9.0.5.4"] to dest image ["was-traditional-base:9.0.5.4"] err: Error writing blob: error storing blob to file "/buildah/storage927894400/9": error happened during read: unexpected EOF" |
@mtrmac @vrothberg WDYT? |
Do we have the full log for this? I'm wondering if we're hitting a timeout. |
here the full log, last night i saw the job was one time running, maybe there is a download issue and buildah needed to retry the download, i saw in dockerbuild sometimes retries of layer downloads ? Myabe there was on option to make this configurable ?
|
Hmm, unfortunately we're not logging relative timestamps in those messages, so I don't have a way of knowing how much time passed between the point when the client requested the layers and the point when it got those EOFs (have opened #3168 to restore them). If it was a consistent amount of time for each layer, that would lend some support to the idea that the registry is cutting the client off after a certain amount of time. If you're running the job directly or via a script, can you run it under |
@nalind here the log with timestamps included. Our jenkins was building the images last night every hour, two times it works fine, the other tries it was broken. Maybe the download takes to much time ?
|
Wow, that's about five minutes with no messages between the last "copying blob / detected compression / using blob without modification" and the first "unexpected EOF" log message. That sure looks like a timeout to me. I don't see one being set in the client-side logic, but I wouldn't be that surprised if the server was giving up on clients after some amount of time to protect itself from potential denial-of-service attacks. |
[engine] Maximum number of image layers to be copied (pulled/pushed) simultaneously.Not setting this field, or setting it to zero, will fall back to containers/image defaults.image_parallel_copies=0Containers.conf has the following, I think we default to 6. |
We did not wire that code in yet. AFAIK the config is a nop.
…On Wed 21 Apr 2021 at 17:33, Daniel J Walsh ***@***.***> wrote:
[engine]
Maximum number of image layers to be copied (pulled/pushed) simultaneously. Not
setting this field, or setting it to zero, will fall back to
containers/image defaults. image_parallel_copies=0
Containers.conf has the following, I think we default to 6.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#2224 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ACZDRAZF5LKR32O7BR6ZKKLTJ3V5TANCNFSM4LHR27IQ>
.
|
@nalind you are right, the bandwith is limited because of an company vpn and the layers could be very big. in normal case the complete build of this images takes 12 mins. I looks like docker has bigger timeouts and more retries. |
I thought I did wire it up. in podmna. |
hi tested containers.conf looks like the default of six downloads is used |
another update, i testet with a local fast connected registry the same build and everything works fine. Looks like a problem with slow connected registries. |
Adding this feature to allow people with little band with to adjust the ammount of threads pulling an image. I don't have an easy way to test this other then manually. [NO TESTS NEEDED] Helps Fix: containers#2224 Signed-off-by: Daniel J Walsh <[email protected]>
A friendly reminder that this issue had no activity for 30 days. |
I believe with the parallel fixes this should be a fixed issue. |
Hello,
|
Could you use TMPDIR to set the temporary storage to point to a bigger location? TMPDIR=$HOME/tmp buildah bud ... |
The error is no longer seen. After I added Persistence volume claim, storage with increased size. |
Faced this issue (I think) in downloading a container with a 2GB layer and a 3.2GB layer. All the other (smaller) layers would get pull'ed fine, but with these two as soon as the 1GB mark is reached, I get an "Error writing blob ... Unexpected EOF". Fortunately the following work-around (implicitly suggested above -- nalind, 3/18/2020) worked:
|
I tried the workaround using
This is on macOS, which might be a factor. I did see |
(Edited by @TomSweeneyRedHat just to make the post more readable in GitHub. No content changes)
I'm running buildah in a privileged pod. Trying to pull a docker image of size 10GB and constantly failing.
Steps to reproduce the issue:
Describe the results you received:
failed with the below error
Describe the results you expected:
Should be able to pull the image successfully
Output of
rpm -q buildah
orapt list buildah
:Output of
buildah version
:Output of
podman version
if reporting apodman build
issue:Output of
cat /etc/*release
:Output of
uname -a
:Output of
cat /etc/containers/storage.conf
:The text was updated successfully, but these errors were encountered: