archive: report error from input stream #2012

giuseppe · 2024-07-11T12:21:47Z

if there is an error reading from the input stream, prefer to report it instead of the error from the filter program itself.

We have a test in the buildah CI that expects the "no space left on device" error that comes from the input stream, to avoid changing the test, just fix it here.

Reported here: containers/buildah#5585

pkg/archive/filter.go

if there is an error reading from the input stream, prefer to report it instead of the error from the filter program itself. We have a test in the buildah CI that expects the "no space left on device" error that comes from the input stream, to avoid changing the test, just fix it here. Reported here: containers/buildah#5585 Signed-off-by: Giuseppe Scrivano <[email protected]>

rhatdan · 2024-07-11T20:09:46Z

LGTM

rhatdan · 2024-07-11T21:06:46Z

@mtrmac PTANL

cgwalters · 2024-07-11T22:09:19Z

pkg/archive/filter.go

-			err = fmt.Errorf("%s: %w", strings.TrimRight(stderrBuf.String(), "\n"), err)
+			b := make([]byte, 1)
+			// if there is an error reading from input, prefer to return that error
+			_, errRead := input.Read(b)


Anyways here...is it generally defined behavior in Go to read from a reader after it's returned EOF? It would seem unusual to cache errors right?

Actually I am now a bit more confused as to the flow here - how did this break the ENOSPC in the first place? And how does this fix fix it? The input here is the http request reader, right? It wouldn't be the thing giving us ENOSPC...

I think it is just a race condition. The decompressor is faster now and detects the error earlier, while before we were relying on catching the ENOSPC before the decompressor.

At a high level I don't understand: why in a test case that we're checking for ENOSPC are we hitting corruption in the compression stream? That doesn't make sense to me.

Different topic: My semi-understanding here is that the previous code was synchronously doing decompression and writes in a single goroutine. That will make the errors quite predictable indeed (in practice, they could only change if the decompressor happened to change to start reading larger chunks or something).

But here, we're suddenly doing decompression in a separate process (but logically, we could have done it in a separate goroutine before, sending decompressed chunks via a pipe or channel) - and it's that parallelization that makes things racy, not speed of decompression itself.

Anyways here...is it generally defined behavior in Go to read from a reader after it's returned EOF?

The API does not explicitly say, but I would expect a Reader to infinitely return EOF on end-of-file.

It would seem unusual to cache errors right?

I wouldn’t generally expect a Read to continue returning a previously-reported error. Some implementations do that, but e.g. for a raw file access, I’d expect it to just issue a syscall again. So, yes, that might plausibly lose data, but we don’t care here — if we don’t see any error, we would report the one from the subprocess.

Actually I am now a bit more confused as to the flow here - how did this break the ENOSPC in the first place? And how does this fix fix it?

The input here is the http request reader, right? It wouldn't be the thing giving us ENOSPC...

I think here the input is an io.TeeReader actually writing to the restricted-space directory. The storageImageDestination.putBlobToPendingFile pipeline is

HTTP -> … -> Tee -> digest compressed -> decompress -> digest uncompressed | -> count size -> store to file

and a read from the Tee causes a write to the side branch, which fails with ENOSPC, and is reported to the consumer of the Tee.

(Yes, it would have been nice to have that explanation at the start of the PR, and to avoid the “let’s merge because we are in a rush, what do you mean we don’t understand whether this is actually a fix” controversy.)

and it's that parallelization that makes things racy, not speed of decompression itself.

It’s not really a race at all, it’s that going across the external process loses the Go error information available for the input of the decompression. The external process is just going to see an EOF on a pipe (or maybe we could send a signal?), there is no way to stuff a “we encountered ENOSPC when providing you data” side channel into the pipe.

Anyways here...is it generally defined behavior in Go to read from a reader after it's returned EOF? It would seem unusual to cache errors right?
…
It’s not really a race at all, it’s that going across the external process loses the Go error information available for the input of the decompression.

On second thought, I think this should be implemented in a different way, one that is clearly correct, reporting all errors and not introducing any risk of extra hangs:

Wrap the input stream in a new errorRecordingReader, which stores the first error value reported, if any. Pass that wrapped stream as cmd.Stdin. Then we can consult the reported error if cmd.Run() fails. (cmd.Run() ensures the goroutine consuming cmd.Stdin terminates before returning, so we don’t need to worry about concurrency.)

(It’s a bit ironic that cmd.Run() already has such an error-recording code in cmd.childStdin; but that error is only reported if cmd returns with exit status 0, the ExitError cause is preferred.)

I think this should be implemented in a different way

I need to focus on something else; I have filed at least a placeholder #2022 for visibility.

and a read from the Tee causes a write to the side branch, which fails with ENOSPC, and is reported to the consumer of the Tee.

Thank you! That makes sense to me now. I was unaware of the use of Tee here as I currently just have a superficial awareness of this code, but I'm here to learn. The "reads cause writes via side effect" is definitely not something I am used to seeing.

This is a tangent...and I am sure you guys know what's coming next 😄 but in Go, TeeReader is pretty popular whereas there's no such thing in the Rust stdlib, and while people have written it, it's really vanishingly rare comparatively. It's way more common IME to do this type of thing via a custom Write implementation that writes to two places; e.g. ocidir-rs (my code) and there's similar in coreos-installer I think.

I suspect one reason for this is the "ownership" of things and error handling is much clearer if there's a single "super writer" instead of a tee.

*shrug* STREAMS (…) are hard. Go has ended up with two separate Reader and Writer abstractions, which nicely compose individually, but connecting a codebase built around a Reader to a codebase built around a Writer requires an awkward API translation layer; that might be io.Copy or io.TeeReder (easy), or a separate goroutine and an io.Pipe (doable but annoying and comparatively risky).

Whether something interfaces using a Reader or a Writer is frequently, but not always, fairly arbitrary, but often out of control of the caller. Here, the decompression is built around Readers, and using a TeeReader is to adapt in that direction is much easier than using pipes to adapt in the other direction.

[This is sort of isomorphic to the “XML streaming API design” dilemma, where, without concurrency, it’s easy to write an XML parser triggering hard-to-consume callbacks, vs. it’s harder to write an XML parser which returns an easy-to-consume stream of events; with XML it’s so much more visible because the API surface is much larger than just some byte arrays and errors.]

Anyways here...is it generally defined behavior in Go to read from a reader after it's returned EOF? It would seem unusual to cache errors right?
…
It’s not really a race at all, it’s that going across the external process loses the Go error information available for the input of the decompression.

On second thought, I think this should be implemented in a different way, one that is clearly correct, reporting all errors and not introducing any risk of extra hangs:

Wrap the input stream in a new errorRecordingReader, which stores the first error value reported, if any. Pass that wrapped stream as cmd.Stdin. Then we can consult the reported error if cmd.Run() fails. (cmd.Run() ensures the goroutine consuming cmd.Stdin terminates before returning, so we don’t need to worry about concurrency.)

(It’s a bit ironic that cmd.Run() already has such an error-recording code in cmd.childStdin; but that error is only reported if cmd returns with exit status 0, the ExitError cause is preferred.)

thanks for the hint. Opened a PR: #2025

cgwalters · 2024-07-11T22:19:48Z

pkg/archive/filter.go

@@ -46,7 +46,14 @@ func tryProcFilter(args []string, input io.Reader, cleanup func()) (io.ReadClose
 	go func() {


This is a bit of a tangent but I am not sure why we're not using
https://pkg.go.dev/os/exec#Cmd.StdinPipe above?

I've not used StdoutPipe to report explicitly the error with CloseWithError. I'll need to check if that is the same behavior with StdoutPipe

rhatdan · 2024-07-12T12:37:37Z

I would like to get this in, so we can move forward with the vendoring into Buildah and then finally into Podman.

cgwalters · 2024-07-12T12:46:14Z

I would like to get this in, so we can move forward with the vendoring into Buildah and then finally into Podman.

Well,
/approve
from me at least FWIW...I don't think this change will make things worse, I just personally don't understand the problem and how this will fix it, but I may be missing something. But if you or someone else is happy with it then from my PoV feel free to lgtm!

openshift-ci · 2024-07-12T12:46:19Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: cgwalters, giuseppe

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [giuseppe]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

rhatdan · 2024-07-12T12:49:56Z

Buildah is expecting to get an out of space error, which it gets from the non-pigz path. Hopefully this fixes this so the user ends up with a understandable error when the system is out of space.

rhatdan · 2024-07-12T12:52:02Z

/lgtm

cgwalters · 2024-07-12T12:53:50Z

Yes, I understood the problem enough on that end. I just don't quite understand how this change as written will actually fix it. If you (or someone else does) feel free to explain...but we can also just see if it actually works when retrying the vendored update.

follow-up for containers#2012 report the error as seen by the input stream, instead of attempting another read. Closes: containers#2022 Signed-off-by: Giuseppe Scrivano <[email protected]>

openshift-ci bot added the approved label Jul 11, 2024

giuseppe mentioned this pull request Jul 11, 2024

Vendor in latest containers/(common, image, storage) containers/buildah#5585

Merged

giuseppe force-pushed the archive-filter-reports-error-from-read branch from 69cb719 to 9883769 Compare July 11, 2024 12:42

mtrmac reviewed Jul 11, 2024

View reviewed changes

pkg/archive/filter.go Outdated Show resolved Hide resolved

pkg/archive/filter.go Outdated Show resolved Hide resolved

giuseppe force-pushed the archive-filter-reports-error-from-read branch from 9883769 to 22fa550 Compare July 11, 2024 20:04

cgwalters reviewed Jul 11, 2024

View reviewed changes

openshift-ci bot assigned rhatdan Jul 12, 2024

openshift-ci bot added the lgtm label Jul 12, 2024

openshift-merge-bot bot merged commit 98ad80d into containers:main Jul 12, 2024
18 checks passed

giuseppe mentioned this pull request Jul 12, 2024

archive: improve filter error reporting #2025

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

archive: report error from input stream #2012

archive: report error from input stream #2012

giuseppe commented Jul 11, 2024

rhatdan commented Jul 11, 2024

rhatdan commented Jul 11, 2024

cgwalters Jul 11, 2024

giuseppe Jul 12, 2024

cgwalters Jul 12, 2024

mtrmac Jul 12, 2024

mtrmac Jul 12, 2024

mtrmac Jul 12, 2024

cgwalters Jul 12, 2024

mtrmac Jul 12, 2024 •

edited

Loading

giuseppe Jul 12, 2024

cgwalters Jul 11, 2024

giuseppe Jul 12, 2024

rhatdan commented Jul 12, 2024

cgwalters commented Jul 12, 2024

openshift-ci bot commented Jul 12, 2024

rhatdan commented Jul 12, 2024

rhatdan commented Jul 12, 2024

cgwalters commented Jul 12, 2024

		@@ -46,7 +46,14 @@ func tryProcFilter(args []string, input io.Reader, cleanup func()) (io.ReadClose
		go func() {

archive: report error from input stream #2012

archive: report error from input stream #2012

Conversation

giuseppe commented Jul 11, 2024

rhatdan commented Jul 11, 2024

rhatdan commented Jul 11, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mtrmac Jul 12, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rhatdan commented Jul 12, 2024

cgwalters commented Jul 12, 2024

openshift-ci bot commented Jul 12, 2024

rhatdan commented Jul 12, 2024

rhatdan commented Jul 12, 2024

cgwalters commented Jul 12, 2024

mtrmac Jul 12, 2024 •

edited

Loading