-
Notifications
You must be signed in to change notification settings - Fork 242
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
archive: report error from input stream #2012
archive: report error from input stream #2012
Conversation
69cb719
to
9883769
Compare
if there is an error reading from the input stream, prefer to report it instead of the error from the filter program itself. We have a test in the buildah CI that expects the "no space left on device" error that comes from the input stream, to avoid changing the test, just fix it here. Reported here: containers/buildah#5585 Signed-off-by: Giuseppe Scrivano <[email protected]>
9883769
to
22fa550
Compare
LGTM |
@mtrmac PTANL |
err = fmt.Errorf("%s: %w", strings.TrimRight(stderrBuf.String(), "\n"), err) | ||
b := make([]byte, 1) | ||
// if there is an error reading from input, prefer to return that error | ||
_, errRead := input.Read(b) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Anyways here...is it generally defined behavior in Go to read from a reader after it's returned EOF? It would seem unusual to cache errors right?
Actually I am now a bit more confused as to the flow here - how did this break the ENOSPC in the first place? And how does this fix fix it? The input here is the http request reader, right? It wouldn't be the thing giving us ENOSPC...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is just a race condition. The decompressor is faster now and detects the error earlier, while before we were relying on catching the ENOSPC before the decompressor.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At a high level I don't understand: why in a test case that we're checking for ENOSPC are we hitting corruption in the compression stream? That doesn't make sense to me.
Different topic: My semi-understanding here is that the previous code was synchronously doing decompression and writes in a single goroutine. That will make the errors quite predictable indeed (in practice, they could only change if the decompressor happened to change to start reading larger chunks or something).
But here, we're suddenly doing decompression in a separate process (but logically, we could have done it in a separate goroutine before, sending decompressed chunks via a pipe or channel) - and it's that parallelization that makes things racy, not speed of decompression itself.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Anyways here...is it generally defined behavior in Go to read from a reader after it's returned EOF?
The API does not explicitly say, but I would expect a Reader
to infinitely return EOF on end-of-file.
It would seem unusual to cache errors right?
I wouldn’t generally expect a Read
to continue returning a previously-reported error. Some implementations do that, but e.g. for a raw file access, I’d expect it to just issue a syscall again. So, yes, that might plausibly lose data, but we don’t care here — if we don’t see any error, we would report the one from the subprocess.
Actually I am now a bit more confused as to the flow here - how did this break the ENOSPC in the first place? And how does this fix fix it?
The input here is the http request reader, right? It wouldn't be the thing giving us ENOSPC...
I think here the input is an io.TeeReader
actually writing to the restricted-space directory. The storageImageDestination.putBlobToPendingFile
pipeline is
HTTP -> … -> Tee -> digest compressed -> decompress -> digest uncompressed
|
-> count size -> store to file
and a read from the Tee causes a write to the side branch, which fails with ENOSPC, and is reported to the consumer of the Tee.
(Yes, it would have been nice to have that explanation at the start of the PR, and to avoid the “let’s merge because we are in a rush, what do you mean we don’t understand whether this is actually a fix” controversy.)
and it's that parallelization that makes things racy, not speed of decompression itself.
It’s not really a race at all, it’s that going across the external process loses the Go error information available for the input of the decompression. The external process is just going to see an EOF on a pipe (or maybe we could send a signal?), there is no way to stuff a “we encountered ENOSPC when providing you data” side channel into the pipe.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Anyways here...is it generally defined behavior in Go to read from a reader after it's returned EOF? It would seem unusual to cache errors right?
…
It’s not really a race at all, it’s that going across the external process loses the Go error information available for the input of the decompression.
On second thought, I think this should be implemented in a different way, one that is clearly correct, reporting all errors and not introducing any risk of extra hangs:
Wrap the input
stream in a new errorRecordingReader
, which stores the first error
value reported, if any. Pass that wrapped stream as cmd.Stdin
. Then we can consult the reported error if cmd.Run()
fails. (cmd.Run()
ensures the goroutine consuming cmd.Stdin
terminates before returning, so we don’t need to worry about concurrency.)
(It’s a bit ironic that cmd.Run()
already has such an error-recording code in cmd.childStdin
; but that error is only reported if cmd
returns with exit status 0, the ExitError
cause is preferred.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should be implemented in a different way
I need to focus on something else; I have filed at least a placeholder #2022 for visibility.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and a read from the Tee causes a write to the side branch, which fails with ENOSPC, and is reported to the consumer of the Tee.
Thank you! That makes sense to me now. I was unaware of the use of Tee here as I currently just have a superficial awareness of this code, but I'm here to learn. The "reads cause writes via side effect" is definitely not something I am used to seeing.
This is a tangent...and I am sure you guys know what's coming next 😄 but in Go, TeeReader is pretty popular whereas there's no such thing in the Rust stdlib, and while people have written it, it's really vanishingly rare comparatively. It's way more common IME to do this type of thing via a custom Write
implementation that writes to two places; e.g. ocidir-rs (my code) and there's similar in coreos-installer I think.
I suspect one reason for this is the "ownership" of things and error handling is much clearer if there's a single "super writer" instead of a tee.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
*shrug* STREAMS (…) are hard. Go has ended up with two separate Reader
and Writer
abstractions, which nicely compose individually, but connecting a codebase built around a Reader
to a codebase built around a Writer
requires an awkward API translation layer; that might be io.Copy
or io.TeeReder
(easy), or a separate goroutine and an io.Pipe
(doable but annoying and comparatively risky).
Whether something interfaces using a Reader
or a Writer
is frequently, but not always, fairly arbitrary, but often out of control of the caller. Here, the decompression is built around Reader
s, and using a TeeReader
is to adapt in that direction is much easier than using pipes to adapt in the other direction.
[This is sort of isomorphic to the “XML streaming API design” dilemma, where, without concurrency, it’s easy to write an XML parser triggering hard-to-consume callbacks, vs. it’s harder to write an XML parser which returns an easy-to-consume stream of events; with XML it’s so much more visible because the API surface is much larger than just some byte arrays and errors.]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Anyways here...is it generally defined behavior in Go to read from a reader after it's returned EOF? It would seem unusual to cache errors right?
…
It’s not really a race at all, it’s that going across the external process loses the Go error information available for the input of the decompression.On second thought, I think this should be implemented in a different way, one that is clearly correct, reporting all errors and not introducing any risk of extra hangs:
Wrap the
input
stream in a newerrorRecordingReader
, which stores the firsterror
value reported, if any. Pass that wrapped stream ascmd.Stdin
. Then we can consult the reported error ifcmd.Run()
fails. (cmd.Run()
ensures the goroutine consumingcmd.Stdin
terminates before returning, so we don’t need to worry about concurrency.)(It’s a bit ironic that
cmd.Run()
already has such an error-recording code incmd.childStdin
; but that error is only reported ifcmd
returns with exit status 0, theExitError
cause is preferred.)
thanks for the hint. Opened a PR: #2025
@@ -46,7 +46,14 @@ func tryProcFilter(args []string, input io.Reader, cleanup func()) (io.ReadClose | |||
go func() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a bit of a tangent but I am not sure why we're not using
https://pkg.go.dev/os/exec#Cmd.StdinPipe above?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've not used StdoutPipe
to report explicitly the error with CloseWithError
. I'll need to check if that is the same behavior with StdoutPipe
I would like to get this in, so we can move forward with the vendoring into Buildah and then finally into Podman. |
Well, |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: cgwalters, giuseppe The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Buildah is expecting to get an out of space error, which it gets from the non-pigz path. Hopefully this fixes this so the user ends up with a understandable error when the system is out of space. |
/lgtm |
Yes, I understood the problem enough on that end. I just don't quite understand how this change as written will actually fix it. If you (or someone else does) feel free to explain...but we can also just see if it actually works when retrying the vendored update. |
follow-up for containers#2012 report the error as seen by the input stream, instead of attempting another read. Closes: containers#2022 Signed-off-by: Giuseppe Scrivano <[email protected]>
follow-up for containers#2012 report the error as seen by the input stream, instead of attempting another read. Closes: containers#2022 Signed-off-by: Giuseppe Scrivano <[email protected]>
if there is an error reading from the input stream, prefer to report it instead of the error from the filter program itself.
We have a test in the buildah CI that expects the "no space left on device" error that comes from the input stream, to avoid changing the test, just fix it here.
Reported here: containers/buildah#5585