-
Notifications
You must be signed in to change notification settings - Fork 242
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
chunked: do not use a temporary file #2041
base: main
Are you sure you want to change the base?
chunked: do not use a temporary file #2041
Conversation
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: giuseppe The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
and use directly the stream to create the temporary zstd:chunked file. Signed-off-by: Giuseppe Scrivano <[email protected]>
ef8928f
to
b6ba804
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The goal is improved performance, right? Worth stating in the commit message if so.
blobFile.Close() | ||
blobFile = nil | ||
// Make sure the entire payload is consumed. | ||
_, _ = io.Copy(io.Discard, payload) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't we still check for errors here? Maybe it doesn't matter because if e.g. we get a short read or something we'll presumably fail checksum validation anyways.
If that's the idea then I think it at least deserves a comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've not measured performance, this is just a preparatory work for extending ApplyDiff()
to use this codepath when we convert images, so we can support pulling from other sources too, not only registries.
I've not yet found a nice way to extend the API though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes (if this goes in) I think it would be better to report a read error (the cause) instead of a digest mismatch that is hard to diagnose.
} | ||
}() | ||
|
||
// calculate the checksum before accessing the file. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Presumably we were doing this for a reason (security?) before...is it just that we think convertTarToZstdChunked
can now safely operate on untrusted input?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm I don't remember the security implications, @mtrmac do you think this approach is fine or should I just close this PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On balance, I’d prefer this not to be merged as is.
This exposes the conversion code and decompression to malicious input in more situations.
In principle, the conversion code and decompression must be able to handle malicious input anyway, because an attacker could, in many cases, refer to the malicious input in a valid manifest, without triggering this check.
But some users might have a policy which only accepts signed images, i.e. the malicious input would only be digested, not processed otherwise.
I’m really more worried about the decompression than the chunked conversion. That’s a lot of bit manipulation, potentially in an unsafe language “for performance”, with a fair history of vulnerabilities.
… and for ordinary image pulls, we currently also stream the input through decompression, only validating the digest concurrently; we don’t validate the digest before starting to decompress. So, in that sense, we are already accepting a larger part of the risk.
So I don’t feel that strongly about it here.
More importantly, if this is aiming at the c/storage ApplyDiff
= c/image PutBlob
(not PutBlobPartial
) path, c/image is currently creating a temporary file, and verifying the digest, on that path, without c/storage having any way to prevent it; and c/storage wouldn’t need to do this. (Also, c/image streams the data to the file, and digests it, concurrently for several layers, without holding any storage locks, which seems valuable.)
So, if the goal is code structure, not performance or PutBlobPartial
, I don’t see that this makes any difference for the ApplyDiff
path, and it is just a performance/risk trade-off for the PutBlobPartial
path.
If you don’t actually care about the performance at this point, we get the increased risk but not any benefit we value — so I’d prefer to close the PR and not merge this; we can always resurrect it later.
Except for the one line that allows convertTarToZstdChunked
to accept an arbitrary io.Reader
, and the missing payload.Close
.
// copy the entire tarball and compute its digest | ||
_, err = io.CopyBuffer(destination, r, c.copyBuffer) | ||
rc := ioutils.NewReadCloserWrapper(r, func() error { | ||
return payload.Close() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We were not closing payload
before? That fix should not be forgotten.
} | ||
}() | ||
|
||
// calculate the checksum before accessing the file. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On balance, I’d prefer this not to be merged as is.
This exposes the conversion code and decompression to malicious input in more situations.
In principle, the conversion code and decompression must be able to handle malicious input anyway, because an attacker could, in many cases, refer to the malicious input in a valid manifest, without triggering this check.
But some users might have a policy which only accepts signed images, i.e. the malicious input would only be digested, not processed otherwise.
I’m really more worried about the decompression than the chunked conversion. That’s a lot of bit manipulation, potentially in an unsafe language “for performance”, with a fair history of vulnerabilities.
… and for ordinary image pulls, we currently also stream the input through decompression, only validating the digest concurrently; we don’t validate the digest before starting to decompress. So, in that sense, we are already accepting a larger part of the risk.
So I don’t feel that strongly about it here.
More importantly, if this is aiming at the c/storage ApplyDiff
= c/image PutBlob
(not PutBlobPartial
) path, c/image is currently creating a temporary file, and verifying the digest, on that path, without c/storage having any way to prevent it; and c/storage wouldn’t need to do this. (Also, c/image streams the data to the file, and digests it, concurrently for several layers, without holding any storage locks, which seems valuable.)
So, if the goal is code structure, not performance or PutBlobPartial
, I don’t see that this makes any difference for the ApplyDiff
path, and it is just a performance/risk trade-off for the PutBlobPartial
path.
If you don’t actually care about the performance at this point, we get the increased risk but not any benefit we value — so I’d prefer to close the PR and not merge this; we can always resurrect it later.
Except for the one line that allows convertTarToZstdChunked
to accept an arbitrary io.Reader
, and the missing payload.Close
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See elsewhere for the higher-level evaluation.
Though now that we're forking a separate process we could isolate those quite aggressively. |
Which forking does that refer to? |
|
Right there's that and also ultimately we still need to be safe against untrusted/malicious zstd:chunked inputs even if the checksum matches. So I'm going to proactively reopen this PR since I think it makes sense. |
If this is merged on or before August 12, 2024, please cherry-pick this to the release-1.55 branch |
The payload stream must be closed after being used. Reported here: containers#2041 (comment) Signed-off-by: Giuseppe Scrivano <[email protected]>
and use directly the stream to create the temporary zstd:chunked file.