-
Notifications
You must be signed in to change notification settings - Fork 379
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Record (TOC digest → DiffID) mapping in BlobInfoCache #2321
Conversation
// UncompressedDigest returns an uncompressed digest corresponding to anyDigest. | ||
// Returns "" if the uncompressed digest is unknown. | ||
// FIXME: Does this need to record TOC/compression type? | ||
UncompressedDigestForTOC(tocDigest digest.Digest) digest.Digest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The TOC digest is the checksum of the uncompressed JSON document, so I think the compression should not matter in this case
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree we probably don’t need that right now (with GetTOCDigest
refusing to work on manifests which contain multiple TOC digest annotations, and presumably with the zstd / estargz code being unable to decompress the other one).
This comment is a looking a bit more into the future, for lookups in the other direction, where we will want to look up (UncompressedDigest → (compressed digest, TOC digest, algorithm)) and match that against “the user wants the destination to contain zstd:chunked” (i.e. reject estargz matches).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for lookups in the other direction,
That will be done in a separate data structure (an extension of RecordDigestCompressorName
: We need the full set of annotations for reuse of a TOC-compressed blob, so this simple mapping is not sufficient anyway. And the other structure does record the algorithm.
be098a2
to
b14f00b
Compare
d238714
to
6dae67d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note to self: This is code-complete but I want to test it in practice.
2a542f7
to
9e3cace
Compare
To test: Before: # podman rmi alpine level1 level9
# rm -f /var/lib/containers/cache/blob-info-cache-v1.sqlite
# podman pull quay.io/libpod/alpine
# podman --log-level=debug push --compression-format zstd:chunked --compression-level 1 --force-compression quay.io/libpod/alpine localhost:50000/level1
## Even better would be to use two different destination registries, to be 100% certain the blobs are not reused
## (right now they are not reused, but we’ll fix that):
# podman --log-level=debug push --compression-format zstd:chunked --compression-level 9 --force-compression quay.io/libpod/alpine localhost:50000/level9
## Note the compressed digest, and TOC digest, values:
# skopeo inspect --raw docker://localhost:50000/level1 | jq .
# skopeo inspect --raw docker://localhost:50000/level9 | jq .
## No DigestTOCUncompressedPairs entries:
# sqlite3 /var/lib/containers/cache/blob-info-cache-v1.sqlite .dump
# podman rmi alpine level1 level9
## Triggers a partial pull: "Applying differ in …":
# podman --log-level=debug pull localhost:50000/level1
## Triggers a partial pull: "Applying differ in …"
# podman --log-level=debug pull localhost:50000/level9
## level1 and level9 have different image IDs:
# podman images
## Contains two copies of the layer, with the same expected-layer-diffid
# jq . < /var/lib/containers/storage/overlay-layers/layers.json After:
|
72db3f5
to
714108a
Compare
Podman tests are now passing in containers/podman#23348 (with containers/podman#23379 for Podman-side updates). |
2dcf236
to
1a14d38
Compare
@mtrmac is it ready to review? |
@giuseppe yes, please review. containers/podman#23348 shows Podman tests passing. This should generally make the layer reuse on pulls comparable with non-chunked. See also the specific test case above. |
Signed-off-by: Miloslav Trmač <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just a nit (probably not even worth a repush), otherwise LGTM
great work!
storage/storage_dest.go
Outdated
// Externally, a layer is identified either by (compressed) digest, or by TOC digest | ||
// (and we assume the TOC digest also uniquely identifies the contents, i.e. there aren’t two | ||
// different formats/ways to parse a single TOC); internally, we use uncompressed digest (“DiffID”) or a TOC digest. | ||
// We may or may not know the relantionships between these three values. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo in "relantionships"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Fixed (and rebased).
We are already calling m.LayerInfos() anyway, so there is ~no extra cost. And using LayerInfos means we don't need to worry about reversing the order of layers, and we will have access to the layer index, allowing us to acccess the indexTo* fields in the future. Should not change behavior. Signed-off-by: Miloslav Trmač <[email protected]>
- Don't claim that we only use compressed digests. - Explicitly document that we assume TOC digests to be unambiguous - Actually define the term "DiffID". - Be more precise in computeID about the criteria being layer identity, not where we pull the layer from. Should not change behavior. Signed-off-by: Miloslav Trmač <[email protected]>
Some errors are severe enough that just logging and continuing is not really worthwhile. Signed-off-by: Miloslav Trmač <[email protected]>
…tyDataLocked Currrently we "only" have indexToTOCDigest and blobDiffIDs, but we will make this more complex. Centralizing the consumption of these fields into trustedLayerIdentityDataLocked ensure that all consumers interpret the data exactly consistently (and it also allows us to use a single "trusted" variable instead of 2/3 individual ones). Should not change behavior. Signed-off-by: Miloslav Trmač <[email protected]>
The new code is not called, so it should not change behavior (apart from extending the BoltDB/SQLite schema). Signed-off-by: Miloslav Trmač <[email protected]>
…storage by DiffID If we can, prefer identifying layers by DiffID, because multiple TOCs can map to the same DiffID; and because it maximizes reuse with non-TOC layers. For now, the new situation is unreachable. Signed-off-by: Miloslav Trmač <[email protected]>
We will add one more instance of this, so share the code. Should not change behavior (it does remove one unreachable code path). Signed-off-by: Miloslav Trmač <[email protected]>
… is known - Multiple TOC values might correspond to a single DiffID (e.g. if different compression levels are used); try to share them all, identified by DiffID (so that we also reuse with non-TOC pulls). - LayersByTOCDigest only uses a single TOC digest per layer; BlobInfoCache allows multiple matches, matches layers which have been since deleted, and potentially matches TOC digests which we have created by pushing but haven't pulled yet. - On reuse, we can now use DiffID-based layer identities even if the reuse was TOC~driven. Signed-off-by: Miloslav Trmač <[email protected]>
…yers Signed-off-by: Miloslav Trmač <[email protected]>
…ayers - Rely on it instead of triggering the "untrusted DiffID" logic - Also propagate it to storage Signed-off-by: Miloslav Trmač <[email protected]>
Signed-off-by: Miloslav Trmač <[email protected]>
LGTM |
A single DiffID may map to multiple TOC digest values. Record that in
BlobInfoCache
, and use it for layer reuse.Also prefer reusing even TOC-matched layers by DiffID, when available.
@giuseppe I’d appreciate a preliminary review of the new logic; see individual commits.
Draft: TheBlobInfoCache
implementations don’t actually store/record any data yet — so this is obviously completely untested.