Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix c/storage destination with partial pulls #2288

Merged
merged 29 commits into from
Feb 14, 2024

Commits on Feb 13, 2024

  1. dest: propagate layer index to PutBlobPartial

    Signed-off-by: Giuseppe Scrivano <[email protected]>
    giuseppe authored and mtrmac committed Feb 13, 2024
    Configuration menu
    Copy the full SHA
    11630be View commit details
    Browse the repository at this point in the history
  2. storage, dest: clarify when TOCDigest is used

    This update introduces an enhancement in the blob handling mechanism,
    specifically by separating the TOC digest from the
    uncompressed/compressed digest.
    
    Follow-up for: containers#1080.
    
    Signed-off-by: Giuseppe Scrivano <[email protected]>
    giuseppe authored and mtrmac committed Feb 13, 2024
    Configuration menu
    Copy the full SHA
    6da4441 View commit details
    Browse the repository at this point in the history
  3. Remove an unnecessary variable

    Should not change behavior.
    
    Signed-off-by: Miloslav Trmač <[email protected]>
    mtrmac committed Feb 13, 2024
    Configuration menu
    Copy the full SHA
    ca9c3e5 View commit details
    Browse the repository at this point in the history
  4. Don't use the same DriverWithDifferOutput more than once

    ApplyDiffFromStagingDirectory Rename()s diffOutput.Target
    to move it into the destination layer, so the diffOutput is
    not reusable.
    
    That might mean that we re-pull (partially?) the same partial
    layer again... but the same layer used more than once in an image
    should be rare, or at least very small.
    
    Signed-off-by: Miloslav Trmač <[email protected]>
    mtrmac committed Feb 13, 2024
    Configuration menu
    Copy the full SHA
    b6062e1 View commit details
    Browse the repository at this point in the history
  5. Introduce private.PutBlobPartialOptions

    This will allow us to name the more obscure parameters,
    and to change their names/semantics without having to update
    the 4 trivial implementations.
    
    Should not change behavior.
    
    Signed-off-by: Miloslav Trmač <[email protected]>
    mtrmac committed Feb 13, 2024
    Configuration menu
    Copy the full SHA
    5df5391 View commit details
    Browse the repository at this point in the history
  6. Make the index option in PutBlobPartialOptions mandatory

    ... because it is always available, and this allows us to remove
    a condition.
    
    Also rename it to LayerIndex, and make the Cache option first,
    for consistency with other private.*Options types.
    
    Should not change behavior.
    
    Signed-off-by: Miloslav Trmač <[email protected]>
    mtrmac committed Feb 13, 2024
    Configuration menu
    Copy the full SHA
    981360c View commit details
    Browse the repository at this point in the history
  7. Remove UploadedBlob.TOCDigest

    It has no users at all; and transports should not be in the
    business in specifically managing that value, compression/decompression/
    updates belong in transport-independent code.
    
    Should not change behavior.
    
    Signed-off-by: Miloslav Trmač <[email protected]>
    mtrmac committed Feb 13, 2024
    Configuration menu
    Copy the full SHA
    7fdc3b5 View commit details
    Browse the repository at this point in the history
  8. Replace ReusedBlob.TOCDigest with MatchedByTOCDigest

    The caller must already have provided options.TOCDigest, so
    we don't really need to return a value; the UI only needs
    a boolean.
    
    Also, document, again, that the non-TOC digest is a mandatory field.
    
    Should not change behavior.
    
    Signed-off-by: Miloslav Trmač <[email protected]>
    mtrmac committed Feb 13, 2024
    Configuration menu
    Copy the full SHA
    1b0e6be View commit details
    Browse the repository at this point in the history
  9. Rename indexToTocDigest to indexToTOCDigest

    ... to follow Go conventions a bit more closely.
    
    Also add a comment about the general trust design of
    layer storage/lookup/reuse mechanisms.
    
    Should not change behavior.
    
    Signed-off-by: Miloslav Trmač <[email protected]>
    mtrmac committed Feb 13, 2024
    Configuration menu
    Copy the full SHA
    d17c78f View commit details
    Browse the repository at this point in the history
  10. Don't use diffOutputs in getLayerID

    - It is unnecessary: We now only set diffOutputs
      in PutBlobPartial, and that also sets indexToTOCDigest
    - It is incorrect: it is indexing by tocDigest, but it is
      (currently) set by compressedDigest
    
    So, right now, it can only hurt things.
    
    Signed-off-by: Miloslav Trmač <[email protected]>
    mtrmac committed Feb 13, 2024
    Configuration menu
    Copy the full SHA
    83a035d View commit details
    Browse the repository at this point in the history
  11. Fix computation of image IDs

    - Take TOC digests into account even if we are converting from OCI
      to another format
    - Compute the image ID based on whether we _used_ the TOC, not whether
      it just exists.
    - Also don't forget to include the config digest in the ID computation...
    
    Signed-off-by: Miloslav Trmač <[email protected]>
    mtrmac committed Feb 13, 2024
    Configuration menu
    Copy the full SHA
    649f3a5 View commit details
    Browse the repository at this point in the history
  12. Remove lookups by TOC digest on the fallback code in commitImage

    The fallback code should never be invoked at all, in principle;
    and it is only reachable by internal callers; we can fix them instead
    of having the fallback code.
    
    So, remove it, making the tocDigest field unnecessary.
    
    Signed-off-by: Miloslav Trmač <[email protected]>
    mtrmac committed Feb 13, 2024
    Configuration menu
    Copy the full SHA
    e8b6ba0 View commit details
    Browse the repository at this point in the history
  13. Remove a no-longer-used addedLayerInfo.tocDigest field

    Now we have one less user of diffOutputs, making it easier to change it.
    
    Should not change behavior.
    
    Signed-off-by: Miloslav Trmač <[email protected]>
    mtrmac committed Feb 13, 2024
    Configuration menu
    Copy the full SHA
    597dbc6 View commit details
    Browse the repository at this point in the history
  14. Make diffOutputs indexed by layer ID

    Every DriverWithDifferOutput can only be used once;
    so we must not index it by digest.
    
    This also happens to fix indexing it by an untrusted compressed digest.
    
    Signed-off-by: Miloslav Trmač <[email protected]>
    mtrmac committed Feb 13, 2024
    Configuration menu
    Copy the full SHA
    fddf6f4 View commit details
    Browse the repository at this point in the history
  15. Clarify naming around layerID

    - Fix incorrect uncompressedDigest parameter name; it is actually the
      manifest-originated probably-compressed digest
    - Rename the function to not suggest it returns a storage.Layer.ID value
    - Use the same parameter order as commitLayer
    
    Signed-off-by: Miloslav Trmač <[email protected]>
    mtrmac committed Feb 13, 2024
    Configuration menu
    Copy the full SHA
    f55d2db View commit details
    Browse the repository at this point in the history
  16. Fix computation of layer IDs

    - If the first layer uses a TOC, don't use a non-hex string
      as the layer ID for c/storage
    - Fix a panic on "".Hex() if we trigger the fallback "layer never
      seen" path
    
    Signed-off-by: Miloslav Trmač <[email protected]>
    mtrmac committed Feb 13, 2024
    Configuration menu
    Copy the full SHA
    41119d7 View commit details
    Browse the repository at this point in the history
  17. Split createNewLayer from commitLayer

    ... to separate the concerns a bit.
    
    Now we have the updates of indexToStorageID closer together.
    
    Signed-off-by: Miloslav Trmač <[email protected]>
    mtrmac committed Feb 13, 2024
    Configuration menu
    Copy the full SHA
    992e548 View commit details
    Browse the repository at this point in the history
  18. Remove a redundant condition

    Signed-off-by: Miloslav Trmač <[email protected]>
    mtrmac committed Feb 13, 2024
    Configuration menu
    Copy the full SHA
    0097f83 View commit details
    Browse the repository at this point in the history
  19. Support pulling and pushing fully-consumed partial layers

    ... and identify them using UncompressedDigest, not TOCDigest
    
    On pushes, also use the trusted UncompressedDigest if available
    instead of preferring the untrusted value when a TOC digest
    is present.
    
    Signed-off-by: Miloslav Trmač <[email protected]>
    mtrmac committed Feb 13, 2024
    Configuration menu
    Copy the full SHA
    a3222f6 View commit details
    Browse the repository at this point in the history
  20. Don't set filenames/fileSizes in PutBlobPartial

    - Setting filenames to "" is clearly useless
    - It is unnecessary: We set diffOutputs, so filenames are
      not consumed in commitLayer
    - The data from diffOutputs can only be used once, so setting
      filenames to affect TryReusingBlob... doesn't help
    
    Signed-off-by: Miloslav Trmač <[email protected]>
    mtrmac committed Feb 13, 2024
    Configuration menu
    Copy the full SHA
    30646b5 View commit details
    Browse the repository at this point in the history
  21. Use a known uncompressed digest directly instead of reading it from a…

    … Layer
    
    If we get the layer using LayersByUncompressedDigest, that value should always
    match.
    
    Using the value we have directly is trivially faster, and more importantly
    we don't have to worry at all about Layer.UncompressedDigest being unset
    in that location, making maintenance easier.
    
    Should not change behavior.
    
    Signed-off-by: Miloslav Trmač <[email protected]>
    mtrmac committed Feb 13, 2024
    Configuration menu
    Copy the full SHA
    8a74ec9 View commit details
    Browse the repository at this point in the history
  22. Move layer queue data out of the per-layer fields

    Should not change behavior.
    
    Signed-off-by: Miloslav Trmač <[email protected]>
    mtrmac committed Feb 13, 2024
    Configuration menu
    Copy the full SHA
    1f3b20c View commit details
    Browse the repository at this point in the history
  23. Fix reuse of existing layers twice in the same image

    - When we extract a layer, allow reusing it only by the DiffID, not
      by the compressed digest; we don't have the compressed data, and
      reusing by compressed digest would result (via PutLayer LayerOptions.OriginalDigest)
      in a layer with an compressed CompressedDigest value, but an uncompressed
      CompressedSize value.
    
      Reuse by DiffID is quite a bit less likely to lead to a match
      in TryReusingBlob, probably causing us to find the reused layer and having to
      extract it again.
    
      We could improve on this by recording more data; for now, let's just assume
      that images which reuse the same compressed layer twice are pretty rare, and
      prefer simpler code.
    
    - On the positive side, record the item in fileSizes, so that we actually do
      find the layer in TryReusing, and not happen to reuse the file purely
      by accident.
    
    Signed-off-by: Miloslav Trmač <[email protected]>
    mtrmac committed Feb 13, 2024
    Configuration menu
    Copy the full SHA
    5567453 View commit details
    Browse the repository at this point in the history
  24. Hold a Layer object when extracting its contents during commit

    That will allow us to read more data out of it.
    
    Signed-off-by: Miloslav Trmač <[email protected]>
    mtrmac committed Feb 13, 2024
    Configuration menu
    Copy the full SHA
    fbd8474 View commit details
    Browse the repository at this point in the history
  25. Don't require a diffID if blobAdditionalLayer is set

    We don't need it for anything, so shorten the scope.
    
    Should not change behavior, diffID is actually set on that
    path anyway.
    
    Signed-off-by: Miloslav Trmač <[email protected]>
    mtrmac committed Feb 13, 2024
    Configuration menu
    Copy the full SHA
    2db6b8e View commit details
    Browse the repository at this point in the history
  26. Document/consolidate layer identification

    - Ensure layers have an ID on every path before commitLayer,
      and it is consistently set before making data available.
    - Remove possibly misleading "for completeness" comments, ensuring identification
      is a basic responsibility.
    
    Signed-off-by: Miloslav Trmač <[email protected]>
    mtrmac committed Feb 13, 2024
    Configuration menu
    Copy the full SHA
    4b9bf31 View commit details
    Browse the repository at this point in the history
  27. Document origins of layer data

    Should not change behavior.
    
    Signed-off-by: Miloslav Trmač <[email protected]>
    mtrmac committed Feb 13, 2024
    Configuration menu
    Copy the full SHA
    5b14e3d View commit details
    Browse the repository at this point in the history
  28. Fix a mismatch in CompressedDigest/CompressedSize on layer reuse

    When reusing contents from another layer, don't set
    - CompressedDigest = compressed digest (provided by us)
    - CompressedSize = uncompressed size (computed by PutLayer)
    
    because it is inconsistent.
    
    A c/storage API extension would be required to do that.
    
    Signed-off-by: Miloslav Trmač <[email protected]>
    mtrmac committed Feb 13, 2024
    Configuration menu
    Copy the full SHA
    846520d View commit details
    Browse the repository at this point in the history
  29. Actually make reusing layers found by TOC work

    Look up the layer by TOC; and don't abort when diffID is not set.
    
    We could, instead, look up the layers only in tryReusingBlobAsPending,
    and record the layer metadata at that time.  That would be simpler,
    but it would also widen a race with concurrent image pulls/deletions:
    the current code can find one layer (ChainID) to reuse, and when that layer
    is deleted, it can find some other layer (ChainID) to actually consume.
    
    The time between tryReusingBlobAsPending and createNewLayer can be fairly
    significant, so opening a ~deterministic race singificantly more might
    lead to reproducible issues.
    
    Even if anyone encountering such issues has fundamental workflow problems
    that should be fixed; it is our tools that would look bad first.
    
    Signed-off-by: Miloslav Trmač <[email protected]>
    mtrmac committed Feb 13, 2024
    Configuration menu
    Copy the full SHA
    e7a8eba View commit details
    Browse the repository at this point in the history