Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docker, BlobInfoCache: try to reuse blobs from set of all known compressed blobs when pushing across registries #1645

Merged

Conversation

flouthoc
Copy link
Contributor

@flouthoc flouthoc commented Sep 1, 2022

It seems we try to reuse blobs only for the specified registry, however
we can have valid known compressed digests across registry as well
following pr attempts to use that by doing following steps.

  • CandidateLocations2 now processes all known blobs and appends them
    to returned candidates at the lowest priority. As a result when
    TryReusingBlob tries to process these candidates and if the blobs
    filtered by the Opaque set by the transport fail to match then
    attempt is made against all known blobs (ones which do not belong to the
    current registry).

  • Increase the sample set of potential blob reuse to all known
    compressed digests , also involving the one which do not belong to
    current registry.

  • If a blob is found match it against the registry where we are
    attempting to push. If blob is already there consider it a CACHE HIT! and reply skipping blob, since its already there.


How to verify this ?

$ skopeo copy docker://registry.fedoraproject.org/fedora-minimal docker://quay.io/fl/test:some-tag
$ buildah pull registry.fedoraproject.org/fedora-minimal
$ buildah tag registry.fedoraproject.org/fedora-minimal quay.io/fl/test
$ buildah push quay.io/fl/test
Getting image source signatures
Copying blob a3497ca15bbf skipped: already exists
Copying config f7e02de757 done
Writing manifest to image destination
Storing signatures

Alternative to: #1461

@flouthoc
Copy link
Contributor Author

flouthoc commented Sep 1, 2022

@mtrmac Following patch works for me and unlike #1461 current patch works without breaking/cleaning existing written blob-info-cache-v1.boltdb.

However since CandidateLocations2 now returns all blobs with lower priority so i think some tests needs to be retrofitted, but I'll do that once you give a quick look to the patch and verify if its on the right track.

Copy link
Collaborator

@mtrmac mtrmac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • I think we really need to have an explicit concept of “BICLocationReference not set” (and not create those entries that for the v1 callers). Then we can avoid the O(BICLocationReference) loop and adding all the O(BICLocationReference) individual entries.
    • A downside of doing just that, which I didn’t originally realize, is that we would not have any timestamp for the location-less entries.
      • We might just live with that, there are not likely to be that many digests of the same blob floating around. I think that’s good enough for now, with an explicit comment.
      • Alternatively, it would help to replace the existing knownLocationsBucket organized transport→scope→digest→(location, timestamp), with a new version organized transport→digest→scope→(location, timestamp); that way we would only need to iterate through the scopes (=registries) that contain the digest, not all scopes, scaling hopefully much better.
  • It seems to me this could basically be a small addition to the appendReplacementCandidates function, adding one more BICLocationReference-unknown entry, and then teaching …Prioritize… to handle that.

docker/docker_image_dest.go Outdated Show resolved Hide resolved
pkg/blobinfocache/boltdb/boltdb.go Outdated Show resolved Hide resolved
pkg/blobinfocache/boltdb/boltdb.go Outdated Show resolved Hide resolved
pkg/blobinfocache/boltdb/boltdb.go Outdated Show resolved Hide resolved
pkg/blobinfocache/boltdb/boltdb.go Outdated Show resolved Hide resolved
@mtrmac
Copy link
Collaborator

mtrmac commented Sep 1, 2022

 * We might just live with that, there are not likely to be _that_ many digests of the same blob floating around.

Right now my cache doesn’t have a single digestByUncompressed record with more than one entry. That’s probably not representative, but really, would there likely to be more than 3 different gzip implementations? This might become more relevant with widespread, but not overwhelming, use of Zstd.

@flouthoc flouthoc force-pushed the CandidateLocations2-process-all-blobs branch 2 times, most recently from 67235a1 to 0b5a3be Compare September 2, 2022 09:46
@flouthoc
Copy link
Contributor Author

flouthoc commented Sep 2, 2022

 * We might just live with that, there are not likely to be _that_ many digests of the same blob floating around.

Right now my cache doesn’t have a single digestByUncompressed record with more than one entry. That’s probably not representative, but really, would there likely to be more than 3 different gzip implementations? This might become more relevant with widespread, but not overwhelming, use of Zstd.

@mtrmac Did not completely understood this comment. Is this comment more like a future note or some change which we are expecting in this PR but since it looks like a different scope so i'm expecting that this is a note for future.

@mtrmac
Copy link
Collaborator

mtrmac commented Sep 2, 2022

 * We might just live with that, there are not likely to be _that_ many digests of the same blob floating around.

Right now my cache doesn’t have a single digestByUncompressed record with more than one entry. …

@mtrmac Did not completely understood this comment.

This is my argument why we might not need to collect the timestamps to sort the location-less candidates accurately, at least in this version.

internal/blobinfocache/types.go Outdated Show resolved Hide resolved
pkg/blobinfocache/boltdb/boltdb.go Outdated Show resolved Hide resolved
pkg/blobinfocache/memory/memory.go Outdated Show resolved Hide resolved
pkg/blobinfocache/memory/memory.go Show resolved Hide resolved
pkg/blobinfocache/boltdb/boltdb.go Outdated Show resolved Hide resolved
docker/docker_image_dest.go Outdated Show resolved Hide resolved
docker/docker_image_dest.go Outdated Show resolved Hide resolved
docker/docker_image_dest.go Outdated Show resolved Hide resolved
pkg/blobinfocache/memory/memory.go Outdated Show resolved Hide resolved
@flouthoc flouthoc force-pushed the CandidateLocations2-process-all-blobs branch 2 times, most recently from 0580587 to 600a8b9 Compare September 5, 2022 10:05
@flouthoc
Copy link
Contributor Author

flouthoc commented Sep 5, 2022

All changes addressed except modifying heuristic check which i'll push soon.

pkg/blobinfocache/boltdb/boltdb.go Outdated Show resolved Hide resolved
pkg/blobinfocache/boltdb/boltdb.go Outdated Show resolved Hide resolved
pkg/blobinfocache/boltdb/boltdb.go Outdated Show resolved Hide resolved
pkg/blobinfocache/memory/memory.go Outdated Show resolved Hide resolved
pkg/blobinfocache/memory/memory.go Outdated Show resolved Hide resolved
pkg/blobinfocache/memory/memory.go Outdated Show resolved Hide resolved
@flouthoc flouthoc force-pushed the CandidateLocations2-process-all-blobs branch from 600a8b9 to 0fcbf47 Compare September 7, 2022 09:30
docker/docker_image_dest.go Outdated Show resolved Hide resolved
pkg/blobinfocache/boltdb/boltdb.go Outdated Show resolved Hide resolved
pkg/blobinfocache/boltdb/boltdb.go Outdated Show resolved Hide resolved
pkg/blobinfocache/boltdb/boltdb.go Outdated Show resolved Hide resolved
pkg/blobinfocache/boltdb/boltdb.go Show resolved Hide resolved
pkg/blobinfocache/memory/memory.go Outdated Show resolved Hide resolved
pkg/blobinfocache/boltdb/boltdb.go Outdated Show resolved Hide resolved
@flouthoc flouthoc force-pushed the CandidateLocations2-process-all-blobs branch 4 times, most recently from 822867a to 26a55a9 Compare September 8, 2022 07:15
Copy link
Collaborator

@mtrmac mtrmac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The BoltDB comments mostly apply to memory equally.

pkg/blobinfocache/boltdb/boltdb.go Outdated Show resolved Hide resolved
pkg/blobinfocache/boltdb/boltdb.go Outdated Show resolved Hide resolved
pkg/blobinfocache/internal/prioritize/prioritize.go Outdated Show resolved Hide resolved
pkg/blobinfocache/internal/prioritize/prioritize.go Outdated Show resolved Hide resolved
pkg/blobinfocache/internal/prioritize/prioritize.go Outdated Show resolved Hide resolved
pkg/blobinfocache/internal/prioritize/prioritize.go Outdated Show resolved Hide resolved
pkg/blobinfocache/memory/memory.go Outdated Show resolved Hide resolved
pkg/blobinfocache/boltdb/boltdb.go Show resolved Hide resolved
@flouthoc flouthoc force-pushed the CandidateLocations2-process-all-blobs branch 2 times, most recently from 78bdd13 to e2a1ceb Compare September 9, 2022 07:36
pkg/blobinfocache/boltdb/boltdb.go Outdated Show resolved Hide resolved
pkg/blobinfocache/internal/prioritize/prioritize.go Outdated Show resolved Hide resolved
@flouthoc flouthoc force-pushed the CandidateLocations2-process-all-blobs branch 2 times, most recently from 6c2de11 to 7c41d67 Compare September 12, 2022 07:35
@flouthoc flouthoc force-pushed the CandidateLocations2-process-all-blobs branch from f2fb916 to ca111f7 Compare September 21, 2023 10:04
@mtrmac
Copy link
Collaborator

mtrmac commented Sep 21, 2023

Yeah, I think it would be best not to worry about the new without-from feature in this PR. This is quite enough work for one PR already, and we can add that feature later as implementations appear; let’s just keep accurate comments in TryReusingBlob… to help future maintainers.

@flouthoc flouthoc force-pushed the CandidateLocations2-process-all-blobs branch from ca111f7 to 92cdb4f Compare September 22, 2023 09:28
@flouthoc
Copy link
Contributor Author

flouthoc commented Sep 22, 2023

@mtrmac Done, addressed comments as well as some old comments. I'll add new tests once overall PR looks good to you ( there are already some tests in prioritize_test.go, integration tests must be added on podman side )

@flouthoc flouthoc force-pushed the CandidateLocations2-process-all-blobs branch from 92cdb4f to d99cfa9 Compare September 22, 2023 11:05
Copy link
Collaborator

@mtrmac mtrmac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks reasonable overall.

I think the one major missing part is expanding the tests in pkg/blobinfocache/internal/test , so that the new code in storage backends code is tested. The logic is not quite trivial, so I’d prefer to have a good test coverage.

pkg/blobinfocache/memory/memory.go Outdated Show resolved Hide resolved
pkg/blobinfocache/memory/memory.go Outdated Show resolved Hide resolved
pkg/blobinfocache/memory/memory.go Outdated Show resolved Hide resolved
pkg/blobinfocache/memory/memory.go Outdated Show resolved Hide resolved
pkg/blobinfocache/memory/memory.go Outdated Show resolved Hide resolved
pkg/blobinfocache/internal/prioritize/prioritize_test.go Outdated Show resolved Hide resolved
pkg/blobinfocache/internal/prioritize/prioritize_test.go Outdated Show resolved Hide resolved
docker/docker_image_dest.go Outdated Show resolved Hide resolved
docker/docker_image_dest.go Outdated Show resolved Hide resolved
docker/docker_image_dest.go Outdated Show resolved Hide resolved
@flouthoc flouthoc force-pushed the CandidateLocations2-process-all-blobs branch 4 times, most recently from 4d3fe76 to a7cd83c Compare October 18, 2023 10:13
@flouthoc flouthoc requested a review from mtrmac October 18, 2023 10:14
@flouthoc
Copy link
Contributor Author

@mtrmac PTAL

@flouthoc
Copy link
Contributor Author

@mtrmac PTAL

@mtrmac Friendly ping :)

Copy link
Collaborator

@mtrmac mtrmac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

Hopefully the last round…

pkg/blobinfocache/sqlite/sqlite.go Outdated Show resolved Hide resolved
pkg/blobinfocache/sqlite/sqlite.go Outdated Show resolved Hide resolved
pkg/blobinfocache/sqlite/sqlite.go Outdated Show resolved Hide resolved
pkg/blobinfocache/sqlite/sqlite.go Outdated Show resolved Hide resolved
pkg/blobinfocache/internal/prioritize/prioritize_test.go Outdated Show resolved Hide resolved
pkg/blobinfocache/internal/prioritize/prioritize_test.go Outdated Show resolved Hide resolved
docker/docker_image_dest.go Outdated Show resolved Hide resolved
pkg/blobinfocache/internal/test/test.go Outdated Show resolved Hide resolved
…oss registries

It seems we try to reuse blobs only for the specified registry, however
we can have valid known compressed digests across registry as well
following pr attempts to use that by doing following steps.

* `CandidateLocations2` now processes all known blobs and appends them
  to returned candidates at the lowest priority. As a result when
`TryReusingBlob` tries to process these candidates and if the blobs
filtered by the `Opaque` set by the `transport` fail to match then
attempt is made against all known blobs (ones which do not belong to the
current registry).

* Increase the sample set of potential blob reuse to all known
  compressed digests , also involving the one which do not belong to
current registry.

* If a blob is found match it against the registry where we are
  attempting to push. If blob is already there consider it a `CACHE
HIT!` and reply skipping blob, since its already there.

How to verify this ?

* Remove all images `buildah rmi --all` // needed so all new blobs can
  be tagged again in common bucket
* Remove any previous `blob-info-cache` by

```console
rm /home/<user>/.local/share/containers/cache/blob-info-cache-v1.boltdb
```

```console
$ skopeo copy docker://registry.fedoraproject.org/fedora-minimal docker://quay.io/fl/test:some-tag
$ buildah pull registry.fedoraproject.org/fedora-minimal
$ buildah tag registry.fedoraproject.org/fedora-minimal quay.io/fl/test
$ buildah push quay.io/fl/test
```

```console
Getting image source signatures
Copying blob a3497ca15bbf skipped: already exists
Copying config f7e02de757 done
Writing manifest to image destination
Storing signatures
```

Signed-off-by: Aditya R <[email protected]>
@flouthoc flouthoc force-pushed the CandidateLocations2-process-all-blobs branch from a7cd83c to 539f9ee Compare November 1, 2023 05:49
@flouthoc
Copy link
Contributor Author

flouthoc commented Nov 1, 2023

@mtrmac PTAL again.

Copy link
Collaborator

@mtrmac mtrmac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

pkg/blobinfocache/sqlite/sqlite.go Show resolved Hide resolved
@mtrmac mtrmac merged commit 6c186a9 into containers:main Nov 1, 2023
9 checks passed
flouthoc added a commit to flouthoc/image that referenced this pull request Nov 1, 2023
Following check is not needed and can be removed. Followup fix from
containers#1645 (comment)

Signed-off-by: Aditya R <[email protected]>
@TomSweeneyRedHat
Copy link
Member

Nice work on this @flouthoc and @mtrmac

flouthoc added a commit to flouthoc/buildah that referenced this pull request Nov 10, 2023
It seems we try to reuse blobs only for the specified registry, however
we can have valid known compressed digests across registry as well
following pr attempts to use that by doing following steps.

* `CandidateLocations2` now processes all known blobs and appends them
  to returned candidates at the lowest priority. As a result when
`TryReusingBlob` tries to process these candidates and if the blobs
filtered by the `Opaque` set by the `transport` fail to match then
attempt is made against all known blobs (ones which do not belong to the
current registry).

* Increase the sample set of potential blob reuse to all known
  compressed digests , also involving the one which do not belong to
current registry.

* If a blob is found match it against the registry where we are
  attempting to push. If blob is already there consider it a `CACHE
HIT!` and reply skipping blob, since its already there.

----

```console
$ skopeo copy docker://registry.fedoraproject.org/fedora-minimal docker://quay.io/fl/test:some-tag
$ buildah pull registry.fedoraproject.org/fedora-minimal
$ buildah tag registry.fedoraproject.org/fedora-minimal quay.io/fl/test
$ buildah push quay.io/fl/test
```

```console
Getting image source signatures
Copying blob a3497ca15bbf skipped: already exists
Copying config f7e02de757 done
Writing manifest to image destination
Storing signatures
```

Testing: containers/image#1645

Signed-off-by: Aditya R <[email protected]>
flouthoc added a commit to flouthoc/buildah that referenced this pull request Nov 10, 2023
It seems we try to reuse blobs only for the specified registry, however
we can have valid known compressed digests across registry as well
following pr attempts to use that by doing following steps.

* `CandidateLocations2` now processes all known blobs and appends them
  to returned candidates at the lowest priority. As a result when
`TryReusingBlob` tries to process these candidates and if the blobs
filtered by the `Opaque` set by the `transport` fail to match then
attempt is made against all known blobs (ones which do not belong to the
current registry).

* Increase the sample set of potential blob reuse to all known
  compressed digests , also involving the one which do not belong to
current registry.

* If a blob is found match it against the registry where we are
  attempting to push. If blob is already there consider it a `CACHE
HIT!` and reply skipping blob, since its already there.

----

```console
$ skopeo copy docker://registry.fedoraproject.org/fedora-minimal docker://quay.io/fl/test:some-tag
$ buildah pull registry.fedoraproject.org/fedora-minimal
$ buildah tag registry.fedoraproject.org/fedora-minimal quay.io/fl/test
$ buildah push quay.io/fl/test
```

```console
Getting image source signatures
Copying blob a3497ca15bbf skipped: already exists
Copying config f7e02de757 done
Writing manifest to image destination
Storing signatures
```

Testing: containers/image#1645

Signed-off-by: Aditya R <[email protected]>
flouthoc added a commit to flouthoc/buildah that referenced this pull request Nov 11, 2023
It seems we try to reuse blobs only for the specified registry, however
we can have valid known compressed digests across registry as well
following pr attempts to use that by doing following steps.

* `CandidateLocations2` now processes all known blobs and appends them
  to returned candidates at the lowest priority. As a result when
`TryReusingBlob` tries to process these candidates and if the blobs
filtered by the `Opaque` set by the `transport` fail to match then
attempt is made against all known blobs (ones which do not belong to the
current registry).

* Increase the sample set of potential blob reuse to all known
  compressed digests , also involving the one which do not belong to
current registry.

* If a blob is found match it against the registry where we are
  attempting to push. If blob is already there consider it a `CACHE
HIT!` and reply skipping blob, since its already there.

----

```console
$ skopeo copy docker://registry.fedoraproject.org/fedora-minimal docker://quay.io/fl/test:some-tag
$ buildah pull registry.fedoraproject.org/fedora-minimal
$ buildah tag registry.fedoraproject.org/fedora-minimal quay.io/fl/test
$ buildah push quay.io/fl/test
```

```console
Getting image source signatures
Copying blob a3497ca15bbf skipped: already exists
Copying config f7e02de757 done
Writing manifest to image destination
Storing signatures
```

Testing: containers/image#1645

Signed-off-by: Aditya R <[email protected]>
flouthoc added a commit to flouthoc/buildah that referenced this pull request Nov 14, 2023
It seems we try to reuse blobs only for the specified registry, however
we can have valid known compressed digests across registry as well
following pr attempts to use that by doing following steps.

* `CandidateLocations2` now processes all known blobs and appends them
  to returned candidates at the lowest priority. As a result when
`TryReusingBlob` tries to process these candidates and if the blobs
filtered by the `Opaque` set by the `transport` fail to match then
attempt is made against all known blobs (ones which do not belong to the
current registry).

* Increase the sample set of potential blob reuse to all known
  compressed digests , also involving the one which do not belong to
current registry.

* If a blob is found match it against the registry where we are
  attempting to push. If blob is already there consider it a `CACHE
HIT!` and reply skipping blob, since its already there.

----

```console
$ skopeo copy docker://registry.fedoraproject.org/fedora-minimal docker://quay.io/fl/test:some-tag
$ buildah pull registry.fedoraproject.org/fedora-minimal
$ buildah tag registry.fedoraproject.org/fedora-minimal quay.io/fl/test
$ buildah push quay.io/fl/test
```

```console
Getting image source signatures
Copying blob a3497ca15bbf skipped: already exists
Copying config f7e02de757 done
Writing manifest to image destination
Storing signatures
```

Testing: containers/image#1645

Signed-off-by: Aditya R <[email protected]>
flouthoc added a commit to flouthoc/buildah that referenced this pull request Nov 17, 2023
It seems we try to reuse blobs only for the specified registry, however
we can have valid known compressed digests across registry as well
following pr attempts to use that by doing following steps.

* `CandidateLocations2` now processes all known blobs and appends them
  to returned candidates at the lowest priority. As a result when
`TryReusingBlob` tries to process these candidates and if the blobs
filtered by the `Opaque` set by the `transport` fail to match then
attempt is made against all known blobs (ones which do not belong to the
current registry).

* Increase the sample set of potential blob reuse to all known
  compressed digests , also involving the one which do not belong to
current registry.

* If a blob is found match it against the registry where we are
  attempting to push. If blob is already there consider it a `CACHE
HIT!` and reply skipping blob, since its already there.

----

```console
$ skopeo copy docker://registry.fedoraproject.org/fedora-minimal docker://quay.io/fl/test:some-tag
$ buildah pull registry.fedoraproject.org/fedora-minimal
$ buildah tag registry.fedoraproject.org/fedora-minimal quay.io/fl/test
$ buildah push quay.io/fl/test
```

```console
Getting image source signatures
Copying blob a3497ca15bbf skipped: already exists
Copying config f7e02de757 done
Writing manifest to image destination
Storing signatures
```

Testing: containers/image#1645

Signed-off-by: Aditya R <[email protected]>
flouthoc added a commit to flouthoc/buildah that referenced this pull request Nov 17, 2023
It seems we try to reuse blobs only for the specified registry, however
we can have valid known compressed digests across registry as well
following pr attempts to use that by doing following steps.

* `CandidateLocations2` now processes all known blobs and appends them
  to returned candidates at the lowest priority. As a result when
`TryReusingBlob` tries to process these candidates and if the blobs
filtered by the `Opaque` set by the `transport` fail to match then
attempt is made against all known blobs (ones which do not belong to the
current registry).

* Increase the sample set of potential blob reuse to all known
  compressed digests , also involving the one which do not belong to
current registry.

* If a blob is found match it against the registry where we are
  attempting to push. If blob is already there consider it a `CACHE
HIT!` and reply skipping blob, since its already there.

----

```console
$ skopeo copy docker://registry.fedoraproject.org/fedora-minimal docker://quay.io/fl/test:some-tag
$ buildah pull registry.fedoraproject.org/fedora-minimal
$ buildah tag registry.fedoraproject.org/fedora-minimal quay.io/fl/test
$ buildah push quay.io/fl/test
```

```console
Getting image source signatures
Copying blob a3497ca15bbf skipped: already exists
Copying config f7e02de757 done
Writing manifest to image destination
Storing signatures
```

Testing: containers/image#1645

Signed-off-by: Aditya R <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature A request for, or a PR adding, new functionality
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants