Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide layers referenced by an image stream as layers subresource #19969

Merged
merged 4 commits into from
Jul 13, 2018

Conversation

smarterclayton
Copy link
Contributor

@smarterclayton smarterclayton commented Jun 12, 2018

Adds a new GET endpoint to an image stream as a subresource layers that
returns an array of every layer referenced by the image stream and the tags
and images included by the image.

The subresource is fed by a store driven informer that caches and indexes
only the layers. Clients get a 500 retry error if the cache has not
initialized yet (the client will silently retry).

Turns the registry access check for a given layer into an O(1) check
instead of O(N) where N is the number of images in the image stream.

This is not a general purpose layer cache, but lays the foundation for it.

Depends on openshift/api#56

Detected by https://bugzilla.redhat.com/show_bug.cgi?id=1589994

@bparees

@openshift-ci-robot openshift-ci-robot added approved Indicates a PR has been approved by an approver from all required OWNERS files. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Jun 12, 2018
var _ metav1.Object = &ImageLayers{}

func (l *ImageLayers) GetObjectKind() schema.ObjectKind { return &metav1.TypeMeta{} }
func (l *ImageLayers) DeepCopyObject() runtime.Object {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not generated this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably could, although I'm hesitant to involve the stack there.


var _ metav1.Object = &ImageLayers{}

func (l *ImageLayers) GetObjectKind() schema.ObjectKind { return &metav1.TypeMeta{} }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like we are forcing ImageLayers to be an runtime.Object. At the very least this neads a very good reason documented as a comment above.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@smarterclayton why not making this an actual API object?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a minimal cache. Cache doesn't actually need runtime object except there are bugs in cache. I'm going to correct those, which is why there is a comment. Informers are supposed to depend on list-watch (which requires resource version) but the cache key is arbitrary.

@mfojtik
Copy link
Contributor

mfojtik commented Jun 12, 2018

@smarterclayton for UPSTREAM: openshift/api: Add an image layers API aren't you supposed to bump the external repo (openshift/api) first and then just run ./hack/update-deps ? /cc @deads2k

@smarterclayton
Copy link
Contributor Author

smarterclayton commented Jun 12, 2018 via email

@smarterclayton smarterclayton changed the title Provide layers referenced by an image stream as layers subresource WIP - 3.11 - Provide layers referenced by an image stream as layers subresource Jun 12, 2018
@openshift-ci-robot openshift-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jun 12, 2018
}
return &imageapi.ImageLayer{
Name: image.DockerImageMetadata.ID,
MediaType: image.DockerImageManifestMediaType,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function name is misleading. And the return value too.

For v2 manifests DockerImageMetadata.ID contains a digest of the image configuration. It differs from a manifest.

The MediaType for this blob is a though question, but it definitely shouldn't be DockerImageManifestMediaType.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As LayerSize is not filled here, it shouldn't be used anywhere.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’ll probably say we just mark layer size as optional in the api. It’s not fundamental for this call (although I would prefer it be correct). Is there a reason we didn’t record it on the image object? Or did we just lose that info when we moved back to storing this in the registry? We have it when we post a new image to the registry, correct?

You’re right that this should be the configuration mimetype.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably can make LayerSize a pointer and have it be nil if unknown.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated to make layer size be a pointer and nil if unknown.

for i, image := range list.Items {
out.Items[i] = &ImageLayers{
Name: image.Name,
Layers: image.DockerImageLayers,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These layers have different order in different images. ImageAPI reorder them and add imageapi.DockerImageLayersOrderAnnotation, but this code drop this infomation. After that, these layers are copied as is, but without an annotation indicating the order.

Also there is no imageapi.ManagedByOpenShiftAnnotation for image. Therefore, then the registry does not have the ability to understand this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’ll reorder them before returning.

With regards to pullthrough off, I think we probably should drop support for disabling it. We will depend on it for secured images (when registry access for redhat is made protected), and local resolution depends on it. Is there a good use case for continuing to support it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a good use case for continuing to support it?

You tell me :) We always had the option to turn on/off pullthrough. Because of this, the registry's code is more complex than it could be because we always need to remember the type of the image and sometimes pretend that we do not know about it.

If you think that we can drop it, then let's do it. Please open an issue to remove it.

But right now without imageapi.ManagedByOpenShiftAnnotation pullthrough will be broken.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, we can not completely remove imageapi.ManagedByOpenShiftAnnotation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me chat with Ben - if we think we're dependent on pull through now for protected image access, then I think it's time to remove the support for pull through false. Will respond back today.

Do we need the annotation for anything else specific to the use of this call (in the place where we check whether an image stream has "access" to a layer)?

A long time ago we talked about changing the meaning of the annotation (or replacing it with a new annotation) which was:

The person who imported this image metadata has proven they have access to each of the layers in the image they imported.

A background controller (the mirror controller) would not set the annotation until it had either pulled all layers from the source (in the background) or otherwise verified the source had those layers (check that the source can serve the contents). Before the annotation was set, only pull through would work, and users would be blocked from tagging the image into another image stream. Once the annotation was set, the image was now effectively managed (we've imported the image and the metadata). The downside of this is that it forces mirroring which can be expensive (today mirroring is mostly controlled by whether you set ReferencePolicy in practice).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me chat with Ben - if we think we're dependent on pull through now for protected image access, then I think it's time to remove the support for pull through false.

OCP will be, Origin won't be. Perhaps that's sufficient though.

(Technically someone could also setup an OCP cluster that got protected images from elsewhere like an internal mirror, so it is possible to run an OCP cluster w/o pullthrough, also).

But i'm certainly a fan of simplifying.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we closed on this, right? Dropping pullthrough: false support?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, being removed here: openshift/image-registry#103

i think it's currently stuck on that test that needs the docker socket for which i have a PR open to mark as local.

@smarterclayton
Copy link
Contributor Author

/test launch-gcp

2 similar comments
@smarterclayton
Copy link
Contributor Author

/test launch-gcp

@smarterclayton
Copy link
Contributor Author

/test launch-gcp

@smarterclayton
Copy link
Contributor Author

e2e test should be passing, initial comments addressed. Will be updating the registry PR soon.

@smarterclayton smarterclayton changed the title WIP - 3.11 - Provide layers referenced by an image stream as layers subresource Provide layers referenced by an image stream as layers subresource Jun 25, 2018
// may have zero layers.
// +optional
Layers []string `json:"layers" protobuf:"bytes,1,rep,name=layers"`
// manifest, if set, is the blob that contains the image manifest. Some images do
Copy link
Contributor

@liggitt liggitt Jul 9, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is the sha of the manifest ("name of the blob containing the image manifest"), not the blob itself, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The manifest is a blob, and the digest / name for both is the same (the SHA of the contents)

type ImageBlobReferences struct {
// layers is the list of blobs that compose this image, from base layer to top layer.
// All layers referenced by this array will be defined in the blobs map. Some images
// may have zero layers.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it valid to have zero layers and no manifest?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

talked offline: yes, this could be a scratch image with inline manifest

// optional if the necessary information about size is not available.
LayerSize *int64 `json:"size" protobuf:"varint,1,opt,name=size"`
// MediaType of the referenced object.
MediaType string `json:"mediaType" protobuf:"bytes,2,opt,name=mediaType"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this always known?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we always know what this value is in the Image object as part of that API (since 3.5 or 3.6)

@liggitt
Copy link
Contributor

liggitt commented Jul 9, 2018

Any other API comments? Updating that PR and will put the label on it otherwise

a couple questions, no other concerns from me

Copy link
Contributor

@bparees bparees left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems like there are some un-addressed comments in this PR also.

Layers(name string) (*v1.ImageStreamLayers, error)
}

// Bind applies the provided binding to the named pod in the current namespace (binding.Namespace is ignored).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

godoc is wrong

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The upstreams are only for testing, they'll be pulled in via a bump.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i didn't look closely to see if you pulled this in from upstream. as long as it's correct upstream, then, i guess.

var oc *exutil.CLI
var ns []string

g.AfterEach(func() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wait for builder service account in a beforeeach

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't builds wait for builder service account?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't try to access anything before builder service account is available.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't builds wait for builder service account?

well, I should have said "wait for the builder service account's secrets". (the waitforserviceaccount helper does both)

if the SA exists but the secrets don't, the build will fail (because it can't pull the base image or push the app image) We do have some logic in the build controller now that waits for build secrets in certain cases:

https://github.com/openshift/origin/blob/master/pkg/build/controller/build/build_controller.go#L525-L543

given that i'm actually not sure why we saw some of the pull issues we saw earlier, unless those were only happening on deployments, not builds.

@smarterclayton
Copy link
Contributor Author

Updated on top of an API bump (i think i finally fixed glide to not be horribad)

Regressed when we started using our own logic.
Adds a new GET endpoint to an image stream as a subresource `layers`
that returns an array of every layer referenced by the image stream and
the tags and images included by the image.

The subresource is fed by a store driven informer that caches and
indexes only the layers. Clients get a 500 retry error if the cache has
not initialized yet (the client will silently retry).

Turns the registry access check for a given layer into an O(1) check
instead of O(N) where N is the number of images in the image stream.
@deads2k
Copy link
Contributor

deads2k commented Jul 11, 2018

Updated on top of an API bump (i think i finally fixed glide to not be horribad)

Any other changes? I made it through my review before and it lgtm

@smarterclayton
Copy link
Contributor Author

Nothing new. Ben's comments were resolved

@smarterclayton
Copy link
Contributor Author

/retest

@deads2k
Copy link
Contributor

deads2k commented Jul 11, 2018

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Jul 11, 2018
@openshift-ci-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: deads2k, smarterclayton

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [deads2k,smarterclayton]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@smarterclayton
Copy link
Contributor Author

/retest

4 similar comments
@danwinship
Copy link
Contributor

/retest

@smarterclayton
Copy link
Contributor Author

/retest

@danwinship
Copy link
Contributor

/retest

@smarterclayton
Copy link
Contributor Author

/retest

@openshift-merge-robot openshift-merge-robot merged commit 4ad16ad into openshift:master Jul 13, 2018
@openshift-ci-robot
Copy link

@smarterclayton: The following tests failed, say /retest to rerun them all:

Test name Commit Details Rerun command
ci/openshift-jenkins/gcp e9536dd link /test gcp
ci/openshift-jenkins/extended_image_ecosystem aac71d5 link /test extended_image_ecosystem

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.