-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
image-pruning: dereference ImageStreamTags #16821
image-pruning: dereference ImageStreamTags #16821
Conversation
pkg/image/prune/prune.go
Outdated
return err | ||
} | ||
if err := p.addBuildsToGraph(options.Builds); err != nil { | ||
// TODO: either make the other methods error out on image reference parsing or ignore this error |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@legionus what do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i'd prefer we report a warning if we can't parse an image reference, but don't prevent pruning. I think we had this conversation before, but my view is that if you don't have valid reference, we shouldn't care what you were trying to reference.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i'd prefer we report a warning if we can't parse an image reference, but don't prevent pruning. I think we had this conversation before, but my view is that if you don't have valid reference, we shouldn't care what you were trying to reference.
I agree. Changed to warning.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I disagree.
There can be two cases:
-
Docker change their policy on references (for example, they may allow the
+
symbol in tags). And we can change our policy on tags as well. In this case the old version ofoc
will fail to parse this reference, and it will happily remove images with such references ("graceful fail", yep). -
Someone able to create a resource with an invalid image reference. This state is bad enough to start investigations. We shouldn't ignore bugs in our code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if it's not a reference we can parse, then there can be no pods using it, by definition, no?
Whether that's a bug in our code or not doesn't really matter, if we can't parse it, it's not a reference.
And while i agree that the ability to create invalid references is bad(if it were to happen), the fact is the code paths are different (validation vs resolution) and pruning is not the place to reconcile/debug that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pruning is a client-side operation and it may have an outdated parser.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Whether that's a bug in our code or not doesn't really matter, if we can't parse it, it's not a reference.
@bparees But if you can't parse reference then you don't add a dependency and you can remove what is used right now.
Is not that a problem?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@miminar It's NAK for me.
pkg/image/prune/prune.go
Outdated
@@ -455,18 +475,36 @@ func addPodSpecToGraph(g graph.Graph, spec *kapi.PodSpec, predecessor gonum.Node | |||
} | |||
|
|||
if len(ref.ID) == 0 { | |||
ref = ref.DockerClientDefaults() | |||
if p.registryURL.Host != ref.RegistryURL().Host { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is very fragile IMHO. There are many possible registry hosts that can be used to fetch images from the integrated registry. Currently, this considers just the one determined by the algorithm or passed with --registry-url
, which may be totally wrong.
Shall we collect all the possible registry hosts from all the images when we build the graph and match the url against the collected set?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how would you build the set of possible registry hosts that represent the internal registry?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Collecting all hosts from all the images is not enough. In general we can't know which DockerImageReference uses the internal registry. One can pull images through the router.
Even now I can use three names for the integrated registry: 172.30.1.1:5000, docker-registry.default.svc and docker-registry.default.svc.cluster.local. But only one can be specified in p.registryURL.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
right, that's why i'm asking how you could build the set of all possible hosts that represent the registry? it doesn't seem possible (or at least reasonably possible) since anyone can create a service or route that points to it and then use that.
So i'm trying to understand what you're proposing doing when you say:
Shall we collect all the possible registry hosts from all the images when we build the graph and match the url against the collected set?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Most of the images pushed to the integrated registry will have $OPENSHIFT_DEFAULT_REGISTRY
host name in their dockerImageReference
attribute. All of them will be managed (and having managed annotation). There may be even some older managed images having some obsolete registry address. These addresses are most probably being used inside a cluster in deployments, pods, daemonsets, etc.
For the external ones, it's much harder. All we have (sometimes) is the optional --registry-url
flag.
What I had in mind is iterating over all managed images and get a set of internal addresses, add the registry-url
and docker-registry.default.svc.cluster.local
and compare image references against this set.
pkg/image/prune/prune.go
Outdated
glog.V(4).Infof("%q has no image ID", container.Image) | ||
node := p.g.Find(imagegraph.ImageStreamTagNodeName(makeISTagWithReference(&ref))) | ||
if node == nil { | ||
glog.V(4).Infof("no image stream tag found for %q - skipping", container.Image, ref.RegistryURL().Host, p.registryURL.Host) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like this won't pass verification.
pkg/image/prune/prune.go
Outdated
tagUpdated = true | ||
continue | ||
} | ||
glog.V(4).Infof("Image stream tag %s:%s revision %d - keeping because of age", streamName, tag, i) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But it isn't kept.
Can you extract this part into func tagEventIsPrunable(tagEvent image.TagEvent, g, prunableImageNodes, keepYoungerThan) (ok bool, reason string)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But it isn't kept.
Good catch, extracted.
pkg/image/prune/prune.go
Outdated
func (p *pruner) buildGraph(options PrunerOptions) error { | ||
p.g = graph.New() | ||
|
||
p.addImagesToGraph(options.Images, p.algorithm) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
p.algorithm is a part of pruner
and can be removed from arguments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
p.algorithm is a part of pruner and can be removed from arguments.
Removed.
@@ -351,28 +366,33 @@ func addImageStreamsToGraph(g graph.Graph, streams *imageapi.ImageStreamList, li | |||
} | |||
} | |||
|
|||
if i == 0 { | |||
glog.V(4).Infof("Adding edge (kind=%s) from %q to %q", kind, istNode.UniqueName(), imageNode.UniqueName()) | |||
p.g.AddEdge(istNode, imageNode, kind) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need this?
In order to dereference imagestreamtag to image, there must be some kind of mapping between the two. It could be regular map[istagname]image
. But since we already have the graph, we can utilize it. The fact is that ImageStreamTagNode
takes unnecessarily too much memory - we could make the istnode a lot smaller by making it contain just a single string (istagname). Or would you prefer regular map
?
3fc7551
to
52d9c44
Compare
pkg/image/prune/prune.go
Outdated
age := metav1.Now().Sub(pod.CreationTimestamp.Time) | ||
if age >= algorithm.keepYoungerThan { | ||
glog.V(4).Infof("Pod %s is not running nor pending and age exceeds keepYoungerThan (%v) - skipping", getName(pod), age) | ||
if !pod.CreationTimestamp.Time.After(algorithm.keepYoungerThan) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would not have expected --keep-younger-than to apply to pod ages, just image ages. is this well documented?
--keep-younger-than=1h0m0s: Specify the minimum age of an image for it to be considered a candidate for pruning.
I understand the motivation, but i think the behavior would surprise people. I would prefer that we either said "stopped pods don't count as image references" or "prune your stopped pods before pruning your images" instead of this approach of overloading the keep-younger-than arg to "ignore some stopped pods, but count others"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's poorly documented here. Our official docs say it better:
Do not prune any image that is younger than relative to the current time. Do not prune any image that is referenced by any other object that is younger than relative to the current time. (default 60m)
I'll fix it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We shouldn't prune an image which is referenced by any other object, should we?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We shouldn't prune an image which is referenced by any other object, should we?
@dmage that's not what we've been doing. I agree that it would be the right thing to do to maintain the integrity. However, we don't have an easy to use pruning solution for all the objects that can reference images.
If we decide to keep the images referenced by any other object regardless of age, let's do it in some other PR. This PR makes it already harder to delete referenced images.
pkg/image/prune/prune.go
Outdated
if age >= algorithm.keepYoungerThan { | ||
glog.V(4).Infof("Pod %s is not running nor pending and age exceeds keepYoungerThan (%v) - skipping", getName(pod), age) | ||
if !pod.CreationTimestamp.Time.After(algorithm.keepYoungerThan) { | ||
glog.V(4).Infof("Pod %s is neither running nor pending and is too old", getName(pod)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's not clear (out of context) what this message means. Update it to something like "Ignoring pod %s for image reference counting because it is not running/pending and is older than %age"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rewritten.
glog.V(4).Infof("Checking tag event %d with image %q", i, tagEvent.Image) | ||
|
||
if ok, reason := tagEventIsPrunable(tagEvent, g, prunableImageNodes, keepYoungerThan); ok { | ||
glog.V(4).Infof("Image stream tag %s:%s revision %d - removing because %s", streamName, tag, i, reason) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
print the namespace for the imagestream too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The namespace is already part of the streamName
: https://github.com/openshift/origin/blob/4fc5ca3/pkg/image/prune/prune.go#L1077
glog.V(4).Infof("Image stream tag %s:%s revision %d - removing because %s", streamName, tag, i, reason) | ||
tagUpdated = true | ||
} else { | ||
glog.V(4).Infof("Image stream tag %s:%s revision %d - keeping because %s", streamName, tag, i, reason) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
print the IS namespace too.
pkg/apps/graph/nodes/nodes.go
Outdated
) | ||
|
||
// EnsureDaemonSetNode adds the provided deployment config to the graph if it does not exist |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/deployment config/daemon set/
pkg/apps/graph/nodes/nodes.go
Outdated
@@ -92,3 +92,30 @@ func FindOrCreateSyntheticDeploymentConfigNode(g osgraph.MutableUniqueGraph, dc | |||
}, | |||
).(*DeploymentConfigNode) | |||
} | |||
|
|||
// EnsureReplicaSetNode adds the provided deployment config to the graph if it does not exist |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/deployment config/replicaset/
pkg/image/prune/prune.go
Outdated
@@ -455,18 +475,36 @@ func addPodSpecToGraph(g graph.Graph, spec *kapi.PodSpec, predecessor gonum.Node | |||
} | |||
|
|||
if len(ref.ID) == 0 { | |||
ref = ref.DockerClientDefaults() | |||
if p.registryURL.Host != ref.RegistryURL().Host { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how would you build the set of possible registry hosts that represent the internal registry?
pkg/image/prune/prune.go
Outdated
@@ -455,18 +475,36 @@ func addPodSpecToGraph(g graph.Graph, spec *kapi.PodSpec, predecessor gonum.Node | |||
} | |||
|
|||
if len(ref.ID) == 0 { | |||
ref = ref.DockerClientDefaults() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please add a comment here explaining what is happening.
My understanding is something like:
If the pod container spec image reference does not contain a digest(e.g. it i just "docker.io/foo/bar:latest"), attempt to determine an image digest by looking for imagestreamtags that point to the image referenced by this pod container spec, and create a reference between the pod and that image to prevent the image from being pruned.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Commented.
/unassign @knobunc |
52d9c44
to
b89cba8
Compare
b89cba8
to
99e3193
Compare
99e3193
to
d75b1dc
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NAK
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NAK
/hold |
Update: The point is to error out on invalid image reference. All errors should be collected and reported during the building of graph. Reason: consider a more benevolent reference parser (e.g. allowing ~~I'm reworking it. ~~ |
c1ae9c6
to
4267f2a
Compare
All parsing errors are now collected and reported during the graph build.
|
cc695ab
to
4d5a231
Compare
/test extended_image_registry |
/hold cancel |
pkg/image/prune/prune_test.go
Outdated
var ( | ||
config1 = "sha256:2b8fd9751c4c0f5dd266fcae00707e67a2545ef34f9a29354585f93dac906749" | ||
config2 = "sha256:8ddc19f16526912237dd8af81971d5e4dd0587907234be2b83e249518d5b673f" | ||
_ "github.com/openshift/origin/pkg/apps/apis/apps/install" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it for getRef?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I'll put a comment there
Update: commented.
pkg/oc/admin/prune/images.go
Outdated
if closed && err != nil { | ||
return nil, nil, err | ||
} | ||
case <-time.After(time.Second): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you need this timeout?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code is inspiried by
origin/pkg/oc/cli/cmd/version.go
Line 95 in 50389c0
func (o VersionOptions) RunVersion() error { |
oc version
asks the master for its version with some default timeout.
This call will be much faster than most of the other calls asking for object lists. On the other hand, I don't want to further block error reporting once we know we have failed and master is busy.
Update: changed it to get the timeout from client config
Create strong references to images for each pod/bc/dc/etc that uses <host>/<repo>:tag reference. Resolves bug 1498604 Signed-off-by: Michal Minář <[email protected]>
4d5a231
to
1cda0bb
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/approve
but I want one more opinion for lgtm :)
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: bparees, dmage, miminar The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these OWNERS Files:
You can indicate your approval by writing |
/retest Please review the full test history for this PR and help us cut down flakes. |
Automatic merge from submit-queue. |
Did this start flaking in master?
https://openshift-gce-devel.appspot.com/build/origin-ci-test/pr-logs/pull/17035/test_pull_request_origin_unit/5039/
On Oct 31, 2017, at 2:58 AM, OpenShift Merge Robot <[email protected]> wrote:
Merged #16821 <#16821>.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#16821 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/ABG_p5EoZWfroSswPhSOr_VnBtT1g0XPks5sxsUAgaJpZM4P25ZY>
.
|
Yes, sorry about that. Here's a fix: #17105 |
Create strong references to images for each pod/bc/dc/etc that uses
<host>/<repo>:tag
reference.Resolves bz#1498604 and https://bugzilla.redhat.com/show_bug.cgi?id=1386917
Images can manually removed using
oc delete
. Image stream tags having references to these images become obsolete - we may delete them.To be sure that we don't remove reference to image that has just been created (and we don't know about it), make sure to honor
--keep-younger-than
.