Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Garbage collector will delete objects when wrong GVK is specified in ownerRefs #14646

Closed
tnozicka opened this issue Jun 14, 2017 · 9 comments
Closed
Assignees
Labels
component/kubernetes kind/bug Categorizes issue or PR as related to a bug. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. priority/P2

Comments

@tnozicka
Copy link
Contributor

Follow up on #14582

Upstream garbage collector already checks for and ignores objects it doesn't know.

What happens in OpenShift is that if you specify invalid resource it works the same way. But if you specify a valid OpenShift resource with API version (without group) and GC has a cold cache, it will delete it as an orphan. That's because RESTMapper won't raise an error but dynamic client hits url with /api instead of /oapi and get 404 as if the resource wouldn't exist thus deletes the "orphan". This will cause data loss. Even if we fix all the controllers to create a valid controllerRefs and tested it properly, user can still manually (or with some automation of his own) create an ownerReference and such object would be incorrectly deleted resulting in data loss.

It seems like we could switch RESTMapper for garbage collector to fix it. (See @deads2k comment)

@deads2k
Copy link
Contributor

deads2k commented Jun 14, 2017

It seems like we could switch RESTMapper for garbage collector to fix it. (See @deads2k comment)

To be clear, the fix in this case means to fail harder and earlier, not actually tolerate such owner references.

@tnozicka tnozicka added the kind/bug Categorizes issue or PR as related to a bug. label Jun 15, 2017
@tnozicka
Copy link
Contributor Author

Agreed. (To expand on that: it doesn't fail with an error at all at this point which is the biggest issue - it just deletes data incorrectly.)

@tnozicka
Copy link
Contributor Author

cc: @mfojtik

@deads2k
Copy link
Contributor

deads2k commented Jun 16, 2017

I don't think this is a blocker for 3.6. The ownerrefs are managed in code, we just have to have them set them correctly.

@tnozicka
Copy link
Contributor Author

@deads2k For our objects, yes. For DC<-RC<-Po we set only 1 cotrollerRef in Po+RC. Users could set other owner references manually as well to achieve cascade deletion. Or have some scripts/controllers in place to do that.

ControllerRef is a special kind of OwnerRef (and a singleton) that we control and we can make sure to set it up correctly. OwnerRefs in general can be set up by anyone I think. Like you want to tie a lifetime of your DC to an IS or another DC or ... User can do that. I am aware that the probability of him doing that is quite low but it still results in data loss.

@deads2k
Copy link
Contributor

deads2k commented Jun 16, 2017

ControllerRef is a special kind of OwnerRef (and a singleton) that we control and we can make sure to set it up correctly. OwnerRefs in general can be set up by anyone I think. Like you want to tie a lifetime of your DC to an IS or another DC or ... User can do that. I am aware that the probability of him doing that is quite low but it still results in data loss.

I'm not going to claim it's ideal, but performing an advanced operation incorrectly (and there is a very narrow path to doing it correctly) behaving consistently (every other mistake also results in deletion), ought not block a release.

@tnozicka
Copy link
Contributor Author

tnozicka commented Jun 16, 2017

@deads2k I am fine with letting it slip 1.6.0; just wanted to point out it's not only about our controllers

@openshift-bot
Copy link
Contributor

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

@openshift-ci-robot openshift-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Feb 12, 2018
@tnozicka
Copy link
Contributor Author

/lifecycle frozen

@openshift-ci-robot openshift-ci-robot added the lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. label Feb 12, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component/kubernetes kind/bug Categorizes issue or PR as related to a bug. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. priority/P2
Projects
None yet
Development

No branches or pull requests

6 participants