-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
deploy: set ownerRef from RC to grouped API version #14582
Conversation
@tnozicka @smarterclayton do we need migration for RC that have already legacy API group set or the refManager will patch them again with the correct ref? |
[test] |
To confirm this works:
Now the GC client gets the DC successfully. |
@bparees fyi, i guess the dynamic client will be broken for all ownerRefs that build* creates? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Evaluated for origin test up to 0824d22 |
lgtm [merge] |
@mfojtik let's try to think of a way to make sure that these cause errors, not disasters. Seems like using a RESTMapper without our |
@smarterclayton i think this was 85% of the friday's problem, the rest 15% was quorum read which we should address as well. |
continuous-integration/openshift-jenkins/merge Waiting: You are in the build queue at position: 5 |
Evaluated for origin merge up to 0824d22 |
@jupierce FYI |
I don't think we need migration for this as this was not in any released version, yet. (Also this should kind of migrate itself over time as those old RCs get deleted by mistake new RCs gets created properly, with history loss. This should happen only on start up [because of caches] not causing almost any downtime since containers were not really started yet anyways.) |
continuous-integration/openshift-jenkins/test SUCCESS (https://ci.openshift.redhat.com/jenkins/job/test_pull_request_origin/2129/) (Base Commit: c09c601) |
Don't we have to do this for builds and everything else? |
@smarterclayton @bparees was already warned |
Force merging to clear the queue, does not conflict with currently merging PR. |
@smarterclayton @sttts this should fix the deployer gone missing problem (I believe).
Some explanation:
When the GC cache is cold, the GC will try to use a live GET using the dynamic client to check if the resource specified in the ownerRef exists. In this case the RC is pointing to a DC. However, the dynamic client use wrong API prefix (/api instead of /oapi) it this will return 404 and the GC thinks that the DC is gone and mark the RC for deletion. The RC is deleted and since the deployer pod has an ownerRef pointing to this RC the pod is also terminated and deleted.
When the GC caches warm up, it will see the DC (UUID) and so it will not do live GET lookup and keep the RC untouched, allowing successfull rollout. I guess we were just lucky in our CI/CD and we warmed the cache fast enough to not hit this (the ipfailover flaked)...
What this patch does is to force API group version for the DC, which the dynamic client should handle (i can confirm it returns 200).
Fixes: #13995