Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Check if machineDeployment queue exists before using it #504

Merged

Conversation

alvaroaleman
Copy link
Member

Resolves #501

What this PR does / why we need it:

Avoids NPEs when starting the machineDeployment controller in a cluster that has pre-existing machineSets

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #501

Special notes for your reviewer:

Please confirm that if this PR changes any image versions, then that's the sole change this PR makes.

Release note:

NONE

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Sep 19, 2018
@k8s-ci-robot k8s-ci-robot added size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Sep 19, 2018
@alvaroaleman
Copy link
Member Author

/assign @roberthbailey

@xmudrii
Copy link
Member

xmudrii commented Sep 19, 2018

/ok-to-test

@k8s-ci-robot k8s-ci-robot removed the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Sep 19, 2018
if _, queueExists := c.informers.WorkerQueues["MachineDeployment"]; !queueExists {
glog.V(2).Infof("MachineDeployment queue does not exist, requing after %v", requeueAfterWhenQueueAbsent)
time.Sleep(requeueAfterWhenQueueAbsent)
c.enqueue(d)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't here have an error out after multiple attempts or timeout?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only case where this happens is during startup of the controller because init is called before the workqeue is created:

if i, ok := ci.(sharedinformers.LegacyControllerInit); ok {
i.Init(config, si, c.LookupAndReconcile)
} else if i, ok := ci.(sharedinformers.ControllerInit); ok {
i.Init(&sharedinformers.ControllerInitArgumentsImpl{si, config, c.LookupAndReconcile})
}
c.controller = uc
queue.Reconcile = c.reconcile
if c.Informers.WorkerQueues == nil {
c.Informers.WorkerQueues = map[string]*controller.QueueWorker{}
}
c.Informers.WorkerQueues["MachineDeployment"] = queue

The init of this controller however registers already eventHandlers for machineSets which will subsequently try to put their owning machineDeployments in the not yet existing qeue:

arguments.GetSharedInformers().Factory.Cluster().V1alpha1().MachineSets().Informer().AddEventHandler(cache.ResourceEventHandlerFuncs{
AddFunc: c.addMachineSet,
UpdateFunc: c.updateMachineSet,
DeleteFunc: c.deleteMachineSet,
})

Just creating the qeue before calling Init is unfortunatelly not possible because all of that happens in generated code.

Since this should only happen during the startup in just a very small timeframe I think its okay like this. WDYT?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pwittrock - is this something that gets fixed with the kubebuilder change?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alvaroaleman , I see, thanks.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@roberthbailey Are you fine with merging this to have the machineDeployment controller working again?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, sorry for the delay.

@roberthbailey
Copy link
Contributor

/approve
/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Sep 25, 2018
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: alvaroaleman, roberthbailey

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Sep 25, 2018
@k8s-ci-robot k8s-ci-robot merged commit d99b763 into kubernetes-sigs:master Sep 25, 2018
@alvaroaleman alvaroaleman deleted the fix-machinedeployment-npe branch September 25, 2018 07:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

machineDeployment controller races+panics on startup when a machineSet exists
5 participants