optimize leader election and abuse api server #837

orainxiong · 2018-07-02T09:22:18Z

Optimize leader election from per PV to per instance;
Use informer cache rather than talking against api server.

2. Use informer cache rather than talking against api server.

orainxiong · 2018-07-02T09:23:44Z

When 100 PVCs are created at same time, the CSI external-provisioner hammers the kube apiserver with requests and gets throttled, causing all sorts of issues.

more detals : kubernetes-csi/external-provisioner#68

wongma7 · 2018-07-04T13:42:05Z

lib/controller/controller.go

 	pvName := ctrl.getProvisionedVolumeNameForClaim(claim)
-	volume, err := ctrl.client.CoreV1().PersistentVolumes().Get(pvName, metav1.GetOptions{})


IMO we should keep this. It is copied from upstream https://github.com/kubernetes/kubernetes/blob/master/pkg/controller/volume/persistentvolume/pv_controller.go#L1390. The idea is to deliberately bypass the cache in case a PV was already created. In this case we are trading a kube API Get to avoid an unnecessary storage backend Provision which is usually worse.

Actually I need to think more about this. Since this Get is premised on the cache being stale, and I am not sure if we have the same problem of a stale cache as upstream has.

nvm, sorry for noise, I am okay with this change. I cannot think of a scenario where we need this, we are no longer using goroutinemap like upstream is either.

wongma7 · 2018-07-04T13:47:10Z

lib/controller/controller.go

@@ -1118,10 +1035,25 @@ func (ctrl *ProvisionController) deleteVolumeOperation(volume *v1.PersistentVolu
 	// Our check does not have to be as sophisticated as PV controller's, we can
 	// trust that the PV controller has set the PV to Released/Failed and it's
 	// ours to delete
-	newVolume, err := ctrl.client.CoreV1().PersistentVolumes().Get(volume.Name, metav1.GetOptions{})


Unlike above, this Get I am okay with removing.

wongma7 · 2018-07-04T13:49:55Z

lib/controller/controller.go

@@ -716,75 +698,6 @@ func (ctrl *ProvisionController) shouldDelete(volume *v1.PersistentVolume) bool
 	return true
 }

-// lockProvisionClaimOperation wraps provisionClaimOperation. In case other


The issue if we remove it without a replacement is we will be trading kube API abuse for storage backend API abuse, in case somebody is running more than one provisioner instance (which is way too easy given they can just e.g. change deployment # from 1 to 2). I.e. instead of racing to lock a PVC and spamming the kube API server they will race to talk to the storage backend then race to create a PV, so we need a replacement per-storageclass leader election if we want to remove this.

Also if we get rid of this leader election there is a lot more code that can be removed, which I will be happy to be rid of, but I will do it myself in a subsequent and I would appreciate a review!

If I understand correctly, I think there are two ways to implementation leader election:

Pull off leader election logic from external-storage to have external-provisioner their own implement it. It will make the logic more simple on implementation, and the disadvantage is getting to impact on current external-provisioner.

Modify granularity of lock from per-PVC to per-Class to avoid the race condition.

I have no idea which one is better. BTW, with my pleasure to review if possible.

will be much more difficult since there are many other provisioner besides csi external-provisioner depending on external-storage and they will each be burdened with copying the same complementation, even if the implementation is simple. 2 is better I think. The more opaque all of this is to library consumers the better. I don't like the idea of our little controller depending on some configmap for maintaining leader state but I also don't want to overload storage classes for that purpose like we are overloading pvcs at the moment. I will have more time to think/work on this in the coming days

Agree with you. From the implementation perspective, external-storage could copy kind of leader election as existing operator does.

Any new ideas, please let me know. Many thanks

hi, in which scenario current per-PVC lock is bad? I like per-PVC lock idea, because I can simply deploy more provisioners to scale. Each provisioner can work independently.

I am not convinced client-side throttling is even a problem anymore after we introduced PVC work queues, since Provision typically takes a lot of time. But I am never opposed to adding more construct functions.

I've opened #847 to discuss operator principles in general.

I think we need to bring this up with more people, e.g. in the next sig-storage meeting if there's time, otherwise I will ramble on indecisively forever. We should have this resolved AT LATEST before 1.12 release IMO

I'll also write an email to sig-storage google group next week

@wongma7 any chance you can join us to discuss this issue ?

@vladimirvivien yes, where?

@wongma7
if you can, join the next CSI meeting (Wednesday 10am PST) for a quick report on the stuff you are working (LE/informers). Here is the link below:

VC on Zoom: https://zoom.us/j/614261834

Notes and agenda doc: https://docs.google.com/document/d/1-WmRYvqw1FREcD1jmZAOjC0jX6Gop8FMOzsdqXHFoT4/edit?usp=sharing

1. Optimize leader election from per PV to per instance;

8d9f3ad

2. Use informer cache rather than talking against api server.

k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jul 2, 2018

orainxiong changed the title ~~optimize leader election and~~ optimize leader election and abuse api server Jul 2, 2018

This was referenced Jul 2, 2018

fix apiserver throttling issue kubernetes-csi/external-provisioner#104

Closed

Rewrite controller #812

Closed

wongma7 reviewed Jul 4, 2018

View reviewed changes

wongma7 mentioned this pull request Jul 6, 2018

Investigate Operator Framework? #847

Closed

wongma7 added the area/lib label Jul 6, 2018

wongma7 mentioned this pull request Jul 30, 2018

Replace per-PVC leader election with per-cluster #892

Merged

orainxiong closed this Aug 7, 2018

wongma7 mentioned this pull request Aug 14, 2018

AlreadyExists error cause repeated provisioning a volume kubernetes-csi/external-provisioner#124

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

optimize leader election and abuse api server #837

optimize leader election and abuse api server #837

orainxiong commented Jul 2, 2018

orainxiong commented Jul 2, 2018

wongma7 Jul 4, 2018

wongma7 Jul 4, 2018

wongma7 Jul 4, 2018

wongma7 Jul 4, 2018

wongma7 Jul 4, 2018

orainxiong Jul 5, 2018

wongma7 Jul 5, 2018

orainxiong Jul 6, 2018

cofyc Jul 6, 2018

wongma7 Jul 6, 2018

wongma7 Jul 6, 2018

vladimirvivien Jul 25, 2018

wongma7 Jul 25, 2018

vladimirvivien Jul 30, 2018

		pvName := ctrl.getProvisionedVolumeNameForClaim(claim)
		volume, err := ctrl.client.CoreV1().PersistentVolumes().Get(pvName, metav1.GetOptions{})

optimize leader election and abuse api server #837

optimize leader election and abuse api server #837

Conversation

orainxiong commented Jul 2, 2018

orainxiong commented Jul 2, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment