Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add autoscaler for kube-state-metrics #200

Merged
merged 2 commits into from
Aug 8, 2017

Conversation

WIZARD-CXY
Copy link

@WIZARD-CXY WIZARD-CXY commented Aug 3, 2017

For #124
(provide deployment manifest that scales with cluster size using pod nanny)
This setup is tested on my gke cluster, now kube-state-metrics pod has two container, kube-state-metrics and pod-nanny

➜  kube-state-metrics git:(addautoscaler) ✗ kubectl get po -n kube-system
NAME                                                  READY     STATUS    RESTARTS   AGE
event-exporter-v0.1.4-4272745813-m1sqp                2/2       Running   0          6d
fluentd-gcp-v2.0-290kq                                2/2       Running   0          6d
fluentd-gcp-v2.0-7wmrd                                2/2       Running   0          6d
fluentd-gcp-v2.0-qtr9r                                2/2       Running   0          6d
heapster-v1.4.0-2764992688-s3q6v                      2/2       Running   0          6d
kube-dns-1413379277-387r0                             3/3       Running   0          6d
kube-dns-1413379277-z057j                             3/3       Running   0          6d
kube-dns-autoscaler-3880103346-73r5g                  1/1       Running   0          6d
kube-proxy-gke-cluster-1-default-pool-d10ff02d-cft7   1/1       Running   0          6d
kube-proxy-gke-cluster-1-default-pool-d10ff02d-mmfk   1/1       Running   0          6d
kube-proxy-gke-cluster-1-default-pool-d10ff02d-x4pg   1/1       Running   0          6d
kube-state-metrics-1585740284-w6lnr                   2/2       Running   0          1m
kubernetes-dashboard-1962351010-km5wg                 1/1       Running   0          6d
l7-default-backend-2954409777-rrwkq                   1/1       Running   0          6d

The cpu and memory parameters can be adjusted too.


This change is Reviewable

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Aug 3, 2017
@WIZARD-CXY WIZARD-CXY force-pushed the autoscaler branch 2 times, most recently from 89a11dd to 68cf172 Compare August 3, 2017 06:34
@WIZARD-CXY
Copy link
Author

I think since the nanny scales resources linearly with the number of nodes in the cluster, I can use small starting resource for kube-state-metrics container. Here I use request resource for resource base requirement in nanny.

@brancz
Copy link
Member

brancz commented Aug 3, 2017

Having never used the nanny pod, this might be a incorrect, but I'd expect it to change the deployment manifest, meaning it needs the RBAC roles for it, no?

@brancz
Copy link
Member

brancz commented Aug 3, 2017

I looked at the addon-resizer code and from what I can tell it needs:

  • nodes: list, watch
  • deployments: get, update
  • pods: get

What that would basically mean is that we need to add the last two points to our RBAC roles.

@WIZARD-CXY
Copy link
Author

Good point, but I don't add the last two, it works... and the example in https://github.com/kubernetes/autoscaler/tree/master/addon-resizer
is a simple deployment without any other RBAC config.Maybe it exists some tricks

@brancz
Copy link
Member

brancz commented Aug 3, 2017

Depending on the ServiceAccount you're using, you might have given your pod cluster admin roles as it's in the kube-system namsepace.

@piosz @loburm are you aware of the RBAC roles the addon-resizer needs? From looking at the code it doesn't seem to do anything special so I'd expect it to require it.

@WIZARD-CXY
Copy link
Author

@brancz I started a new namespace and deployed pod-nanny, it complains it can't get pod and update deployment due to limited permission.So maybe is it due to kube-system(a special ns)?

@brancz
Copy link
Member

brancz commented Aug 3, 2017

Yes that was my suspicion. Would you care to adapt? 🙂

@WIZARD-CXY
Copy link
Author

WIZARD-CXY commented Aug 3, 2017

@brancz
Copy link
Member

brancz commented Aug 3, 2017

Yes, but also get for deployments, as currently it's just list/watch.

@WIZARD-CXY
Copy link
Author

yeah, exactly what i want to say due to this func
https://github.com/kubernetes/autoscaler/blob/master/addon-resizer/nanny/kubernetes_client.go#L69

@WIZARD-CXY
Copy link
Author

Changed according to your suggestion,PTAL @brancz.Tested ok in my cluster.

2017-08-03 5 55 58

Copy link
Member

@brancz brancz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

small nit, then lgtm

- deployments
verbs: ["list", "watch", "get", "update"]
- apiGroups: ["extensions"]
resources:
- daemonsets
- deployments
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this can be removed as it's redundant with the above rule

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry will do

@WIZARD-CXY
Copy link
Author

fixed PTAL

@WIZARD-CXY
Copy link
Author

I read the doc https://kubernetes.io/docs/admin/authorization/rbac/, and find that "Default RBAC policies grant scoped permissions to control-plane components, nodes, and controllers, but grant no permissions to service accounts outside the “kube-system” namespace (beyond discovery permissions given to all authenticated users)."
maybe service account in kube-system namespace is privileged just as our kube-state-metrics sa in kube-system.

@WIZARD-CXY
Copy link
Author

maybe @piosz @loburm can give an answer?

@brancz
Copy link
Member

brancz commented Aug 3, 2017

We actually run kube-state-metrics not in the kube-system namespace, so we don't really have that problem, or rather therefore we do have this very problem. Yes kube-system is a special namespace, and that's sort of why we chose it to be the example namespace for this component as we can always assume it's there. If you want to we can change the example manifests to deploy to the default namespace instead, maybe that makes the need for RBAC more clear, wdyt?

@WIZARD-CXY
Copy link
Author

I think it should be in kube-system, no need to change IMHO

@brancz
Copy link
Member

brancz commented Aug 3, 2017

Agreed, but it's still up to the user, so RBAC roles should be properly documented.

@piosz
Copy link
Member

piosz commented Aug 4, 2017

@kubernetes/sig-api-machinery-pr-reviews for rbac stuff

@piosz
Copy link
Member

piosz commented Aug 4, 2017

Also we use pod-nanny with Heapster already. I'm not sure where we can find rbac rules for it.

FYI @kubernetes/sig-autoscaling-misc

- deployments
verbs: ["list", "watch", "get", "update"]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Letting this update any replicaset and daemonset in the system doesn't seem appropriate. I'd expect those permissions limited to the namespace and specific replicaset/daemonset it is responsible for. Also, the update permission belongs in a separate role specific to the nanny, since someone who is not using the nanny should not need to give write permission to their metrics pod

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

replicaset and daemonset are just given list and watch permission, no update.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, sorry, misread the diff. Same comment applies to deployments re update permission just on this particular deployment in this namespace

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that case we need to separate the list/watch from the get and update as list/watch needs to be on all namespaces for the purpose of collecting the objects to produce the metrics of those. Thus get and update only need to be on that specific namespace.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that case we need to separate the list/watch from the get and update as list/watch needs to be on all namespaces for the purpose of collecting the objects to produce the metrics of those. Thus get and update only need to be on that specific namespace.

exactly, in a Role in the kube-system namespace, bound with a RoleBinding. the rule should also specify resourceNames:["kube-state-metrics"] to only allow modifying the relevant deployment.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But I think maybe it's a little too tight, from the code we can see that pod-nanny only get and update the relevant deployment we specified. I think we can trust it.

Copy link
Member

@brancz brancz Aug 4, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's less about not trusting the code/binary of the addon-resizer, than an attacker being able to modify arbitrary deployments. (aka principle of least privilege) Which as a side note makes a great example of why we should not run kube-state-metrics in the kube-system namespace.

@WIZARD-CXY
Copy link
Author

So I will submit another commit to fix the rbac issue

@WIZARD-CXY
Copy link
Author

@liggitt @brancz. I add Role and Rolebinding according to your suggestion.PTAL

Copy link
Member

@liggitt liggitt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion on patch. LGTM otherwise

resources:
- deployments
resourceNames: ["kube-state-metrics"]
verbs: ["get", "update"]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might want to allow patch as well… not sure how the nanny updates

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just using get and update

@WIZARD-CXY
Copy link
Author

@liggitt another question here: If I already have a role to give "list pod" permission, is it right that I already have "get a specific pod" permission?

@WIZARD-CXY
Copy link
Author

WIZARD-CXY commented Aug 7, 2017

@brancz I found that we actually don't need to add "get pods" permission even if we deploy kube-state-metrics all in another fresh-new namespace.Like below, I deploy it in monitor namspace


➜  kube-state-metrics git:(addautoscaler) ✗ kubectl get po -n monitor
NAME                                  READY     STATUS    RESTARTS   AGE
kube-state-metrics-2057856429-k0t4d   2/2       Running   0          1m

Tested on my gke (k8s version 1.7.2)

@brancz
Copy link
Member

brancz commented Aug 7, 2017

I would expect this to cause errors at runtime, did you have a look at the logs of the addon-resizer? The actions should be distinct, also we should trigger the case that autoscaling kicks in, to see that it actually works properly.

@WIZARD-CXY
Copy link
Author

I looked at the log of addon-resizer, it didn't complain. Below is its log.
2017-08-07 3 51 38
The addon-resizer actually changed the kube-state-metrics resource configuration from cpu 100 to 103 and mem 30 to 36Mi
2017-08-07 3 54 59

You can test my pr in your environment and see if anything change. My env is gke k8s 1.7.2

@brancz
Copy link
Member

brancz commented Aug 7, 2017

@ericchiang @liggitt is this intended, that get/list RBAC actions are both allowed when only one of them is specified?

@liggitt
Copy link
Member

liggitt commented Aug 7, 2017

@ericchiang @liggitt is this intended, that get/list RBAC actions are both allowed when only one of them is specified?

That is not the case. The clusterrole already grants list pods to the service account, so anything running with that service account would have list permission. If the resizer is doing a get of a pod, either RBAC (or another configured authorizer) is allowing an explicit get. Did you set up your gke cluster with legacy abac permissions disabled?

@WIZARD-CXY
Copy link
Author

@liggitt no extra config, I just use default setting

@WIZARD-CXY
Copy link
Author

Do you know how to get apiserver start configuration on gke? I can't find it anywhere, maybe google hide it...

@liggitt
Copy link
Member

liggitt commented Aug 7, 2017

Do you know how to get apiserver start configuration on gke? I can't find it anywhere, maybe google hide it...

I don't think it's accessible. Did you start your cluster passing --no-enable-legacy-authorization to gcloud? If not, I think you still have legacy ABAC policy in place that makes every service account a cluster admin

@WIZARD-CXY
Copy link
Author

WIZARD-CXY commented Aug 8, 2017

@LiGgit, I check the configuration, it is "Legacy authorization
Enabled", the default value is enabled as u can see below

2017-08-08 9 55 05

@WIZARD-CXY
Copy link
Author

WIZARD-CXY commented Aug 8, 2017

Finally get these authorization things straight, thanks to @brancz @liggitt. add "get pod" rule in pod-nanny role.PTAL

@brancz
Copy link
Member

brancz commented Aug 8, 2017

Happy we could figure it out, lgtm now 👍 . I'll hold off on a final lgtm from @liggitt though.

@liggitt
Copy link
Member

liggitt commented Aug 8, 2017

seems fine as a starting point. as a test, I'd recommend deploying the addon-resizer in its own pod with its own service account, and seeing if the role is sufficient.

@WIZARD-CXY
Copy link
Author

@liggitt If we deploy addon-resizer in its own pod with its own sa, we also need a cluster role for list nodes permission due to it must get the cluster node number to determine whether to scale or not. Tested on my cluster with this. Now it is using the kube-state-metrics clusterrole to work.

@liggitt
Copy link
Member

liggitt commented Aug 8, 2017

we also need a cluster role for list nodes permission due to it must get the cluster node number to determine whether to scale or not. Tested on my cluster with this. Now it is using the kube-state-metrics clusterrole to work.

that's worth noting, but given the Role for the addon-resizer is pretty specific to kube-state-metrics, granting the kube-state-metrics clusterrole to the resizer wouldn't be terrible. the important thing is to keep the deployment mutation permissions separate so that someone running without the addon-resizer wouldn't be required to give their metrics pod permission to modify the deployment. this PR does that, so I think it's fine as-is.

Copy link
Member

@brancz brancz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

small nits on the naming

@@ -30,4 +30,30 @@ spec:
limits:
memory: 50Mi
cpu: 200m

- name: pod-nanny
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as the container image is called addon-resizer, let's call the container the same

kind: Role
metadata:
namespace: kube-system
name: pod-nanny
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above, I'd like to change this to something more expressive to what it is, maybe something like: s/pod-nanny/kube-state-metrics-resizer/

roleRef:
apiGroup: rbac.authorization.k8s.io
kind: Role
name: pod-nanny
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

due to my suggestion to change the name of this, we need to adapt here as well

@WIZARD-CXY
Copy link
Author

fixed @brancz ptal

@WIZARD-CXY
Copy link
Author

tested ok on my cluster

@brancz
Copy link
Member

brancz commented Aug 8, 2017

Great thanks! I think we can move forward and get 1.0 out!

@brancz brancz merged commit d0febc4 into kubernetes:master Aug 8, 2017
@andyxning andyxning added this to the 1.0 milestone Sep 13, 2017
@andyxning andyxning mentioned this pull request Sep 13, 2017
4 tasks
while1malloc0 pushed a commit to while1malloc0/kube-state-metrics that referenced this pull request Jul 2, 2018
add autoscaler for kube-state-metrics
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants