-
Notifications
You must be signed in to change notification settings - Fork 717
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[2.5] Backport Monitoring V2 and all dependencies #1316
Conversation
Ran the following script to move packages/ back to dev-v2.6 with corresponding changes to the package.yaml. ```bash #!/bin/bash if [[ $(git branch --show-current) != "dev-v2.6" ]]; then echo "Need to be on dev-v2.6 branch to continue"; fi if [ -n "$(git status --porcelain)" ]; then echo "Git needs to be clean to run this script" exit 1 fi pkgs=(rancher-grafana rancher-kube-state-metrics rancher-monitoring rancher-node-exporter rancher-prometheus-adapter rancher-pushprox rancher-windows-exporter) mkdir -p new_packages for pkg in ${pkgs[@]}; do # Remove version yq e -i 'del(.version)' packages/${pkg}/package.yaml; # Add back in packageVersion and set to 0 yq e -i '.packageVersion = 0' packages/${pkg}/package.yaml; # Move to another directory that is untracked by dev-v2.5 cp -R packages/${pkg} new_packages; done git checkout -- . git checkout dev-v2.5 # Remove initial packages for pkg in ${pkgs[@]}; do rm -rf packages/${pkg} mkdir -p packages/${pkg} cp -R new_packages/${pkg} packages done rm -rf new_packages ```
Since we needed to add a brand new chart, here is the diff. ```diff diff -uNr charts/rancher-pushprox/rancher-pushprox/0.1.300/Chart.yaml charts/rancher-pushprox/rancher-pushprox/0.1.400/Chart.yaml --- charts/rancher-pushprox/rancher-pushprox/0.1.300/Chart.yaml 2021-06-29 11:15:58.000000000 -0700 +++ charts/rancher-pushprox/rancher-pushprox/0.1.400/Chart.yaml 2021-06-29 11:30:12.000000000 -0700 @@ -10,4 +10,4 @@ clients. name: rancher-pushprox type: application -version: 0.1.300 +version: 0.1.400 diff -uNr charts/rancher-pushprox/rancher-pushprox/0.1.300/README.md charts/rancher-pushprox/rancher-pushprox/0.1.400/README.md --- charts/rancher-pushprox/rancher-pushprox/0.1.300/README.md 2021-06-29 11:15:58.000000000 -0700 +++ charts/rancher-pushprox/rancher-pushprox/0.1.400/README.md 2021-06-29 11:30:12.000000000 -0700 @@ -24,11 +24,13 @@ | ----- | ----------- | ------ | | `component` | The component that is being monitored | `kube-etcd` | `metricsPort` | The port on the host that contains the metrics you want to scrape (e.g. `http://<HOST_IP>:<metricsPort>/metrics`) | `2379` | +| `namespaceOverride` | The namespace to install the chart | `""` #### Optional | Parameter | Description | Default | | ----- | ----------- | ------ | | `serviceMonitor.enabled` | Deploys a [Prometheus Operator](https://github.com/coreos/prometheus-operator/blob/master/Documentation/api.md#servicemonitor) ServiceMonitor CR that is configured to scrape metrics on the hosts that the clients are deployed on via the proxy. Also deploys a Service that points to all pods with the expected client name that exposes the `metricsPort` selected | `true` | +| `serviceMonitor.endpoints` | A list of endpoints that will be added to the ServiceMonitor based on the [Endpoint spec](https://github.com/prometheus-operator/prometheus-operator/blob/master/Documentation/api.md#endpoint) | `[{port: metrics}]` | | `clients.enabled` | Deploys a DaemonSet of clients that are each capable of scraping endpoints on the hostNetwork it is deployed on | `true` | | `clients.port` | The port where the client will publish PushProx client-specific metrics. If deploying multiple clients onto the same node, the clients should not have conflicting ports | `9369` | | `clients.proxyUrl` | Overrides the default proxyUrl setting of `http://pushprox-{{ .Values.component }}-proxy.{{ . Release.Namespace }}.svc.cluster.local:{{ .Values.proxy.port }}"` with the `proxyUrl` specified | `""` | @@ -40,6 +42,10 @@ | `clients.https.certFile` | The path to the TLS cert file located within `clients.https.certDir`. Required and only used if `clients.https.enabled` is set | `""` | | `clients.https.keyFile` | The path to the TLS key file located within `clients.https.certDir`. Required and only used if `clients.https.enabled` is set | `""` | | `clients.https.caCertFile` | The path to the TLS cacert file located within `clients.https.certDir`. Required and only used if `clients.https.enabled` is set | `""` | +| `clients.rbac.additionalRules` | Additional permissions to provide to the ServiceAccount bound to the client. This can be used to provide additional permissions for the client to scrape metrics from the k8s API. Only enabled if clients.https.enabled and clients.https.useServiceAccountCredentials are true | `[]` | +| `clients.deployment.enabled` | Deploys the client as a Deployment (generally used if the underlying hostNetwork Pod that is being scraped is managed by a Deployment) | `false` | +| `clients.deployment.replicas` | The number of pods the Deployment has, it should match the number of pod the hostNetwork Deployment has. Required and only used if `client.deployment.enable` is set | `0` | +| `clients.deployment.affinity` | The affinity rules that allocate the pod to the node in which the hostNetwork Deployment's pods run. Required and only used if `client.deployment.enable` is set | `{}` | | `clients.resources` | Set resource limits and requests for the client container | `{}` | | `clients.nodeSelector` | Select which nodes to deploy the clients on | `{}` | | `clients.tolerations` | Specify tolerations for clients | `[]` | diff -uNr charts/rancher-pushprox/rancher-pushprox/0.1.300/templates/_helpers.tpl charts/rancher-pushprox/rancher-pushprox/0.1.400/templates/_helpers.tpl --- charts/rancher-pushprox/rancher-pushprox/0.1.300/templates/_helpers.tpl 2021-06-29 11:15:58.000000000 -0700 +++ charts/rancher-pushprox/rancher-pushprox/0.1.400/templates/_helpers.tpl 2021-06-29 11:30:12.000000000 -0700 @@ -49,7 +49,7 @@ {{- if .Values.clients.proxyUrl -}} {{ printf "%s" .Values.clients.proxyUrl }} {{- else -}} -{{ printf "http://%s.%s.svc:%d" (include "pushProxy.proxy.name" .) .Release.Namespace (int .Values.proxy.port) }} +{{ printf "http://%s.%s.svc:%d" (include "pushProxy.proxy.name" .) (include "pushprox.namespace" .) (int .Values.proxy.port) }} {{- end -}}{{- end -}} # Client @@ -84,4 +84,21 @@ app: {{ template "pushprox.serviceMonitor.name" . }} release: {{ .Release.Name | quote }} {{ template "pushProxy.commonLabels" . }} +{{- end -}} + +{{- define "pushProxy.serviceMonitor.endpoints" -}} +{{- $proxyURL := (include "pushProxy.proxyUrl" .) -}} +{{- $useHTTPS := .Values.clients.https.enabled -}} +{{- $endpoints := .Values.serviceMonitor.endpoints }} +{{- range $endpoints }} +{{- $_ := set . "proxyUrl" $proxyURL }} +{{- if $useHTTPS -}} +{{- if (hasKey . "params") }} +{{- $_ := set (get . "params") "_scheme" (list "https") }} +{{- else }} +{{- $_ := set . "params" (dict "_scheme" (list "https")) }} +{{- end }} +{{- end }} +{{- end }} +{{- toYaml $endpoints }} {{- end -}} \ No newline at end of file diff -uNr charts/rancher-pushprox/rancher-pushprox/0.1.300/templates/pushprox-clients-rbac.yaml charts/rancher-pushprox/rancher-pushprox/0.1.400/templates/pushprox-clients-rbac.yaml --- charts/rancher-pushprox/rancher-pushprox/0.1.300/templates/pushprox-clients-rbac.yaml 2021-06-29 11:15:58.000000000 -0700 +++ charts/rancher-pushprox/rancher-pushprox/0.1.400/templates/pushprox-clients-rbac.yaml 2021-06-29 11:30:12.000000000 -0700 @@ -13,6 +13,9 @@ {{- if and .Values.clients.https.enabled .Values.clients.https.useServiceAccountCredentials }} - nonResourceURLs: ["/metrics"] verbs: ["get"] +{{- if .Values.clients.rbac.additionalRules }} +{{ toYaml .Values.clients.rbac.additionalRules }} +{{- end }} {{- end }} --- apiVersion: rbac.authorization.k8s.io/v1 @@ -27,20 +30,20 @@ subjects: - kind: ServiceAccount name: {{ template "pushProxy.client.name" . }} - namespace: {{ .Release.Namespace }} + namespace: {{ include "pushprox.namespace" . }} --- apiVersion: v1 kind: ServiceAccount metadata: name: {{ template "pushProxy.client.name" . }} - namespace: {{ .Release.Namespace }} + namespace: {{ include "pushprox.namespace" . }} labels: {{ include "pushProxy.client.labels" . | nindent 4 }} --- apiVersion: policy/v1beta1 kind: PodSecurityPolicy metadata: name: {{ template "pushProxy.client.name" . }} - namespace: {{ .Release.Namespace }} + namespace: {{ include "pushprox.namespace" . }} labels: {{ include "pushProxy.client.labels" . | nindent 4 }} spec: privileged: false diff -uNr charts/rancher-pushprox/rancher-pushprox/0.1.300/templates/pushprox-clients.yaml charts/rancher-pushprox/rancher-pushprox/0.1.400/templates/pushprox-clients.yaml --- charts/rancher-pushprox/rancher-pushprox/0.1.300/templates/pushprox-clients.yaml 2021-06-29 11:15:58.000000000 -0700 +++ charts/rancher-pushprox/rancher-pushprox/0.1.400/templates/pushprox-clients.yaml 2021-06-29 11:30:12.000000000 -0700 @@ -1,18 +1,28 @@ {{- if .Values.clients }}{{- if .Values.clients.enabled }} apiVersion: apps/v1 +{{- if .Values.clients.deployment.enabled }} +kind: Deployment +{{- else }} kind: DaemonSet +{{- end }} metadata: name: {{ template "pushProxy.client.name" . }} namespace: {{ template "pushprox.namespace" . }} labels: {{ include "pushProxy.client.labels" . | nindent 4 }} pushprox-exporter: "client" spec: + {{- if .Values.clients.deployment.enabled }} + replicas: {{ .Values.clients.deployment.replicas }} + {{- end }} selector: matchLabels: {{ include "pushProxy.client.labels" . | nindent 6 }} template: metadata: labels: {{ include "pushProxy.client.labels" . | nindent 8 }} spec: + {{- if .Values.clients.affinity }} + affinity: {{ toYaml .Values.clients.affinity | nindent 8 }} + {{- end }} nodeSelector: {{ include "linux-node-selector" . | nindent 8 }} {{- if .Values.clients.nodeSelector }} {{ toYaml .Values.clients.nodeSelector | indent 8 }} diff -uNr charts/rancher-pushprox/rancher-pushprox/0.1.300/templates/pushprox-proxy-rbac.yaml charts/rancher-pushprox/rancher-pushprox/0.1.400/templates/pushprox-proxy-rbac.yaml --- charts/rancher-pushprox/rancher-pushprox/0.1.300/templates/pushprox-proxy-rbac.yaml 2021-06-29 11:15:58.000000000 -0700 +++ charts/rancher-pushprox/rancher-pushprox/0.1.400/templates/pushprox-proxy-rbac.yaml 2021-06-29 11:30:12.000000000 -0700 @@ -23,20 +23,20 @@ subjects: - kind: ServiceAccount name: {{ template "pushProxy.proxy.name" . }} - namespace: {{ .Release.Namespace }} + namespace: {{ include "pushprox.namespace" . }} --- apiVersion: v1 kind: ServiceAccount metadata: name: {{ template "pushProxy.proxy.name" . }} - namespace: {{ .Release.Namespace }} + namespace: {{ include "pushprox.namespace" . }} labels: {{ include "pushProxy.proxy.labels" . | nindent 4 }} --- apiVersion: policy/v1beta1 kind: PodSecurityPolicy metadata: name: {{ template "pushProxy.proxy.name" . }} - namespace: {{ .Release.Namespace }} + namespace: {{ include "pushprox.namespace" . }} labels: {{ include "pushProxy.proxy.labels" . | nindent 4 }} spec: privileged: false diff -uNr charts/rancher-pushprox/rancher-pushprox/0.1.300/templates/pushprox-servicemonitor.yaml charts/rancher-pushprox/rancher-pushprox/0.1.400/templates/pushprox-servicemonitor.yaml --- charts/rancher-pushprox/rancher-pushprox/0.1.300/templates/pushprox-servicemonitor.yaml 2021-06-29 11:15:58.000000000 -0700 +++ charts/rancher-pushprox/rancher-pushprox/0.1.400/templates/pushprox-servicemonitor.yaml 2021-06-29 11:30:12.000000000 -0700 @@ -6,13 +6,7 @@ namespace: {{ template "pushprox.namespace" . }} labels: {{ include "pushProxy.serviceMonitor.labels" . | nindent 4 }} spec: - endpoints: - - port: metrics - proxyUrl: {{ template "pushProxy.proxyUrl" . }} - {{- if .Values.clients.https.enabled }} - params: - _scheme: [https] - {{- end }} + endpoints: {{include "pushProxy.serviceMonitor.endpoints" . | nindent 4 }} jobLabel: component podTargetLabels: - component diff -uNr charts/rancher-pushprox/rancher-pushprox/0.1.300/values.yaml charts/rancher-pushprox/rancher-pushprox/0.1.400/values.yaml --- charts/rancher-pushprox/rancher-pushprox/0.1.300/values.yaml 2021-06-29 11:15:58.000000000 -0700 +++ charts/rancher-pushprox/rancher-pushprox/0.1.400/values.yaml 2021-06-29 11:30:12.000000000 -0700 @@ -16,6 +16,8 @@ cattle: systemDefaultRegistry: "" +namespaceOverride: "" + # The component that is being monitored (i.e. etcd) component: "component" @@ -23,8 +25,13 @@ metricsPort: 2739 # Configure ServiceMonitor that monitors metrics from the metricsPort endpoint -serviceMonitor: +serviceMonitor: enabled: true + # A list of endpoints that will be added to the ServiceMonitor based on the Endpoint spec + # Source: https://github.com/prometheus-operator/prometheus-operator/blob/master/Documentation/api.md#endpoint + # By default, proxyUrl and params._scheme will be overridden based on other values + endpoints: + - port: metrics clients: enabled: true @@ -52,22 +59,40 @@ keyFile: "" caCertFile: "" + rbac: + # Additional permissions to provide to the ServiceAccount bound to the client + # This can be used to provide additional permissions for the client to scrape metrics from the k8s API + # Only enabled if clients.https.enabled and clients.https.useServiceAccountCredentials are true + additionalRules: [] + # Resource limits resources: {} # Options to select all nodes to deploy client DaemonSet on nodeSelector: {} tolerations: [] + affinity: {} image: repository: rancher/pushprox-client - tag: v0.1.0-rancher1-client + tag: v0.1.0-rancher2-client command: ["pushprox-client"] copyCertsImage: repository: rancher/mirrored-library-busybox tag: 1.31.1 + # The default intention of rancher-pushprox clients is to scrape hostNetwork metrics across all nodes. + # This can be used to scrape internal Kubernetes components or DaemonSets of hostNetwork Pods in + # situations where a cloud provider firewall prevents Pod-To-Host communication but not Pod-To-Pod. + # However, if the underlying hostNetwork Pod that is being scraped is managed by a Deployment, + # this advanced option enables users to deploy the client as a Deployment instead of a DaemonSet. + # If a user deploys this feature and the underlying Deployment's number of replicas changes, the user will + # be responsible for upgrading this chart accordingly to the right number of replicas. + deployment: + enabled: false + replicas: 0 + proxy: enabled: true # The port through which PushProx clients will communicate to the proxy @@ -82,5 +107,5 @@ image: repository: rancher/pushprox-proxy - tag: v0.1.0-rancher1-proxy + tag: v0.1.0-rancher2-proxy command: ["pushprox-proxy"] \ No newline at end of file ```
0965879
to
d6c9bf3
Compare
@@ -1540,8 +1540,8 @@ entries: | |||
- https://github.com/grafana/grafana | |||
type: application | |||
urls: | |||
- assets/rancher-grafana/rancher-grafana-6.6.402+up6.6.4.tgz | |||
version: 6.6.402+up6.6.4 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
urls: | ||
- assets/rancher-kube-state-metrics/rancher-kube-state-metrics-2.13.101+up2.13.1.tgz | ||
version: 2.13.101+up2.13.1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dependencies: | ||
- condition: grafana.enabled | ||
name: grafana | ||
repository: file://./charts/grafana | ||
- condition: hardenedKubelet.enabled |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These assets were added as part of rancher/rancher#33420
- condition: rke2IngressNginx.enabled | ||
name: rke2IngressNginx | ||
repository: file://./charts/rke2IngressNginx |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These assets were added as part of rancher/rancher#32606
- condition: rkeIngressNginx.enabled | ||
name: rkeIngressNginx | ||
repository: file://./charts/rkeIngressNginx |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These assets were added as part of rancher/rancher#32606
runAsNonRoot: false | ||
runAsUser: 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
securityContext and PSP changes are associated with #1314
@@ -1415,7 +1415,7 @@ | |||
"multi": false, | |||
"name": "ingress", | |||
"options": [], | |||
"query": "label_values(nginx_ingress_controller_requests{namespace=~\"$namespace\",controller_class=~\"$controller_class\",controller=~\"$controller\"}, ingress) ", | |||
"query": "label_values(nginx_ingress_controller_requests{namespace=~\"$namespace\",controller_class=~\"$controller_class\",controller_pod=~\"$controller\"}, ingress) ", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Came from rebasing with the latest upstream dashboards
@@ -481,7 +481,7 @@ | |||
"steppedLine": false, | |||
"targets": [ | |||
{ | |||
"expr": "sum by (path) (rate(nginx_ingress_controller_request_duration_seconds_count{\n ingress = \"$ingress\",\n status =~ \"[4-5].*\"\n}[1m])) / sum by (path) (rate(nginx_ingress_controller_request_duration_seconds_count{\n ingress = \"$ingress\",\n}[1m]))", | |||
"expr": "sum by (path) (rate(nginx_ingress_controller_request_duration_seconds_count{\n ingress =~ \"$ingress\",\n status =~ \"[4-5].*\"\n}[1m])) / sum by (path) (rate(nginx_ingress_controller_request_duration_seconds_count{\n ingress =~ \"$ingress\",\n}[1m]))", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Came from rebasing with the latest upstream dashboards
@@ -565,25 +565,25 @@ | |||
"steppedLine": false, | |||
"targets": [ | |||
{ | |||
"expr": "sum(rate(node_network_receive_errs_total{device!~\"lo|veth.*|docker.*|flannel.*|cali.*|cbr.*\"}[$__rate_interval])) by (instance) OR sum(rate(windows_net_packets_received_errors{nic!~'.*isatap.*|.*VPN.*|.*Pseudo.*|.*tunneling.*'}[$__rate_interval])) by (instance)", | |||
"expr": "sum(rate(node_network_receive_errs_total{device!~\"lo|veth.*|docker.*|flannel.*|cali.*|cbr.*\"}[$__rate_interval])) by (instance) OR sum(rate(windows_net_packets_received_errors_total{nic!~'.*isatap.*|.*VPN.*|.*Pseudo.*|.*tunneling.*'}[$__rate_interval])) by (instance)", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All changes that came from breaking changes in windows_exporter. See 00db29b
@@ -470,7 +470,7 @@ | |||
"tableColumn": "", | |||
"targets": [ | |||
{ | |||
"expr": "sum(kube_node_status_allocatable_cpu_cores{})", | |||
"expr": "sum(kube_node_status_allocatable_cpu_cores{}) OR sum(kube_node_status_allocatable{resource=\"cpu\",unit=\"core\"})", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Came from breaking changes to kube-state-metrics when we migrated to a new major line (2.x.x)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM !
@aiyengar2 It is super clear to follow changes with those separated commits and explanations. Thank you for doing it 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm! I also agree with @jiaqiluo - the separate commits really helped!
First commit runs a script that backports all the charts as is from dev-v2.6 to dev-v2.5. Nothing was modified for dev-v2.5 specifically, so all these changes have been approved in other PRs.
Then I ran
git checkout -b bump_monitoring
.Finally, for the last two commits I took the following manual actions:
I then ran make charts and committed each chart's changes and the index.yaml/assets changes independently.
I chose to keep them as separate commits to make it easier to review.
Related Issues: rancher/rancher#33351, rancher/rancher#33420, rancher/rancher#32606, rancher/rancher#33015, rancher/rancher#33014