Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[2.5] Backport Monitoring V2 and all dependencies #1316

Merged
merged 9 commits into from
Jul 1, 2021

Conversation

aiyengar2
Copy link
Contributor

@aiyengar2 aiyengar2 commented Jun 29, 2021

First commit runs a script that backports all the charts as is from dev-v2.6 to dev-v2.5. Nothing was modified for dev-v2.5 specifically, so all these changes have been approved in other PRs.

Then I ran git checkout -b bump_monitoring.

Finally, for the last two commits I took the following manual actions:

  • Deleted all the old Monitoring + Monitoring dependencies assets
  • Deleted all Monitoring + Monitoring dependencies assets index.yaml entries

I then ran make charts and committed each chart's changes and the index.yaml/assets changes independently.

I chose to keep them as separate commits to make it easier to review.

Related Issues: rancher/rancher#33351, rancher/rancher#33420, rancher/rancher#32606, rancher/rancher#33015, rancher/rancher#33014

Arvind Iyengar added 9 commits June 29, 2021 11:20
Ran the following script to move packages/ back to dev-v2.6 with corresponding changes to the package.yaml.

```bash
#!/bin/bash

if [[ $(git branch --show-current) != "dev-v2.6" ]]; then
    echo "Need to be on dev-v2.6 branch to continue";
fi

if [ -n "$(git status --porcelain)" ]; then
    echo "Git needs to be clean to run this script"
    exit 1
fi

pkgs=(rancher-grafana rancher-kube-state-metrics rancher-monitoring rancher-node-exporter rancher-prometheus-adapter rancher-pushprox rancher-windows-exporter)

mkdir -p new_packages
for pkg in ${pkgs[@]}; do
    # Remove version
    yq e -i 'del(.version)' packages/${pkg}/package.yaml;
    # Add back in packageVersion and set to 0
    yq e -i '.packageVersion = 0' packages/${pkg}/package.yaml;
    # Move to another directory that is untracked by dev-v2.5
    cp -R packages/${pkg} new_packages;
done

git checkout -- .
git checkout dev-v2.5

# Remove initial packages
for pkg in ${pkgs[@]}; do
    rm -rf packages/${pkg}
    mkdir -p packages/${pkg}
    cp -R new_packages/${pkg} packages
done

rm -rf new_packages
```
Since we needed to add a brand new chart, here is the diff.

```diff
diff -uNr charts/rancher-pushprox/rancher-pushprox/0.1.300/Chart.yaml charts/rancher-pushprox/rancher-pushprox/0.1.400/Chart.yaml
--- charts/rancher-pushprox/rancher-pushprox/0.1.300/Chart.yaml	2021-06-29 11:15:58.000000000 -0700
+++ charts/rancher-pushprox/rancher-pushprox/0.1.400/Chart.yaml	2021-06-29 11:30:12.000000000 -0700
@@ -10,4 +10,4 @@
   clients.
 name: rancher-pushprox
 type: application
-version: 0.1.300
+version: 0.1.400
diff -uNr charts/rancher-pushprox/rancher-pushprox/0.1.300/README.md charts/rancher-pushprox/rancher-pushprox/0.1.400/README.md
--- charts/rancher-pushprox/rancher-pushprox/0.1.300/README.md	2021-06-29 11:15:58.000000000 -0700
+++ charts/rancher-pushprox/rancher-pushprox/0.1.400/README.md	2021-06-29 11:30:12.000000000 -0700
@@ -24,11 +24,13 @@
 | ----- | ----------- | ------ |
 | `component` | The component that is being monitored | `kube-etcd`
 | `metricsPort` | The port on the host that contains the metrics you want to scrape (e.g. `http://<HOST_IP>:<metricsPort>/metrics`) | `2379` |
+| `namespaceOverride` | The namespace to install the chart | `""`

 #### Optional
 | Parameter | Description | Default |
 | ----- | ----------- | ------ |
 | `serviceMonitor.enabled` | Deploys a [Prometheus Operator](https://github.com/coreos/prometheus-operator/blob/master/Documentation/api.md#servicemonitor) ServiceMonitor CR that is configured to scrape metrics on the hosts that the clients are deployed on via the proxy. Also deploys a Service that points to all pods with the expected client name that exposes the `metricsPort` selected | `true` |
+| `serviceMonitor.endpoints` | A list of endpoints that will be added to the ServiceMonitor based on the [Endpoint spec](https://github.com/prometheus-operator/prometheus-operator/blob/master/Documentation/api.md#endpoint) | `[{port: metrics}]` |
 | `clients.enabled` | Deploys a DaemonSet of clients that are each capable of scraping endpoints on the hostNetwork it is deployed on | `true` |
 | `clients.port` |  The port where the client will publish PushProx client-specific metrics. If deploying multiple clients onto the same node, the clients should not have conflicting ports | `9369` |
 | `clients.proxyUrl` | Overrides the default proxyUrl setting of `http://pushprox-{{ .Values.component }}-proxy.{{ . Release.Namespace }}.svc.cluster.local:{{ .Values.proxy.port }}"` with the `proxyUrl` specified | `""` |
@@ -40,6 +42,10 @@
 | `clients.https.certFile` | The path to the TLS cert file located within `clients.https.certDir`. Required and only used if `clients.https.enabled` is set | `""` |
 | `clients.https.keyFile` | The path to the TLS key file located within `clients.https.certDir`. Required and only used if `clients.https.enabled` is set | `""` |
 | `clients.https.caCertFile` | The path to the TLS cacert file located within `clients.https.certDir`. Required and only used if `clients.https.enabled` is set | `""` |
+| `clients.rbac.additionalRules` | Additional permissions to provide to the ServiceAccount bound to the client. This can be used to provide additional permissions for the client to scrape metrics from the k8s API. Only enabled if clients.https.enabled and clients.https.useServiceAccountCredentials are true | `[]` |
+| `clients.deployment.enabled` | Deploys the client as a Deployment (generally used if the underlying hostNetwork Pod that is being scraped is managed by a Deployment) | `false` |
+| `clients.deployment.replicas` | The number of pods the Deployment has, it should match the number of pod the hostNetwork Deployment has. Required and only used if `client.deployment.enable` is set | `0` |
+| `clients.deployment.affinity` | The affinity rules that allocate the pod to the node in which the hostNetwork Deployment's pods run. Required and only used if `client.deployment.enable` is set | `{}` |
 | `clients.resources` | Set resource limits and requests for the client container | `{}` |
 | `clients.nodeSelector` | Select which nodes to deploy the clients on | `{}` |
 | `clients.tolerations` | Specify tolerations for clients | `[]` |
diff -uNr charts/rancher-pushprox/rancher-pushprox/0.1.300/templates/_helpers.tpl charts/rancher-pushprox/rancher-pushprox/0.1.400/templates/_helpers.tpl
--- charts/rancher-pushprox/rancher-pushprox/0.1.300/templates/_helpers.tpl	2021-06-29 11:15:58.000000000 -0700
+++ charts/rancher-pushprox/rancher-pushprox/0.1.400/templates/_helpers.tpl	2021-06-29 11:30:12.000000000 -0700
@@ -49,7 +49,7 @@
 {{- if .Values.clients.proxyUrl -}}
 {{ printf "%s" .Values.clients.proxyUrl }}
 {{- else -}}
-{{ printf "http://%s.%s.svc:%d" (include "pushProxy.proxy.name" .) .Release.Namespace (int .Values.proxy.port) }}
+{{ printf "http://%s.%s.svc:%d" (include "pushProxy.proxy.name" .) (include "pushprox.namespace" .) (int .Values.proxy.port) }}
 {{- end -}}{{- end -}}

 # Client
@@ -84,4 +84,21 @@
 app: {{ template "pushprox.serviceMonitor.name" . }}
 release: {{ .Release.Name | quote }}
 {{ template "pushProxy.commonLabels" . }}
+{{- end -}}
+
+{{- define "pushProxy.serviceMonitor.endpoints" -}}
+{{- $proxyURL := (include "pushProxy.proxyUrl" .) -}}
+{{- $useHTTPS := .Values.clients.https.enabled -}}
+{{- $endpoints := .Values.serviceMonitor.endpoints }}
+{{- range $endpoints }}
+{{- $_ := set . "proxyUrl" $proxyURL }}
+{{- if $useHTTPS -}}
+{{- if (hasKey . "params") }}
+{{- $_ := set (get . "params") "_scheme" (list "https") }}
+{{- else }}
+{{- $_ := set . "params" (dict "_scheme" (list "https")) }}
+{{- end }}
+{{- end }}
+{{- end }}
+{{- toYaml $endpoints }}
 {{- end -}}
\ No newline at end of file
diff -uNr charts/rancher-pushprox/rancher-pushprox/0.1.300/templates/pushprox-clients-rbac.yaml charts/rancher-pushprox/rancher-pushprox/0.1.400/templates/pushprox-clients-rbac.yaml
--- charts/rancher-pushprox/rancher-pushprox/0.1.300/templates/pushprox-clients-rbac.yaml	2021-06-29 11:15:58.000000000 -0700
+++ charts/rancher-pushprox/rancher-pushprox/0.1.400/templates/pushprox-clients-rbac.yaml	2021-06-29 11:30:12.000000000 -0700
@@ -13,6 +13,9 @@
 {{- if and .Values.clients.https.enabled .Values.clients.https.useServiceAccountCredentials }}
 - nonResourceURLs: ["/metrics"]
   verbs: ["get"]
+{{- if .Values.clients.rbac.additionalRules }}
+{{ toYaml .Values.clients.rbac.additionalRules }}
+{{- end }}
 {{- end }}
 ---
 apiVersion: rbac.authorization.k8s.io/v1
@@ -27,20 +30,20 @@
 subjects:
   - kind: ServiceAccount
     name: {{ template "pushProxy.client.name" . }}
-    namespace: {{ .Release.Namespace }}
+    namespace: {{ include "pushprox.namespace" . }}
 ---
 apiVersion: v1
 kind: ServiceAccount
 metadata:
   name: {{ template "pushProxy.client.name" . }}
-  namespace: {{ .Release.Namespace }}
+  namespace: {{ include "pushprox.namespace" . }}
   labels: {{ include "pushProxy.client.labels" . | nindent 4 }}
 ---
 apiVersion: policy/v1beta1
 kind: PodSecurityPolicy
 metadata:
   name: {{ template "pushProxy.client.name" . }}
-  namespace: {{ .Release.Namespace }}
+  namespace: {{ include "pushprox.namespace" . }}
   labels: {{ include "pushProxy.client.labels" . | nindent 4 }}
 spec:
   privileged: false
diff -uNr charts/rancher-pushprox/rancher-pushprox/0.1.300/templates/pushprox-clients.yaml charts/rancher-pushprox/rancher-pushprox/0.1.400/templates/pushprox-clients.yaml
--- charts/rancher-pushprox/rancher-pushprox/0.1.300/templates/pushprox-clients.yaml	2021-06-29 11:15:58.000000000 -0700
+++ charts/rancher-pushprox/rancher-pushprox/0.1.400/templates/pushprox-clients.yaml	2021-06-29 11:30:12.000000000 -0700
@@ -1,18 +1,28 @@
 {{- if .Values.clients }}{{- if .Values.clients.enabled }}
 apiVersion: apps/v1
+{{- if .Values.clients.deployment.enabled }}
+kind: Deployment
+{{- else }}
 kind: DaemonSet
+{{- end }}
 metadata:
   name: {{ template "pushProxy.client.name" . }}
   namespace: {{ template "pushprox.namespace" . }}
   labels: {{ include "pushProxy.client.labels" . | nindent 4 }}
     pushprox-exporter: "client"
 spec:
+  {{- if .Values.clients.deployment.enabled }}
+  replicas: {{ .Values.clients.deployment.replicas }}
+  {{- end }}
   selector:
     matchLabels: {{ include "pushProxy.client.labels" . | nindent 6 }}
   template:
     metadata:
       labels: {{ include "pushProxy.client.labels" . | nindent 8 }}
     spec:
+      {{- if .Values.clients.affinity }}
+      affinity: {{ toYaml .Values.clients.affinity | nindent 8 }}
+      {{- end }}
       nodeSelector: {{ include "linux-node-selector" . | nindent 8 }}
 {{- if .Values.clients.nodeSelector }}
 {{ toYaml .Values.clients.nodeSelector | indent 8 }}
diff -uNr charts/rancher-pushprox/rancher-pushprox/0.1.300/templates/pushprox-proxy-rbac.yaml charts/rancher-pushprox/rancher-pushprox/0.1.400/templates/pushprox-proxy-rbac.yaml
--- charts/rancher-pushprox/rancher-pushprox/0.1.300/templates/pushprox-proxy-rbac.yaml	2021-06-29 11:15:58.000000000 -0700
+++ charts/rancher-pushprox/rancher-pushprox/0.1.400/templates/pushprox-proxy-rbac.yaml	2021-06-29 11:30:12.000000000 -0700
@@ -23,20 +23,20 @@
 subjects:
   - kind: ServiceAccount
     name: {{ template "pushProxy.proxy.name" . }}
-    namespace: {{ .Release.Namespace }}
+    namespace: {{ include "pushprox.namespace" . }}
 ---
 apiVersion: v1
 kind: ServiceAccount
 metadata:
   name: {{ template "pushProxy.proxy.name" . }}
-  namespace: {{ .Release.Namespace }}
+  namespace: {{ include "pushprox.namespace" . }}
   labels: {{ include "pushProxy.proxy.labels" . | nindent 4 }}
 ---
 apiVersion: policy/v1beta1
 kind: PodSecurityPolicy
 metadata:
   name: {{ template "pushProxy.proxy.name" . }}
-  namespace: {{ .Release.Namespace }}
+  namespace: {{ include "pushprox.namespace" . }}
   labels: {{ include "pushProxy.proxy.labels" . | nindent 4 }}
 spec:
   privileged: false
diff -uNr charts/rancher-pushprox/rancher-pushprox/0.1.300/templates/pushprox-servicemonitor.yaml charts/rancher-pushprox/rancher-pushprox/0.1.400/templates/pushprox-servicemonitor.yaml
--- charts/rancher-pushprox/rancher-pushprox/0.1.300/templates/pushprox-servicemonitor.yaml	2021-06-29 11:15:58.000000000 -0700
+++ charts/rancher-pushprox/rancher-pushprox/0.1.400/templates/pushprox-servicemonitor.yaml	2021-06-29 11:30:12.000000000 -0700
@@ -6,13 +6,7 @@
   namespace: {{ template "pushprox.namespace" . }}
   labels: {{ include "pushProxy.serviceMonitor.labels" . | nindent 4 }}
 spec:
-  endpoints:
-  - port: metrics
-    proxyUrl: {{ template "pushProxy.proxyUrl" . }}
-    {{- if .Values.clients.https.enabled }}
-    params:
-      _scheme: [https]
-    {{- end }}
+  endpoints: {{include "pushProxy.serviceMonitor.endpoints" . | nindent 4 }}
   jobLabel: component
   podTargetLabels:
   - component
diff -uNr charts/rancher-pushprox/rancher-pushprox/0.1.300/values.yaml charts/rancher-pushprox/rancher-pushprox/0.1.400/values.yaml
--- charts/rancher-pushprox/rancher-pushprox/0.1.300/values.yaml	2021-06-29 11:15:58.000000000 -0700
+++ charts/rancher-pushprox/rancher-pushprox/0.1.400/values.yaml	2021-06-29 11:30:12.000000000 -0700
@@ -16,6 +16,8 @@
   cattle:
     systemDefaultRegistry: ""

+namespaceOverride: ""
+
 # The component that is being monitored (i.e. etcd)
 component: "component"

@@ -23,8 +25,13 @@
 metricsPort: 2739

 # Configure ServiceMonitor that monitors metrics from the metricsPort endpoint
-serviceMonitor:
+serviceMonitor:
   enabled: true
+  # A list of endpoints that will be added to the ServiceMonitor based on the Endpoint spec
+  # Source: https://github.com/prometheus-operator/prometheus-operator/blob/master/Documentation/api.md#endpoint
+  # By default, proxyUrl and params._scheme will be overridden based on other values
+  endpoints:
+  - port: metrics

 clients:
   enabled: true
@@ -52,22 +59,40 @@
     keyFile: ""
     caCertFile: ""

+  rbac:
+    # Additional permissions to provide to the ServiceAccount bound to the client
+    # This can be used to provide additional permissions for the client to scrape metrics from the k8s API
+    # Only enabled if clients.https.enabled and clients.https.useServiceAccountCredentials are true
+    additionalRules: []
+
   # Resource limits
   resources: {}

   # Options to select all nodes to deploy client DaemonSet on
   nodeSelector: {}
   tolerations: []
+  affinity: {}

   image:
     repository: rancher/pushprox-client
-    tag: v0.1.0-rancher1-client
+    tag: v0.1.0-rancher2-client
   command: ["pushprox-client"]

   copyCertsImage:
     repository: rancher/mirrored-library-busybox
     tag: 1.31.1

+  # The default intention of rancher-pushprox clients is to scrape hostNetwork metrics across all nodes.
+  # This can be used to scrape internal Kubernetes components or DaemonSets of hostNetwork Pods in
+  # situations where a cloud provider firewall prevents Pod-To-Host communication but not Pod-To-Pod.
+  # However, if the underlying hostNetwork Pod that is being scraped is managed by a Deployment,
+  # this advanced option enables users to deploy the client as a Deployment instead of a DaemonSet.
+  # If a user deploys this feature and the underlying Deployment's number of replicas changes, the user will
+  # be responsible for upgrading this chart accordingly to the right number of replicas.
+  deployment:
+    enabled: false
+    replicas: 0
+
 proxy:
   enabled: true
   # The port through which PushProx clients will communicate to the proxy
@@ -82,5 +107,5 @@

   image:
     repository: rancher/pushprox-proxy
-    tag: v0.1.0-rancher1-proxy
+    tag: v0.1.0-rancher2-proxy
   command: ["pushprox-proxy"]
\ No newline at end of file
```
@@ -1540,8 +1540,8 @@ entries:
- https://github.com/grafana/grafana
type: application
urls:
- assets/rancher-grafana/rancher-grafana-6.6.402+up6.6.4.tgz
version: 6.6.402+up6.6.4
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

urls:
- assets/rancher-kube-state-metrics/rancher-kube-state-metrics-2.13.101+up2.13.1.tgz
version: 2.13.101+up2.13.1
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dependencies:
- condition: grafana.enabled
name: grafana
repository: file://./charts/grafana
- condition: hardenedKubelet.enabled
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These assets were added as part of rancher/rancher#33420

Comment on lines +2517 to +2519
- condition: rke2IngressNginx.enabled
name: rke2IngressNginx
repository: file://./charts/rke2IngressNginx
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These assets were added as part of rancher/rancher#32606

Comment on lines +2532 to +2534
- condition: rkeIngressNginx.enabled
name: rkeIngressNginx
repository: file://./charts/rkeIngressNginx
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These assets were added as part of rancher/rancher#32606

Comment on lines +96 to +97
runAsNonRoot: false
runAsUser: 0
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

securityContext and PSP changes are associated with #1314

@@ -1415,7 +1415,7 @@
"multi": false,
"name": "ingress",
"options": [],
"query": "label_values(nginx_ingress_controller_requests{namespace=~\"$namespace\",controller_class=~\"$controller_class\",controller=~\"$controller\"}, ingress) ",
"query": "label_values(nginx_ingress_controller_requests{namespace=~\"$namespace\",controller_class=~\"$controller_class\",controller_pod=~\"$controller\"}, ingress) ",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Came from rebasing with the latest upstream dashboards

@@ -481,7 +481,7 @@
"steppedLine": false,
"targets": [
{
"expr": "sum by (path) (rate(nginx_ingress_controller_request_duration_seconds_count{\n ingress = \"$ingress\",\n status =~ \"[4-5].*\"\n}[1m])) / sum by (path) (rate(nginx_ingress_controller_request_duration_seconds_count{\n ingress = \"$ingress\",\n}[1m]))",
"expr": "sum by (path) (rate(nginx_ingress_controller_request_duration_seconds_count{\n ingress =~ \"$ingress\",\n status =~ \"[4-5].*\"\n}[1m])) / sum by (path) (rate(nginx_ingress_controller_request_duration_seconds_count{\n ingress =~ \"$ingress\",\n}[1m]))",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Came from rebasing with the latest upstream dashboards

@@ -565,25 +565,25 @@
"steppedLine": false,
"targets": [
{
"expr": "sum(rate(node_network_receive_errs_total{device!~\"lo|veth.*|docker.*|flannel.*|cali.*|cbr.*\"}[$__rate_interval])) by (instance) OR sum(rate(windows_net_packets_received_errors{nic!~'.*isatap.*|.*VPN.*|.*Pseudo.*|.*tunneling.*'}[$__rate_interval])) by (instance)",
"expr": "sum(rate(node_network_receive_errs_total{device!~\"lo|veth.*|docker.*|flannel.*|cali.*|cbr.*\"}[$__rate_interval])) by (instance) OR sum(rate(windows_net_packets_received_errors_total{nic!~'.*isatap.*|.*VPN.*|.*Pseudo.*|.*tunneling.*'}[$__rate_interval])) by (instance)",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All changes that came from breaking changes in windows_exporter. See 00db29b

@@ -470,7 +470,7 @@
"tableColumn": "",
"targets": [
{
"expr": "sum(kube_node_status_allocatable_cpu_cores{})",
"expr": "sum(kube_node_status_allocatable_cpu_cores{}) OR sum(kube_node_status_allocatable{resource=\"cpu\",unit=\"core\"})",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Came from breaking changes to kube-state-metrics when we migrated to a new major line (2.x.x)

@aiyengar2 aiyengar2 changed the title Backport Monitoring V2 and all dependencies [2.5] Backport Monitoring V2 and all dependencies Jun 29, 2021
@aiyengar2 aiyengar2 marked this pull request as ready for review June 29, 2021 19:08
Copy link
Member

@jiaqiluo jiaqiluo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM !

@aiyengar2 It is super clear to follow changes with those separated commits and explanations. Thank you for doing it 😄

Copy link
Contributor

@brendarearden brendarearden left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm! I also agree with @jiaqiluo - the separate commits really helped!

@aiyengar2 aiyengar2 merged commit 0653adc into rancher:dev-v2.5 Jul 1, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants