Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

new nodeSelectors are not working when label value contains "true" #700

Closed
khassel opened this issue Feb 2, 2022 · 9 comments
Closed

new nodeSelectors are not working when label value contains "true" #700

khassel opened this issue Feb 2, 2022 · 9 comments

Comments

@khassel
Copy link

khassel commented Feb 2, 2022

Trying to use the new nodeSelectors with helm chart setup providing the following values per yaml file:

tridentControllerPluginNodeSelector:
  node-role.kubernetes.io/worker: "true"

tridentNodePluginNodeSelector:
  node-role.kubernetes.io/worker: "true"

With this setup the corresponding pods are not created, in the operator logs you find messages

time="2022-02-02T14:15:59Z" level=error msg="Object creation failed." 
err="DaemonSet in version \"v1\" cannot be handled as a DaemonSet: v1.DaemonSet.Spec: v1.DaemonSetSpec.Template: v1.PodTemplateSpec.Spec: v1.PodSpec.NodeSelector: ReadString: 
expects \" or n, but found t, error found in #10 byte of ...|/worker\":true},\"serv|..., 
bigger context ...|.io/os\":\"linux\",\"node-role.kubernetes.io/worker\":true},\"serviceAccount\":\"trident-csi\",\"tolerations\":|..." yamlDocument="---\napiVersion: apps/v1\nkind: DaemonSet\nmetadata:\n  name: trident-csi\n  labels:\n    app: node.csi.trident.netapp.io\n    
k8s_version: v1.21.5\n    
trident_version: v22.01.0\n

Using other Labels works, e.g.

tridentControllerPluginNodeSelector:
  node-role.kubernetes.io/mytype: "worker"

tridentNodePluginNodeSelector:
  node-role.kubernetes.io/mytype: "worker"

So I think the values string "true" is not handled correctly when creating the k8s resources within the operator, looks like "true" becomes to true (missing "), this would explain the error messages.

Only workaround is to use labels without a value of "true" but this needs extra configuration because you cannot use the default rancher labels as node-role.kubernetes.io/worker: "true" or node-role.kubernetes.io/etcd: "true".

Environment
Provide accurate information about the environment to help us reproduce the issue.

  • Trident version: v22.01.0
  • Trident installation flags used: helm setup with above custom yaml
  • Container runtime: Docker 20.10.12
  • Kubernetes version: 1.21.5
  • Kubernetes orchestrator: Rancher v2.6.2
  • Kubernetes enabled feature gates: -
  • OS: Centos 7
  • NetApp backend types: ONTAP SAN
@khassel khassel added the bug label Feb 2, 2022
@ameade
Copy link
Contributor

ameade commented Feb 2, 2022

Hey, Thanks for submitting this issue.

I've taken a look and it makes sense why this is happening. The yaml parser is interpreting "true" as a boolean and not a string. There is logic that can be added in the trident operator to avoid this case.

A workaround would be to add additional quotes around booleans, such as "'true'" and "'false'".
Example:
tridentNodePluginNodeSelector:
node-role.kubernetes.io/worker: "'true'"

@khassel
Copy link
Author

khassel commented Feb 2, 2022

Thank you for the quick feedback, tested the workaround and this works.

@ameade
Copy link
Contributor

ameade commented Feb 2, 2022

I should note, this also seems to occur if the value is an integer string, Ex: "10"

@khassel
Copy link
Author

khassel commented Feb 2, 2022

next note: The workaroud is not needed (and not working) for nodeSelector (which is clear because this is the normal helm setup for the operator), so a working yaml is e.g.

nodeSelector:
  node-role.kubernetes.io/etcd: "true"
  node-role.kubernetes.io/conrolplane: "true"

tridentControllerPluginNodeSelector:
  node-role.kubernetes.io/etcd: "'true'"
  node-role.kubernetes.io/conrolplane: "'true'"

tridentNodePluginNodeSelector:
  node-role.kubernetes.io/worker: "'true'"

@ameade
Copy link
Contributor

ameade commented Feb 2, 2022

Good point, thank you. We'll work to make it more consistent.

@gnarl gnarl added the tracked label Feb 2, 2022
@gnarl
Copy link
Contributor

gnarl commented Jul 26, 2022

This issue is fixed with commit a53cf69 and will be included in the Trident 22.07 release.

@moonek
Copy link

moonek commented Jun 28, 2023

This issue reproduces on 23.04.0.
It seems that it is not processed by changing to nodeAffinity instead of nodeSelector.

  • trident version
$ kubectl get deploy -n trident -oyaml | grep image:
          image: netapp/trident-operator:23.04.0
  • TridentOrchestrator manifests
apiVersion: trident.netapp.io/v1
kind: TridentOrchestrator
metadata:
  name: trident
spec:
  debug: false
  namespace: trident
  silenceAutosupport: false
  controllerPluginNodeSelector:
    node-role.kubernetes.io/test: "true"
  • trident-operator pod log
time="2023-06-28T09:58:58Z" level=debug msg="Object not created, waiting." err="Deployment in version \"v1\" cannot be handled as a Deployment: json: cannot unmarshal bool into Go struct field NodeSelectorRequirement.spec.template.spec.affinity.nodeAffinity.requiredDuringSchedulingIgnoredDuringExecution.nodeSelectorTerms.matchExpressions.values of type string" increment=4.781876405s requestID=96cc73b1-9ea5-4421-a1dd-2cdac39848dc requestSource=Unknown workflow="k8s_client=trace_api" yamlDocument="---\napiVersion: apps/v1\nkind: Deployment\nmetadata:\n  name: trident-controller\n  labels:\n    app: controller.csi.trident.netapp.io\n    k8s_version: v1.25.10\n    trident_version: v23.04.0\n    kubectl.kubernetes.io/default-container: trident-main\n  ownerReferences:\n  - apiVersion: trident.netapp.io/v1\n    controller: true\n    kind: TridentOrchestrator\n    name: trident\n    uid: 3a58a015-da5f-4150-a51b-e13ae06b4aa3\nspec:\n  replicas: 1\n  strategy:\n    type: Recreate\n  selector:\n    matchLabels:\n      app: controller.csi.trident.netapp.io\n  template:\n    metadata:\n      labels:\n        app: controller.csi.trident.netapp.io\n    spec:\n      serviceAccount: trident-controller\n      containers:\n      - name: trident-main\n        image: netapp/trident:23.04.0\n        imagePullPolicy: IfNotPresent\n        ports:\n        - containerPort: 8443\n        - containerPort: 8001\n        command:\n        - /trident_orchestrator\n        args:\n        - \"--crd_persistence\"\n        - \"--k8s_pod\"\n        - \"--https_rest\"\n        - \"--https_port=8443\"\n        - \"--csi_node_name=$(KUBE_NODE_NAME)\"\n        - \"--csi_endpoint=$(CSI_ENDPOINT)\"\n        - \"--csi_role=controller\"\n        - \"--log_format=text\"\n        - \"--log_level=info\"\n        - \"--log_workflows=\"\n        - \"--log_layers=\"\n        - \"--disable_audit_log=true\"\n        - \"--address=127.0.0.1\"\n        - \"--http_request_timeout=90s\"\n        - \"--enable_force_detach=false\"\n        - \"--metrics\"\n        #- -debug\n        livenessProbe:\n          exec:\n            command:\n            - tridentctl\n            - -s\n            - \"127.0.0.1:8000\"\n            - version\n          failureThreshold: 2\n          initialDelaySeconds: 120\n          periodSeconds: 120\n          timeoutSeconds: 90\n        env:\n        - name: KUBE_NODE_NAME\n          valueFrom:\n            fieldRef:\n              apiVersion: v1\n              fieldPath: spec.nodeName\n        - name: CSI_ENDPOINT\n          value: unix://plugin/csi.sock\n        - name: TRIDENT_SERVER\n          value: \"127.0.0.1:8000\"\n        volumeMounts:\n        - name: socket-dir\n          mountPath: /plugin\n        - name: certs\n          mountPath: /certs\n          readOnly: true\n      - name: trident-autosupport\n        image: docker.io/netapp/trident-autosupport:23.04\n        imagePullPolicy: IfNotPresent\n        command:\n        - /usr/local/bin/trident-autosupport\n        args:\n        - \"--k8s-pod\"\n        - \"--log-format=text\"\n        - \"--trident-silence-collector=false\"\n        \n        \n        \n        \n        #- -debug\n        resources:\n          limits:\n            memory: 1Gi\n        volumeMounts:\n        - name: asup-dir\n          mountPath: /asup\n      - name: csi-provisioner\n        image: registry.k8s.io/sig-storage/csi-provisioner:v3.4.1\n        imagePullPolicy: IfNotPresent\n        args:\n        - \"--v=2\"\n        - \"--timeout=600s\"\n        - \"--csi-address=$(ADDRESS)\"\n        - \"--retry-interval-start=8s\"\n        - \"--retry-interval-max=30s\"\n        \n        env:\n        - name: ADDRESS\n          value: /var/lib/csi/sockets/pluginproxy/csi.sock\n        volumeMounts:\n        - name: socket-dir\n          mountPath: /var/lib/csi/sockets/pluginproxy/\n      - name: csi-attacher\n        image: registry.k8s.io/sig-storage/csi-attacher:v4.2.0\n        imagePullPolicy: IfNotPresent\n        args:\n        - \"--v=2\"\n        - \"--timeout=60s\"\n        - \"--retry-interval-start=10s\"\n        - \"--csi-address=$(ADDRESS)\"\n        env:\n        - name: ADDRESS\n          value: /var/lib/csi/sockets/pluginproxy/csi.sock\n        volumeMounts:\n        - name: socket-dir\n          mountPath: /var/lib/csi/sockets/pluginproxy/\n      - name: csi-resizer\n        image: registry.k8s.io/sig-storage/csi-resizer:v1.7.0\n        imagePullPolicy: IfNotPresent\n        args:\n        - \"--v=2\"\n        - \"--timeout=300s\"\n        - \"--csi-address=$(ADDRESS)\"\n        env:\n        - name: ADDRESS\n          value: /var/lib/csi/sockets/pluginproxy/csi.sock\n        volumeMounts:\n        - name: socket-dir\n          mountPath: /var/lib/csi/sockets/pluginproxy/\n      - name: csi-snapshotter\n        image: registry.k8s.io/sig-storage/csi-snapshotter:v3.0.3\n        imagePullPolicy: IfNotPresent\n        args:\n        - \"--v=2\"\n        - \"--timeout=300s\"\n        - \"--csi-address=$(ADDRESS)\"\n        env:\n        - name: ADDRESS\n          value: /var/lib/csi/sockets/pluginproxy/csi.sock\n        volumeMounts:\n        - name: socket-dir\n          mountPath: /var/lib/csi/sockets/pluginproxy/\n      affinity:\n        nodeAffinity:\n          requiredDuringSchedulingIgnoredDuringExecution:\n            nodeSelectorTerms:\n              - matchExpressions:\n                  - key: kubernetes.io/arch\n                    operator: In\n                    values:\n                    - arm64\n                    - amd64\n                  - key: kubernetes.io/os\n                    operator: In\n                    values:\n                    - linux\n                  - key: node-role.kubernetes.io/test\n                    operator: In\n                    values:\n                    - true\n      tolerations: []\n      volumes:\n      - name: socket-dir\n        emptyDir:\n      - name: certs\n        projected:\n          sources:\n          - secret:\n              name: trident-csi\n          - secret:\n              name: trident-encryption-keys\n      - name: asup-dir\n        emptyDir:\n          medium: \"\"\n          sizeLimit: 1Gi\n"
  • this part is wrong
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
        - matchExpressions:
          - key: node-role.kubernetes.io/test
            operator: In
            values:
            - true # must be "true"

please reopen this issue.

@radhika-pr
Copy link

Issue still exists in 23.07.0 version too. Please reopen the issue

tridentControllerPluginNodeSelector: node.kubernetes.io/controller: "true"

Log:
Failed to install Trident; err: failed to create the Trident Deployment; failed to create or patch Trident deployment; could not patch Trident deployment; "" is invalid: patch: Invalid value: .........{\"key\":\"node.kubernetes.io/controller\",\"operator\":\"In\",\"values\":[true]}]}]}}},

@smunirk
Copy link

smunirk commented Feb 13, 2024

The issue still persist on trident 23.10.

with ex values:
tridentNodePluginNodeSelector : {node-role.kubernetes.io/worker: "true"}

And the workaround fixes the issue as well,
tridentNodePluginNodeSelector : {node-role.kubernetes.io/worker: "'true'"}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants