Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pod can't be started with sysctls custom settings #11962

Open
yaroslav-nakonechnikov opened this issue Sep 10, 2024 · 15 comments
Open

Pod can't be started with sysctls custom settings #11962

yaroslav-nakonechnikov opened this issue Sep 10, 2024 · 15 comments
Labels
kind/support Categorizes issue or PR as a support question. needs-priority needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one.

Comments

@yaroslav-nakonechnikov
Copy link

Hello,

What happened:

i'm getting next warning, which prevents to start nginx pod:

  Warning  FailedCreatePodSandBox  37m (x3 over 40m)      kubelet            Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create containerd task: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: write /proc/sys/net/ipv4/ip_local_port_range: invalid argument: unknown
  Warning  FailedCreatePodSandBox  2m14s (x163 over 42m)  kubelet            Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create containerd task: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: write /proc/sys/net/core/somaxconn: invalid argument: unknown

deploy was done via terraforms helm_release resource:

...
      "controller.podSecurityContext.sysctls[1].name"  = "net.core.somaxconn"
      "controller.podSecurityContext.sysctls[1].value" = "\"32768\""
      "controller.podSecurityContext.sysctls[0].name"  = "net.ipv4.ip_local_port_range"
      "controller.podSecurityContext.sysctls[0].value" = "\"1024 65000\""
      "sysctls.net\\.core\\.somaxconn"                 = "32768"
      "sysctls.net\\.ipv4\\.ip_local_port_range"       = "1024 65000"
...

values are rendered like:

USER-SUPPLIED VALUES:
controller:
  admissionWebhooks:
    patch:
      image:
        image: prj-eks-42718-ingress-kube-webhook-certgen
        registry: SOMEID.dkr.ecr.eu-central-1.amazonaws.com
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
        - matchExpressions:
          - key: karpenter.sh/nodepool
            operator: In
            values:
            - prj-eks-42718-ingress
    podAntiAffinity:
      preferredDuringSchedulingIgnoredDuringExecution:
      - podAffinityTerm:
          labelSelector:
            matchExpressions:
            - key: app.kubernetes.io/component
              operator: In
              values:
              - controller
          topologyKey: topology.kubernetes.io/zone
        weight: 100
      - podAffinityTerm:
          labelSelector:
            matchExpressions:
            - key: app.kubernetes.io/component
              operator: In
              values:
              - controller
          topologyKey: topology.kubernetes.io/hostname
        weight: 99
  config:
    allow-snippet-annotations: true
    enable-opentelemetry: true
    log-format-escape-json: true
    log-format-escape-none: true
    log-format-stream: '{"module":"log_stream","src_ip":"$remote_addr","timestamp":"$time_local","protocol":"$protocol","status":"$status","bytes_out":$bytes_sent,"bytes_in":$bytes_received,"session_time":"$session_time","upstream_addr":"$upstream_addr","upstream_bytes_out":"$upstream_bytes_sent","upstream_bytes_in":"$upstream_bytes_received","upstream_connect_time":"$upstream_connect_time","proxy_upstream_name":"$proxy_upstream_name"}'
    log-format-upstream: '{"module":"upstreamlog","src_ip":"$remote_addr", "username":"$remote_user","timestamp":"$time_local",
      "request":"$request", "status":"$status", "bytes_sent":"$body_bytes_sent", "http_referer":"$http_referer",
      "http_user_agent":"$http_user_agent", "req_len":$request_length, "req_time":"$request_time","proxy_upstream_name":"$proxy_upstream_name",
      "proxy_alternative_upstream_name":"$proxy_alternative_upstream_name", "upstream_addr":"$upstream_addr",
      "upstream_response_length":"$upstream_response_length","upstream_response_time":"$upstream_response_time",
      "upstream_status":"$upstream_status", "req_id":"$req_id", "service_name":"$service_name"}'
    retry-non-idempotent: true
  extraVolumeMounts:
  - mountPath: /mnt/indexer
    name: indexer
    readOnly: true
  - mountPath: /mnt/ingress
    name: ingress
    readOnly: true
  extraVolumes:
  - name: indexer
    secret:
      secretName: indexer
  - name: ingress
    secret:
      secretName: ingress
  image:
    image: prj-eks-42718-ingress-controller
    registry: SOMEID.dkr.ecr.eu-central-1.amazonaws.com
  ingressClassResource:
    default: true
  kind: Deployment
  opentelemetry:
    enabled: true
    image:
      image: prj-eks-42718-ingress-opentelemetry-1.25.3
      registry: SOMEID.dkr.ecr.eu-central-1.amazonaws.com
  podSecurityContext:
    sysctls:
    - name: net.ipv4.ip_local_port_range
      value: '"1024 65000"'
    - name: net.core.somaxconn
      value: '"32768"'
  port: '{"https":443}'
  resources:
    requests:
      cpu: 128m
      memory: 512Mi
  service:
    enableHttp: false
    type: ClusterIP
  tolerations:
  - effect: NoSchedule
    key: function
    operator: Equal
    value: ingress
  topologySpreadConstraints:
  - labelSelector:
      matchExpressions:
      - key: app.kubernetes.io/component
        operator: In
        values:
        - controller
    maxSkew: 1
    topologyKey: topology.kubernetes.io/zone
    whenUnsatisfiable: ScheduleAnyway
defaultBackend:
  image:
    image: prj-eks-42718-ingress-defaultbackend-amd64
    registry: SOMEID.dkr.ecr.eu-central-1.amazonaws.com
sysctls:
  net.core.somaxconn: 32768
  net.ipv4.ip_local_port_range: 1024 65000
tcp:
  "8089": ingress/ingress-nginx-controller:443
  "9999": ingress/ingress-nginx-controller:443

as i see, there is somehow additional chars passed there:

 podSecurityContext:
    sysctls:
    - name: net.ipv4.ip_local_port_range
      value: '"1024 65000"'
    - name: net.core.somaxconn
      value: '"32768"'

but if i write nex:

...
      "controller.podSecurityContext.sysctls[1].name"  = "net.core.somaxconn"
      "controller.podSecurityContext.sysctls[1].value" = "32768"
      "controller.podSecurityContext.sysctls[0].name"  = "net.ipv4.ip_local_port_range"
      "controller.podSecurityContext.sysctls[0].value" = "1024 65000"
      "sysctls.net\\.core\\.somaxconn"                 = "32768"
      "sysctls.net\\.ipv4\\.ip_local_port_range"       = "1024 65000"
...

it fails on apply stage like:

Error: failed to replace object: Deployment in version "v1" cannot be handled as a Deployment: json: cannot unmarshal number into Go struct field Sysctl.spec.template.spec.securityContext.sysctls.value of type string

  with helm_release.ingress_nginx,
  on ingress-nginx.tf line 71, in resource "helm_release" "ingress_nginx":
  71: resource "helm_release" "ingress_nginx" {

Why? how it is possible to provide values, so it will work?

What you expected to happen:

Simple notation works without issues.

NGINX Ingress controller version (exec into the pod and run nginx-ingress-controller --version.):
installed with chart 4.10.4

@yaroslav-nakonechnikov yaroslav-nakonechnikov added the kind/bug Categorizes issue or PR as related to a bug. label Sep 10, 2024
@k8s-ci-robot k8s-ci-robot added the needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. label Sep 10, 2024
@k8s-ci-robot
Copy link
Contributor

This issue is currently awaiting triage.

If Ingress contributors determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@longwuyuan
Copy link
Contributor

Does it work if you do not customize PodSecurityContext ?

@yaroslav-nakonechnikov
Copy link
Author

yaroslav-nakonechnikov commented Sep 10, 2024

yes, it works perfectly.

and if i edit deployment like kubectl edit deployment -n ingress ingress-nginx-controller:

$ kubectl get deployment -n ingress ingress-nginx-controller -o yaml | grep securityContext -A 5
        securityContext:
          allowPrivilegeEscalation: false
          capabilities:
            add:
            - NET_BIND_SERVICE
            drop:
--
        securityContext:
          allowPrivilegeEscalation: false
          capabilities:
            drop:
            - ALL
          readOnlyRootFilesystem: true
--
      securityContext:
        sysctls:
        - name: net.ipv4.ip_local_port_range
          value: 1024 65000
        - name: net.core.somaxconn
          value: "32768"

it stats fine:

ingress-nginx-controller-db89d67bd-mpfkn:/etc/nginx$ sysctl net. | grep max
net.core.somaxconn = 32768

@longwuyuan
Copy link
Contributor

At least for 2 sysctl arguments, the error message is unknown value ;

range: invalid argument: unknown

So this is not a bug but a misconfiguration of sysctl arguments.

That kind of config is not in controller code as it just passes that from template to the Kubernetes API

/remove-kind bug

I think you should manually try those sysctl commands and see what fits

@k8s-ci-robot k8s-ci-robot added needs-kind Indicates a PR lacks a `kind/foo` label and requires one. and removed kind/bug Categorizes issue or PR as related to a bug. labels Sep 10, 2024
@longwuyuan
Copy link
Contributor

/kind support

@k8s-ci-robot k8s-ci-robot added kind/support Categorizes issue or PR as a support question. and removed needs-kind Indicates a PR lacks a `kind/foo` label and requires one. labels Sep 10, 2024
@yaroslav-nakonechnikov
Copy link
Author

but manually it works.
I know that from hcl sometimes it is hard to pass some values, and for passing custom log_format - it looks extremely weird.
but for sysctls i tried several notations - doesn't work.

workaround with additional modification after helm_release - it works without problem.

ps. almost same problem is with keda addon. But i will report it later, as it is not so critical.

@longwuyuan
Copy link
Contributor

There is a word about unsupported. Have you checked

% k explain pod.spec.securityContext.sysctls
KIND:       Pod
VERSION:    v1

FIELD: sysctls <[]Sysctl>


DESCRIPTION:
    Sysctls hold a list of namespaced sysctls used for the pod. Pods with
    unsupported sysctls (by the container runtime) might fail to launch. Note
    that this field cannot be set when spec.os.name is windows.
    Sysctl defines a kernel parameter to be set
    
FIELDS:
  name  <string> -required-
    Name of a property to set

  value <string> -required-
    Value of a property to set

@yaroslav-nakonechnikov
Copy link
Author

@longwuyuan if i manually (or even with terraform) updating deployment after initial helm install - it starts to work as expected. About unsupported sysctl parameters i've read, but it is different.

@longwuyuan
Copy link
Contributor

Then its a parsing problem. Have you played with the string.

@yaroslav-nakonechnikov
Copy link
Author

yes, i've tried next versions:
"controller.podSecurityContext.sysctls[1].value" = 32768
"controller.podSecurityContext.sysctls[1].value" = "32768"
"controller.podSecurityContext.sysctls[1].value" = ""32768""
"controller.podSecurityContext.sysctls[1].value" = "'32768'"
"controller.podSecurityContext.sysctls[1].value" = '32768'

nothing works.

@longwuyuan
Copy link
Contributor

reduce upper port number to 60000 and try

@longwuyuan
Copy link
Contributor

try

      "sysctls.net\\.core\\.somaxconn"                 = "30000"
      "sysctls.net\\.ipv4\\.ip_local_port_range"       = "1024 60000"

@longwuyuan
Copy link
Contributor

or maybe ;

      "controller.podSecurityContext.sysctls[1].name"  = "net.core.somaxconn"
      "controller.podSecurityContext.sysctls[1].value" = 32768
      "controller.podSecurityContext.sysctls[0].name"  = "net.ipv4.ip_local_port_range"
      "controller.podSecurityContext.sysctls[0].value" = "1024 65000"

I am not sure how to solve but I am sure this is not controller code as these keys & values are passed straight from the rendered template to the kubeapi-server .... you can enable debug and check the json payload

@yaroslav-nakonechnikov
Copy link
Author

yaroslav-nakonechnikov commented Sep 11, 2024

i also tried outside of dynamic set:

  set {
    name  = "controller.podSecurityContext.sysctls[0].value"
    value = "32768"
    type  = "auto"
  }

and

  set {
    name  = "controller.podSecurityContext.sysctls[0].value"
    value = 32768
    type  = "auto"
  }

gives: Error: failed to replace object: Deployment in version "v1" cannot be handled as a Deployment: json: cannot unmarshal number into Go struct field Sysctl.spec.template.spec.securityContext.sysctls.value of type string

 set {
  name  = "controller.podSecurityContext.sysctls"
  value = "[\\{\"name\":\"net.core.somaxconn\"\\,\"value\":\"32768\"\\}\\,\\{\"name\":\"net.ipv4.ip_local_port_range\"\\,\"value\":\"1024 65000\"\\}]"
  type  = "auto"
}

and

set {
   name  = "controller.podSecurityContext.sysctls[0]"
   value = "\\{\"name\":\"net.core.somaxconn\"\\,\"value\":\"32768\"\\}"
   type  = "auto"
 }

gives Error: failed to replace object: Deployment in version "v1" cannot be handled as a Deployment: json: cannot unmarshal string into Go struct field PodSecurityContext.spec.template.spec.securityContext.sysctls of type []v1.Sysctl

@longwuyuan
Copy link
Contributor

Please come talk on Kubernetes Slack as there are not many resources here.

The error message is proof that this is about parsing and var interpolation. I think that this works without terraform or ArgoCD type of tools so its not a problem with the controller. Some expert of these tools has to comment how to inject int instead of string etc etc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/support Categorizes issue or PR as a support question. needs-priority needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one.
Projects
Development

No branches or pull requests

3 participants