UDP TX drops every 10min when rate exceeds 15-20Kbps #2338

liggetm · 2018-04-12T12:11:55Z

NGINX Ingress controller version:
0.12.0
(from quay.io/kubernetes-ingress-controller/nginx-ingress-controller:0.12.0)

Kubernetes version (use kubectl version):
Client Version: version.Info{Major:"1", Minor:"5", GitVersion:"v1.5.4", GitCommit:"7243c69eb523aa4377bce883e7c0dd76b84709a1", GitTreeState:"clean", BuildDate:"2017-03-07T23:53:09Z", GoVersion:"go1.7.4", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"5", GitVersion:"v1.5.2", GitCommit:"269f928217957e7126dc87e6adfa82242bfe5b1e", GitTreeState:"clean", BuildDate:"2017-07-03T15:31:10Z", GoVersion:"go1.7.4", Compiler:"gc", Platform:"linux/amd64"}

Environment:

Cloud provider or hardware configuration:
Bare metal (on an HP Elitedesk 800 G3, i7, 32GB, 250GB SSD)
OS (e.g. from /etc/os-release):
CentOS Atomic Host 1803

# atomic host status
State: idle
Deployments:
* centos-atomic-host:centos-atomic-host/7/x86_64/standard
                   Version: 7.1803 (2018-04-03 12:35:38)
                    Commit: cbb9dbf9c8697e9254f481fff8f399d6808cecbed0fa6cc24e659d2f50e05a3e
              GPGSignature: Valid signature by 64E3E7558572B59A319452AAF17E745691BA8335
# cat /etc/redhat-release
(CentOS Linux release 7.4.1708 (Core)

Kernel (e.g. uname -a):
Linux atomic80 3.10.0-693.21.1.el7.x86_64 Basic structure #1 SMP Wed Mar 7 19:03:37 UTC 2018 x86_64 x86_64
Install tools:
Installed via Ansible scripts from https://github.com/kubernetes/contrib/tree/master/ansible
Others:
None I can think of other than I can replicate this same issue in version 0.9.0.

What happened:
Drops in UDP TX traffic from the ingress controller every 10mins for a period of approximately 90sec even though UDP RX traffic remains constant. This appears to happen at UDP rates > 15-20Kbps.

What you expected to happen:
No drops in traffic and no real difference between RX/TX rates when using UDP regardless of rate

How to reproduce it (as minimally and precisely as possible):
Configure a valid upstream UDP host, configure an ingress using a hostport to point to the upstream host via a Kubernetes service. My config snippets:

apiVersion: v1
kind: ConfigMap
metadata:
  name: my-ingress-udp
data:
  50000: "default/my-udp-svc:50000"

---

apiVersion: extensions/v1beta1
kind: DaemonSet
metadata:
  name: my-ingress-udp
spec:
  template:
    metadata:
      labels:
        my-app: my-ingress-udp
    spec:
      containers:
      - image:  quay.io/kubernetes-ingress-controller/nginx-ingress-controller:0.12.0
        name: nginx-ingress-lb
        readinessProbe:
          httpGet:
            path: /healthz
            port: 10254
            scheme: HTTP
        livenessProbe:
          httpGet:
            path: /healthz
            port: 10254
            scheme: HTTP
          initialDelaySeconds: 10
          timeoutSeconds: 1
        # use downward API
        env:
          - name: POD_NAME
            valueFrom:
              fieldRef:
                fieldPath: metadata.name
          - name: POD_NAMESPACE
            valueFrom:
              fieldRef:
                fieldPath: metadata.namespace
        ports:
        - containerPort: 50000
          hostPort: 50000
          protocol: UDP
        args:
        - /nginx-ingress-controller
        - --default-backend-service=$(POD_NAMESPACE)/my-backend-svc
        - --udp-services-configmap=$(POD_NAMESPACE)/my-ingress-udp

---

apiVersion: v1
kind: Service
metadata:
  name: my-udp-svc
spec:
  ports:
  - port: 50000
    name: telemetry
    protocol: UDP
    targetPort: telemetry
  selector:
    my-app: my-udp

---

apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: my-udp
spec:
  replicas: 1
  template:
    metadata:
      labels:
        my-app: my-udp
    spec:
      containers:
      - name: my-udp
        image: registry:5000/my-udp-server:latest
        ports:
        - containerPort: 50000
          name: telemetry
          protocol: UDP

---

apiVersion: v1
kind: Service
metadata:
  name: my-backend-svc
spec:
  ports:
  - port: 80
    targetPort: 8080
  selector:
    my-app: my-ui

---

apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: my-ui
spec:
  replicas: 1
  template:
    metadata:
      labels:
        my-app: my-ui
    spec:
      containers:
      - name:my-ui
        image: registry:5000/my-ui:latest
        ports:
        - containerPort: 8080
          protocol: TCP

Anything else we need to know:
I've uploaded an image from Grafana showing the TX drops over the course of an hour: https://imagebin.ca/v/3y7wU7cilhwF

The text was updated successfully, but these errors were encountered:

aledbf · 2018-04-12T12:15:36Z

@liggetm please check the pod logs searching for "reloading" and checking the timestamp against the drop of traffic to see if that's the issue.

liggetm · 2018-04-12T14:44:18Z

@aledbf thanks for coming back to me. I didn't see any reloading at the same timestamp, but I did see alerts when I increasing the logging to verbose.

16384 worker_connections are not enough while connecting to upstream, udp client

Looking at the documentation it appears that the defaults are 16384 worker_connections per worker-process (@ 1 worker process per cpu). I'll try to increase the worker_connections, but I don't fully understand what it means in relation to UDP given it's connectionless.

aledbf · 2018-04-13T01:26:45Z

@liggetm please check the generated nginx.con searching the value of worker_rlimit_nofile
You can do this using kubectl exec <ingress pod> cat /etc/nginx/nginx.conf

You can adjust worker_connections in the configuration configmap setting max-worker-connections: XX
This value cannot be higher than worker_rlimit_nofile

liggetm · 2018-04-17T11:07:52Z

Thanks @aledbf - my config shows worker_rlimit_nofile 201874; - after setting the max-worker-connections: 65536 the issue with the traffic appears resolved.
When you say that max-worker-connections cannot exceed worker_rlimit_nofile - do you mean the total number of worker connections (ie worker_processes * worker_connections)?

aledbf · 2018-04-17T11:13:38Z

do you mean the total number of worker connections (ie worker_processes * worker_connections)?

Yes.

Edit: worker_rlimit_nofile is per worker process

aledbf · 2018-04-17T11:14:03Z

Can we close this?

liggetm · 2018-04-17T12:31:49Z

Yes, thanks @aledbf for all your help!

liggetm closed this as completed Apr 17, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

UDP TX drops every 10min when rate exceeds 15-20Kbps #2338

UDP TX drops every 10min when rate exceeds 15-20Kbps #2338

liggetm commented Apr 12, 2018

aledbf commented Apr 12, 2018

liggetm commented Apr 12, 2018

aledbf commented Apr 13, 2018 •

edited

Loading

liggetm commented Apr 17, 2018

aledbf commented Apr 17, 2018 •

edited

Loading

aledbf commented Apr 17, 2018

liggetm commented Apr 17, 2018

UDP TX drops every 10min when rate exceeds 15-20Kbps #2338

UDP TX drops every 10min when rate exceeds 15-20Kbps #2338

Comments

liggetm commented Apr 12, 2018

aledbf commented Apr 12, 2018

liggetm commented Apr 12, 2018

aledbf commented Apr 13, 2018 • edited Loading

liggetm commented Apr 17, 2018

aledbf commented Apr 17, 2018 • edited Loading

aledbf commented Apr 17, 2018

liggetm commented Apr 17, 2018

aledbf commented Apr 13, 2018 •

edited

Loading

aledbf commented Apr 17, 2018 •

edited

Loading