Firewall rule not updated properly with NEG if service uses name in targetPort or does not name its port #703

pdecat · 2019-03-26T23:59:19Z

Under some conditions, the k8s-fw-l7--<uid> firewall rule managed by the ingress-gce controller does not include the target pod's port to the list of allowed ports.
When this happens, health checks do not reach the pods and all requests end up in HTTP 502 errors.

For example, with the following service configuration:

apiVersion: v1
kind: Service
metadata:
  name: test
  namespace: default
  annotations:
    cloud.google.com/neg: '{"ingress": true}'
spec:
  ports:
  - nodePort: 30742
    port: 80
    protocol: TCP
    targetPort: http
  selector:
    app: test
  sessionAffinity: None
  type: NodePort

The pods selected by this service have one container with a corresponding port named http and httpGet readiness/liveness probes referencing that port:

apiVersion: v1
kind: Pod
metadata:
  labels:
    app: test
  name: test
  namespace: default
spec:
  containers:
  - image: nginx:latest
    name: nginx
    ports:
    - containerPort: 80
      name: http
      protocol: TCP
    livenessProbe:
      httpGet:
        path: /healthz
        port: http
        scheme: HTTP
    readinessProbe:
      httpGet:
        path: /healthz
        port: http
        scheme: HTTP

And FWIW, the ingress:

apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  name: test
  namespace: default
  annotations:
    kubernetes.io/ingress.class: "gce"
    kubernetes.io/ingress.allow-http: "true"
spec:
  backend:
    serviceName: test
    servicePort: 80

I've identified two work-arounds for now:

adding a name to the service port:

@@ -11,7 +11,8 @@ metadata:
     cloud.google.com/neg: '{"ingress": true}'
 spec:
   ports:
-  - nodePort: 30742
+  - name: http
+    nodePort: 30742
     port: 80
     protocol: TCP
     targetPort: http

using the port number in the targetPort instead of the port name:

@@ -14,7 +14,7 @@ spec:
   - nodePort: 30742
     port: 80
     protocol: TCP
-    targetPort: http
+    targetPort: 80
   selector:
     app: test
   sessionAffinity: None

When any of those two changes is applied separately, the corresponding port is almost instantly added to the firewall rule:

And health checks reach the pods and all requests end up in HTTP 200 status.

Reverting those changes ends up in the original situation: port missing in firewall rule, failed health checks and 502 errors.

Tested on GKE master version 1.11.7-gke.12 with supposedly ingress-gce v1.4.3 according to https://github.com/kubernetes/ingress-gce/blob/master/README.md#gke-version-mapping.
I've yet to check if the issue is still current with ingress-gce v1.5.0 on GKE 1.12.5-gke.10+.

Having access to the GKE managed ingress-gce logs would greatly help troubleshooting these kind of errors. I did not face this issue in our preproduction environment because the same port was already allowed by another service that named its port.

PS: I've learned from reading the GCE ingress controller code that NEGs do not require services to be of type NodePort but I'm still in the process of converting ingresses to container native load balancing by adding the cloud.google.com/neg: '{"ingress": true}'. I'll convert those services back to ClusterIP once done.

I believe this issue should be referenced by #583.

The text was updated successfully, but these errors were encountered:

rramkumar1 · 2019-03-27T15:00:12Z

/assign @freehan

strideynet · 2019-05-14T14:33:50Z

It appears this also effects creation of Endpoints in Endpoint Groups. I found that it only created them if the targetPort was a number rather than a name.

fejta-bot · 2019-08-12T14:46:19Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

fejta-bot · 2019-09-11T15:45:13Z

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

fejta-bot · 2019-10-11T16:30:20Z

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

k8s-ci-robot · 2019-10-11T16:30:28Z

@fejta-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

pdecat changed the title ~~Firewall rule not updated properly with NEG if service with undefined port name and~~ Firewall rule not updated properly with NEG if service uses name in targetPort or does not name its port Mar 27, 2019

k8s-ci-robot assigned freehan Mar 27, 2019

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Aug 12, 2019

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Sep 11, 2019

k8s-ci-robot closed this as completed Oct 11, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Firewall rule not updated properly with NEG if service uses name in targetPort or does not name its port #703

Firewall rule not updated properly with NEG if service uses name in targetPort or does not name its port #703

pdecat commented Mar 26, 2019

rramkumar1 commented Mar 27, 2019

strideynet commented May 14, 2019

fejta-bot commented Aug 12, 2019

fejta-bot commented Sep 11, 2019

fejta-bot commented Oct 11, 2019

k8s-ci-robot commented Oct 11, 2019

Firewall rule not updated properly with NEG if service uses name in targetPort or does not name its port #703

Firewall rule not updated properly with NEG if service uses name in targetPort or does not name its port #703

Comments

pdecat commented Mar 26, 2019

rramkumar1 commented Mar 27, 2019

strideynet commented May 14, 2019

fejta-bot commented Aug 12, 2019

fejta-bot commented Sep 11, 2019

fejta-bot commented Oct 11, 2019

k8s-ci-robot commented Oct 11, 2019