Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proxy-read-timeout annotations getting ignored after v1.11.1 upgrade from 1.10.1 #11850

Closed
varunthakur2480 opened this issue Aug 22, 2024 · 7 comments
Labels
kind/support Categorizes issue or PR as a support question. needs-priority needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. triage/needs-information Indicates an issue needs more information in order to work on it.

Comments

@varunthakur2480
Copy link

What happened:

2024/08/22 10:51:43 [error] 41#41: *3030 upstream timed out (110: Operation timed out) while reading response header from upstream, client: 10.124.70.10, server: xxxx-gateway-xxxx.l7.dev2.xx.gcp.xxx.net, request: "POST /rbs.gbm.xxx.web_service_core.gateway.structured_document.MdxStructuredDocumentService/QueryViewPaginated HTTP/2.0", upstream: "grpc://100.71.1.170:5000", host: "xxx-gateway-xxx.l7.dev2.xxx.gcp.xxx.net:443"

Application logs - https://sxxxxxl/KkgESm6Lp3cdbWJDA
Retrying client request due to: [Status(StatusCode="Unknown", Detail="Stream removed", DebugException="Grpc.Core.Internal.CoreErrorDetailException: {"created":"@1724323903.302000000","description":"Error received from peer ipv4:10.124.66.63:443","file":"......\src\core\lib\surface\call.cc","file_line":953,"grpc_message":"Stream removed","grpc_status":2}")]. Retry number [1/10]

What you expected to happen:

Client should not have timed out

It looks like something has changed between 1.10.1 and v1.11.1 after which client side annotations are not being honoured
nginx.ingress.kubernetes.io/ssl-redirect: "true" nginx.ingress.kubernetes.io/proxy-body-size: "500m" nginx.ingress.kubernetes.io/proxy-buffer-size: "16k" nginx.ingress.kubernetes.io/proxy-connect-timeout: 600s nginx.ingress.kubernetes.io/proxy-read-timeout: 600s

NGINX Ingress controller version (exec into the pod and run nginx-ingress-controller --version.): 1.11.1

Kubernetes version (use kubectl version): 1.28

Environment:

  • Cloud provider or hardware configuration: GCP

  • OS (e.g. from /etc/os-release): Continer optimised OS

  • Kernel (e.g. uname -a): 6.1.85

  • Install tools:

    • Please mention how/where was the cluster created like kubeadm/kops/minikube/kind etc. Terraform + kustomization + helm
  • Basic cluster related info:

    • kubectl version Client Version: v1.29.3
      Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
      Server Version: v1.29.7-gke.100800
    • kubectl get nodes -o wide
    • NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
      gke-xxx-xxx-xxx-6-n2-16-2023071905-0e2b7d00-y2lc Ready 2d22h v1.29.7-gke.1008000 10.124.64.160 Container-Optimized OS from Google 6.1.85+ containerd://1.7.15
  • How was the ingress-nginx-controller installed:

    • helm template --values values.yaml --namespace nwm-ingress-nginx --version $chart_version ingress-nginx ingress-nginx/ingress-nginx > manifests.yaml
  • additional config map has been added to address Alpine 3.17 images causes SSL Error "unsafe legacy renegotiation disabled" dotnet/dotnet-docker#4332 and config is upto date with alpine 3.20

  • Current State of the controller:

    • kubectl describe ingressclasses
      Name: nginx
      Labels: app.kubernetes.io/component=controller
      app.kubernetes.io/instance=ingress-nginx
      app.kubernetes.io/managed-by=Helm
      app.kubernetes.io/name=ingress-nginx
      app.kubernetes.io/part-of=ingress-nginx
      app.kubernetes.io/version=1.10.1
      helm.sh/chart=ingress-nginx-4.10.1
      kustomize.toolkit.fluxcd.io/name=gke-cluster-services
      kustomize.toolkit.fluxcd.io/namespace=ddd-flux-system
      Annotations: ingressclass.kubernetes.io/is-default-class: true
      nwm.io/contact: *[email protected]
      Controller: k8s.io/ingress-nginx
      Events:
  • Current state of ingress object, if applicable:

    • kubectl -n <appnamespace> get all,ing -o wide
    • kubectl -n <appnamespace> describe ing <ingressname>
    • If applicable, then, your complete and exact curl/grpcurl command (redacted if required) and the reponse to the curl/grpcurl command with the -v flag
  • Others:

    • Any other related information like ;
      • copy/paste of the snippet (if applicable)
      • kubectl describe ... of any custom configmap(s) created and in use
      • Any other related information that may help

How to reproduce this issue:
deploy ingress with following annotations
metadata:
annotations:
kubernetes.io/ingress.class: nginx
meta.helm.sh/release-name: mdx
meta.helm.sh/release-namespace: dev2-e2-tst1-mdx-mdx-demo2
nginx.ingress.kubernetes.io/backend-protocol: GRPC
nginx.ingress.kubernetes.io/limit-connections: "1000"
nginx.ingress.kubernetes.io/proxy-body-size: 500m
nginx.ingress.kubernetes.io/proxy-buffer-size: 16k
nginx.ingress.kubernetes.io/proxy-connect-timeout: 600s
nginx.ingress.kubernetes.io/proxy-next-upstream-timeout: 600s
nginx.ingress.kubernetes.io/proxy-read-timeout: 600s
nginx.ingress.kubernetes.io/proxy-send-timeout: 600s

Run regression tasks and long running queries

Anything else we need to know:

No issues are reported in version 1.10.1 where as 1.11.1 consistently times out at 60 seconds
-->

@varunthakur2480 varunthakur2480 added the kind/bug Categorizes issue or PR as related to a bug. label Aug 22, 2024
@k8s-ci-robot k8s-ci-robot added the needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. label Aug 22, 2024
@k8s-ci-robot
Copy link
Contributor

This issue is currently awaiting triage.

If Ingress contributors determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@longwuyuan
Copy link
Contributor

The information you have provided is incomplete as most of the important questions from the template are not answered.

Whatever little information you provided can not be used to analyze any problems as such as no reader would be able to recreate the environment you have or the tests you performed. (The information is also not formatted in markdown)

You can help out by answering the questions asked in the new bug report template. And then you could add complete detailed precise and real-use-as-is information from your tests like the output of kubectl describe command for all the resources related to the issue, from your test cluster. Add the logs and the curl command with -v inclusind the curl response so that a reader can copy/paste it all from your tests.

Once triaging results in the data available here showing the bug details, we can re-apply the bug label here.

You can also check the changelog and release notes for relevance to your use-case.

/remove-kind bug
/kind support
/triage needs-information

@k8s-ci-robot k8s-ci-robot added kind/support Categorizes issue or PR as a support question. triage/needs-information Indicates an issue needs more information in order to work on it. and removed kind/bug Categorizes issue or PR as related to a bug. labels Aug 22, 2024
@longwuyuan
Copy link
Contributor

This PR is related to the grpc timeouts #11258

@Anddd7
Copy link
Contributor

Anddd7 commented Aug 26, 2024

Hi @varunthakur2480 , could you try to get

  • nginx.conf: k exec -it nginx-controller-z2hp4 -- cat /etc/nginx/nginx.conf > nginx.conf (mask any credential urls)
  • nginx log: k logs nginx-controller-z2hp4 --tail 100, after timeout
  • grpcbin test: use the same ingress config and deploy a test server grpcbin to verify the connection

@varunthakur2480
Copy link
Author

varunthakur2480 commented Aug 27, 2024

this seems to be sorted after we changed

nginx.ingress.kubernetes.io/ssl-redirect: "true" nginx.ingress.kubernetes.io/proxy-body-size: "500m" nginx.ingress.kubernetes.io/proxy-buffer-size: "16k" nginx.ingress.kubernetes.io/proxy-connect-timeout: 600s nginx.ingress.kubernetes.io/proxy-read-timeout: 600s

to
nginx.ingress.kubernetes.io/ssl-redirect: "true" nginx.ingress.kubernetes.io/proxy-body-size: "500m" nginx.ingress.kubernetes.io/proxy-buffer-size: "16k" nginx.ingress.kubernetes.io/proxy-connect-timeout: "600" nginx.ingress.kubernetes.io/proxy-read-timeout: "600"
removed "s" and added quotes as per documentation here https://github.com/kubernetes/ingress-nginx/blob/main/docs/user-guide/nginx-configuration/annotations.md

see tip !!! tip Annotation keys and values can only be strings. Other types, such as boolean or numeric values must be quoted, i.e. "true", "false", "100".

but still worth highlighting that old annotations without quotes and "s" still work in older version

@longwuyuan
Copy link
Contributor

Closing the issue as seems resolved
/close

@k8s-ci-robot
Copy link
Contributor

@longwuyuan: Closing this issue.

In response to this:

Closing the issue as seems resolved
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/support Categorizes issue or PR as a support question. needs-priority needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. triage/needs-information Indicates an issue needs more information in order to work on it.
Projects
Development

No branches or pull requests

4 participants