-
Notifications
You must be signed in to change notification settings - Fork 317
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
helm:Envoy sidecar shutting down too early causes requests to fail #650
Comments
hey @nflaig !! Thanks so much for bringing this to our attention. Also, the attention to detail as well as the suggestions in the issue are really appreciated!! The team is stretched a little thin at the moment and we might not be in a position to solve this in the next release but will try and prioritize this for the one after. Will keep this issue updated with the status of the fix! Thanks again!! |
Having the same problem. Maybe a more flexible solution would be calling something like Edit: using the latest helm chart 0.32.1 |
hi @nflaig doesn't terminationDrainDuration solve the problem or I'm missing anything? |
Hi @Samjin, it does not solve the problem as it still prevents new connections |
@Samjin only |
@ryan4yin Let me know if I understand this correctly. While the pod ip is being removed from iptable, istio-proxy and your service have already received SIGTERM and stopped accepting any new request, this is causing problem because requests are still coming in through the available Pod IP. |
@Samjin Correct, that's what this issue suggested. |
I'm seeing a similar issue but using the AWS ALB Controller We also make extensive use of long polling so often have HTTP requests that take 27s to return. There's also a small lag between a pod being marked as terminating and the ALB being fully updated to not send new requests to the pod. At the moment though the pod is marked terminating, preStop fires on my container and the envoy sidecar immediately shuts down. Being able to set a preStop on the envoy container as well would resolve this problem entirely. |
@hamishforbes, what did you configure on the alb ingress controller side to make it work with consul connect? I'm not able to send requests to a service that's under consul connect, by accessing it through ALB. |
We're seeing similar issues with release In our case, the As we're using |
We have a similar issue where our app container has draining configured. As such on termination it starts draining the requests with a max time f.e say 3 minutes. But since the sidecar has no corresponding config it shuts down immediately. The problem is that when during this period the draining requests makes a connection to the sidecar they receive "connection refused" error as the sidecar is already stopped, resulting in failures and noise in logs. |
Already being discussed : #536 |
Closing as the pod shutdown use case of sidecar lifecycle should now be addressed by #2233. Please open a new issue if you still have issues. |
Hey guys,
during a rolling update (pod termination) we are getting 502 errors in nginx for upstream requests. This happens because the load balancer in k8s still sends requests to the terminating pod due to the fact that the endpoint deregistration happens asynchronously, see kubernetes/kubernetes#43576.
For nginx itself, I was able to resolve this race condition issue by adding a simple sleep to the
preStop
hook of the containerbut the problem is not fully resolved since the envoy sidecar is still shutting down too early which causes requests to fail.
Current behavior
Nginx returns 502 errors because the upstream requests are send through the envoy sidecar proxy which is already shutting down due to the
SIGTERM
send by k8s.Expected behavior
The envoy sidecar proxy should handle requests of the proxied service as long as the service is still running, e.g. if the service does some cleanup and still needs to send data when terminating or in case of nginx if further requests are still being routed to the pod.
Suggestion
In the envoy preStop command a delay could be added to prevent instantly sending the
SIGTERM
signal to the container. Since a hardcoded sleep duration seems to be too static maybe it would make sense to add something like thisIt would delay the shutdown process of envoy until there are no more TCP listeners, i.e. the proxied service is no longer running and it is save to also shutdown envoy without causing further requests to fail.
Another option could be to allow the user of the helm chart to customize the envoy
preStop
hook.Environment details
consul-k8s
version: 0.24.0consul-helm
version: 0.30.0consul
version: 1.9.3envoy
version: 1.16.0Related
The text was updated successfully, but these errors were encountered: