-
Notifications
You must be signed in to change notification settings - Fork 349
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Provide a way to shutdown the process from another container #828
Comments
Seems that there was a proposal for kubernetes native functionality for this, but it will be reworked and there is no timeline for it: kubernetes/enhancements#753 I think an endpoint could be a good workaround for this in the meantime from the user perspective. |
I'm not entirely convinced an endpoint makes the most sense here. This problem was discussed in #128. #206 added the You could start the proxy with this flag, sleep for some amount of time for your application to start up, and then send a sigterm to the proxy. It'll exit gracefully when the active connections hit 0 (or ungracefully when the term_timeout is reached). Additionally, this method was suggested as a workaround in the original issue. It essentially uses a sentry file to tell a script when to kill the proxy. Both seem like they are equal (or better?) options to an endpoint until k8s adds better support. |
@villesau Does the |
There's some prior art here with Linkerd: linkerd/linkerd2-proxy#811 The correct answer is definitely Kubernetes handling this natively. In every other case it's a workaround. |
@arbourd Can you clarify this? To be clear, |
Understood. Some of our jobs may run for many hours and some others (like migration) complete reasonably. The migration task is far far more common. Feels strange to set a 5h |
I just run into this while trying to use the proxy for postgres migrations. What do you people think about something like an |
If we were to support this, I think following linkerd would be the way to go.
|
Yes, I also would love to see this implemented. My use case is documented in #1115, but briefly, it is to use with Apache Airflow when deployed on Kubernetes using their official Helm chart, and the sidecar needs to be killed semi-manually after the main app is done its work with the database. |
Another option would be to follow Istio with its |
Having an explicit shutdown endpoint (
This is not the behavior I am currently seeing with proxy sidecar integration with pega-helm-charts installer subchart. After installer container completion, and checking cloud-sql-proxy logs, there are just as many |
To be clear, setting A workaround proposed elsewhere (not finding the comment at the moment) is to start the proxy with the alpine or buster container like so:
The idea here is that the |
Ah thanks @enocom it seems I was missing the part of actually sending the proxy a SIGTERM. Using |
For completeness, here's a blog post that describes how to use a PID file to accomplish the same. In plain bash the approach translates to: $ /cloud_sql_proxy -instances=my-project:us-central1:my-instance=tcp:3306 \
-credential_file=/secrets/cloudsql/credentials.json & CHILD_PID=$!
$ (while true; do if [[ -f "/tmp/pod/terminated" ]]; then
kill $CHILD_PID; echo "Killed $CHILD_PID because the main container terminated."
fi; sleep 1; done) &
$ wait $CHILD_PID
$ if [[ -f "/tmp/pod/terminated" ]]; then exit 0; echo "Job completed. Exiting..."; fi |
I think an endpoint like Relying on timeouts could result in flaky behavior because it makes an assumption that past a certain time in the future if the connections drop to 0, then the proxy can exit. I believe that is coming from an assumption that the other container would only be using 1 connection, and that it is reasonable to try to predict how long it will take database operations to complete or the other containers themselves to start. IMO those are not reasonable assumptions; therefore, I think it needs to be stated that using timeouts to enter the exiting state is not an acceptable solution for this issue. However, using timeouts for then connections to close after the proxy being explicitly to shutdown seem perfectly fine. |
Adding a suggestion from @taylorchu from here:
|
Random thought: we could add support for SIGQUIT or similar to have the Proxy shutdown. That might sidestep the problems of securing an HTTP endpoint. |
The current plan is to add a localhost-only |
I found quitquitquit endpoint is not very scalable, and could easily create race conditions. For example,
Now imagine the count is not 2 but >= 3, and you have many apps like this. All this additional/complex code added is because we run things on k8s; the application is bloated because of the "environment" you run it on. I think this issue is more of a k8s problem and therefore out of scope. |
It's definitely a K8s problem. The idea here is to ease the pain for people running jobs or init containers. And we'll make it opt-in with a CLI flag, so by default the behavior will be off. What's a keystone container? I don't know that phrase and don't find much when searching for it. |
We'll be releasing this in the next version of v2 currently scheduled for February 21. |
@enocom question about this, will you also provide a way to have it ignore sigterm as well or is that already possible? Consider the following case:
It doesn't matter if there is an endpoint the app can hit if the proxy is getting a sigterm from k8s. There needs to be a way to keep the proxy running. I could use the entrypoint config hacks I've seen in github pages to catch sigterm in the entrypoint and ignore but it would be nice to support this in the proxy process itself! Maybe another endpoint that could be hit to cancel shutdown the app could call? Or will the quitquitquit endpoint support setting a hard time out so they app could force it to stay alive for X seconds when getting a sigterm? |
Take a look at the |
So I guess I was misunderstanding how that worked. I though using that meant it would stay alive only while there are pending connections. I'm actually using that switch currently. Your when the proxy gets a sig term and So I'm wrong in thinking @enocom I'm actually using V1 and |
If you have open connections, the proxy will stay alive until the duration elapses. Otherwise it will shut down early. In either case, let's pull this discussion into a new feature request and we can explore whether there's something that needs to be changed here or elsewhere. |
In case someone was struggling with the same problem. I've come up with something working for me.
|
The flag for -term_timeout doesn't work and I'm going to have to insist on the fact that it does not work. It works locally, yes, on my Mac, but does not work in the gcr.io/cloudsql-docker/gce-proxy container where it should absolutely work. |
Let's talk through the details here: #1742. |
@dastanko answer helped me a lot for setuping a post-install/post-upgrade helm hook for a kubernetes job in order to run django migrations. |
@GuillaumeCisco Does that change your opinion on the need for #1640? |
No it is a total different thing.
cloudsql proxy and pgbouncer communicates with some connection opened, so using I'm still hacking around for trying to resolve this issue. I did what you suggest by creating a connection pool with pgbouncer, but I don't think it is a part of the solution. Like moving the issue somewhere else. In my opinion, the EASIEST way, could be simply to declare a "main" container in a pod, and only this one get the SIGTERM signal, gracefully shutdown, then other sidecar pods shutdown. Other hacky way I thought about is to run sidecar containers with a I'm still scratching my head to understand how to address this issue. |
"sidecar using Kubernetes-style init containers" |
Adding the cloud sql proxy as init container with restartPolicy=Always did the trick. Thanks. |
Feature Description
It is a common practice to run database migrations in an init container and wait for the migration job to finish. However, when the proxy never shuts down, init script stalls. Would be good to provide a way to shut down the proxy gracefully. For example an endpoint for other containers to call when finished.
Alternatives Considered
Not sure really. If Kubernetes had a native way to shutdown the whole pod and deliver the signal to the container, that could work too. But I think an endpoint could be easier and more straightforward.
Additional Context
An example tutorial on how to run migrations with init containers, kubernetes jobs and k8s-wait-for: https://andrewlock.net/deploying-asp-net-core-applications-to-kubernetes-part-8-running-database-migrations-using-jobs-and-init-containers/
The text was updated successfully, but these errors were encountered: