Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add command for use in "PostStart" hook that delays until proxy has started #1117

Closed
adriangb opened this issue Feb 26, 2022 · 20 comments · Fixed by #2041
Closed

Add command for use in "PostStart" hook that delays until proxy has started #1117

adriangb opened this issue Feb 26, 2022 · 20 comments · Fixed by #2041
Assignees
Labels
priority: p1 Important issue which blocks shipping the next release. Will be fixed prior to next release. type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design.

Comments

@adriangb
Copy link

It is a well known issue there is no way to sequence container startups in Kubernetes (e.g. kubernetes/kubernetes#65502). As far as I know, this is the best solution we currently have: istio/istio#11130 (comment)

This would be made a lot better if the images provided some wait-until-proxy-is-up-and-running.sh script (especially since the default image lacks sleep, curl, etc.) so that users can easily force Kuberentes to wait until the proxy is up before it starts the next container. Could be as simple as the proxy creating a file when it finishes starting up.

@adriangb adriangb added the type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design. label Feb 26, 2022
@kurtisvg
Copy link
Contributor

I'm not sure what value a script in the proxy image would have, as you'd probably want something in your application image.

Have you considered using the HTTP health-checks? I think you could use something like this as a wrapper in your application container:

until $(curl --output /dev/null --silent --head --fail http://127.0.0.1:8090/readiness); do
    printf '.'
    sleep .1 # 100 ms
done

# start your app here

@adriangb
Copy link
Author

adriangb commented Feb 28, 2022

I'm not sure what value a script in the proxy image would have

For the proposed solution, this has to run from the proxy's container. The idea is to use a lifecycle hook to delay startup of the application container until the proxy is ready (see istio/istio#11130 (comment) for more details, this is exactly what Istio does so that the application container doesn't start until the service mesh sidecar is ready).

One could sidestep all of this and run a wait from the application container like you suggest, but I'd rather not couple my application to an implementation detail of one of it's execution environments (in general but also in my particular case, this is being run cross cloud, so it's not even communicating with CloudSQL Proxy half of the time).

So basically what I'm asking for is a way to wait until the proxy is healthy from within the proxy container, ideally without curl so that one can use the distroless image.

@kurtisvg kurtisvg changed the title Add a wait-until-proxy-is-up-and-running.sh script Add command for use in "PostStart" hook that delays until proxy has started Feb 28, 2022
@kurtisvg
Copy link
Contributor

Ok, I think I understand now. I'm not really sure how good of an idea it is depend on the synchronous order of containers though - feels a bit like an implementation detail.

It looks like this could probably be an HTTP endpoint, so maybe something like /waituntilready that doesn't return until the proxy is "ready"? There might be a bit of a race condition if the health check server isn't mounted when the HTTP request is sent. Otherwise it would need to be a different execution flow that just calls on the readiness endpoint and terminates when it's green.

@adriangb
Copy link
Author

I agree that it's an implementation detail. But it has to be dealt with somewhere. Since the thing introducing the implementation detail is the CloudSQL Proxy and the sidecar pattern (we wouldn't have to wait for something to start up if we were connecting to the DB directly) to me it seems most natural to handle that implementation detail via the proxy as well instead of the application. But that's just my opinion.

@kurtisvg kurtisvg added the priority: p2 Moderately-important priority. Fix may not be included in next release. label Feb 28, 2022
@adriangb
Copy link
Author

Re-reading this I realized that by:

I'm not really sure how good of an idea it is depend on the synchronous order of containers though - feels a bit like an implementation detail.

You were talking about a different implementation detail than in my response.

I agree with you, that's an implementation detail of Kubernetes. But seeing that Istio uses it for exactly the same purpose, and that there is currently no better solution (at least that I know of) it seems like it's safe to rely on that until a more official pattern comes about.

@enocom enocom removed their assignment Mar 28, 2022
@enocom
Copy link
Member

enocom commented Mar 28, 2022

We're currently working to get v2 shipped. Once that's done, we'll revisit this and adjust the priority as needed.

@enocom
Copy link
Member

enocom commented Apr 7, 2022

Another option would be to use https://github.com/karlkfi/kubexit to support dependency ordering.

@alamothe
Copy link

alamothe commented Aug 4, 2022

For the record, buster image does contain sleep and I ended up using sleep 5. All other options seem overly complex.

@enocom
Copy link
Member

enocom commented Feb 1, 2023

We've recently taken a cue from Istio and added a /quitquitquit endpoint (see #1624).

If we wanted to avoid a solution that required curl, I guess we'd add a wait command that tried to connect to http://localhost:15021/healthz/ready or similar matching Istio again.

@benweint
Copy link

This would be really helpful for running tools that are CloudSQL-proxy unaware (for example, mysqldump) in Kubernetes with a CloudSQL proxy sidecar. In this case, I want to block mysqldump (or whatever other tool) from running until the proxy has gotten far enough along in its startup sequence that an attempt to use it won't fail due to lack of initialization.

For the case of mysqldump specifically, the official MySQL image has both bash and curl, so a curl-based solution would be OK. (FWIW, the official postgres image lacks curl, though.)

Is there even an existing HTTP endpoint exposed by the CloudSQL proxy today that would be usable for this purpose? The README section on the Localhost Admin Server doesn't mention one.

@enocom
Copy link
Member

enocom commented Jul 20, 2023

Do you think our HTTP healthchecks would fit the bill? We have startup, liveness, and readiness. In particular the startup probe might be the right one to try.

@benweint
Copy link

Do you think our HTTP healthchecks would fit the bill? We have startup, liveness, and readiness. In particular the startup probe might be the right one to try.

Ah, I had missed the docs for them! Yes, it does seem like these would probably work (at least if curl or similar is available in the application image).

@michaelbannister
Copy link

if curl or similar is available in the application image
This wouldn't be ideal; unless the proxy already needs this it'd be adding to the attack surface of the image & if a container was compromised could give an attacker a lot more capability than currently.

Istio images contain a pilot-agent binary which has specific commands including a {{wait}} command – internally this does just poll the readiness endpoint of the proxy (source) up until a (configurable) timeout, and on timeout it calls

This seems to work quite nicely…

@enocom
Copy link
Member

enocom commented Jul 25, 2023

That's a nice idea and pretty cheap to implement as well.

@enocom enocom added priority: p1 Important issue which blocks shipping the next release. Will be fixed prior to next release. and removed priority: p2 Moderately-important priority. Fix may not be included in next release. labels Jul 25, 2023
@Stono
Copy link

Stono commented Oct 24, 2023

👋 as a heavy cloudsql-proxy user (100's of proxied workloads in a high churn environment) i thought i'd share my piece.
We have a lifecycle.postStart hook which runs a wait-for-cloudsql-running.sh script which we bake into our cloudsql-proxy image. It's super crude, and just has a simple loop waiting for the port to be listening. It works, but it saddens me how our kubernetes estate is littered with these sorts of scripts to try and crudely implement ordering.

We also use istio, which as mentioned has the pilot-agent wait; which is much cleaner in our eyes (and would mean we don't need to bake our own image with scripts in it). So a similar pattern would be nice.

It's worth remembering with kubernetes 1.28 sidecar containers are being introduced which "solves" this problem officially.

@enocom
Copy link
Member

enocom commented Oct 24, 2023

Thanks @Stono -- even with the support for sidecars coming, I'd guess that a wait command will still be useful in other situations. We have a bunch of stuff higher on the priority list, but we'll try to squeeze this in soon.

@enocom
Copy link
Member

enocom commented Oct 27, 2023

For reference, this is a good overview of the post start hook approach: https://medium.com/@marko.luksa/delaying-application-start-until-sidecar-is-ready-2ec2d21a7b74

@GergelyKalmar
Copy link

I have a related problem: I'd like to use Docker Compose's health checks to bring up the proxy before starting dependent services, but I can't seem to be able to ping the endpoints from within the container image.

It would be awesome if there was a health check defined in the cloud-sql-proxy Dockerfile itself (see https://docs.docker.com/engine/reference/builder/#healthcheck).

@enocom
Copy link
Member

enocom commented Oct 30, 2023

@GergelyKalmar We'd be happy to investigate and support that (assuming no major technical issues). Would you mind opening a separate feature request for that work?

@ohaibbq
Copy link

ohaibbq commented Nov 16, 2023

I was able to implement the postSart hook as suggested above by using the alpine image with this script-

sidecar = k8s.V1Container(
    name="cloud-sql-proxy",
    image="gcr.io/cloud-sql-connectors/cloud-sql-proxy:2.7.2-alpine",
    # ....
    lifecycle=k8s.V1Lifecycle(
        post_start=k8s.V1LifecycleHandler(
            _exec=k8s.V1ExecAction(command=[
                "sh",
                "-c",
                f"""until [ "$(wget --server-response 'http://0.0.0.0:{CLOUD_SQL_HTTP_PORT}/startup' -O - 2>&1 | grep -c 'HTTP/1.1 200 OK')" -eq 1 ]; do
                  echo "Waiting for Cloud SQL Proxy to be ready";
                  sleep 10;
              done"""
            ])
        )
    )
)

enocom added a commit that referenced this issue Nov 21, 2023
To help ensure the Proxy is up and ready, this commit adds a wait
command with an optional `--max` flag to set the maximum time to wait.

By default when invoking this command:

./cloud-sql-proxy wait

The Proxy will wait up to the maximum time for the /startup endpoint to
respond. This command requires that the Proxy be started in another
process with the HTTP health check enabled. If an alternate health
check port or address is used, as in:

./cloud-sql-proxy <INSTANCE_CONNECTION_NAME> --http-address 0.0.0.0 \
  --http-port 9191

Then the wait command must also be told to use the same custom values:

./cloud-sql-proxy wait --http-address 0.0.0.0 \
  --http-port 9191

By default the wait command will wait 30 seconds. To alter this value,
use:

./cloud-sql-proxy wait --max 10s

Fixes #1117
enocom added a commit that referenced this issue Nov 21, 2023
To help ensure the Proxy is up and ready, this commit adds a wait
command with an optional `--max` flag to set the maximum time to wait.

By default when invoking this command:

./cloud-sql-proxy wait

The Proxy will wait up to the maximum time for the /startup endpoint to
respond. This command requires that the Proxy be started in another
process with the HTTP health check enabled. If an alternate health
check port or address is used, as in:

./cloud-sql-proxy <INSTANCE_CONNECTION_NAME> --http-address 0.0.0.0 \
  --http-port 9191

Then the wait command must also be told to use the same custom values:

./cloud-sql-proxy wait --http-address 0.0.0.0 \
  --http-port 9191

By default the wait command will wait 30 seconds. To alter this value,
use:

./cloud-sql-proxy wait --max 10s

Fixes #1117
enocom added a commit that referenced this issue Nov 21, 2023
To help ensure the Proxy is up and ready, this commit adds a wait
command with an optional `--max` flag to set the maximum time to wait.

By default when invoking this command:

./cloud-sql-proxy wait

The Proxy will wait up to the maximum time for the /startup endpoint to
respond. This command requires that the Proxy be started in another
process with the HTTP health check enabled. If an alternate health
check port or address is used, as in:

./cloud-sql-proxy <INSTANCE_CONNECTION_NAME> --http-address 0.0.0.0 \
  --http-port 9191

Then the wait command must also be told to use the same custom values:

./cloud-sql-proxy wait --http-address 0.0.0.0 \
  --http-port 9191

By default the wait command will wait 30 seconds. To alter this value,
use:

./cloud-sql-proxy wait --max 10s

Fixes #1117
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
priority: p1 Important issue which blocks shipping the next release. Will be fixed prior to next release. type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants