Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Queue Proxy health checks incompatible with non-HTTP/2 applications #15432

Open
braunsonm opened this issue Jul 31, 2024 · 14 comments · May be fixed by #15436
Open

Queue Proxy health checks incompatible with non-HTTP/2 applications #15432

braunsonm opened this issue Jul 31, 2024 · 14 comments · May be fixed by #15436
Labels
area/networking kind/bug Categorizes issue or PR as related to a bug.

Comments

@braunsonm
Copy link

braunsonm commented Jul 31, 2024

/area networking

What version of Knative?

1.15.0

Expected Behavior

Legacy applications may have undefined behavior when HTTP/2 upgrade requests are made. Knative should gracefully handle those errors and downgrade the health check attempt to HTTP/1 or HTTP/1.1.

Actual Behavior

Applications which do not support HTTP/2 will not handle the upgrade request properly. In our case, a legacy application returns a 500 when OPTIONS are sent to upgrade the connection. Knative fails the entire healthcheck because of this, even if the same check over HTTP/1 or HTTP/1.1 will properly return a 200.

Steps to Reproduce the Problem

  1. Create an application which does not support HTTP/2 or returns a 500 on the OPTIONS request
  2. Notice that Knative will start failing the health checks and the pod will be killed

Additional Context

It is not within the Kubernetes spec that an application must support HTTP/2 or that it should expect an OPTIONS call to its health/liveness probes. Only GET is part of the contract, which the Queue Proxy does not follow.

I believe the logic is flawed in the queue proxy's HTTP probes here.

return maxProto, fmt.Errorf("HTTP probe did not respond Ready, got status code: %d", res.StatusCode)

When an error occurs during the upgrade, maxProto should be set to 1 and Knative should stop trying to make HTTP/2 requests. Currently because of this line, HTTP/2 will be retried indefinitely and HTTP/1 will never be attempted.

@braunsonm braunsonm added the kind/bug Categorizes issue or PR as related to a bug. label Jul 31, 2024
@braunsonm braunsonm changed the title Queue Proxy does not gracefully handle applications which do not support HTTP/2 Queue Proxy health checks incompatible with anything but HTTP/2 Jul 31, 2024
@braunsonm braunsonm changed the title Queue Proxy health checks incompatible with anything but HTTP/2 Queue Proxy health checks incompatible with non-HTTP/2 applications Jul 31, 2024
@dprotaso
Copy link
Member

I'm confused what's making HTTP2 requests? Knative healthchecks are HTTP/1

@braunsonm
Copy link
Author

braunsonm commented Jul 31, 2024

@dprotaso I can see requests being made from the queue-proxy to the user-container and attempting to upgrade to HTTP/2 during the readiness probes.

And the code I linked above I believe is the logic for the queue-proxy to perform the HTTP/2 upgrade for these probes. This happens when the feature gate for auto-detecting HTTP2 is set to true

@dprotaso
Copy link
Member

oh interesting - i didn't realize this was added. h2c upgrade is deprecated https://datatracker.ietf.org/doc/html/rfc9113#section-3.1

We should probably just always be doing HTTP/1 unless the user has specified h2c OR we change the detection to use h2c prior knowledge

@dprotaso
Copy link
Member

You don't have an example app where this breaks?

@braunsonm
Copy link
Author

braunsonm commented Jul 31, 2024

I agree that probes should have always been HTTP/1 to match what would be expected from Kubernetes. But if you want this to remain so you can tell if an app supports HTTP/2 or not, then I would suggest at least gracefully failing if the HTTP/2 check fails (fallback to HTTP/1).

Unfortunately I don't have a sample that I could share, but I think it should be reproducible if you just had an app that throws a 500 whenever an OPTIONS request is made (ie, the upgrade request)

@skonto
Copy link
Contributor

skonto commented Aug 1, 2024

Hi @braunsonm, thanks for reporting this.

This happens when the feature gate for auto-detecting HTTP2 is set to true

Would it work if you turn this off for now or is this something that fails in other scenarios?

@braunsonm
Copy link
Author

Would it work if you turn this off for now or is this something that fails in other scenarios?

It does work if it is set to false, but that does mean other applications deployed on Knative can no longer benefit from HTTP/2 which is unfortunate.

@skonto
Copy link
Contributor

skonto commented Aug 1, 2024

but that does mean other applications deployed on Knative can no longer benefit from HTTP/2 which is unfortunate.

That autodetect feature was never completed. So if the app is using http2 you mean that QP is not going to use it with autodetect= off? What do you mean apps on Knative cant benefit from HTTP/2, could you elaborate?

@braunsonm
Copy link
Author

What do you mean apps on Knative cant benefit from HTTP/2, could you elaborate?

I was under the impression that autodetecting HTTP2 feature was required for HTTP2 to be used between the activator and ksvc's. Is that not true?

@skonto
Copy link
Contributor

skonto commented Aug 1, 2024

This is has to do with probes here. We do support http2 without setting that auto-detect property which btw is not done as a feature (check our grpc tests for example). Also see here on what happens when you turn that on: https://github.com/knative/serving/blob/main/pkg/queue/readiness/probe.go#L233-L242.
We only try the upgrade if maxProto = 0 see https://github.com/knative/serving/blob/main/pkg/queue/health/probe.go#L163
cc @dprotaso if has more to add for the background info of this feature

@dprotaso
Copy link
Member

dprotaso commented Aug 1, 2024

Right now to support HTTP2 requires people to set the containerPort name to be h2c.

apiVersion: serving.knative.dev/v1
kind: Service
metadata:
  name: grpc-ping
  namespace: default
spec:
  template:
    spec:
      containers:
      - image: docker.io/{username}/grpc-ping-go
        ports:
          - name: h2c
            containerPort: 8080

The feature has an issue here #4283 - the idea is to detect the protocol without the labelling

@braunsonm
Copy link
Author

I see. We use func which doesn't support naming the port so that's why the autodetection was going to be required for us.

@skonto
Copy link
Contributor

skonto commented Oct 24, 2024

@braunsonm is this something functions could help with instead? Do you mind opening an issue there too?

@braunsonm
Copy link
Author

No it is not. @skonto this is broken in functions because of the flawed implementation in serving.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/networking kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants