Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Offloaded HTTPS: Redirect HTTP port traffic to HTTPS #1811

Conversation

consideRatio
Copy link
Member

@consideRatio consideRatio commented Oct 6, 2020

Summary

In #1190 the question was raised if there was a way for this Helm chart to take care of HTTP -> HTTPS redirections when proxy.https.type=offload. I concluded that this was possible in theory, but would require some changes because currently we mash all incoming traffic on port 80 and 443 to port 8000 on the CHP pod, and then we can't rely on the ports to know how to redirect.

We don't have CI tests for this, and I have never worked with a proxy.https.type=offload deployment, so I'd really love for someone with experience with those to contribute with a review.

Closes #1190.

Exploratory notes

I'm not sure if we can do this redirect, all traffic in the application goes through the proxy-public k8s service, and if we have configured offloading the HTTPS part, then the proxy-public service will go to the pod named proxy and run the ConfigurableHTTPProxy (CHP) NodeJS server running outside and separate to the hub pod, so perhaps that can be configured to send out a 301 redirect on HTTP traffic.

In the CHP changelog i found indication of this would be possible.

In the CHP readme I found confirmation of this would be possible:

  --redirect-port <redirect-port>    Redirect HTTP requests on this port to the server on HTTPS
  --redirect-to <port>               Redirect HTTP requests from --redirect-port to this port
  --auto-rewrite                     Rewrite the Location header host/port in redirect responses
  --protocol-rewrite <proto>         Rewrite the Location header protocol in redirect responses to the specified protocol

In the configuration of the CHP pod I found the configuration was determined like this:

- name: chp
image: {{ .Values.proxy.chp.image.name }}:{{ .Values.proxy.chp.image.tag }}
command:
- configurable-http-proxy
- "--ip=::"
- "--api-ip=::"
- --api-port=8001
- --default-target=http://hub:$(HUB_SERVICE_PORT)
- --error-target=http://hub:$(HUB_SERVICE_PORT)/hub/error
{{- if $manualHTTPS }}
- --port=8443
- --redirect-port=8000
- --redirect-to=443
- --ssl-key=/etc/chp/tls/tls.key
- --ssl-cert=/etc/chp/tls/tls.crt
{{- else if $manualHTTPSwithsecret }}
- --port=8443
- --redirect-port=8000
- --redirect-to=443
- --ssl-key=/etc/chp/tls/{{ .Values.proxy.https.secret.key }}
- --ssl-cert=/etc/chp/tls/{{ .Values.proxy.https.secret.crt }}
{{- else }}
- --port=8000
{{- end }}
{{- if .Values.debug.enabled }}
- --log-level=debug
{{- end }}

But there is a crux now that I think of it, and that is that if all traffic arriving to the CHP server is either originally HTTP, or originally HTTPS decrypted and now HTTP, this still requires the traffic to arrive to different ports, because if not, this logic would just redirect even the HTTPS traffic that has been redirected already.

Networking investigation

How does traffic flow to the CHP server within Kubernetes?

We need to consider the proxy-public Service, and the Pod, assuming proxy.https.type=offload.

  1. the Service's (redirecting traffic to the pod) port and targetPort redirections
  2. the Pod's container's port definitions, describing named Pod->Container port redirection

1 - proxy-public Service -> CHP Pod

ports:
{{- if $HTTPS }}
- name: https
port: 443
# When HTTPS termination is handled outside our helm chart, pass traffic
# coming in via this Service's port 443 to targeted pod's port meant for
# HTTP traffic.
{{- if $offloadHTTPS }}
targetPort: http
{{- else }}
targetPort: https
{{- end }}
{{- with .Values.proxy.service.nodePorts.https }}
nodePort: {{ . }}
{{- end }}
{{- end }}
- name: http
port: 80
targetPort: http
{{- with .Values.proxy.service.nodePorts.http }}
nodePort: {{ . }}
{{- end }}

2 - CHP Pod -> CHP container

ports:
{{- if or $manualHTTPS $manualHTTPSwithsecret }}
- name: https
containerPort: 8443
{{- end }}
- name: http
containerPort: 8000
- name: api
containerPort: 8001

Conclusion

Traffic arriving to both the entrypoint of the Helm chart, the proxy-public service, at either the 443 port and 80 port, when the chart is configured with proxy.https.type=offload is redirecting traffic to the port name http, which is port 8000. Due to this, we cannot configure CHP with flags to redirect HTTP to HTTPS, because all traffic is the same port-wise.

Solution idea - PR needed

The gist is to ensure, when proxy.https.type=offload, that we have a network route of traffic that is decrypted HTTPS separate from a network route that is original HTTP traffic, where the decrypted HTTPS traffic flow under the 443 or 8443 ports, and the original HTTP traffic flow under the 80 or 8000 ports.

  • To open the https named port 8443 on the CHP pod.
  • To not redirect both https port traffic and http port traffic to the same CHP pod port in the proxy-public service, but let https/443 map to https/8443, and http/80 map to http/8000.
  • To configure CHP using command line flags to redirect port http/8000 to a location which is the same, but with the HTTPS scheme.

This change is a bit sensitive, because we don't have a CI system to verify we don't break functionality for those with proxy.https.type=offload configurations. I think the change makes a lot of sense, but since its theory all the way rather than based on practical experience if I would create this PR, it would be really good to have feedback from those with offload deployments already running.

Review considerations

Assuming proxy.https.type=offload, what happens if...

  • Another pod sends traffic to the internal service IP and $PROXY_PUBLIC_SERVICE_PORT? I think nothing would change, instead of traffic going through the k8s Service port 443 to the Pods port 8000 it goes to the Pods port 8443.
  • Another pod sends traffic to the internal service IP and $PROXY_PUBLIC_SERVICE_HTTP_PORT? I think this will mean a redirect is added, and assuming the sender tried to send HTTPS traffic it will fail because the actual TLS termination is further out, so unless CHP is configured to know its public URL, it won't be able to redirect to itself properly when working internally. I think this is okay.
  • What was the original intent of having a proxy.https.type=offload configuration? What does it do? I think the answer is that it simply accepts HTTP traffic on another port in the Service (443). But, what was that meant to accomplish? Is it a workaround for logic outside the Helm chart, or for logic within the Helm chart?

{{- else if $offloadHTTPS }}
- --port=8443
- --redirect-port=8000
- --redirect-to=443
Copy link
Member Author

@consideRatio consideRatio Oct 6, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What flags should be used here? I figure the goal is really to send back a redirect with a Location header (where someone is redirected) being the same as the request originally but with a https:// scheme.

Flags at our disposal according to the CHP readme

  --redirect-port <redirect-port>    Redirect HTTP requests on this port to the server on HTTPS
  --redirect-to <port>               Redirect HTTP requests from --redirect-port to this port
  --auto-rewrite                     Rewrite the Location header host/port in redirect responses
  --protocol-rewrite <proto>         Rewrite the Location header protocol in redirect responses to the specified protocol

@manics
Copy link
Member

manics commented Oct 6, 2020

I don't have any experience with Z2JH's built-in https, I've always used a cluster ingress controller configured with HTTPS.

https://zero-to-jupyterhub.readthedocs.io/en/latest/administrator/security.html#off-loading-ssl-to-a-load-balancer
How does proxy.https.type: offload differ from proxy.https.enabled: false with https enforced at an ingress?

@consideRatio
Copy link
Member Author

consideRatio commented Oct 6, 2020

@manics thank you for your time considering this! ❤️ I'm generally in need of review atm so I love it! Thank you!

@manics that is a very very good question that I've forgot to raise, but I remember having raised this before and become quite frustrated with the situation. I believe, that the reason is specifically to be prepared for already decrypted HTTPS traffic on 443, as that is the key logic I see implemented in the chart. To me, it sounds like a workaround that didn't belong in the chart, hmm...

@manics
Copy link
Member

manics commented Oct 6, 2020

If unencrypted traffic is being sent from the external load balancer to port (8)443 on the proxy that sounds like a bug in one of the components rather than something the helm chart should have to work around. Does anyone have a reproducible example?

@pcfens
Copy link
Contributor

pcfens commented Oct 17, 2020

Hi Everyone I originally wrote the part for proxy.https.type: offload, so hopefully I can help out a little here. It's really only useful when you're off-loading SSL termination to a Kuberenetes managed load balancer managed by the service object (most common on public clouds).

To listen on port 443 using a Kubernetes managed cloud provider's load balancer we have to define the port in the service resource, but disabling https with (proxy.https.enabled: false) blocks it from being created. It's worth noting that port 80 is always define and the service type of LoadBalancer is the default value.

Setting proxy.https.type: offload creates listeners on ports 80 and 443, but sets the backend to an unencrypted port. It's up to the annotations to configure the load balancer with the right configurations.

@manics - When you're using an ingress controller offloading and disabled are essentially the same thing. Rather than having the service object create and set things up you've setup an HTTPS listener and attached the certificate in your ingress controller configuration (my production install uses Traefik as an ingress controller and we do exactly this).

We redirect HTTP to HTTPS in our ingress controller because we found it easier than modifying JupyterHub. Another added benefit of using an ingress controller is that we can run multiple applications through a single load balancer and save some money (we host multiple JuptyerHub instances that way too).

@consideRatio
Copy link
Member Author

consideRatio commented Oct 17, 2020

Thank you for helping out @pcfens! ❤️

Currently I don't manage to follow what you are saying @pcfens =/

when you're off-loading SSL termination to a Kuberenetes managed load balancer managed by the service object

  • "Kubernetes managed load balancer": does this part mean to say that there are pods in the k8s cluster that terminates TLS and routes traffic?
  • Kuberenetes managed load balancer managed by the service object: does this part mean to say the k8s Service resource and its annotations are hints for the TLS terminating pods on how to act etc?

Oh yikes... I just don't understand anything I feel... I give up for now as I want to head for bed, but I hope to raise this question given that I fail to answer it myself: Is this Helm chart still required to have this kind of k8s Service logic that points two separate service ports to a single pod port in order to support k8s clusters on AWS to use AWS systems to manage TLS termination?

@pcfens
Copy link
Contributor

pcfens commented Oct 17, 2020

Let me try and clarify a little bit. I say Kubernetes managed load balancer to differentiate from a load balancer that you configure yourself and point at a Kubernetes cluster. In this case it's any Service object/resource where spec.type is set to LoadBalancer.

Annotations are used to attach metadata to objects. Since the Service object doesn't have any Amazon (or other cloud service) specific fields where you'd set things like SSL termination, logging, etc, the authors of most integrations choose to use annotations for the extra config.

The pattern you describe where there are pods that terminate SSL and route traffic is what Ingress controllers do when combined with the Ingress resource.

The current service logic (or perhaps some simplified version of it) is required to support SSL termination on any cloud load balancer without an Ingress controller, not just on Amazon.

@consideRatio
Copy link
Member Author

consideRatio commented Oct 18, 2020

Thank you for taking the time to help me understand @pcfens, I'm not really understanding still, but closer.

I need to process this further, and I'll write some notes in this comment while I'm doing it.

Relevant links

Terminology

  • ALB - application load balancer

Rubber duck

So, the traffic arrives to a LB outside the k8s cluster, which is setup as triggered by the k8s Service of type: LoadBalancer because there is a External Load Balancer Provider. The general part of a k8s Service of a type LoadBalancer means that there will be nodePorts created to reach the pods etc, but otherwise its also a standard ClusterIP service. The key here, is that there is some gluing between the ELB functionality which is k8s unaware to the pods, and that glue interprets the k8s Service resource and its annotations.

I think the k8s external LB has what AWS calls listeners. I see from an example that these can actually listen on 443, terminate TLS, and then send decrypted traffic to port 80. So - my confusion here is why we need port 443 configured on the k8s Service as well as port 80, and not only port 80. But, at the same time, all examples indicate this so far, and I've not seen any annotation that describe something like this for AWS, allowing for example a k8s Service to only have a port: 80, and then have an annotation describing that we want the frontend port to be 443, but map to the k8s service port 80.

aws elb create-load-balancer-listeners --load-balancer-name my-load-balancer --listeners Protocol=HTTPS,LoadBalancerPort=443,InstanceProtocol=HTTP,InstancePort=80,SSLCertificateId=ARN

Another question in my mind is that if we have a AWS LB configured through a k8s Service like this, does that mean that there will be e2e HTTP traffic arriving at port 80, and HTTPS->HTTP traffic arriving at port 443?

Arrrgh... Ah well.. Off to bed now.

@manics
Copy link
Member

manics commented Oct 18, 2020

@pcfens Thanks for your additional explanation. Could you perhaps sketch out a diagram of encrypted and unencrypted data flows between the load-balancer, CHP, and JupyterHub, including the relevant ports on each component? I think that would help us understand what the requirements are when https: offload.

As you can tell there's some confusion here, and it'd be good to have a clear explanation we can use as a reference for future maintainers.

@pcfens
Copy link
Contributor

pcfens commented Oct 18, 2020

The link between the CHP and the JuptyerHub service always runs over http, so I'm going to ignore everything beyond the proxy.

Let's start with what things look like when proxy.service.type is set to LoadBalancer (the default). When you're running in a public cloud, Kubernetes creates and configures a load balancer for you, so we need to define every port that we want to listen on.

When HTTPS is enabled and you're terminating SSL on the CHP (not offloaded), we end up with things looking like this.

     80                 443
+--------------------------------+
|                                |
|       Cloud Load Balancer      |
|                                |
+-----+-------------------+------+
      |                   |
      |                   |
     http               https
+--------------------------------+
|                                |
|     Configurable HTTP Proxy    |
|                                |
+--------------------------------+

Setting proxy.https.type: offload creates a setup that looks like this. The chart assumes that the user will set proxy.service.annotations to values that work in their environment.

     80                 443
+--------------------------------+
|                                |
|       Cloud Load Balancer      |
|                                |
+--------------+-----------------+
               |
               |
             http
+--------------------------------+
|                                |
|     Configurable HTTP Proxy    |
|                                |
+--------------------------------+

In this case, the CHP can't tell whether the client is connected to the load balancer using HTTP or HTTPS without using headers, which the CHP can't do today.

To redirect HTTP to HTTPS when we offload SSL termination we need to modify the CHP to look at headers. The options used in this PR (redirect-port and redirect-to) work based on ports, so it should work fine in cases where we don't offload SSL.

@consideRatio - Can you help us understand your use case a little bit better? If we have a better idea of what you're trying to do I might be able to help with a patch.

@manics
Copy link
Member

manics commented Oct 19, 2020

Thanks for the diagram!

In this case, the CHP can't tell whether the client is connected to the load balancer using HTTP or HTTPS without using headers, which the CHP can't do today.

Is this something we could fix in CHP and avoid the complexity of the Helm chart workaround? CHP has some support for X-Forwarded headers, I've got an open PR to modify the behaviour slightly due to how how Tornado (and therefore JupyterHub) handle the header: jupyterhub/configurable-http-proxy#267

@pcfens
Copy link
Contributor

pcfens commented Oct 19, 2020

Adding the ability to handle X-Forwarded-Port and/or X-Forwarded-Proto to CHP would be a good addition, but it won't remove the need for the ability to offload SSL termination. It does however make this PR usable with only minor changes.

@manics
Copy link
Member

manics commented Oct 19, 2020

#1813 adds the ability to specify additional CHP arguments, which I think could cover everything this PR does and more? If we can't figure out what the correct behaviour should be here this could be an alternative?

@pcfens
Copy link
Contributor

pcfens commented Oct 19, 2020

Thanks for pointing that one out - I didn't realize there was an option to do that already. #1813 exposes the options here and more.

Since redirecting to HTTPS should be a common use case we might want to add some documentation around the redirect flags to the HTTPS section.

@consideRatio
Copy link
Member Author

Ah hmmm, my interpretation is that the Cloud Load Balancer is network infrastructure external to k8s which maps into k8s through a k8s Service with NodePort and ClusterIP configured automatically to work well with the externally setup LoadBalancer.

Interpretation evidence 1

image
Source: https://kubernetes.io/docs/tasks/access-application-cluster/create-external-load-balancer/

Interpretation evidence 2

image
Source: https://kubernetes.io/docs/tasks/access-application-cluster/create-external-load-balancer/#external-load-balancer-providers

Interpretation overview

This is how I think of it.

1: AWS LB listens to what the k8s service listens to it seems, which
   is weird because if the External Cloud LB is meant to Terminate TLS
   then the k8s Service should not want to receive any HTTPS traffic,
   and only decrypted HTTPS traffic on a single port in my mind.

     80                 443
      |                  |
+--------------------------------+
|                                |
|       Cloud Load Balancer      |
|                                |
+-----+-------------------+------+
      |                  |
2. Automatic node ports are configured on the k8s Service by the
   External load balancer provider, which observes the k8s service
   resources to automate the creation of a cloud based external load
   balancer to send traffic onwards to the k8s Service through node ports.
   The logic with AWS external LB seems to be to map the port it listens to
   directly to the k8s service ports, without mapping decrypted HTTPS traffic
   on port 443 to port 80 or similarly, which makes this have two entries
      |                  |
     80                 443
      |                  |
+--------------------------------+
|                                |
|          K8s Service           |
|                                |
+-----+-------------------+------+
      |                  |
3. We end up forced to have two ports defined on the k8s Service, which
   both map to 8000, and need to manage the port 80 traffic on the external
   LoadBalancer as well even though we probably want this traffic to not even
   enter but instead be directly redirected at the external load balancer to https.

     8000              8000
      |                  |
+--------------------------------+
|                                |
|  Configurable HTTP Proxy Pod   |
|                                |
+--------------------------------+

Conclusion

I think AWS external load balancer doesn't provide sufficient confiugration options to avoid being forced to map both port 80 and 443 of the k8s proxy-public service to the proxy pod, and that we are forced to support having the proxy.https.type=offload setting due to this.

This seems like a shortcoming of AWS LB, and I'm not sure if other cloud provides have such limitation as well or if they can be configured to only pass onward decrypted HTTPS traffic by doing something like i describe below, which I think is a better configuration.

My wish for a external cloud LB

1: An external LB is cofigured through k8s Service annotations to automatically redirect
   HTTP request on port 80 to HTTPS request on port 443.

 80 (*instant redirect)  443
  |*                      |
+--------------------------------+
|                                |
|       Cloud Load Balancer      |
|                                |
+-----+-------------------+------+
                          |
     80 (HTTPS that has become HTTP go to port 80)
      |
+--------------------------------+
|                                |
|          K8s Service           |
|                                |
+-----+-------------------+------+
      |
     8000 (We remain blissfully unaware that HTTPS has been used)
      |
+--------------------------------+
|                                |
|  Configurable HTTP Proxy Pod   |
|                                |
+--------------------------------+

@consideRatio consideRatio marked this pull request as draft October 22, 2020 23:20
@consideRatio
Copy link
Member Author

We have discussed why we have a HTTP and HTTPS on the k8s Service when proxy.https.type=offload is configured, as an external load balancer could reasonably just accept two ports and send the decrypted TLS traffic to port 80, but the AWS external LB doesn't do that.

I think the redirect this PR suggest though, may be problematic.

The issue is that the TLS termination is done outside of the k8s cluster, and if we redirect to https://domain-unchanged:443 then a k8s local request will be wrong, so anyone wanting to sent HTTP directly to proxy-public, then they need to send traffic to http://proxy-public:443 which is quite weird.

I think this PR make some sense, but I'm open to closing it unless someone can verify it works properly and champions it. I mostly opened it to close #1190.

@consideRatio
Copy link
Member Author

Closing this PR as I don't see it getting merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

proxy.https.type=offload: Can CHP redirect requests originally being HTTP requests to HTTPS?
3 participants