Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

no ingress, nodePort network access between two deployments running on the same provider #1

Closed
andy108369 opened this issue Jan 20, 2023 · 3 comments
Labels
P2 repo/provider Akash provider-services repo issues

Comments

@andy108369
Copy link
Contributor

It appears that Akash deployment network policy blocks the ingress & nodePort access between two deployments running on the same provider.

They are accessible when deployments running on different providers.

ingress & nodePorts (global: true) are expected to be open even when deployments are running on the same provider.

@andy108369 andy108369 added the repo/provider Akash provider-services repo issues label Jan 20, 2023
@andy108369
Copy link
Contributor Author

andy108369 commented Feb 17, 2023

Hairpinning might be a solution here as @chainzero mentioned.

Update:
Hairpin mode allows a Kubernetes pod to access a service through the node's external IP (or the NodePort) even when the traffic originates from within the same node. This is particularly useful when a pod needs to access a service exposed on the same node but through the external IP or NodePort. This is in addition to the netpol, of course.
We may need to look into this deeper.

Additional interesting point on what hairpin was meant to solve for K8s.
We should consider that the Linux Kernel's hairpin_mode was previously known to potentially cause kernel panics. This was a recognized issue at the time.

Couple of interesting refs on CNI plugin disabling the hairpin traffic:
projectcalico/calico#6186 (comment)
ray-project/ray#33030 (comment)
containernetworking/cni#476

Update 2:
In our case kube-proxy already enables hairpin:

# kubectl proxy
# curl -s -X GET http://127.0.0.1:8001/api/v1/nodes/node1/proxy/configz | jq . | grep -i hairp
    "hairpinMode": "promiscuous-bridge",

@andy108369
Copy link
Contributor Author

andy108369 commented Feb 28, 2023

ingress access

This network policy is likely what breaks the communication:

https://github.com/akash-network/provider/blob/v0.2.1/cluster/kube/builder/netpol.go#L123-L138

$ kubectl -n $ns get netpol -o yaml
...
...
    - to:
      - ipBlock:
          cidr: 0.0.0.0/0
          except:
          - 10.0.0.0/8
          - 192.168.0.0/16
          - 172.16.0.0/12

10.0.0.0/8 restriction overlaps kube_service_addresses: 10.233.0.0/18, kube_pods_subnet: 10.233.64.0/18, calico_pool_cidr: 10.233.64.0/20 - restricting the communication between different deployments.

$ ipcalc 10.0.0.0/8
Address:   10.0.0.0             00001010. 00000000.00000000.00000000
Netmask:   255.0.0.0 = 8        11111111. 00000000.00000000.00000000
Wildcard:  0.255.255.255        00000000. 11111111.11111111.11111111
=>
Network:   10.0.0.0/8           00001010. 00000000.00000000.00000000
HostMin:   10.0.0.1             00001010. 00000000.00000000.00000001
HostMax:   10.255.255.254       00001010. 11111111.11111111.11111110
Broadcast: 10.255.255.255       00001010. 11111111.11111111.11111111
Hosts/Net: 16777214              Class A, Private Internet
kubespray$ git grep -E '^kube_pods_subnet:|^kube_service_addresses:|^calico_pool_cidr:' | column -t | sort -d -k2,2
roles/kubespray-defaults/defaults/main.yaml:kube_service_addresses:              10.233.0.0/18
inventory/sample/group_vars/k8s_cluster/k8s-cluster.yml:kube_service_addresses:  10.233.0.0/18
roles/kubespray-defaults/defaults/main.yaml:kube_pods_subnet:                    10.233.64.0/18
inventory/sample/group_vars/k8s_cluster/k8s-cluster.yml:kube_pods_subnet:        10.233.64.0/18
docs/calico.md:calico_pool_cidr:                                                 10.233.64.0/20

Now that this netpol makes it secure when it comes to the pods deployed within different namespace (you don't want someone poking your internal app's ports from his deployment), the only feasible solution I see would be allowing the ingress communication by permitting the ingress-nginx:

    - to:
      - namespaceSelector:
          matchLabels:
            kubernetes.io/metadata.name: ingress-nginx
        podSelector: {}

Quick patch:

ns - is the namespace of the deployment you wish enable the access from to the ingress resources behind the ingress-nginx controller

kubectl -n $ns patch netpol akash-deployment-restrictions --type=json -p='[{"op": "add", "path": "/spec/egress/-", "value":{"to":[{"namespaceSelector":{"matchLabels":{"kubernetes.io/metadata.name":"ingress-nginx"}},"podSelector":{}}]}}]'

nodePort access restriction issue within the same cluster

When accessing a nodePort service via the public IP from within the same Kubernetes cluster, you may encounter connectivity issues.

What's Happening:

When a Pod attempts to access a nodePort service using the cluster's public IP, the traffic behaves as follows:

  • The traffic leaves the Pod and is routed to the public IP, which Kubernetes treats as external traffic.
  • The traffic then "turns around" at the node's network interface, rerouting back to the nodePort. Since the public IP is viewed as external, Kubernetes allows the traffic back into the node.
  • However, because the traffic originates from within the cluster, it retains its internal source IP (e.g., 10.x.x.x for Oblivus or 172.16.x.x for Valdi providers).

The ingress rule in the akash-deployment-restrictions policy blocks this returning traffic because it sees the source IP as part of the internal range, which is normally restricted to prevent security risks.

Example of the akash-deployment-restrictions ingress rule:

$ kubectl -n $ns get netpol akash-deployment-restrictions -o yaml 
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  generation: 1
  labels:
    akash.network: "true"
    akash.network/namespace: qczh7hdmtdtc3yugadabhv4tiv313fjam37f5mj6aaz4z
  name: akash-deployment-restrictions
  namespace: qczh7hdmtdtc3yugadabhv4tiv313fjam37f5mj6aaz4z
spec:
  egress:
  - to:
    - namespaceSelector:
        matchLabels:
          akash.network/namespace: qczh7hdmtdtc3yugadabhv4tiv313fjam37f5mj6aaz4z
  - ports:
    - port: 53
      protocol: UDP
    - port: 53
      protocol: TCP
    to:
    - namespaceSelector:
        matchLabels:
          kubernetes.io/metadata.name: kube-system
      podSelector:
        matchLabels:
          k8s-app: kube-dns
  - to:
    - ipBlock:
        cidr: 0.0.0.0/0
        except:
        - 10.0.0.0/8
        - 192.168.0.0/16
        - 172.16.0.0/12
  ingress:
  - from:
    - namespaceSelector:
        matchLabels:
          akash.network/namespace: qczh7hdmtdtc3yugadabhv4tiv313fjam37f5mj6aaz4z
  - from:
    - namespaceSelector:
        matchLabels:
          app.kubernetes.io/name: ingress-nginx
      podSelector:
        matchLabels:
          app.kubernetes.io/name: ingress-nginx
  podSelector: {}
  policyTypes:
  - Ingress
  - Egress

The akash-deployment-restrictions NetworkPolicy may block nodePort traffic due to an egress rule that prevents traffic to internal IP ranges. This rule can block traffic that "turns around" and reenters the cluster via a nodePort if the source IP falls within those internal ranges.

Potential Solution:

To enable nodePort access between two Akash deployments within the same provider, you would need to remove the line - 10.0.0.0/8 (or the one matches your range for your internal network you are using for K8s (check with kubectl get nodes -o wide or calicoctl get nodes -o wide command)) from the to.ipBlock.except block (which currently covers cidr: 0.0.0.0/0).

Community workaround

Someone is using the following network policy to get around this restriction on his provider by explicitly allowing his internal range for all deployments:

{{namespace}} is the user akash deployment namespace, he is using this as a template in a script that parses through the namespaces and ensures to update them with this network policy

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-from-10-0-0-0-8
  namespace: {{namespace}}
spec:
  podSelector: {}
  policyTypes:
    - Ingress
  ingress:
    - from:
        - ipBlock:
            cidr: 10.0.0.0/8

Extra: Asymmetric routing issue

NOTE: This isn't directly related to the akash-deployment-restrictions NetworkPolicy issue, but I wanted to mention it since I spent a considerable amount of time debugging it on one of the providers.

Another reason for this to not work even when network policy permits the traffic is asymmetric routing issue.

I verified this on Oblivus and confirmed that it's not the case. I tested it with this deployment: two Pods running on different nodes (see details here: https://gist.github.com/andy108369/2be9452dd0e9f2f292265bb335df2f58). I was able to access the nodePort both from the host and from a Pod in a different namespace. Additionally, there are no network policies, such as akash-deployment-restrictions mentioned in point 1, set between these namespaces.

However, this is not working on Valdi provider. Likely that they do not enable this support on their router.

You can view the recording of this here (6 minutes long):
asciicast

Accessing the nodePort from within the same Kubernetes cluster can be problematic. This is primarily because intra-cluster communication typically relies on the ClusterIP, not the NodePort. When we attempt to access a NodePort using the node's IP within the cluster, it may not work as expected. Kubernetes generally routes traffic within the cluster via the ClusterIP, which bypasses the NodePort entirely. As a result, using the node's IP or external IP for internal cluster communication might fail, depending on the Kubernetes networking setup. This issue usually occurs due to asymmetric routing — specifically when the K8s [Calico] internal network is configured on a network that isn't NATted for WAN access. This causes traffic to be routed through a different network, as demonstrated in the asciinema recording above.

Extra: Kubernetes can also struggle with hairpin NAT (loopback) scenarios, where a pod tries to access a service using the node's external IP. In these cases, the Pod might be unable to reach the NodePort service via the external IP, even though it’s accessible from outside the cluster. This depends on the router/network config (example https://stackoverflow.com/a/6059622 ).
But this is not our case as kube-proxy enables hairpin:

# kubectl proxy
# curl -s -X GET http://127.0.0.1:8001/api/v1/nodes/node1/proxy/configz | jq . | grep -i hairp
    "hairpinMode": "promiscuous-bridge",

To avoid these issues, it's best to ensure that all internal communication between pods uses the ClusterIP instead of the NodePort. However, if we need to test NodePort access within the cluster, we need to consider the following steps:

  • Ensure networking config (NAT/router) supports hairpinning. (Not our case)
  • Ensure that the NetworkPolicy settings permit traffic on the NodePort from internal sources, as described above.
  • Access the NodePort service from within the cluster using the node's internal IP rather than the external/public IP.
  • Redeploy Kubernetes over the internal network that is routed to the internet (WAN) to fix the asymmetric routing issue.

@andy108369
Copy link
Contributor Author

andy108369 commented Aug 20, 2024

I was unable to reproduce either the Ingress or nodePort issues when testing two separate SDLs running on different provider nodes. The tests included H100 and A100 Oblivus, as well as RTX4090 Evergreen providers.

The akash-deployment-restrictions network policy appears to be functioning correctly, except potentially when there is a specific network configuration issue on the provider's side, such as an asymmetric routing problem with the Valdi provider. Alternatively, the issues I encountered a year ago might have been due to bugs or configuration differences in older versions of Calico and Kubespray, as noted here.

For now, I will close this issue until it can be clearly reproduced under standard networking conditions without any special provider configurations that might contribute to the problem.

Regarding the nodePort issue previously encountered, I'm left wondering how changing the akash-deployment-restrictions resolved the problem at that time. It might have been related to a policy refresh triggered by updating the akash-deployment-restrictions network policy.

Alternatively, this issue could also be linked to the following Calico bug, which was fixed in version 3.27.0. The bug specifically addressed the nodePort service source NAT issue:

Fix nodeport service src NAT issue, we do need to src NAT nodeports when the service is not local

This fix is not included in Kubespray v2.24.2 but only gets introduced in Kubespray v2.25.0.


For the future reference

Next time test two different network layouts when looking at this issue / network policy:

  • public IP is set directly on the interface and K8s is configured over the internal network (either 10., 172.16. or 192.168.);
  • public IP is NAT'ed to same network the internal K8s network configured (either 10., 172.16. or 192.168.);

kube-proxy is configured in ipvs mode. (default in kubespray v2.24.2)
Calico is configured in VXLAN mode. (default kubespray v2.24.2 Calico config)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P2 repo/provider Akash provider-services repo issues
Projects
None yet
Development

No branches or pull requests

2 participants