-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
calico pod cross-node access failed with ipv6 #6877
Comments
@coutinhop Could you help? |
@cyclinder I'll try to reproduce this later to investigate, but could you share some more details on your setup? Is this in a cloud, or on-prem, how do your ippool and felixconfig yamls look? I assume the failing ping is to the ipv6 addresses, right? Does IPv4 ping work? (If you could share the yamls from your pods that would be great too) |
Thanks for look this @coutinhop @song-jiang ! I built a k8s cluster via [root@master ~]# kubectl version
WARNING: This version information is deprecated and will be replaced with the output from kubectl version --short. Use --output=yaml|json to get the full version.
Client Version: version.Info{Major:"1", Minor:"24", GitVersion:"v1.24.4", GitCommit:"95ee5ab382d64cfe6c28967f36b53970b8374491", GitTreeState:"clean", BuildDate:"2022-08-17T18:54:23Z", GoVersion:"go1.18.5", Compiler:"gc", Platform:"linux/amd64"}
Kustomize Version: v4.5.4
Server Version: version.Info{Major:"1", Minor:"24", GitVersion:"v1.24.4", GitCommit:"95ee5ab382d64cfe6c28967f36b53970b8374491", GitTreeState:"clean", BuildDate:"2022-08-17T18:47:37Z", GoVersion:"go1.18.5", Compiler:"gc", Platform:"linux/amd64"}
[root@master ~]# calicoctl version
Client Version: v3.23.3
Git commit: 3a3559be1
Cluster Version: v3.23.3
Cluster Type: kubespray,kubeadm,kdd calico ippool: [root@master ~]# calicoctl get ippools -o yaml
apiVersion: projectcalico.org/v3
items:
- apiVersion: projectcalico.org/v3
kind: IPPool
metadata:
creationTimestamp: "2022-09-26T03:42:51Z"
name: default-pool
resourceVersion: "689"
uid: ca248dec-8973-4a1c-8b6e-db2841cba1b4
spec:
allowedUses:
- Workload
- Tunnel
blockSize: 26
cidr: 10.233.64.0/18
ipipMode: Never
natOutgoing: true
nodeSelector: all()
vxlanMode: Always
- apiVersion: projectcalico.org/v3
kind: IPPool
metadata:
creationTimestamp: "2022-09-26T03:43:01Z"
name: default-pool-ipv6
resourceVersion: "554152"
uid: 4dad544a-b517-40db-a4ef-b7a12646ffb4
spec:
allowedUses:
- Workload
- Tunnel
blockSize: 122
cidr: fd85:ee78:d8a6:8607::1:0/112
ipipMode: Never
nodeSelector: all()
vxlanMode: CrossSubnet I found cross-node pod failed to ping6 each other, but ipv4 works.
I try to change ipv6 tunnel mode from [root@master ~]# calicoctl patch ippools default-pool-ipv6 -p '{"spec": {"vxlanMode": "Always"}}'
Successfully patched 1 'IPPool' resource [root@master ~]# kubectl exec -it test111-7c9f87b884-p5jkm sh
kubectl exec [POD] [COMMAND] is DEPRECATED and will be removed in a future version. Use kubectl exec [POD] -- [COMMAND] instead.
/ # ping 10.233.105.188
PING 10.233.105.188 (10.233.105.188): 56 data bytes
64 bytes from 10.233.105.188: seq=0 ttl=62 time=1.260 ms
^C
--- 10.233.105.188 ping statistics ---
1 packets transmitted, 1 packets received, 0% packet loss
round-trip min/avg/max = 1.260/1.260/1.260 ms
/ # ping fd85:ee78:d8a6:8607::1:ebbc
PING fd85:ee78:d8a6:8607::1:ebbc (fd85:ee78:d8a6:8607::1:ebbc): 56 data bytes
^C
--- fd85:ee78:d8a6:8607::1:ebbc ping statistics ---
12 packets transmitted, 0 packets received, 100% packet loss I try cross-node ping6 test pod and
stange🤔,I noticed that the ipv6 tunnel routing seems to be out of order:
calico-node logs are always complaining about failed to add route:
|
Finally, here are the environment variables for calico-node, which I don't see any problem:
|
friendly ping :) @coutinhop could you look on this? |
@cyclinder sorry for the delay, I've been a bit busy lately... I'll try to look at this as soon as possible, but as a quick check, could you see if enabling natOutgoing for the IPv6 pool has any effect? (maybe a long shot, but it was the only difference between v4 and v6 that I could spot on your configs) I'll try to reproduce it later this week and investigate |
oh..no sorry and thank you for reply! I try to patch IPv6 pool( change natOutgoing to true), but it still doesn't work.
I think the root tone is that the ipv6 routing is not working properly:
|
Hi @coutinhop , Any update here? If you have free time, please help me. thanks :) |
@cyclinder sorry for the delay, I'm still having trouble reproducing it, in addition to not being familiar with kubespray. By chance, I came accross this issue #6273 where they were having a similar problem ( Also, would you have more basic instructions on how to get a setup similar to yours on kubespray? Thanks! |
@coutinhop thanks your reply! kernel version is shown following:
I'm trying to figure out why adding tunnel routes fail. I added the tunnel route manually on the node and netlink returned the same error: "No route to host"
At first, I suspect a problem with the neighbor table of the vxlan tunnel interface. two entries appear here that are not expected
Manually delete the wrong one or restart calico-node, leaving only the correct one. and I trying to add the tunnel route again returns the same error.
This is strange, I tried the same operation above on a node with 5.15 kernel and it works:
Other info: Kernal 5.15:
Kernal 3.10:
I don't think there's anything special here. @coutinhop Can you take a look ? Looking forward to your reply! Thanks a lot! |
@coutinhop Hi~ can you please take a look? Thank you for your help. |
@cyclinder sorry I haven't been having a lot of time to put into this... Can you confirm that this is: not working for kernel 3.10, and working for kernel 5.15? And if that is true, would upgrading to 5.15 be an acceptable fix/workaround? |
No worry! @coutinhop
Yes, I think it's a workaround. But it's not the best way to fix it. We should figure out why vxlan-ipv6 for kernel 3.10 doesn't work. I suspect that the root cause of the problem is that kernel 3.10 does not have better support for ipv6 vxlan. |
I think my suspicions were correct, I looked through the source code of liunx and found: vxlan ipv6 is only supported from kernel 3.12 onwards. So we should specify the compatibility of vxlan-ipv6 with the kernel version in the documentation. |
Thanks @cyclinder for getting to the bottom of this! Indeed, the 'TODO' is there up until v3.11: Then it goes away on v3.12 onwards: I see you also pushed a docs PR, thanks! |
@cyclinder Do you have time to help check my issue?
my kernel version is shown above but I still meet the same problem with you
|
|
@meizhuhanxiang It looks like you are not using ipv6 vxlan mode, can you show the output of |
I am having the same problem as meizhuhanxiang. However I have ipv6 vxlan enabled, nat enabled, kernel version 5.10+. Any ideas. One thing I suspect is that autodetect from calico bound to ips of a bridge interface, but it works for IPv4 so I am not sure |
NAME CIDR NAT IPIPMODE VXLANMODE DISABLED DISABLEBGPEXPORT SELECTOR |
My issue still exists. However, I have another cluster where the Linux kernel is exactly the same as the machines in this cluster, and that cluster does not have this problem, so it should not be caused by the kernel version. |
Expected Behavior
calico pod cross-node access works with ipv6, Regardless of the tunnel mode( vxlan always or crosseSubnet or never)
Current Behavior
calico pod cross-node access failed with ipv6, Regardless of the tunnel mode( vxlan always or crosseSubnet or never)
Possible Solution
Steps to Reproduce (for bugs)
Context
Your Environment
The text was updated successfully, but these errors were encountered: