Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pod cross-node access failed with ipv6 #7251

Closed
blue-troy opened this issue Jan 30, 2023 · 11 comments
Closed

pod cross-node access failed with ipv6 #7251

blue-troy opened this issue Jan 30, 2023 · 11 comments

Comments

@blue-troy
Copy link
Contributor

blue-troy commented Jan 30, 2023

Expected Behavior

pod cross-node access works with ipv6 in vxlan mode when kernel >= 3.12

Current Behavior

pod cross-node access do not works with ipv6 in centos7.9 with kernel 4.18.0-1.el7.elrepo.x86_64

Possible Solution

Steps to Reproduce (for bugs)

  1. install k8s v1.24.3 by kubeadm in centos7.9
  2. upgrade kernel to 4.18.0-1.el7.elrepo.x86_64
  3. install calico v3.25.0 in vxlan mode
  4. pod cross-node can not access with ipv6

this is my calico config yaml
calico.txt

Context

calico node logs shows:

2023-01-30 09:06:07.469 [WARNING][56] felix/route_table.go 640: Failed to sync routes to interface even after retries. Leaving it dirty, requiring a full sync. ifaceName="vxlan-v6.calico" ifaceRegex="^vxlan-v6.calico$" ipVersion=0x6 tableIndex=0
2023-01-30 09:06:07.469 [WARNING][56] felix/route_table.go 653: Some interfaces still out-of sync. ifaceRegex="^vxlan-v6.calico$" ipVersion=0x6 tableIndex=0
2023-01-30 09:06:07.469 [WARNING][56] felix/int_dataplane.go 2009: Failed to synchronize routing table, will retry...
2023-01-30 09:06:07.571 [INFO][56] felix/route_table.go 950: Deleting from expected targets cidr=2001:db8:42:ca:fe3b:9f62:b140:2180/122 ifaceName="vxlan-v6.calico" ifaceRegex="^vxlan-v6.calico$" ipVersion=0x6 tableIndex=0
2023-01-30 09:06:07.571 [INFO][56] felix/route_table.go 956: No pending target update, adding back in as an update cidr=2001:db8:42:ca:fe3b:9f62:b140:2180/122 ifaceName="vxlan-v6.calico" ifaceRegex="^vxlan-v6.calico$" ipVersion=0x6 tableIndex=0
2023-01-30 09:06:07.572 [INFO][56] felix/route_table.go 950: Deleting from expected targets cidr=2001:db8:42:ca:fe3b:9f62:b140:2180/122 ifaceName="vxlan-v6.calico" ifaceRegex="^vxlan-v6.calico$" ipVersion=0x6 tableIndex=0
2023-01-30 09:06:07.572 [INFO][56] felix/route_table.go 956: No pending target update, adding back in as an update cidr=2001:db8:42:ca:fe3b:9f62:b140:2180/122 ifaceName="vxlan-v6.calico" ifaceRegex="^vxlan-v6.calico$" ipVersion=0x6 tableIndex=0
2023-01-30 09:06:07.572 [WARNING][56] felix/route_table.go 757: Failed to add route error=invalid argument ifaceName="vxlan-v6.calico" ifaceRegex="^vxlan-v6.calico$" ipVersion=0x6 route={Ifindex: 12 Dst: 2001:db8:42:ca:fe3b:9f62:b140:2180/122 Src: <nil> Gw: 2001:db8:42:ca:fe3b:9f62:b140:2180 Flags: [onlink] Table: 0 Realm: 0} tableIndex=0
2023-01-30 09:06:07.572 [WARNING][56] felix/route_table.go 1227: Failed to access interface but it appears to be up error=netlink update operation failed ifaceName="vxlan-v6.calico" ifaceRegex="^vxlan-v6.calico$" ipVersion=0x6 link=&netlink.Vxlan{LinkAttrs:netlink.LinkAttrs{Index:12, MTU:1430, TxQLen:0, Name:"vxlan-v6.calico", HardwareAddr:net.HardwareAddr{0x66, 0xa9, 0x7b, 0xb6, 0x3d, 0x83}, Flags:0x13, RawFlags:0x11043, ParentIndex:0, MasterIndex:0, Namespace:interface {}(nil), Alias:"", Statistics:(*netlink.LinkStatistics)(0xc00097c6c0), Promisc:0, Allmulti:0, Multi:1, Xdp:(*netlink.LinkXdp)(0xc0008aac30), EncapType:"ether", Protinfo:(*netlink.Protinfo)(nil), OperState:0x0, PhysSwitchID:0, NetNsID:-1, NumTxQueues:1, NumRxQueues:1, GSOMaxSize:0xf53c, GSOMaxSegs:0xffff, GROMaxSize:0x0, Vfs:[]netlink.VfInfo(nil), Group:0x0, Slave:netlink.LinkSlave(nil)}, VxlanId:4096, VtepDevIndex:2, SrcAddr:net.IP{0x24, 0x7, 0xc0, 0x80, 0x8, 0x2, 0x11, 0xcd, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x4}, Group:net.IP(nil), TTL:0, TOS:0, Learning:false, Proxy:false, RSC:false, L2miss:false, L3miss:false, UDPCSum:true, UDP6ZeroCSumTx:false, UDP6ZeroCSumRx:false, NoAge:false, GBP:false, FlowBased:false, Age:300, Limit:0, Port:4789, PortLow:0, PortHigh:0} tableIndex=0
2023-01-30 09:06:07.572 [WARNING][56] felix/route_table.go 640: Failed to sync routes to interface even after retries. Leaving it dirty, requiring a full sync. ifaceName="vxlan-v6.calico" ifaceRegex="^vxlan-v6.calico$" ipVersion=0x6 tableIndex=0
2023-01-30 09:06:07.572 [WARNING][56] felix/route_table.go 653: Some interfaces still out-of sync. ifaceRegex="^vxlan-v6.calico$" ipVersion=0x6 tableIndex=0
2023-01-30 09:06:07.572 [WARNING][56] felix/int_dataplane.go 2009: Failed to synchronize routing table, will retry...

this is fdb:

$ bridge fdb show dev vxlan-v6.calico
66:be:ee:ea:d0:7e dst 2407:c080:802:11cd::5 self permanent

neighbor:

$ ip neigh | grep vxlan-v6.calico
2001:db8:42:ca:fe3b:9f62:b140:2180 dev vxlan-v6.calico lladdr 66:be:ee:ea:d0:7e PERMANENT

vxlan device:

$ ip neigh | grep vxlan-v6.calico
12: vxlan-v6.calico: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1430 qdisc noqueue state UNKNOWN mode DEFAULT group default
    link/ether 66:a9:7b:b6:3d:83 brd ff:ff:ff:ff:ff:ff promiscuity 0
    vxlan id 4096 local 2407:c080:802:11cd::4 dev eth0 srcport 0 0 dstport 4789 nolearning ageing 300 udpcsum noudp6zerocsumtx noudp6zerocsumrx addrgenmode eui64 numtxqueues 1 numrxqueues 1 gso_max_size 62780 gso_max_segs 65535

ipv6 route tables which lack for vxlan-v6.calico route :

ip -6 r
unreachable ::/96 dev lo metric 1024 pref medium
unreachable ::ffff:0.0.0.0/96 dev lo metric 1024 pref medium
2001:db8:42:99:c534:b446:c218:74c6 dev calic956bfc5072 metric 1024 pref medium
blackhole 2001:db8:42:99:c534:b446:c218:74c0/122 dev lo proto 80 metric 1024 pref medium
unreachable 2002:a00::/24 dev lo metric 1024 pref medium
unreachable 2002:7f00::/24 dev lo metric 1024 pref medium
unreachable 2002:a9fe::/32 dev lo metric 1024 pref medium
unreachable 2002:ac10::/28 dev lo metric 1024 pref medium
unreachable 2002:c0a8::/32 dev lo metric 1024 pref medium
unreachable 2002:e000::/19 dev lo metric 1024 pref medium
2407:c080:802:11cd::/64 dev eth0 proto kernel metric 100 pref medium
unreachable 3ffe:ffff::/32 dev lo metric 1024 pref medium
fd9a:27c7:2272::/64 dev eth0 proto ra metric 100 pref medium
fd9a:27c7:2272::/48 via fe80::215:5dff:fe04:d50a dev eth0 proto ra metric 100 pref medium
fe80::/64 dev eth0 proto kernel metric 100 pref medium
fe80::/64 dev br-6e60a6cf1221 proto kernel metric 256 pref medium
fe80::/64 dev vetha513425 proto kernel metric 256 pref medium
fe80::/64 dev veth09a55be proto kernel metric 256 pref medium
fe80::/64 dev vxlan.calico proto kernel metric 256 pref medium
fe80::/64 dev calic956bfc5072 proto kernel metric 256 pref medium
default via 2407:c080:802:11cd::1 dev eth0 proto static metric 100 pref medium

Your Environment

  • Calico version v3.25.0
  • Orchestrator version (e.g. kubernetes, mesos, rkt): v1.24.3 by kubeadm
  • Operating System and version: Linux node4 4.18.16-1.el7.elrepo.x86_64 Improve the "Introduction" Content #1 SMP Sat Oct 20 12:52:50 EDT 2018 x86_64 x86_64 x86_64 GNU/Linux CentOS Linux release 7.9.2009 (Core)
  • Link to your project (optional):
@blue-troy
Copy link
Contributor Author

blue-troy commented Jan 30, 2023

I make a ipv6 one node to one node type vxlan in the centos7 4.18.0-1.el7.elrepo.x86_64 kernel without calico, and it works just fine. which means the basic usage of ipv6 vxlan in my 4.18.0-1.el7.elrepo.x86_64 kernel is functional. But calico do not work in cross-node pod network.

Then i upgrade kernel to CentOS Linux (5.0.0-1.el7.elrepo.x86_64) 7 (Core) , calico works fine.

I create two new virtual machine which runs centos8.2 with kernel 4.18.0-193.el8.x86_64 and try to reproduce for this bugs. but calico works fine in centos 8.2 4.18.0-193.el8.x86_64.

Maybe there is some patchs between kernel 4.18.0-193.el8.x86_64 with 4.18.0-1.el7.elrepo.x86_64 that affect the calico

@blue-troy
Copy link
Contributor Author

blue-troy commented Jan 30, 2023

Do you have time to see this issue @cyclinder @coutinhop ? this issue may related to #6877 #6273 #7195

@blue-troy
Copy link
Contributor Author

@meizhuhanxiang do you meet the same issue?

@cyclinder
Copy link
Contributor

@blue-troy can you show the output of calicoctl get ippools -o wide ? and can you try to add tunnel route manually like #6877 (comment) ?

@blue-troy
Copy link
Contributor Author

blue-troy commented Feb 1, 2023

@blue-troy can you show the output of calicoctl get ippools -o wide ? and can you try to add tunnel route manually like #6877 (comment) ?

ippools:

$ ./calicoctl get ippools -o wide
NAME                  CIDR               NAT     IPIPMODE   VXLANMODE   DISABLED   DISABLEBGPEXPORT   SELECTOR
default-ipv4-ippool   10.244.0.0/16      true    Never      Always      false      false              all()
default-ipv6-ippool   2001:db8:42::/56   false   Never      Always      false      false              all()

VXLANTunnel:

$ ./calicoctl get nodes -o yaml | grep ipv6VXLANTunnelAddr
    ipv6VXLANTunnelAddr: 2001:db8:42:99:c534:b446:c218:74c0
    ipv6VXLANTunnelAddr: 2001:db8:42:ca:fe3b:9f62:b140:2180

and the route is miss,then we try to add the router and fail

$ ip n | grep vxlan-v6
2001:db8:42:ca:fe3b:9f62:b140:2180 dev vxlan-v6.calico lladdr 66:be:ee:ea:d0:7e PERMANENT

$ ip -6 r add 2001:db8:42:ca:fe3b:9f62:b140:2180/122 via 2001:db8:42:ca:fe3b:9f62:b140:2180 dev vxlan-v6.calico
RTNETLINK answers: No route to host

and upgrade kernel to 5.0.0-1.el7.elrepo.x86_64, it fixed.

@cyclinder
Copy link
Contributor

As far as I know, The kernel version of centos 7.9 is 3.10, Did you upgrade it to 4.18? I have made some tests with centos8(kernel version is 4.18 ) and it works. So I'm not sure if 4.18 running on centos 7.9 is an issue.

@blue-troy
Copy link
Contributor Author

To support calico ipv6 vxlan , maybe just kernel >=3.12 is not enough. Maybe if we compile a 4.18 kernel from https://github.com/torvalds/linux not from https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-8 the ipv6 vxlan in calico in centos8 will still have this issue. Centos kernel have some different patchs which different with torvalds kernel.

I think the impact item is the kernel patchs, maybe not the os verison.And 4.18.0-193.el8.x86_64 kernel is different with 4.18.0-1.el7.elrepo.x86_64 kernel.

@cyclinder
Copy link
Contributor

Are you compiling the kernel from torvalds kernel ? If so , Are you trying to compiling the kernel from centos-stream-8 ? This is to verify that your point is correct.

@blue-troy
Copy link
Contributor Author

I found the centos 7 version 4.18.0 kernel https://vault.centos.org/7.9.2009/updates/Source/SPackages/kernel-4.18.0-348.20.1.el7.src.rpm, but it's hard for me to build it since i have tried.

@blue-troy
Copy link
Contributor Author

blue-troy commented Feb 3, 2023

@cyclinder @coutinhop I have make a new test. As you know centos8 with it's default redhat kernel 4.18.0-193.el8.x86_64 works with calico vxlan ipv6, then I bulid and run torvalds 4.18.0 kernel on this machine and errors occurs. So we can say kernel >= v3.12 even the torvalds 4.18.0 kernel is not enough for calico vxlan ipv6? #6877 (comment)

Since kernel 5.0 works with calico vxlan ipv6, the problem is solved between 4.18.0~5.0 on torvalds linux kernel.

@blue-troy
Copy link
Contributor Author

blue-troy commented Feb 3, 2023

torvalds linux kernel 4.19.0 do not works with calico vxlan ipv6, 4.19.1 works fine, this is the commits between 4.19.0 to 4.19.1. It looks like this commit net/ipv6: Allow onlink routes to have a device mismatch if it is the default route is the commit that affects

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants