Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Interfaces with IPAM IPv6 addresses also pick up SLAAC addresses #160

Closed
NeilW opened this issue Jun 12, 2018 · 7 comments
Closed

Interfaces with IPAM IPv6 addresses also pick up SLAAC addresses #160

NeilW opened this issue Jun 12, 2018 · 7 comments
Labels

Comments

@NeilW
Copy link
Contributor

NeilW commented Jun 12, 2018

The IPAM system appears to be top down in nature - in that the interface assigns the addresses returned by the IPAM plugin. However the interfaces created don't set the interface IPv6 autoconfiguration to off - which can result in the interface picking up a bottom up SLAAC address in addition to the IPAM allocated one if it is on a network where other devices and interfaces are using SLAAC.

With a CNI config of:

{
  "cniVersion": "0.3.0",
  "name": "mynet",
  "type": "ipvlan",
  "master": "ens3",
  "ipam": {
    "type": "host-local",
    "ranges": [
      [
        {
          "subnet": "2a02:1348:178:7112:24:19ff:fee1:c44a/64"
        }
      ]
    ]
  }
}

on kubernetes I get:

ubuntu@srv-xp1mv:~$ sudo nsenter -t 19235 -n 
root@srv-xp1mv:~# ip addr show eth0
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default 
    link/ether 02:24:19:e1:c4:4a brd ff:ff:ff:ff:ff:ff
    inet6 2a02:1348:178:7112:224:1900:1e1:c44a/64 scope global dynamic mngtmpaddr 
       valid_lft 3448sec preferred_lft 3448sec
    inet6 2a02:1348:178:7112:24:19ff:fee1:c44f/64 scope global 
       valid_lft forever preferred_lft forever
    inet6 fe80::224:1900:1e1:c44a/64 scope link 
       valid_lft forever preferred_lft forever

Interfaces with IPAM IPv6 addressing should probably set /proc/sys/net/ipv6/conf/<int>/autoconf and /proc/sys/net/ipv6/conf/<int>/accept_ra appropriately.

(Perhaps accept_ra is switched off if there are routes specified, and switched on if not, similarly autoconf is switched off if there are ranges specified, and switched on if not).

@squeed
Copy link
Member

squeed commented Jun 12, 2018

I agree, we should probably disable accept_ra on the interface. The question is: should we do this in the IPAM plugin or in ptp?

@NeilW
Copy link
Contributor Author

NeilW commented Jun 12, 2018

I wouldn't want to disable accept_ra unless there was a route specified.

A lot depends whether there is any plan to support SLAAC in this set of plugins, and how you are going to support the GET proposal with IPv6 - does it include local and temporary addressing for example and discovered routes

@zhanggbj
Copy link

zhanggbj commented Sep 9, 2021

I got a similar issue, that 2nd network interface got an extra IPv6 ip address.
I disabled autoconfig and accept_ra on worker node, it doesn't help.

net.ipv6.conf.all.autoconf=0 
net.ipv6.conf.default.autoconf=0
net.ipv6.conf.all.accept_ra=0 
net.ipv6.conf.default.accept_ra=0 

Here're more details.

apiVersion: "k8s.cni.cncf.io/v1"
kind: NetworkAttachmentDefinition
metadata:
  name: macvlan-conf-1
spec:
  config: '{
      "cniVersion": "0.3.0",
      "name": "macvlan-conf-1",
      "type": "macvlan",
      "master": "eth0",
      "mode": "bridge",
      "ipam": {
        "type": "whereabouts",
        "range": "2001::0/116",
        "gateway": "2001::f:1",
        "range_start": "2001::f:10",
        "range_end": "2001::f:20"
      }
    }'

  • Pod network interface
kubectl exec -it pod0-bridge -- ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
3: eth0@if945: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1430 qdisc noqueue state UP group default
    link/ether ae:ae:b8:74:38:6c brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet6 fd00:100:64:1::3ac/64 scope global
       valid_lft forever preferred_lft forever
    inet6 fe80::acae:b8ff:fe74:386c/64 scope link
       valid_lft forever preferred_lft forever
4: net1@if2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default
    link/ether 56:7a:5e:e2:32:32 brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet6 2620:124:6020:c202:547a:5eff:fee2:3232/64 scope global mngtmpaddr dynamic
       valid_lft 2591881sec preferred_lft 604681sec
    inet6 2001::f:10/116 scope global
       valid_lft forever preferred_lft forever
    inet6 fe80::547a:5eff:fee2:3232/64 scope link
       valid_lft forever preferred_lft forever

2001::f:10/116 is in the expectation.
2620:124:6020:c202:547a:5eff:fee2:3232/64 is an extra IP address.

@zhanggbj
Copy link

Even if I disabled autoconfig and accept_ra for the pod config, it also got an extra IP address.

@squeed @NeilW any thoughts? Thanks!

apiVersion: "k8s.cni.cncf.io/v1"
kind: NetworkAttachmentDefinition
metadata:
  name: macvlan-conf-1
spec:
  config: '{
      "cniVersion": "0.3.0",
      "name": "macvlan-conf-1",
      "plugins": [
        {
          "type": "macvlan",
          "master": "eth0",
          "mode": "bridge",
          "ipam": {
            "type": "whereabouts",
            "range": "2001::0/116",
            "gateway": "2001::f:1",
            "range_start": "2001::f:10",
            "range_end": "2001::f:20"
          }
        },
        {
           "type": "tuning",
           "sysctl": {
              "net.ipv6.conf.all.autoconf": "0",
              "net.ipv6.conf.default.autoconf": "0",
              "net.ipv6.conf.all.accept_ra": "0",
              "net.ipv6.conf.default.accept_ra": "0"
         }
        }
      ]
    }'

@akunszt
Copy link

akunszt commented Jun 16, 2023

This is still an issue for us. I think it is a race condition between the RA packets and when the tuning plugin executes the sysctl settings. Sometimes it works, sometimes it doesn't.

This is my theory (I am not very familiar with CNI or Go so take this with a grain of salt):

  1. The bridge plugin creates the veth pair.
  2. Disables the accept_ra on the host side.
  3. Attaches/enslaves the host side to the bridge.
  4. The tuning module changes the sysctl on the container side. It uses information from the bridge module so it cannot be executed earlier.

If an RA packet arrives between steps 3 and 4 then the container will have an extra IPv6 address and default gateway.

We use this configuration (the variables filled in properly by another mechanism):

{
        "cniVersion": "1.0.0",
        "name": "test",
        "plugins": [
                {
                        "type": "bridge",
                        "bridge": "pod",
                        "ipam": {
                                "type": "host-local",
                                "ranges": [
                                        [
                                                {
                                                        "subnet": "$IPV4_RANGE",
                                                        "rangeStart": "$IPV4_RANGE_START",
                                                        "rangeEnd": "$IPV4_RANGE_END",
                                                        "gateway": "$IPV4_GATEWAY"
                                                }
                                        ],
                                        [
                                                {
                                                        "subnet": "$IPV6_RANGE",
                                                        "rangeStart": "$IPV6_RANGE_START",
                                                        "rangeEnd": "$IPV6_RANGE_END",
                                                        "gateway": "$IPV6_GATEWAY"
                                                }
                                        ]
                                ],
                                "routes": [
                                        { "dst": "0.0.0.0/0" },
                                        { "dst": "::/0" }
                                ]
                        }
                },
                {
                        "type": "tuning",
                        "sysctl": {
                                "net.ipv6.conf.all.accept_ra": "0",
                                "net.ipv6.conf.all.autoconf": "0",
                                "net.ipv6.conf.default.accept_ra": "0",
                                "net.ipv6.conf.default.autoconf": "0",
                                "net.ipv6.conf.eth0.accept_ra": "0",
                                "net.ipv6.conf.eth0.autoconf": "0"
                        }
                }
        ]
}

@mccv1r0
Copy link
Member

mccv1r0 commented Jun 16, 2023

If an RA packet arrives between steps 3 and 4 then the container will have an extra IPv6 address and default gateway.

Is the bridge in layer 2 mode by any chance?

In layer 3 mode (the default) an RA sent to "bridge": "pod", should terminate there. The router has no visibility to anything on the bridge. To reach anything on that subnet, the router needs a route defined that uses "bridge": "pod", (or nat via IPv6 address of "bridge": "pod",)

Thus the RA shouldn't propagate to containers on the bridge, that's an entire different subnet ("broadcast domain").

Is something on your node doing the RA? It has to come from somewhere. If nothing sends RA there shouldn't be a need for tuning inside the containers.

@akunszt
Copy link

akunszt commented Jun 16, 2023

It is a Layer 2 bridge. I have a separate physical network for pods and I assign IP addresses from that network. I don't need or want to do any Layer 3 processing on the host node. I have RAdvD servers on that network advertising prefixes and default routes for other reasons.

I created a small PR (#910). It is working for me. It adds an enableSlaac parameter and turns off the accept_ra on the container side based on the value. It is just the code. If you think it is a good approach then I'll extend the PR with test cases and documentation changes etc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants