Skip to content

Virtual eXtensible Local Area Network (VXLAN)

Amit Cohen edited this page Dec 29, 2021 · 9 revisions
Table of Contents
  1. Introduction
  2. Use with VLAN-unaware Bridges
    1. Decapsulation
    2. Encapsulation
      1. Unicast Forwarding
      2. Flooding
    3. Bridge MAC Address
  3. Use with VLAN-aware Bridges
  4. VXLAN Learning
  5. VXLAN Flags
  6. Neighbour Suppression
  7. VXLAN Routing
    1. Asymmetric Routing
    2. Symmetric Routing
  8. Q-in-VNI
  9. Features and Limitations
  10. Further Resources

Introduction

Virtual eXtensible Local Area Network (VXLAN) enables the encapsulation of Ethernet frames inside UDP packets with a designated UDP destination port (4789). VXLAN allows users to overlay L2 networks on top of existing L3 networks. In the data center, it is commonly used to stretch an L2 network across multiple racks.

Initial VXLAN support appeared in kernel 3.7. Since kernel 4.20 it is possible to offload the VXLAN forwarding plane to the Spectrum ASIC. Starting from kernel 5.17 VXLAN with IPv6 underlay is also supported.

Use with VLAN-unaware Bridges

The VXLAN data path can be split into two parts: decapsulation and encapsulation.

Decapsulation

Decapsulation occurs when the switch receives a VXLAN-encapsulated packet whose underlay destination IP corresponds to that of the local VTEP. The source IP of the local VTEP is usually assigned to the loopback device:

$ ip -d link show dev vx10010
3972: vx10010: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master br0 state UNKNOWN mode DEFAULT group default qlen 1000
    link/ether de:49:10:b4:e3:79 brd ff:ff:ff:ff:ff:ff promiscuity 1 minmtu 68 maxmtu 65535
    vxlan id 10 local 192.0.2.1 srcport 0 0 dstport 4789 tos inherit ttl 10 ageing 300 noudpcsum noudp6zerocsumtx noudp6zerocsumrx
...

$ ip address add 192.0.2.1/32 dev lo
$ ip route show 192.0.2.1 table local
local 192.0.2.1 dev lo proto kernel scope host src 192.0.2.1 offload

The local route is marked as offloaded since VXLAN-encapsulated packets that hit it are decapsulated by the hardware and forwarded in the overlay network.

Note: The same local route cannot be used to decapsulate IP-in-IP and VXLAN packets at the same time.

Encapsulation

Encapsulation takes place when the switch decides to forward a packet to a VXLAN tunnel. This can happen either due to an FDB entry pointing to a VXLAN device or due to the packet being flooded by the enslaving bridge.

Unicast Forwarding

In order for a packet to be forwarded to a single remote VTEP, FDB entries need to be configured at both the bridge and VXLAN devices' FDB tables:

$ bridge fdb add 00:11:22:33:44:55 dev vx10010 self master static \
	dst 198.51.100.1
$ bridge fdb show brport vx10010
00:11:22:33:44:55 offload master br0 static
...
00:11:22:33:44:55 dst 198.51.100.1 self offload static

The self keyword will add the entry to the VXLAN FDB, whereas the master keyword will add the entry to the bridge FDB.

Note that both entries are squashed into one {MAC, VLAN/VNI} -> IP entry in the hardware. Therefore, in case one entry is removed, the entry will be removed from the hardware and the remaining entry will be unmarked since it is not offloaded anymore:

$ bridge fdb del 00:11:22:33:44:55 dev vx10010 master
$ bridge fdb show brport vx10010
00:11:22:33:44:55 dst 198.51.100.1 self static

$ bridge fdb add 00:11:22:33:44:55 dev vx10010 master
$ bridge fdb show brport vx10010
00:11:22:33:44:55 offload master br0 static
...
00:11:22:33:44:55 dst 198.51.100.1 self offload static
Flooding

When a packet does not match an FDB entry it is flooded to all the local ports enslaved to the bridge as well as to all the configured remote VTEPs. To add a new remote VTEP to the VXLAN device, use the all-zeroes MAC address:

$ bridge fdb append 00:00:00:00:00:00 dev vx10010 self static \
	dst 198.51.100.1
$ bridge fdb append 00:00:00:00:00:00 dev vx10010 self static \
	dst 198.51.100.64
$ bridge fdb show brport vx10010
...
00:00:00:00:00:00 dst 198.51.100.1 self offload static
00:00:00:00:00:00 dst 198.51.100.64 self offload static
...

In the above example, a flooded packet will be replicated twice in the hardware and routed to both remote VTEPs.

Bridge MAC Address

By default, the MAC address of the bridge is inherited from the bridge slave with the lowest MAC address. If this is the VXLAN device - whose MAC is randomly generated by default - it might not be possible to create a router interface on top of the bridge. That is because all the router interfaces must have the same MSBs in their MAC address.

$ ip link set dev swp3 master br0

# MAC address of the bridge is inherited from swp3
$ ip -br link show dev swp3
swp3             DOWN           7c:fe:90:ff:27:d1 <BROADCAST,MULTICAST>
$ ip -br link show dev br0
br0              DOWN           7c:fe:90:ff:27:d1 <BROADCAST,MULTICAST>

# MAC address of the bridge is inherited from VXLAN device
$ ip -br link show dev vx10010
vx10010          UNKNOWN        7a:04:ef:cd:59:5f <BROADCAST,MULTICAST,UP,LOWER_UP>
$ ip link set dev vx10010 master br0
$ ip -br link show dev br0
br0              DOWN           7a:04:ef:cd:59:5f <BROADCAST,MULTICAST>

# IP address cannot be assigned to the bridge device
$ ip address add 10.1.1.1/24 dev br0
Error: mlxsw_spectrum: All router interface MAC addresses must have the same prefix.

To prevent the Linux bridge from inheriting a MAC address, explicitly set its MAC address to that of one of the physical interfaces:

$ ip link set dev br0 address 7c:fe:90:ff:27:d1
$ ip address add 10.1.1.1/24 dev br0
$ echo $?
0

Note: Only a single VXLAN device can be enslaved to a VLAN-unaware bridge.

Use with VLAN-aware Bridges

When using VLAN-aware bridges, multiple VXLAN devices can be enslaved to the bridge. The mapping between a VLAN and a VNI is performed by configuring the VLAN as PVID and egress untagged on the bridge slave corresponding to the VXLAN device:

$ ip link add name br0 type bridge vlan_filtering 1 vlan_default_pvid 0 \
	mcast_snooping 0

$ ip link set dev swp3 master br0
$ bridge vlan add vid 10 dev swp3 pvid untagged

$ ip link set dev vx10010 master br0
$ bridge vlan add vid 10 dev vx10010 pvid untagged

A VXLAN tunnel that is not mapped to a VLAN will not be offloaded.

The same VLAN cannot be mapped to multiple VNIs:

$ ip link set dev vx10020 master br0
$ bridge vlan add vid 10 dev vx10020 pvid untagged
RTNETLINK answers: Invalid argument

When configuring FDB entries in a VLAN-aware bridge, the vlan keyword should be used to specify the VLAN that the FDB entry will be programmed with:

$ bridge fdb add 00:11:22:33:44:55 dev vx10010 self master static \
	dst 198.51.100.2 vlan 10

VXLAN Learning

When VXLAN learning ("snooping") is enabled, the VXLAN device's FDB is populated based on decapsulated packets. Whenever a packet is decapsulated, a new {MAC, VNI} -> IP FDB entry is created from the packet's overlay source MAC, VNI and underlay source IP. In case the entry already exists, it is refreshed.

To enable VXLAN learning:

$ ip link add name vx10010 up type vxlan id 10 noudpcsum tos inherit \
        ttl 10 local 192.0.2.1 dstport 4789 learning
$ ip link set dev vx10010 master br0

Learning also needs to be enabled at bridge slave. That is the case by default, but if it has been turned off, it needs to be enabled again:

$ ip link set dev vx10010 type bridge_slave learning on

The bridge and the VXLAN drivers are notified on each FDB entry learned by the hardware. These entries will be marked as offloaded and externally learned:

$ bridge fdb show brport vx10010
00:11:22:33:44:55 vlan 10 extern_learn offload master br0
...
00:11:22:33:44:55 dst 198.51.100.1 self extern_learn offload

The default ageing time is 5 minutes, but can be changed as follows:

$ ip link set dev br0 type bridge ageing_time 1000

The above will set the ageing time to 10 seconds.

VXLAN Flags

Spectrum ASICs use a zero UDP checksum for encapsulated packets and do not validate the checksum of decapsulated packets.

On the other hand, the Linux VXLAN implementation calculates UDP checksum for encapsulated packets and in some cases validates the checksum of decapsulated packets.

To align Linux and ASIC behavior, the flags below should be used.

IPv4 VXLAN:
  • noudpcsum - means that UDP checksum will not be calculated for transmitted packets over IPv4
IPv6 VXLAN:
  • noudp6zerocsumtx - means that UDP checksum will not be calculated for transmitted packets over IPv6

  • noudp6zerocsumrx - means that incoming UDP packets over IPv6 with zero checksum field will not be dropped. This is relevant only for IPv6, according to RFC 2460 - "IPv6 receivers must discard UDP packets containing a zero checksum"

Neighbour Suppression

Neighbour suppression allows the Linux bridge to answer IPv4 ARP requests and IPv6 neighbour discovery messages on behalf of remote hosts. It reduces the amount of packets a VTEP needs to flood.

Upon the reception of such a packet the Linux bridge will try to find a corresponding neighbour entry on the bridge device itself or on a VLAN interface configured on top of the bridge (based on the packet's VLAN tag). Assuming an entry was found, the Linux bridge will look up the resulting MAC in its FDB. If the FDB entry points to an interface with neighbour suppression enabled, the Linux bridge will reply on behalf of the remote host.

To enable neighbour suppression on the VXLAN device:

$ ip link set dev vx10010 type bridge_slave neigh_suppress on

Note: Suppression of IPv6 Neighbour Discovery packets is currently not supported.

VXLAN Routing

VXLAN routing allows hosts in different overlay networks to communicate with each other. Two popular models for VXLAN routing are distributed asymmetric routing and distributed symmetric routing. Instead of designating special VTEPs to perform routing - as in the centralized routing model - each VTEP performs routing for the hosts connected to it.

In the asymmetric model, the ingress VTEP performs the routing and the egress VTEP only performs bridging. In the symmetric model, routing occurs at both VTEPs.

While the interface configuration of the symmetric model is more involved, it scales better than the asymmetric model. In the symmetric model, each remote host consumes one route entry and each VTEP consumes a neighbour and an FDB entry. In the asymmetric model, each remote host consumes one neighbour entry and one FDB entry and each VTEP consumes a single route entry. In addition, when using the symmetric model, it is not required for every VTEP to be a member in all the VNIs it needs to communicate with.

To allow VM mobility between different VTEPs, it is recommended to configure an anycast gateway on each VTEP. With an anycast gateway, a VM can be moved to a different VTEP without changing its default gateway configuration. In the examples below the anycast gateway is implemented using a macvlan device.

Asymmetric Routing

The following example illustrates the interface configuration on a switch connected to two hosts and a spine. The switch acts as a VTEP and performs routing between VNIs 1000 and 2000 that belong to a single tenant (VRF). The IP addresses 10.1.1.1 and 10.1.2.1 serve as the gateway IPs for the overlay networks corresponding to VNIs 1000 and 2000, respectively.

     + Host 1                                     + Host 2
     |                                            |
+----|--------------------------------------------|-------------------------+
|    |                                            |                         |
| +--|--------------------------------------------|-----------------------+ |
| |  + swp1                         br1           + swp2                  | |
| |    vid 10 pvid untagged                         vid 20 pvid untagged  | |
| |                                                                       | |
| |  + vx10                                       + vx20                  | |
| |    local 10.0.0.1                               local 10.0.0.1        | |
| |    remote 10.0.0.2                              remote 10.0.0.2       | |
| |    id 1000                                      id 2000               | |
| |    dstport 4789                                 dstport 4789          | |
| |    vid 10 pvid untagged                         vid 20 pvid untagged  | |
| |                                                                       | |
| +-----------------------------------+-----------------------------------+ |
|                                     |                                     |
| +-----------------------------------|-----------------------------------+ |
| |                                   |                                   | |
| |  +--------------------------------+--------------------------------+  | |
| |  |                                                                 |  | |
| |  + vlan10                                                   vlan20 +  | |
| |  | 10.1.1.11/24                                       10.1.2.11/24 |  | |
| |  |                                                                 |  | |
| |  + vlan10-v (macvlan)                           vlan20-v (macvlan) +  | |
| |    10.1.1.1/24                                         10.1.2.1/24    | |
| |    00:00:5e:00:01:01                             00:00:5e:00:01:01    | |
| |                               vrf-green                               | |
| +-----------------------------------------------------------------------+ |
|                                                                           |
|    + swp3                                       + lo                      |
|    | 192.0.2.1/24                                 10.0.0.1/32             |
+----|----------------------------------------------------------------------+
     |
     + Spine

The following commands were used:

# asymmetric routing - interface configuration

ip link add name br1 type bridge vlan_filtering 1 vlan_default_pvid 0 \
	mcast_snooping 0

# Make sure the bridge uses the MAC address of the local port and not
# that of the VXLAN's device
ip link set dev br1 address <swp1's MAC address>
ip link set dev br1 up

ip link set dev swp3 up
ip address add dev swp3 192.0.2.1/24
ip route add 10.0.0.2/32 nexthop via 192.0.2.2

ip link add name vx10 type vxlan id 1000		\
	local 10.0.0.1 remote 10.0.0.2 dstport 4789	\
	nolearning noudpcsum tos inherit ttl 100
ip link set dev vx10 up

ip link set dev vx10 master br1
bridge vlan add vid 10 dev vx10 pvid untagged

ip link add name vx20 type vxlan id 2000		\
	local 10.0.0.1 remote 10.0.0.2 dstport 4789	\
	nolearning noudpcsum tos inherit ttl 100
ip link set dev vx20 up

ip link set dev vx20 master br1
bridge vlan add vid 20 dev vx20 pvid untagged

ip link set dev swp1 master br1
ip link set dev swp1 up
bridge vlan add vid 10 dev swp1 pvid untagged

ip link set dev swp2 master br1
ip link set dev swp2 up
bridge vlan add vid 20 dev swp2 pvid untagged

ip address add 10.0.0.1/32 dev lo

# Create tenant VRF
ip link add dev vrf-green up type vrf table 10
ip -4 route add table 10 unreachable default metric 4278198272
ip -6 route add table 10 unreachable default metric 4278198272
ip -4 rule add pref 32765 table local
ip -4 rule del pref 0
ip -6 rule add pref 32765 table local
ip -6 rule del pref 0

# Create SVIs
ip link add link br1 name vlan10 up master vrf-green type vlan id 10
ip address add 10.1.1.11/24 dev vlan10
ip link add link vlan10 name vlan10-v up master vrf-green \
	address 00:00:5e:00:01:01 type macvlan mode private
ip address add 10.1.1.1/24 dev vlan10-v metric 1024

ip link add link br1 name vlan20 up master vrf-green type vlan id 20
ip address add 10.1.2.11/24 dev vlan20
ip link add link vlan20 name vlan20-v up master vrf-green \
	address 00:00:5e:00:01:01 type macvlan mode private
ip address add 10.1.2.1/24 dev vlan20-v metric 1024

bridge vlan add vid 10 dev br1 self
bridge vlan add vid 20 dev br1 self

bridge fdb add 00:00:5e:00:01:01 dev br1 self local vlan 10
bridge fdb add 00:00:5e:00:01:01 dev br1 self local vlan 20

# Disable rp_filter and enable arp_ignore to make sure ARPs for the
# anycast IP are answered with the anycast MAC
sysctl -w net.ipv4.conf.all.rp_filter=0
sysctl -w net.ipv4.conf.vlan10-v.rp_filter=0
sysctl -w net.ipv4.conf.vlan20-v.rp_filter=0
sysctl -w net.ipv4.conf.all.arp_ignore=1
Overlay Configuration Example

The configuration in the previous section only includes the interface configuration and omits the overlay configuration, as this is usually performed by a routing daemon. Static configuration of the overlay router can be performed as follows:

bridge fdb add $host_mac dev vx10 self master extern_learn static \
	dst $vtep_ip vlan 10
ip neigh add $host_ip lladdr $host_mac nud noarp dev vlan10 extern_learn

Where $vtep_ip is the IP of the remote VTEP and {$host_mac, $host_ip} are the MAC and IP of a remote host connected to that VTEP, in VLAN 10.

The full example is available here.

Symmetric Routing

The interface configuration in the symmetric model is similar to the asymmetric model. The main difference is the addition of an L3 VNI, which is used for routed traffic in both directions: from and to the VTEP.

     + Host 1                                     + Host 2
     |                                            |
+----|--------------------------------------------|-------------------------+
|    |                                            |                         |
| +--|--------------------------------------------|-----------------------+ |
| |  + swp1                         br1           + swp2                  | |
| |    vid 10 pvid untagged                         vid 20 pvid untagged  | |
| |                                                                       | |
| |  + vx10                                       + vx20                  | |
| |    local 10.0.0.1                               local 10.0.0.1        | |
| |    remote 10.0.0.2                              remote 10.0.0.2       | |
| |    id 1010                                      id 1020               | |
| |    dstport 4789                                 dstport 4789          | |
| |    vid 10 pvid untagged                         vid 20 pvid untagged  | |
| |                                                                       | |
| |                             + vx4001                                  | |
| |                               local 10.0.0.1                          | |
| |                               remote 10.0.0.2                         | |
| |                               id 104001                               | |
| |                               dstport 4789                            | |
| |                               vid 4001 pvid untagged                  | |
| |                                                                       | |
| +-----------------------------------+-----------------------------------+ |
|                                     |                                     |
| +-----------------------------------|-----------------------------------+ |
| |                                   |                                   | |
| |  +--------------------------------+--------------------------------+  | |
| |  |                                |                                |  | |
| |  + vlan10                         |                         vlan20 +  | |
| |  | 10.1.1.11/24                   |                   10.1.2.11/24 |  | |
| |  |                                |                                |  | |
| |  + vlan10-v (macvlan)             +             vlan20-v (macvlan) +  | |
| |    10.1.1.1/24                vlan4001                 10.1.2.1/24    | |
| |    00:00:5e:00:01:01                             00:00:5e:00:01:01    | |
| |                               vrf-green                               | |
| +-----------------------------------------------------------------------+ |
|                                                                           |
|    + swp3                                       + lo                      |
|    | 192.0.2.1/24                                 10.0.0.1/32             |
+----|----------------------------------------------------------------------+
     |
     + Spine

The following commands were used:

# symmetric routing - interface configuration

ip link add name br1 type bridge vlan_filtering 1 vlan_default_pvid 0 \
	mcast_snooping 0

# Make sure the bridge uses the MAC address of the local port and not
# that of the VXLAN's device
ip link set dev br1 address <swp1's MAC address>
ip link set dev br1 up

ip link set dev swp3 up
ip address add dev swp3 192.0.2.1/24
ip route add 10.0.0.2/32 nexthop via 192.0.2.2

ip link add name vx10 type vxlan id 1010		\
	local 10.0.0.1 remote 10.0.0.2 dstport 4789	\
	nolearning noudpcsum tos inherit ttl 100
ip link set dev vx10 up

ip link set dev vx10 master br1
bridge vlan add vid 10 dev vx10 pvid untagged

ip link add name vx20 type vxlan id 1020		\
	local 10.0.0.1 remote 10.0.0.2 dstport 4789	\
	nolearning noudpcsum tos inherit ttl 100
ip link set dev vx20 up

ip link set dev vx20 master br1
bridge vlan add vid 20 dev vx20 pvid untagged

ip link set dev swp1 master br1
ip link set dev swp1 up
bridge vlan add vid 10 dev swp1 pvid untagged

ip link set dev swp2 master br1
ip link set dev swp2 up
bridge vlan add vid 20 dev swp2 pvid untagged

ip link add name vx4001 type vxlan id 104001		\
	local 10.0.0.1 dstport 4789			\
	nolearning noudpcsum tos inherit ttl 100
ip link set dev vx4001 up

ip link set dev vx4001 master br1
bridge vlan add vid 4001 dev vx4001 pvid untagged

ip address add 10.0.0.1/32 dev lo

# Create tenant VRF
ip link add dev vrf-green up type vrf table 10
ip -4 route add table 10 unreachable default metric 4278198272
ip -6 route add table 10 unreachable default metric 4278198272
ip -4 rule add pref 32765 table local
ip -4 rule del pref 0
ip -6 rule add pref 32765 table local
ip -6 rule del pref 0

# Create SVIs
ip link add link br1 name vlan10 up master vrf-green type vlan id 10
ip address add 10.1.1.11/24 dev vlan10
ip link add link vlan10 name vlan10-v up master vrf-green \
	address 00:00:5e:00:01:01 type macvlan mode private
ip address add 10.1.1.1/24 dev vlan10-v metric 1024

ip link add link br1 name vlan20 up master vrf-green type vlan id 20
ip address add 10.1.2.11/24 dev vlan20
ip link add link vlan20 name vlan20-v up master vrf-green \
	address 00:00:5e:00:01:01 type macvlan mode private
ip address add 10.1.2.1/24 dev vlan20-v metric 1024

ip link add link br1 name vlan4001 up master vrf-green \
	type vlan id 4001

bridge vlan add vid 10 dev br1 self
bridge vlan add vid 20 dev br1 self
bridge vlan add vid 4001 dev br1 self

bridge fdb add 00:00:5e:00:01:01 dev br1 self local vlan 10
bridge fdb add 00:00:5e:00:01:01 dev br1 self local vlan 20

# Disable rp_filter and enable arp_ignore to make sure ARPs for the
# anycast IP are answered with the anycast MAC
sysctl -w net.ipv4.conf.all.rp_filter=0
sysctl -w net.ipv4.conf.vlan10-v.rp_filter=0
sysctl -w net.ipv4.conf.vlan20-v.rp_filter=0
sysctl -w net.ipv4.conf.all.arp_ignore=1
Overlay Configuration Example

The configuration in the previous section only includes the interface configuration and omits the overlay configuration, as this is usually performed by a routing daemon. Static configuration of the overlay router can be performed as follows:

bridge fdb add $mac dev vx4001 self master extern_learn static \
	dst $vtep_ip vlan 4001
ip neigh add $vtep_ip lladdr $mac nud noarp dev vlan4001 extern_learn
ip route add $host_ip/32 vrf vrf-green nexthop via $vtep_ip \
	dev vlan4001 onlink

Where $vtep_ip is the IP of the remote VTEP, $mac is the MAC of the VLAN interface corresponding to the L3 VNI on the remote VTEP and $host_ip is an IP of a remote host connected to that VTEP.

The full example is available here.

Note: Symmetric routing is not supported on revision A0 of the Spectrum-1 ASIC. Refer to this section for instructions on how to determine the ASIC revision.

Q-in-VNI

Traditional VxLAN bridge takes traffic from a VLAN, strips the VLAN tag, encapsulates the result in IP/UDP/VxLAN headers, and routes the result. This stays the same for Q-in-VNI, the difference being that the VLAN popped is the sVLAN, not cVLAN. As a result, the packet inside the VxLAN encapsulation is tagged.

In Linux, Q-in-VNI bridging is achieved by adding the VxLAN device to an 802.1ad bridge instead of an 802.1q bridge.

Limitations

  • No support for VxLAN with an 802.1ad bridge for Spectrum-1

  • SVIs on top of an 802.1ad bridge are forbidden

  • No support for neighbour suppression and VxLAN routing

Linux 5.11

  • Dual VxLAN bridge is not supported, i.e., VxLAN with an 802.1ad bridge and VxLAN with an 802.1d bridge cannot coexist

Features and Limitations

Features by Version

Kernel Version
4.20 Support for VXLAN with VLAN-unaware bridges
5.0 Support for VXLAN with VLAN-aware bridges, Support for VXLAN routing
5.1 Spectrum-2 support, FDB vetoing
5.11 Q-in-VNI Spectrum-2 support, no dual VXLAN bridge
5.13 Q-in-VNI Spectrum-2 support, VxLAN with an 802.1ad bridge and VxLAN with an 802.1d bridge can coexist
5.17 Support for VXLAN with IPv6 underlay

Limitations

  • The bridge to which the VXLAN device is enslaved must have multicast snooping disabled. This means that packets with a multicast destination MAC are treated as broadcast and flooded

  • Only head-end-replication (HER) flooding is supported. Flooding packets to a multicast IP address in the underlay network is not supported

  • A source IP must be specified for the VXLAN tunnel

  • VXLAN can be used only in the default VRF (i.e., main table)

  • TOS must be inherited from the overlay packet. In case overlay packet is not an IP packet, 0 is used

  • A static TTL must be used. TTL cannot be inherited from the overlay packet

  • UDP checksum must be disabled on the VXLAN tunnel

  • The ASIC supports a single VXLAN tunnel endpoint (VTEP). Therefore, all the offloaded VXLAN tunnels must share the following properties: TTL, learning, UDP destination port, source IP

  • Runtime configuration change of a VXLAN tunnel is currently not supported while it is enslaved to a bridge. The device needs to be unlinked from the bridge and enslaved again for changes to take affect. Alternatively, it can be cycled down-up

  • A bridge with VXLAN device(s) enslaved will not be offloaded unless a physical port (or its upper) is also enslaved to the bridge

  • Learning and neighbour suppression are not supported with IPv6 underlay

  • VXLAN routing over IPv6 underlay is not supported in Spectrum-1

Till Linux 5.16 (include)
  • Only IPv4 underlay is supported

Valid configuration example:

$ ip link add name br0 type bridge mcast_snooping 0
$ ip link set dev swp3 master br0
$ ip link add name vx10010 up type vxlan id 10 noudpcsum tos inherit \
	ttl 10 local 192.0.2.1 dstport 4789
$ ip link set dev vx10010 master br0

In case of an invalid configuration (e.g., TTL set to inherit), an error message will be emitted:

$ ip link add name br0 type bridge mcast_snooping 0
$ ip link set dev swp3 master br0
$ ip link add name vx10010 up type vxlan id 10 noudpcsum tos inherit \
	ttl inherit local 192.0.2.1 dstport 4789
$ ip link set dev vx10010 master br0
Error: mlxsw_spectrum: VxLAN: TTL must not be configured to inherit.

Further Resources

  1. man ip-link
  2. ip link help vxlan
  3. RFC 7348
  4. VXLAN & Linux
Clone this wiki locally