• Extending EVPN and VXLAN to the Host

 
 
Print Friendly, PDF & Email

Overview

VxLAN provides a highly scalable, standards based approach for constructing L2 overlays on top of routed networks. It is defined in RFC7348, and encapsulates the original host Ethernet frame in a UDP + IP + Ethernet frame. BGP EVPN (RFC 7432 and RFC 8365 for its application to VXLAN) is a standards based control protocol to efficiently discover other endpoints (VTEPs) and distribute reachability information (MAC addresses).

This post assumes the reader is already familiar with configuration and operations of EVPN and VXLAN for Arista.

Goals

The use case here is the extension of a L2 overlay south of the TOR/Leaf switches and onto a server itself, allowing for both VLAN-backed overlays and server-native overlays to interoperate seamlessly. This would allow 1 or more servers to participate in larger numbers of overlays then the 4k VLANs supported by a standard TOR/Leaf switch.

Topology

  • Spine-1
    • Lo0: 1.1.1.1/32
  • Spine-2
    • Lo0: 1.1.1.2/32
  • Leaf-1
    • Shared VTEP – Lo1: 2.2.2.200/32
  • Leaf-1a
    • Lo0: 2.2.2.1/32
  • Leaf-1b
    • Lo0: 2.2.2.2/32
  • Leaf-2
    • Shared VTEP – Lo1: 2.2.2.201/32
  • Leaf-2a
    • Lo0: 2.2.2.3/32
  • Leaf-2b
    • Lo0: 2.2.2.4/32
  • Host
    • Lo0: 3.3.3.1/32

Software Versions

  • EOS: 4.24.0F
  • Linux: Linux debian 4.19.0-6-amd64
  • FRR: 7.5.1

Configuration

Only the most relevant portions of configuration are shown for each section, with occasional notes pertaining to the setup.

Spine Configuration (route server)

The spines are acting as route-servers in this setup, and will automatically accept incoming BGP sessions, based on the source IP, and ASN. In this case we accept EVPN sessions from the 2.2.2.0/24 subnet (used for the leaf switches) and the 3.3.3.0/24 subnet (used on our special host).

As this is a route server, for EVPN, we leave the next-hop unchanged.

peer-filter SERVER-PF
   10 match as-range 66000-66999 result accept
!
router bgp 65000
   router-id 1.1.1.1
   no bgp default ipv4-unicast
   distance bgp 20 200 200
   maximum-paths 8 ecmp 16
   bgp listen range 2.2.2.0/24 peer-group OVERLAY peer-filter UNDERLAY-PF
   bgp listen range 3.3.3.0/24 peer-group OVERLAY peer-filter SERVER-PF
   bgp listen range 10.10.10.0/24 peer-group UNDERLAY peer-filter UNDERLAY-PF
   neighbor OVERLAY peer group
   neighbor OVERLAY ebgp-multihop 3
   neighbor OVERLAY send-community
   neighbor OVERLAY maximum-routes 0
   neighbor UNDERLAY peer group
   neighbor UNDERLAY send-community
   neighbor UNDERLAY maximum-routes 12000
   !
   address-family evpn
      bgp next-hop-unchanged
      neighbor OVERLAY activate
   !
   address-family ipv4
      neighbor UNDERLAY activate
      network 1.1.1.1/32
!

Leaf-1a/b Configuration

Leaf 1a and b are almost identical in configuration. In this configuration we are allowing for the use of BGP unnumbered connections from the servers to the Leaf/TOR switches, and accepting BGP sessions from the server – this allows the server to advertise the VTEP loopback, and peer with the Spines, without configuration of link IP addresses.

!
interface Ethernet1
   no switchport
   ipv6 enable
   ipv6 nd ra interval msec 5000
!
ip routing ipv6 interfaces
!
ipv6 unicast-routing
!
peer-filter Server-Underlay-PF
   10 match as-range 66000-66999 result accept
!
router bgp 65001
   router-id 2.2.2.1
   no bgp default ipv4-unicast
   distance bgp 20 200 200
   maximum-paths 8 ecmp 16
   bgp listen range fe80::/10 peer-group SERVER peer-filter Server-Underlay-PF
...
   neighbor SERVER peer group
   neighbor SERVER send-community
   neighbor SERVER maximum-routes 12000
...
   address-family ipv4
      neighbor SERVER activate
      neighbor SERVER next-hop address-family ipv6 originate
...
!
end

The Leafs also have VXLAN and EVPN configuration to allow for VLAN-based overlay towards the other server, on Port-Channel 2, as shown in the topology diagram.

interface Port-Channel2
   switchport access vlan 10
   mlag 2
!
interface Vxlan1
   vxlan source-interface Loopback1
   vxlan udp-port 4789
   vxlan vlan 10 vni 12345
!
router bgp 65001
   router-id 2.2.2.1
...
   neighbor OVERLAY peer group
   neighbor OVERLAY remote-as 65000
   neighbor OVERLAY update-source Loopback0
   neighbor OVERLAY ebgp-multihop 3
   neighbor OVERLAY send-community
   neighbor OVERLAY maximum-routes 0
...
   neighbor SPINES peer group
   neighbor SPINES remote-as 65000
   neighbor SPINES send-community
   neighbor SPINES maximum-routes 12000
   neighbor 1.1.1.1 peer group OVERLAY
   neighbor 1.1.1.2 peer group OVERLAY
   neighbor 10.10.10.2 peer group SPINES
   neighbor 10.10.10.6 peer group SPINES
   !
   vlan 10
      rd 2.2.2.1:10
      route-target both 1:1
      redistribute learned
   !
   address-family evpn
      neighbor OVERLAY activate
   !
   address-family ipv4
      neighbor SERVER activate
      neighbor SERVER next-hop address-family ipv6 originate
      neighbor SPINES activate
      network 2.2.2.1/32
      network 2.2.2.200/32
      network 2.2.2.255/32
   !
   end

Host Configuration

We would expect to have multiple VMs/services connecting to the different overlays in a typical setup e.g.

There are two parts to the host configuration – the linux side (managing the linux bridges in this case), and the routing protocol stack side (in this case FRR).

Linux

First we create the vxlan interface, for our VNI (12345), and binding it to the server loopback address (3.3.3.1/32). MAC address learning is disabled for this interface, as we are relying on BGP/EVPN/FRR to provide this functionality.

Then we create a bridge and bind it to the VXLAN interface, disable STP, and bring it up.

ip link add vxlan12345 type vxlan id 12345 dstport 4789 local 3.3.3.1 nolearning
brctl addbr br12345
brctl addif br12345 vxlan12345
brctl stp br12345 off
ip link set up dev br12345
ip link set up dev vxlan12345

FRR

FRR is providing the BGP peering to the spines/route-servers, advertising locally configured VNIs, as well as importing/exporting our MAC VRF.

interface eth0
 ipv6 nd ra-interval 10
 no ipv6 nd suppress-ra
!
interface eth1
 ipv6 nd ra-interval 10
 no ipv6 nd suppress-ra
!
interface lo
 ip address 3.3.3.1/32
!
router bgp 66000
 bgp router-id 3.3.3.1
 no bgp ebgp-requires-policy
 no bgp network import-check
 neighbor fabric peer-group
 neighbor fabric remote-as external
 neighbor overlay peer-group
 neighbor overlay remote-as 65000
 neighbor overlay ebgp-multihop 255
 neighbor overlay disable-connected-check
 neighbor overlay update-source 3.3.3.1
 neighbor eth0 interface peer-group fabric
 neighbor eth1 interface peer-group fabric
 neighbor 1.1.1.1 peer-group overlay
 neighbor 1.1.1.2 peer-group overlay
!
address-family ipv4 unicast
 network 3.3.3.1/32
 no neighbor overlay activate
 maximum-paths 8
exit-address-family
!
address-family l2vpn evpn
 neighbor overlay activate
 advertise-all-vni
 vni 12345
  route-target import 1:1
  route-target export 1:1
 exit-vni
 advertise-svi-ip
exit-address-family
!

Testing

Normally you would be running VMs on the host, and each connecting to the VXLAN bridges it needs connectivity to. For a basic test, network-namespaces can be used e.g.

ip link add veth0 type veth peer name veth1
ip link set dev veth0 master br12345
ip link set veth0 up
ip netns add container1
ip link set dev veth1 netns container1
ip netns exec container1 ip link set lo up
ip netns exec container1 ip link set veth1 name eth0
ip netns exec container1 ip addr add 55.55.55.55/24 dev eth0
ip netns exec container1 ip link set eth0 up

 

And we can ping the remote hosts from the netns container

ip netns exec container1 ping 55.55.55.2
ip netns exec container1 ping 55.55.55.3

Troubleshooting

Checking the EVPN peering and routes on the spines/route-servers

Spine-1#show bgp evpn sum
BGP summary information for VRF default
Router identifier 1.1.1.1, local AS number 65000
Neighbor Status Codes: m - Under maintenance
Neighbor V AS MsgRcvd MsgSent InQ OutQ Up/Down State PfxRcd PfxAcc
2.2.2.1 4 65001 30837 30926 0 0 17d19h Estab 3 3
2.2.2.2 4 65001 30842 30952 0 0 17d18h Estab 3 3
2.2.2.3 4 65002 31017 31044 0 0 18d00h Estab 3 3
2.2.2.4 4 65002 30915 31017 0 0 17d23h Estab 3 3
3.3.3.1 4 66000 324 360 0 0 04:47:47 Estab 2 2
Spine-1#show bgp evpn detail
...
BGP routing table entry for imet 3.3.3.1, Route Distinguisher: 3.3.3.1:2
 Paths: 1 available
 66000
  3.3.3.1 from 3.3.3.1 (3.3.3.1)
    Origin IGP, metric -, localpref 100, weight 0, valid, external, best
    Extended Community: Route-Target-AS:1:1 TunnelEncap:tunnelTypeVxlan
    VNI: 12345
    PMSI Tunnel: Ingress Replication, MPLS Label: 12345, Leaf Information Required: false, Tunnel ID: 3.3.3.1

Checking FRR

debian# show interface vxlan12345
Interface vxlan12345 is up, line protocol is up
Link ups: 1 last: 2021/04/12 06:01:50.57
Link downs: 2 last: 2021/04/12 06:01:20.97
vrf: default
index 7 metric 0 mtu 1500 speed 0
flags: <UP,BROADCAST,RUNNING,MULTICAST>
Type: Unknown
HWaddr: 2e:d6:da:cd:ed:eb
inet6 fe80::2cd6:daff:fecd:edeb/64
Interface Type Vxlan
Interface Slave Type Bridge
VxLAN Id 12345 VTEP IP: 3.3.3.1 Access VLAN Id 1

Master interface: br12345

debian# show evpn mac vni 12345
Number of MACs (local and remote) known for this VNI: 3
Flags: N=sync-neighs, I=local-inactive, P=peer-active, X=peer-proxy
MAC Type Flags Intf/Remote ES/VTEP VLAN Seq #'s
00:50:00:00:09:00 remote 2.2.2.200 0/0
2e:d6:da:cd:ed:eb local br12345 1 0/0
00:50:00:00:08:00 remote 2.2.2.201 0/0

Checking that FRR has correctly provided the information to the kernel

root@debian:~# bridge fdb show dev vxlan12345
00:50:00:00:08:00 vlan 1 extern_learn master br12345
00:50:00:00:08:00 extern_learn master br12345
00:50:00:00:09:00 vlan 1 extern_learn master br12345
00:50:00:00:09:00 extern_learn master br12345
2e:d6:da:cd:ed:eb vlan 1 master br12345 permanent
2e:d6:da:cd:ed:eb master br12345 permanent
00:00:00:00:00:00 dst 2.2.2.200 self permanent
00:00:00:00:00:00 dst 2.2.2.255 self permanent
00:00:00:00:00:00 dst 2.2.2.201 self permanent
00:50:00:00:08:00 dst 2.2.2.201 self extern_learn
00:50:00:00:09:00 dst 2.2.2.200 self extern_learn

Caveats

Depending on the kernel version, NIC drivers, and NIC vendor/model, common hardware offload options may not be available for the payload traffic, which could impact CPU utilisation and overall performance. Some examples of these are;

  • TCP segmentation and reassembly
  • Checksum offload

Likewise the server linux/driver/NIC stack may or may not be capable of offloading the VXLAN encap/decap operations.

 

Alternative Configuration Options

  • This example is done using the native linux bridge. OVS [Open vSwitch] could also be used for the dataplane on the server.
  • IPv4 addressing between the Server and Leaf-1a/b pair could also be used
  • The EVPN sessions could peer with the TOR/Leaf pair, rather then the spines/route-servers directly
  • The use of FRR in this example should not be taken as a specific recommendation – there are a number of excellent Linux BGP implementations available that provide equivalent functionality

 

 

Follow

Get every new post on this blog delivered to your Inbox.

Join other followers: