VXLAN Without Controller for Network Virtualization with Arista physical VTEPs

 

1) Introduction

This article assumed an understanding of the VXLAN concepts. This article aims at guiding the design and implementation of network virtualization with VXLAN, employing physical VTEPs. This controller-less design provides Layer2 communication across a Layer3 network for any Layer2 Ethernet device. This solution guide resolves network virtualization for network teams that might not have yet a network virtualisation controller, or cloud management platform (CMP), but want to benefit now from all the advantages of VXLAN. Without network controller, the virtual switches will not participate natively in the VXLAN overlay setup, they would be configured the traditional way as standard Layer2. The Arista VTEPs support native traffic (untagged), trunking (802.1Q header) or Q-in-Q as overlay traffic. Traffic can therefore come from :

  • physical hosts (baremetal servers)
  • physical services appliances (access or trunking) such as firewalls, routers, load-balancers
  • virtualisation servers (VMware ESXi, Linux KVM, Microsoft Hyper-V, etc), usually with 802.1Q-tagged traffic. vSwitches are configured to trunk VLANs their uplink (802.1Q)
  • the only and simple requirement is to run Ethernet.

  This controller-less solution does not rely on network virtualisation controllers such as VMWare NSX, Nuage, Openstack Neutron, Plumgrid, or others. The disadvantages of a controller-less implementation is obviously that it does not manage the virtual machines or virtual switches, only the physical VTEPs. In contract, a cloud management platform would typically provision the following to satisfy service chaining, in addition to controlling Arista physical VTEPs:  provision the virtual machine, storage, virtual switch, security, IP addressing, etc. The benefit of controller-less implementation is easier access to network virtualisation for the network teams:

  • even if you don’t have a controller yet
  • the controller design/implementation is not ready yet (or there are internal challenges in getting a full end-to-end solution)
  • future-proof since based on standards, it can integrate with whichever network controller or cloud management platform you might later use (no lock-in)
  • you might never want a controller for the host virtualization side (for example: DC interconnect, hosting, etc)

  A controller-less implementation is therefore not suitable if you want to achieve full service chaining and build a cloud, but it is very suitable for a network team to achieve scalable and open network virtualization, whether it evolves to a cloud or not.    

2) Design – Fundamentals

  Assuming a Layer3 Leaf-Spine topology, the there are several elements of configuration relevant to the network virtualization. For more details on Layer3 Leaf-Spine, see https://www.arista.com/assets/data/pdf/DesignGuides/Arista-Universal-Cloud-Network-Design.pdf The following details what you need to design and implement for VXLAN:  

2.1) VLAN

Layer2 connectivity to hosts/appliances connecting to the VTEP

  • Layer2 ports may be configured either as access (untagged/ native), trunk (802.1Q header), or Q-in-Q
  • VLANs are either assigned manually or automatically.
    • Automation involves VMtracer (vCenter configures VLANs), scripted (auto-allocation, auto-config with lookup), DevOps tools (Ansible, Chef, Puppet, SaltStack, etc)
    • VLANs must match with other resources that are on the same Layer2 segment.
    • With physical host/appliances, which are mostly static in the network, VLANs are often manually configured; but it can be automated easily from a central knowledge, scripted based on MAC lookup.
  • VLANs are mapped to VNIs in a one-to-one manner, although with Q-in-Q it is possible to preserve and carry the C-tag inside VXLAN.

Example for manual configuration (not needed with VMtracer , scripts or DevOps automation)

!
vlan 100
interface ethernet1
  switchport access vlan 100
!

 

2.2) VTI  – VXLAN interface IP address.

For VTEP redundancy (active-active), a pair of VTEPs would share the same anycast IP.

  • this is routable in the underlay, and reachable by any other VTEPs.
  • it is normally manually configured by the networking team as part of the Underlay config, but can be automated with config template, resources pools

Example:

!
interface loopback 1
  ip address 1.1.1.1/32
interface vxlan1
  vxlan source-interface l1
!

 

2.3) VNI – VXLAN Network identifier

  • represent the Layer2 overlay segment ID, there are 16.7 millions.
  • it is globally significant. In contrast the VLANs are only locally significant to a switch (or redundant pair).
  • a VNI maps to a VLAN ID; for example VLAN 100 <==> VNI 10100
    • Such mapping may vary from switch to switch, as it could be automatically created by controllers/scripts. For controller-less implementation, and especially manual implementation, such mapping tends to be kept persistent globally, for ease of troubleshooting. But it does not have to, only VNIs are globally significant. VTEP1 could be configured with VLAN 100 <==> VNI 10100, while VTEP4 could be configured with VLAN 456 <==> VNI 10100. The global connectivity would be achieved.

Example (only needed if you insist doing it manually, automated the same way as VLANs)

!
interface vxlan1
  vxlan vlan 100 vni 10100
!

 

2.4) Unicast Replication for  B.U.M. traffic

Unicast Head-End Replication (HER) is an important part of all VXLAN implementations. Since Broadcast, Unknown Unicast and Multicast (BUM) traffic have unknown / unknowable Layer2 destination, a VTEP rely on a list of VTEPs to send BUM traffic to. This is called a flood list, as per the standard bridging mechanism.

  • HER flood-list may be configured manually. MAC reachability would employ flow-based MAC learning off the dataplane
  • CloudVision eXchange (CVX) can automatically synchronize the VTEPs and the required HER flood-lists. MAC reachability would be also synchronized between Arista VTEPs.
  • manual HER flood-list is suitable for small-scale deployments (few VTEPs, DC interconnect)
  • CVX is recommended for medium to large scale, since it make most of the manual configuration superfluous.

Manual Example:

!
interface vxlan1
  vxlan vlan 100 flood vtep 2.2.2.2 3.3.3.3 4.4.4.4
!

  CVX example:

!
interface vxlan1
  vxlan controller-client
!

   

3) Ethernet Bridging Fundamentals also matters with VXLAN

In this section we review how switching (Ethernet bridging) handles BUM traffic and MAC learning. The purpose is to later relate to these same mechanism in VXLAN. Note: in the below example, we assume some manual configuration on the switch, assigning the ports Eth1-3 to VLAN 2.

3.1) Silent Layer2 network

In a completely silent network, several hosts might be connected to the same broadcast domain (VLAN / Layer2 segment), the switch would have knowledge about which ports participate in the Layer2 domain (by static or dynamic configuration), but it will not know where the different hosts reside, because it would not have learned any MAC addresses.      

3.2) Flooding of Unknown Layer2 Destinations, and MAC learning

The switch will learn the source MAC of any received Layer2 frame it receives in the adequate VLAN. In the specific below example, the switch learns HostA’s MAC address 001c.aaaa.aaaa in VLAN2, mapped for reachability to the port Eth1.   When forwarding frames destined to MAC A, the switch would then be able to select the correct port in VLAN2: Eth1. However, the switch still has not got any information for reaching the destination MAC B for host B: 001c.bbbb.bbbb. From the switch’s perspective it is an unknown destination. The switch will then normally flood the frame out all the ports that participate in the Layer2 domain (except the port the frame was received on). it means the switch will flood the frame out to Eth2 and 3.       If Host B replies (for example as part of an ARP exchange), then the switch would then learn the HostB source MAC address. In the below, the switch forwards the Ethernet frame from Host B to Host A as all the relevant addresses are known.     The Host C MAC address 001c.cccc.cccc remains unknown by the switch until Host C starts to send Ethernet frame with its MAC address as source. If Host A sends a frame to MAC C, then the same flooding will occur again, onto ports Eth2 and 3. Similar behaviour are also experienced with other types of traffic that have permanently unknown destinations: Multicast and Broadcast. However in the case of Multicast, IGMP snooping optimises the default flooding behaviour.    

4) BUM traffic with VXLAN

  With VXLAN, BUM traffic still exists and still needs to be sent to the unknown destination(s) in the Layer2 domain. As previously discussed in the fundamentals section about unicast replication (HER), there are two ways to populate the unicast HER flood list: manually (in CLI), or automatically with CloudVision (CVX with the VXLAN service).   In the below illustration, Host A sends a frame destined to Host D, but MAC D is unknown by VTEP1. VTEP1 will therefore follow the flooding behaviour expected for BUM traffic, and replicate to the VTEP IP addresses listed in VTEP1’s flood-list: VTEP2 and VTEP3.       VXLAN does not modify the normal flooding behaviour of the switch, BUM traffic is therefore also flooded by the switch VTEP1 to local ports in the Layer2 domain: Host B receives the flooded frame. The remote VTEPs (VTEP2 and 3) receive the VXLAN Unicast packet that include the flooded frame. Note that each VXLAN packet is destined to a specify unique VTI: one packet is destined to VTI2, the other to VTI3. The remote VTEPs know about the originator’s MAC address. Depending on the VXLAN implementation chosen (manual or CVX VXLAN service), this dynamic MAC address knowledge would either come from data-plane learning or from automatic provisioning.  

5) MAC addresses knowledge for VXLAN

  In this section we continue the previous example to depict the discovery of MAC addresses. The VTEPs 2 and 3 would know about MAC A because either:

  • With data-plane learning: The receiving VTEP learns the inner source MAC of any received VXLAN IP packet. The source MAC is mapped to the remote VTEPs originating the VXLAN packet.
    • For example, VTEPs 2 and 3 learn that the inner frame’s source MAC A is behind VTI 1
  • With CloudVision VXLAN service, learning is optional: CVX synchronizes all the MAC addresses as soon as a switch learn it from a local port.
    • For example, VTEP1 learnt MAC A from its local port Eth1. This information is automatically and immediately synchronized by CVX VXLAN service to VTEP2 and VTEP3
    • With CVX VXLAN service, the data-plane learning is not necessary but can be activated if desired. If you are unsure, leave the default (data-plane learning would be inactive)

     

6) VXLAN implementation differences

  The different  VXLAN implementations possible are listed below.   This article focuses on the controller-less implementations   While both manual and CVX methods satisfy MAC address knowledge, there are differences to consider for your deployment:   a) Configuration simplicity As a reminder from the fundamentals section, below are the configuration differences between the manual flood-list and the CVX method:   Manual Example:

interface vxlan 1
  vxlan vlan 100 flood vtep 2.2.2.2 3.3.3.3 4.4.4.4   <-- List must include all VTIs participating in the Layer2 segment.

  CVX example:

interface vxlan 1
  vxlan control-service   <-- CVX manages the flood-list automatically
management cvx
  server host 172.16.0.100      <-- IP address of the EOS VM with CVX

  Both methods employ flood-lists for BUM traffic, but the complexity of implementation and troubleshooting increases fast when scaling a manual method. The implementation aspect can easily be automated, as detailed for example in this article: https://eos.arista.com/script-example-automating-vxlan-deployments-with-eapi/ The manual/static method suits well a small scale deployment, for example few VTEPs such as a DCI (Data centre Inter-connect), a lab, or small networks. CVX brings any complexity to the minimum, for both the initial provisioning, ongoing changes and troubleshooting. It requires setting an EOS VM for CVX, but the benefits of configuration and operational simplicity are often preferred. The CVX method suits well a medium-to-large scale deployments, with or without SDN controller/orchestrators.   b) Data-plane flow learning (Manual) versus dynamic pre-provisioning (CVX) The manual flood-list configuration also relies on data-plane flow-based learning of source MAC addresses, like traditional bridging. It means that during initial turn-ups there might be lot of unicast replication for new flows. This is not seen as a problem, because it happens only for initial frames (while the destination is unknown), but it might still be undesired. CVX with the VXLAN service will propagate the MAC information as soon as a VTEP learn a locally attached MAC address, hence remote VTEPs would know it usually before any flow is initiated, preventing most unicast replication from unknown unicasts (since they are pre-learnt). Note that CVX allows using data-plane flow-based learning, instead of MAC synchronization, if desired.    

7) Complete configuration examples

 

7.1) VXLAN service with CVX

  CVX on EOS VM

!
config
  cvx
    no shutdown
    service vxlan
      no shutdown
!

  Configuration on VTEPs:

!
vlan 100
vlan 200
!
interface Ethernet 1
  switchport access vlan 100
!
interface Ethernet 2
  switchport mode trunk
  switchport trunk allowed vlan 100,200
!
interface loopback 1
 ip address 1.1.1.1/32
!
interface Vxlan 1
  vxlan source-interface loopback 1
  vxlan controller-client
  vxlan vlan 100 vni 10100
  vxlan vlan 200 vni 10200
!
management cvx
  server host x.x.x.x
  no shut
!

 

7.2) VXLAN without CVX

Configuration on VTEPs:

!
vlan 100
vlan 200
!
interface Ethernet 1
  switchport access vlan 100
!
interface Ethernet 2
  switchport mode trunk
  switchport trunk allowed vlan 100,200
!
interface loopback 1
 ip address 1.1.1.1/32
!
interface Vxlan 1
  vxlan source-interface loopback 1
  vxlan vlan 100 vni 10100
  vxlan vlan 200 vni 10200
  vxlan vlan 100 flood vtep 2.2.2.2 4.4.4.4
  vxlan vlan 200 flood vtep 2.2.2.2 3.3.3.3 4.4.4.4
!