Tech Note: Centralized vs. Distributed VxLAN Routing with EVPN
Over the past few years EVPN VxLAN deployments have become an increasingly popular overlay architecture selected by customers, primarily in data-center layer 3 leaf-spine (L3LS) fabrics. With this popularity, numerous deployment topologies, and configuration options have presented themselves. This article reflects our observations based on real-world deployment experiences on one such choice; centralized vs. distributed gateways.
When deploying EVPN VXLAN integrated routing and bridging (IRB), both VXLAN bridging and VXLAN routing are required concurrently on the switch. This capability is also commonly referred to as an EVPN VxLAN gateway. There are two general architectural choices on where to place these Gateways: 1) On each and every Leaf switch, known as distributed VxLAN routing. Or 2) On a dedicated set of gateways, known as centralized VXLAN routing. Both deployments exist in customer networks today.
Let’s compare the learning behavior, and traffic flow of these two architectures in a little more detail. One obvious choice is to use every Leaf switch as “L3 VXLAN Gateway”. These TOR-leafs learn local MAC addresses, and because the TOR-leafs are also the Layer 3 gateways (L3 GWs) for local VLANs, they also learn the ARP/ND information for the “local” hosts. The TOR-leafs can then advertise these MAC and ARP/ND entries to remote TOR-Leafs via the BGP EVPN Control plane. This allows all the TOR-leafs to discover all the “remote” hosts without flooding the overlay with ARP/ND requests; AKA ARP suppression.
Figure 1: Distributed Gateway Model
As illustrated in Figure 1 above, with distributed gateways the inter-subnet traffic is capable of being routed locally, and this local routing capability can be further extended to layer-3 multi-tenancy topologies where the TOR-leafs also have multiple routing domains configured using Layer 3 BGP EVPN instances.
The alternate approach is instead to adopt the “centralized routing” model. In this approach, host Layer 2 traffic is bridged to all the centralized L3 VxLAN GWs for inter-subnet (VXLAN) routing, and also to route to external prefixes . All the L3 VXLAN gateways use the same “any-cast virtual IP” and “any-cast gateway mac address”, thus traffic is load-balanced as is illustrated in Figure 2a and 2b.
Figure 2a: Centralized Gateway, Dedicated MLAG Pair
Figure 2b: Centralized Gateway, Utilizing Spines
For the choice of centralized vs. distributed gateways, let’s explore the questions: What are the pros and cons of each approach? When should one be chosen over the other? Let’s answer these questions.
As previously mentioned, in the distributed gateway model, local inter-subnet traffic can be routed at the local ToR-Leaf. Compare this to the centralized gateway model where even traffic routed within the same rack has to go over the Spine, then to the Centralized Gateway. Not only does this increase the latency, it also requires all inter-subnet traffic to go over the L3LS fabric, increasing the bandwidth requirements on the L3LS fabric.
On a related point, in the distributed gateway model, where there is a local IRB on TOR-leaf, the TOR-leafs exchange ARP routes in BGP EVPN. With this ARP/ND information, the TOR-leaf always perform ARP/ND proxy and hence avoid sending BUM traffic to the IP fabric (underlay). Compare this to the TOR-leafs in the “centralized routing model” which do not have IRB interfaces configured, therefore they do not learn, and hence do not exchange, ARP/ND information. The downside of this is that the BGP EVPN control plane is no longer being used to exchange ARP/ND information, thus there is no ARP/ND Proxy support, and no way to suppress ARP/ND flooding in the overlay in centralized routing architecture. Furthermore, all ARP entries must be learnt on the centralized GW increasing the HW resources, control plane scale, and forwarding performance on these GW platforms.
Another advantage of the distributed gateway architecture is that the ToR-Leafs, by having full routing capability, can minimize, or at least optimize, consumption of the limited Layer 2 resources such as MAC and ARP HW tables. Instead they can use the larger L3 routing tables to program reachability entries for remote hosts in the form of host routes. Compare this to the centralized gateway architecture where all ARP and MAC entries must be learnt and programmed in HW by the L3 gateways.
All of the previous points could lead the reader to determine that a distributed gateway design is always the best option; however there are some upsides to the centralized GW approach.
The centralized gateway design is arguably very simple: EVPN is only used for MAC advertisements, so all routing is implemented in a “known central location”, thereby allowing inter-VLAN or inter-tenant communication to be more easily controlled and traced. Not to mention that the required capabilities of the ToR-leafs are simple, as they only need to support EVPN VxLAN bridging.
Centralized VxLAN routing also works well when the majority of the traffic is North-South (N-S), or if a centralized services environment is required to control traffic between customer segments via a Firewall as an example. As illustrated in Figures 3a and 3b below: If the traffic flow is mostly N-S, the number of hops required to egress the overlay is less in the centralized model. Additionally any services can then also be located centrally, reducing unnecessary hops, or source routing requirements, as traffic is routed via the service complex before being routed externally. Furthermore, the ability to scale out greater than two centralized gateways is also a benefit for high bandwidth environments, as shown in Figure 3b
Figure 3a: Centralized GW IP Multi-tenancy. MLAG Pair
Figure 3b: Centralized GW IP Multi-tenancy. Utilizing Spines
Finally, because each gateway is independently learning the MAC addresses from the ToR-leafs and is resolving the remote ARP bindings, when any of the any-cast gateways go “out-of-service”, the underlay detects the failure and quickly re-converges. This avoids the single “point of failure” issue, and the customer traffic does not “blackhole” due to the failed gateway.
In summary, the choice of where to implement the EVPN VxLAN gateway largely depends on the customers application requirements, the application traffic patterns, and what model customer operational teams feel they are most comfortable supporting. In our experience, with most customers’ traffic patterns, flexibility and scale requirements, a symmetric IRB distributed gateway design is most often chosen. This is mainly because of the potential ARP/MAC scale issues on the centralized GW, and that the ‘split’ layer 2 and layer 3 EVPN design of the centralized gateway design is seen as overly complex and restrictive. That being said; in Arista we’re constantly striving to give choice to customers and to provide the most optimal solution on a wide selection of platforms. To this end, Arista EOS supports all options; VxLAN symmetric IRB, asymmetric IRB routing, full layer 3 multi-tenancy, and both centralized and distributed routing routing functionality on all platforms and form factors. Given the ubiquitous platform choice and variety of configuration options, design discussions with customers often center around what is the best design for their requirements, and not one of trying to make one design work for all. This is why customers see Arista as such a compelling choice when deploying EVPN VXLAN.
For further reading please refer to the following document: https://eos.arista.com/eos-4-20-6f/evpn-centralized-anycast-gateway/