Hi I understand how vxlan direct routing works from the vxlan perspective.
The only thing I cannot grasp my head around is how does the traffic flow from ping for 18.104.22.168 from OUTSIDE DATACENTER -> SPINE eBGP-> multiple different LEAF MLAG VXLAN PAIR with different VTEP ex: 22.214.171.124 vtep 1, 126.96.36.199 vtep 2 -> server 188.8.131.52 connected to vtep 1 pair. vARP on all pairs advertising 184.108.40.206 as the default gateway.
Once it comes into the spine how does the spine forward traffic to the get to the server ip addresses? Does it go through one of the vtep ip’s and once it hits it the VTEP floods the ping to all other vtep members? Once the one with the local connection does it forward it there?
Spine must have a route to reach the server’s subnet. In this scenario, traffic that is coming from outside the datacenter—>spine—>leaf–>server will be not vxlan encapsulated. Hence Spine should have a route to reach the server and the route will typically point towards the leaf/vtep where server is connected. This is called Naked routing.
However, the return traffic from server would be vxlan encapsulated as per the design
Ok, in this scenario since its pretty real world. In the spine routing table it has all of the leafs equally load balanced for the servers subnet so how does it know to forward to the correct leaf pair?
Lets say this was the routing table on the spine for that server subnet. All of the leaf mlag domains are configured with vARP as 220.127.116.11. All dual homed to each spine. How would it “know” where the server is sitting since for argument sake lets say it was sitting on this leaf 18.104.22.168, Ethernet4/2/1?
B E 22.214.171.124/24 [20/0] via 126.96.36.199, Ethernet3/1/1
The Spine will see a N-way ECMP route to the server subnet in question, ‘N’ being the number of Leaf VTEPs serving the server subnet with a anycast gateway. The inbound traffic to 188.8.131.52 can hash to any of the Leafs due to ECMP. If it hashes to the “wrong” Leaf (highly likely), the traffic will be VxLAN bridged to the right VTEP where the host lives. Essentially, the data path for inbound traffic will be sub-optimal traversing the Spine twice:
Outside Data Center –> Spine (Route) –> VTEP (Route + VxLAN bridge Encap) –> Spine (Route/Underlay) –> Target VTEP (VxLAN bridge Decap) –> Host
You can avoid this by configuring the VTEPs to inject host routes based on the local ARP cache. This feature is supported starting EOS 4.20.1F. More details on this feature: https://eos.arista.com/eos-4-20-1f/hostinject/
Note that you’ll need to use this feature with caution. Depending on the number of hosts/subnets in your network, host route injection can possibly cause scale concerns with the underlay carrying all the /32 host routes. The data path with host route injection:
Outside Data Center –> Spine (Naked Route) –> Target VTEP (Route) –> Host
A more elegant design for DC external connectivity is through a pair of Service Leafs which are also VxLAN VTEPs. With this approach, the data path for inbound traffic will be:
Outside Data Center –> Service Leaf (Route + VxLAN bridge Encap) –> Target VTEP (VxLAN bridge Decap) –> Host
In this design, you don’t have to inject host routes and there are no scale concerns due to /32 routes.
Thank you for the helpful feedback! Would you know if the service leaf design you mentioned is a documented design option?
Yes, it’s our standard design recommendation for services and external connectivity. You can find some useful information in the L3LS design guide here:
If you’d like to have a detailed design discussion/review, please consult your Arista SE or engage with Arista Professional Services team.
Post your Answer
You must be logged in to post an answer.