I have a few questions centered around the EVPN configuration and whether certain things are possible with EOS. It’s been a year or so since I last played around with EOS in a lab or production environment; I’m glad to finally see EVPN rolled out. So bear with me here.
Drawing is attached. This is all virtual, running Cumulus’ VX image and the vEOS image on my FreeBSD server. The images all run fine, interconnect fine, from BGP sessions with one another, etc. So the “virtual” side of this is all good. The first two spine nodes are Cumulus. The second two spines are vEOS. The leaf nodes are all Cumulus. Basically I’m trying to wedge vEOS into this already-running Cumulus lab.
Specially, I’m trying to build what I think the industry has named “Spline (I dislike most of these silly names). Basically: a spine that can route, switch, and participate in VXLAN. You can see from the drawing that my servers are all in VLAN 100. I want the spines to route for VLAN 100, so that any ingress packets from the AGG router up north, destined for the servers, will get sent to the appropriate leaf node without hair-pinning. Meaning I need this on the spine:
Obviously this interface won’t come up without VXLAN running, since I don’t have any Access or Trunk ports on VLAN 100 on the spine. So:
Here are the appropriate snippets from BGP:
The Ethernet interfaces are all configured out of the 10.0.6.0/24 range, and the loops are all in 10.100.0.0/24. We have BGP and EVPN:
2. Even with the VXLAN1 interface up, none of the server ARPs are being installed. And I can’t ping the servers either because there’s no contiguous L2 between the vEOS spines and the leaf nodes.
C 172.16.0.0/24 is directly connected, Vlan100
— 172.16.0.101 ping statistics —
eos-spine01#show bgp evpn vni 10100
Network Next Hop Metric LocPref Weight Path
Is what I’m trying to do even possible with vEOS? Can a Spine be intelligent when it comes to the L2 on the attached leaf nodes? Obviously with unicast VXLAN configuration it’ll work just fine. But I’m trying to make this work with EVPN.
And as a follow-up: I apologize for the very bad formatting in my post. If it gets confusing, ask.
Some sanity checks:
1. Have you enabled multi-agent BGP to support EVPN?
2. I don’t see MAC-VRF defined under “router bgp 65200” on the Spine. Not sure if you pasted your complete BGP configs. Most likely you’re missing it and this could be the reason for flood list not being auto populated.
3. With EVPN as the VxLAN control plane, you do not have to statically configure the flood list; it is automatically built using Type-3 IMET routes. What do you see under “show bgp evpn route-type imet vni 10100”?
4. With EVPN + VxLAN, Overlay anycast gateway on EOS is only supported with “ip address virtual” anycast option. Check this article for difference between VARP and “ip address virtual” anycast options.
The design described should be supported on an all-EOS environment. You can refer to this link for details including config examples. Note that it’s supported only starting EOS 4.20.6F.
Some general comments:
I see you have only one IP under the Overlay SVI. What’s the gateway IP on the servers in your current setup?
Do you have the same IP on all 4 Spines/Splines? If you have the same IP, the MAC address corresponding to the ARP entry on the servers is dictated by the timing of ARP responses received from each of the 4 Overlay gateways. Since the IP is the same, all 4 Spines would respond to the ARPs for gateway IP (with it’s own MAC) and this could cause “ARP flap” on the hosts the first time. I believe the response that comes last would be cached and the server always would use that gateway. So Server-1 may use Spine-3, Server-2 may use Spine-4 etc., So, things will not deterministic with regards to Overlay routing from South to North.
If you had the same vIP (anycast Overlay gateway) on all Spines, it has to be tied to a common virtual MAC and you’ll need to define a vVTEP. Here’s some specifics around this on EOS:
1. Overlay anycast gateway is tied to a vMAC and we have the notion of a virtual VTEP which is the secondary IP on the VTI. This vVTEP IP (also called VARP VTEP IP) is shared by all the routing VTEPs.
2. The vMAC on the bridging-only VTEPs (Leafs in your topology) is learned behind the vVTEP. Essentially vVTEP avoids MAC flaps on the bridging-only VTEPs.
3. vVTEP is a hard requirement in designs involving bridging-only (no SVI) VTEPs.
In this design, all 4 Spines (Overlay anycast gateways) can equally route the packets North-bound. In your topology, each Leaf will learn vVTEP as a 4-way ECMP from each of the 4 Spines. This IP will be used as the destination IP in the outer packet post VxLAN encap from Leaf to Spine. The load distribution is dictated by flow-based hashing (ECMP) from Leaf layer to Spine layer.
Refer to this link for additional details on vVTEP.
I can’t speak for the implementation specifics on Cumulus but all these things can play a big role in interop.
I wonder what’s your real use case for running mix of vendors at the same layer – Spine (Spline)? Possibly a migration where you’ll have this topology as an interim one? The reason I ask is due to the nuances around anycast gateway implementation and the design in question is not one of the “standard” inter-subnet forwarding use case [Symmetric or Asymmetric IRB] with Overlay functions happening on both Leaf and the Spine layer.
Thanks for the detailed answer and reply. Let me try to fill in some of the blanks as I’ve been continually learning and hacking my way through this.
Here’s the entire BGP snippet from the Arista spine:
The spine shouldn’t/doesn’t need an Anycast IP address because it’s not serving as the default route for VLAN 100. It’s only acting as an ingress point for the VLAN. Each spine has a different VLAN 100 IP address. The leaf nodes are doing the default routing for the VLAN and have the Anycast IP. The servers are all pointed at 172.16.0.1 for their default. All of that is working just fine when the Aristas aren’t in the mix. When they are, the ingress routing to VLAN 100 from upstream doesn’t work in 50% of the cases because the AGG router (see: the diagram) chooses the Aristas, which have a VLAN100 interface up, but no L2 path to the VLAN.
I’ve also changed the Vxlan1 interface config:
And at this point: the Vlan100 interface is up, the Vxlan1 interface is up, and the spine sees VTEPs (which are 4 Cumulus leaf nodes). For instance:
Those IPs are the 4 leaf node loopbacks.
The EVPN part is working. Again, refer back to the diagram. I went to leaf nodes 01 and 02, and disabled their respective uplinks to spines01 and 02. This means they only had uplinks to the Aristas. Then, from server01, I tried pinging server02 and it worked fine. Further, server01 had good ARP knowledge of server02. So the spines are properly routing EVPN the way they’re supposed to.
It’s the IP ingress I’m also trying to solve, and that’s what’s puzzling me. I’m certain it’s doable, and I’m likely just missing something simple.
Also let me answer this:
“I wonder what’s your real use case for running mix of vendors at the same layer – “
Learning. Literally, I’m learning. This whole network is a bunch of VMs running on my FreeBSD server and I’m doing this all to keep my brain firing while I’m out of work. The last VXLAN work I did in production with Arista was before you guys launched EVPN and just after you finally added line cards to the 7500-series that could do VXLAN in hardware. So it’s been a little while.
And yes, I did enable multi-agent BGP to support EVPN. Sorry I forgot to include that in my previous replies.
It appears you are doing something smart to optimize the data path for South-bound traffic :)
With EVPN, Overlay gateway is currently only supported with “ip address virtual” option. I understand Spine is not your gateway but only acts as an Overlay entry point into the fabric. Can you try configuring the IP as:
..and also define a vMAC
ip virtual-router mac-address 00:aa:aa:aa:aa:aa
Alternate option to avoid tromboning with EOS is to enable host route injection on the Leaf switches based on ARP cache. I do not know if there’s something similar on Cumulus. That way you don’t need any Overlay function on the Spines but the price you pay is host route scale with /32 routes injected into the underlay.
I know you are only trying out things from a learning perspective. In a real DC fabric, I’d recommend introducing a pair of Service Leafs and concentrate all the services there. Things would look more cleaner and operationally simpler with this design. I have described it briefly in this previous post.
Test post to see if the forums are repaired; I haven’t been able to add in a reply because of an error.
OK, forums repairs. Ignore the last post.
Thanks for the reply. A few things:
1. Yes, I’m trying to optimize southbound traffic without the need for the service leaf layer. I’m adamantly opposed to the service leaf concept because it, by definition, makes your entire network less reliable. I can go through the mathematical formula to prove it if you’d like; just know it’s an inarguable fact: the more devices you put in the path of your packet, the less reliable your network is. So, as network architects, it behooves us to remove devices where we can and where it makes sense. Not add them!
2. Using the “virtual router” IP and MAC didn’t make a difference. The issue at hand here is that VXLAN information isn’t being exchanged. Specifically the MAC addresses aren’t getting from the leaf nodes to the Arista spines, such that the spine has L2 (and therefore L3) knowledge of the server. Ultimately, I think that’s the problem I need to figure out: how to get VXLAN to inter-op between Cumulus leaf nodes and Arista Splines. The EVPN part is working quite well, however.
3. Yes, the host route idea works fine. Cumulus has a similar concept. I was trying to avoid doing that because, as you point out, it’s a less efficient use of FIB space.
4. I replaced one of my Cumulus leaf nodes with an Arista leaf node in my virtual lab. The Arista spine and leaf nodes instantly synced up and began exchanging EVPN and VXLAN data. And spine was able to ping the server directly connected to the Arista leaf, no problem. The ARP entry is installed, etc. All known via interfaces VLAN100 and Vxlan1
After performing that last exercise, I’ve come to the conclusion that there’s an inter-op challenge between the two operating systems. At least there appears to be. What that inter-op issue is, is still a mystery.
Regarding  I understand where you are coming from but there are several benefits to a UCN L3LS design with Services Leaf approach keeping the Spine/Core layer simple and “dumb” depending on how you look at it. I do not want to deviate from the topic of interest on this thread. You can consult with your Arista SE if you’d like to chat more on this.
Hi, there is a known issue with vEOS and Cumulus VX in the VXLAN forwarding plane, which is resolved in the 4.20.7M release and later can you confirm what vEOS image you are running?
In the design, how is the Agg node connected to the 4 spine nodes, is it layer 2 connected or is it P-to-P layer 3 links to each Spine, with the Spine nodes routing traffic into VLAN 100 and VXLAN bridging directly to the appropriate leaf node? Another approach which would scale better, would be to use a symmetric IRB model where an IP-VRF is used to connect the overlay subnets on the leaf and Spine nodes, with the Spine learning the 172.16.0.0/24 subnet (type-5) and any type-2 route via the IP-VRF, thus allowing the Spine to directly VXLAN route the traffic to Servers without the need for any traffic tromboning.
If the symmetric IRB model is not appropriate and the vEOS switches are running the correct config, can you send a copy of the EVPN RIB type-2,3, and 5 (“show bgp epvn route xxxx detail) and output of “show ip route detail” on an Arista Spine, Cumulus Spine and Cumulus Leaf node, along with the mac table on the switches. Rather than post the information, can you send the information as attached file. From the output above the type-2 routes are being learnt on the Arista Spine, are these not being inserted in the MAC and ARP table e.g.
eos-spine01#show bgp evpn vni 10100
Network Next Hop Metric LocPref Weight Path
Hi Alex –
Three files attached that should hopefully show you what you need to know. Let me know if more is needed. To answer a couple of other questions:
– The agg router is connected via simple IPv4 P2P ints, with EBGP peering on those ints. No L2 happening there at all. It’s just a way into and out of the virtual network.
– I was trying to do this without complicating the configs with VRFs. Is that still the suggested method? So I create a separate VRF for the data plane and put the VLAN100 interface into it? What are your suggestions there?
Marked as spam
Hi JVP, I didn’t receive the files can you resend them, thanks.
Alex – I wonder if that’s because someone marked my reply as “spam”? Odd. I’ve attached them to this message. Hopefully they make it through.
The attachments are not making it through. What is the file format of your attachments?
They’re ASCII .txt files.
Hi JVP, i received the files, thanks.
Looking at the Arista Spine config and the EVPN routes, the Arista spine is correctly learning the Type-2 EVPN routes from each of the Cumulus leaf switches:
Type-2 Route for Server-1 from the Cumulus (leaf-1?)- 10.100.0.3
BGP routing table entry for mac-ip 589c.fc00.5e43 172.16.0.101, Route Distinguisher: 10.100.0.3:2
Type-2 Route (MAC + IP) for Server-2 from the Cumulus leaf -2 – 10.100.0.4
BGP routing table entry for mac-ip 589c.fc0d.3123 172.16.0.102, Route Distinguisher: 10.100.0.4:2
Type-2 Route (MAC + IP) for Server-3 from the Cumulus leaf -3 – 10.100.0.5
BGP routing table entry for mac-ip 589c.fc08.d42e 172.16.0.103, Route Distinguisher: 65200:100
However, although the routes are being correctly learnt on the Arista Spine, they are not all being imported into the MAC FIB because of the Route-target config:
– Leaf-1 has a RT of 65201:10100 –> is this correct?
The MAC-VRF config on the Arista Spine is configured with Import/export RT of 65200:100
Can you configure a consist RT across the leaf and Spine switches for VLAN 100, based on the VNI/VLAN rather than the AS number, or configure multiple import rules on the Arista spine and Cumulus switches i.e.
For the actual configuration, yes I would recommend creating an IP-VRF in the EVPN instance, with the Spine and Leaf switches members of IP-VRF and their own local subnets. For example, Spine-1 would connected to the Agg switch in VLAN 11, Sub-11 and the IP-VRF. Traffic to VLAN 11 from the Agg switch would then be VXLAN routed from the Spine into the IP-VRF and across the fabric to the Cumulus leaf where the host resides. For example on an Arista Spine:
Let’s try a .tar file with the 3 text files inside of it. I have a sinking feeling it’s the “moderation” that’s doing this.
Hey Alex –
Thanks for working through this with me. It was, in fact, the route target configuration on the EOS devices. Simply changing them over as you suggested instantly made everything work perfectly. The Spines have full ARP knowledge of all of the servers on VLAN100 now, which is precisely what I was after.
I’ve marked this as resolved. Thanks!
Post your Answer
You must be logged in to post an answer.