VxLAN Basic Troubleshooting Guide
Provide basic/generic troubleshooting steps to customers in case any VxLAN issue is encountered in their network.
Troubleshooting VxLAN involves few steps as mentioned in the upcoming sections of this document. The below referred topology includes VxLAN configurations with server 1,2,3 as the host devices which obtain connectivity over a vxlan tunnel. Troubleshooting steps are bifurcated into routing and bridging to include multiple scenarios possible.
IV. Generic Configurations to be checked
A. On the VTEPS check for the following configurations:
#show run sec vxlan interface Vxlan1 vxlan source-interface Loopback1 vxlan udp-port 4789 vxlan vlan 10 vni 100 vxlan vlan 20 vni 200 vxlan flood vtep 126.96.36.199 188.8.131.52
B. On platforms 7280E, 7280R and 7500R, TCAM profile must be enabled in order to achieve VxLAN routing. VXLAN routing specific TCAM profile must be configured using:(Whenever SVI is configured please make sure to configure recirculation)
“hardware tcam profile vxlan-routing”
Note: Support for other features that need TCAM resources (Example: PBR, QoS, ACL etc.,) may be limited when ‘vxlan-routing’ TCAM profile is configured. Check this link for the full list of limitations.
C. On Platforms 7050X, 7060CX and 7260QX (except 7050X2,7260CX-64 ,7300X) series switches, a recirculation channel is required to be created in order to perform the VxLAN routing. Please refer the below link for further details : https://eos.arista.com/eos-4-15-2f/vxlan-routing/ .
NOTE: The configuration for Vxlan routing on an MLAG VTEP should include separate Recirc-Channel configuration on both peers.
D. In the case of MLAG peers, both the leaf pairs should have identical VxLan configurations.
E. The virtual addresses across the Routing Vteps should be configured with the same virtual router MAC address. Please refer the below for further details on this.
F. In a network where Virtual IP and Virtual router mac address is used in overlay Vlans, it is always recommended to use a Virtual VTEP (secondary loopback IP) as well. This secondary VTEP ip address should be same on all routing vteps and this IP should not be configured on bridging VTEP. Make sure this virtual VTEP IP is in the floodlist on bridging VTEPs when using HER.
Please refer to this article which explains in detail the need for Virtual Vtep IP in VxLan based DC:
G. Make sure the vx1 interface is up on BOTH MLAG peers and also ensure that the SVI is unshut on both the mlag switches.
H. Ensure SVI is unshut on both switches.
I. Check MTU for bridging and routing scenario
Since VxLAN encapsulation does not involve fragmentation of packets, make sure that the MTU size that is specified on the interface is substantial enough to support the packets to egress out without requiring a fragmentation.
IV. Scenario specific troubleshooting
Considering the above-mentioned topology the following section includes scenario specific troubleshooting steps involved
Ping failing from Server-2 to Server-3 (End to End, Bridging in Vlan 20)
1. On VTEP-2 and VTEP-3 check if mac address of SERVER-2 is learned or not. This is will confirm that VTEP-2 is receiving packets from SERVER-2 or not.
#show mac address-table address H.H.H [H.H.H is the mac address of server 2 and 3]
2. If mac address of SERVER-2 is not learned on VTEP-3, then the Vxlan tunnel is not working properly in which case perform the below steps:
a) Check if the VLAN to VNI mapping is consistent across the two VTEPS.
#show run sec vxlan vxlan vlan 3901 vni 13901
b) Check if the loopback IP’s of VTEP-2 and VTEP-3 are present in the Vxlan Flood list or not
show run sec vxlan vxlan flood vtep 184.108.40.206 220.127.116.11 [Output on VTEP-2]
c) ping 18.104.22.168 sourcing from loopback IP of VTEP 2 to ensure bidirectional VTEP connectivity
In case of an MLAG setup, ping to a remote VTEP Loopback IP will work from only one of the MLAG peer if the ping is sourced from the Vxlan Loopback IP .This is because MLAG peer will have same Vxlan Loopback IP and hence Spine will have an ECMP path to reach the loopback IP. So the return traffic (ICMP reply) may hash to either of the MLAG peers.
After checking all the configuration checks mentioned in section III, please follow the below steps to troubleshoot Vxlan routing ping failure further.
1. On VTEP-1, check if ARP of Server-3 is resolved or not.
#show ip arp A.B.C.D [A.B.C.D is the ip address of the server-3]
2. If ARP of SERVER-3 is not resolved on VTEP-1 then:
To check if the server 3 is receiving the ARP requests, on the VTEP-3 on VTEP-3 and check if the counters “vxlan encapsulated arps” and “arp request decaped and sent” are incrementing or not
(NOTE: if there is other Vxlan ARP coming to VTEP-3, it will be difficult to distinguish between other ARPs and ARP generated by our concerned source-VTEP-1 in this case, as counter will increment for all Vxlan ARPs).
#show vxlan counters varp | grep -i "arp request decaped and sent" arp request decaped and sent : 0 #show vxlan counters varp | grep -i "vxlan encapsulated arps" vxlan encapsulated arps : 0
3. Check that virtual mac (which is configured on routing vtep) is not configured on Bridging VTEP (VTEP-3).
Collect the following outputs:
- Take a packet capture on the interface/NIC to see if ARP request is getting generated or not and whether it is receiving ARP reply from SERVER-3 or not.
- “show interface vxlan 1” from the VTEPs
- On VTEP-1 collect “tcpdump int vlan <vlan num>” to see if the ARP request from Server-1 is received or not.
- On Server-1 take a packet capture to check whether it is sending traffic properly or not with destination mac as Virtual MAC ( in case of VxLAN routing)
- On VTEP-1 Collect “tcpdump int vlan <vlan #> filter arp” to verify if ARP is being generated or not.
Note: Make sure that there is active traffic.
On VTEP-1, you can collect tcpdump on the uplink interface (that connects to Core/Spine and has route to other VTEPs) to check if ARP is getting Vxlan encapsulated or not. If there is multiple uplinks (ECMP), take tcpdump on all paths one by one.
Please collect the following outputs in addition to the above for both bridging and routing:
show tech-support | gzip > /mnt/flash/SR_NUMBER-show-tech-$HOSTNAME-$(date +%m_%d.%H%M).gz show agent log | gzip > /mnt/flash/SR_NUMBER-show-agentlog-$HOSTNAME-$(date +%m_%d.%H%M).gz show agent qtrace | gzip >/mnt/flash/SR_NUMBER-show-agentqt-$HOSTNAME-$(date +%m_%d.%H%M).gz show logging system | gzip >/mnt/flash/SR_NUMBER-show-logsys-$HOSTNAME-$(date +%m_%d.%H%M).gz bash sudo tar -cvf - /mnt/flash/schedule/tech-support/* > /mnt/flash/SR_NUMBER-history-tech-$HOSTNAME-$(date +%d_%m.%H%M).tar
For 7050X, 7060X, 7250X, 7300X Series platforms: “#show platform trident l3 shadow routes host”
** Please feel free to reach out to email@example.com for any assistance or queries.