Posted on June 16, 2021 2:54 am
 |  Asked by Omkar Dhargalkar
 |  214 views
RESOLVED
0
0
Print Friendly, PDF & Email

Hello Folks,

I am trying to simulate a L3LS topology with two leaf PODs and two spine switches on gns3 using vEOS, EOS-4.25.3.1M
AS for spine switches: 65000
AS for leaf switches: 65055
I have created EBGP between leaf and spine switches and IBGP between MLAG peers of the PODs. Somehow, I could not get my EVPN to work. On all the Leaf switches, show bgp evpn summary command shows that the spine leaf switches are identified as as neighbors, however, the bgp state remain in ‘connect’ state. For now, I just want server01(10.96.10.1) connected to POD1 to successfully talk with server02 connected to POD2(10.96.10.2)

Following is the output from leaf1p1 regarding evpn status:

leaf1p1#show bgp evpn summary
BGP summary information for VRF default
Router identifier 100.82.0.11, local AS number 65055
Neighbor Status Codes: m – Under maintenance
Description Neighbor V AS MsgRcvd MsgSent InQ OutQ Up/Down State PfxRcd PfxAcc
spine1 100.82.0.1 4 65000 0 0 0 0 2d06h Connect
spine2 100.82.0.2 4 65000 0 0 0 0 2d06h Connect

leaf1p1#show vxlan flood vtep vlan 10
VXLAN Flood VTEP Table
——————————————————————————–

VLANS Ip Address
—————————– ————————————————

I have attached the running-configs as well as some command outputs to help you understand the topology. Moreover, I have also provided the network diagram for your reference.

Please review my config and please help me identify my issue in the config.

0
Posted by Aniket Bhowmick
Answered on June 16, 2021 4:39 am

Hi Omkar

Thank you for posting your query on EOS forum !

Based on the running config, we found the following misconfiguration which is causing EVPN to not come up:

1. On both Spines, you specified the following command under router bgp:

  • bgp listen range 100.82.1.0/27 peer-group EVPN peer-filter PF-EVPN
  • bgp listen range 100.82.3.0/27 peer-group LEAF peer-filter PF-LEAF

2. The above "bgp listen range" doesn't cover the subnet of Loopback0 of Leaf1p1 and Leaf2p1, Loopback0 is the EVPN source interface on both Leaf:

  • Leaf1p1- Loopback0:  100.82.0.11/32       <---- on Leaf1p1
  • Leaf2p1-Loopback0: 100.82.0.21/32        <---- on Leaf2p1
  • neighbor EVPN update-source Loopback0    <-- on both Leaf

3. You need to add another "bgp listen range" on spine to allow the subnets of Loopback0 of both Leafs.

4. Spine-2 doesn't have the following config in  which is present on Spine-1:

  • peer-filter PF-EVPN
    10 match as-range 65055 result accept

Regards,

Aniket

 

0
Answered on June 16, 2021 5:24 am

Hi Omkar,

Thanks for reaching out.

In addition to what Aniket  mentioned above, please find the below observations:

From running-config logs ,we could see the MLAG VTEPS has same Loopback1 ip address(100.82.1.1 on p1 peers and 100.82.1.2 on p2 peers) in vEOS setup.Loopback1 ip address is used as source interface for Vxlan encapsulation on all the leaf devices:

interface vxlan 1

vxlan source-interface Loopback1

 

In vEOS setup it is not recommended to use the same Loopback ip as Vxlan source for MLAG peers since by default, the kernel(software) will drop an incoming packet that has a source address matching one of the local interfaces on the device.

So let's consider this  scenario:

a)Let's say if one of the uplinks fails (assume MLAG1 -->spine1 link fails) and the packet needs to be routed to spine via it's peer i.e MLAG2 device then MLAG2 will simply drop the packet instead of forwarding since MLAG2 sees the packet incoming from MLAG1 device with source ip(SIP) as 100.82.1.1 and since this ip is there on the local interface of MLAG2 itself(lo1 ip-100.82.1.1) the kernel refuses to accept such packets by default and will drop the packets instead of forwarding it to spine.

This is a limitation only in vEOS. However in Hardware platforms it works fine even if we have the same ip's on both the MLAG VTEP's.(also best recommended practice is to have the same VTEP IP on both MLAG peers on Hardware platforms).

Workaround:

a)You can either configure unique loopback 1 ip on the MLAG peers in vEOS setup to avoid traffic loss situation

(or)

b)In case if you wish to have the same loopback1 ip on both the MLAG peers then you can over-ride this on a per-interface basis.

Follow the below procedure:

switch(conf)#bash

Arista Networks EOS shell

[admin@switch ~]$
[admin@switch ~]$ cat /proc/sys/net/ipv4/conf/vlan10/accept_local

0

[admin@switch ~]$ sudo su
bash-4.2# sudo echo 1 > /proc/sys/net/ipv4/conf/vlan10/accept_local
bash-4.2# cat /proc/sys/net/ipv4/conf/vlan10/accept_local

1

bash-4.2# exit

In this scenario it looks like it belongs to vlan10 from were the VXLAN packets come from, thus feel free to change the interface accordingly and you can also get list of interfaces with 'ls -l /proc/sys/net/ipv4/conf' command.

 

You can also discuss and get recommendation from your Arista SE regarding this issue.

Thanks,

Bhavana.

0
Posted by Omkar Dhargalkar
Answered on June 17, 2021 3:02 am

Thanks Aniket for spotting those config errors. EVPN has now been established successfully.

However, VXLAN is not working between POD 1 and POD 2. One of the reason behind could be that the VxLAN 1(100.82.1.1) from POD 1 could not ping VxLAN 1(100.82.1.2) in POD2. I guess, there is no IP connectivity between Loopback 1 of POD 1 and POD 2 as shown below:

leaf1p1#ping 100.82.1.2 source 100.82.1.1
PING 100.82.1.2 (100.82.1.2) from 100.82.1.1 : 72(100) bytes of data.
ping: sendmsg: Network is unreachable
ping: sendmsg: Network is unreachable
ping: sendmsg: Network is unreachable
ping: sendmsg: Network is unreachable
ping: sendmsg: Network is unreachable

--- 100.82.1.2 ping statistics ---
5 packets transmitted, 0 received, 100% packet loss, time 48ms

I guess, the BGP EVPN in POD 1 and 2 are not advertising the Loopback addresses to each other.

leaf1p1#sh ip route 100.82.1.2

VRF: default
Codes: C - connected, S - static, K - kernel,
O - OSPF, IA - OSPF inter area, E1 - OSPF external type 1,
E2 - OSPF external type 2, N1 - OSPF NSSA external type 1,
N2 - OSPF NSSA external type2, B - BGP, B I - iBGP, B E - eBGP,
R - RIP, I L1 - IS-IS level 1, I L2 - IS-IS level 2,
O3 - OSPFv3, A B - BGP Aggregate, A O - OSPF Summary,
NG - Nexthop Group Static Route, V - VXLAN Control Service,
DH - DHCP client installed default route, M - Martian,
DP - Dynamic Policy Route, L - VRF Leaked,
RC - Route Cache Route

Gateway of last resort is not set

leaf1p1#sh ip route vrf prod

VRF: prod
Codes: C - connected, S - static, K - kernel,
O - OSPF, IA - OSPF inter area, E1 - OSPF external type 1,
E2 - OSPF external type 2, N1 - OSPF NSSA external type 1,
N2 - OSPF NSSA external type2, B - BGP, B I - iBGP, B E - eBGP,
R - RIP, I L1 - IS-IS level 1, I L2 - IS-IS level 2,
O3 - OSPFv3, A B - BGP Aggregate, A O - OSPF Summary,
NG - Nexthop Group Static Route, V - VXLAN Control Service,
DH - DHCP client installed default route, M - Martian,
DP - Dynamic Policy Route, L - VRF Leaked,
RC - Route Cache Route

Gateway of last resort is not set

C 10.96.10.0/24 is directly connected, Vlan10

Can you please take a look at config and let me know what is the issue?

 

Thanks,

Omkar

 

Attachments:
0
Posted by Aniket Bhowmick
Answered on June 17, 2021 5:00 am

Hi Omkar

I didn't find any obvious misconfiguration that would explain the missing route of Loopback1 on both pods.

To debug further, can you send us the below outputs from all the 4 leafs and 2 spines:

  • show ip bgp 100.82.1.1/32 detail
  • show ip bgp 100.82.1.2/32 detail

Then collect the following output from all the leafs:  leaf1p1/2p1 and leaf1p2/2p2 (run the command for both Spine ebgp neighbour IP ):

  • show ip bgp neighbors <ebgp-spine-neighbour-ip> advertised-routes
  • show ip bgp neighbors <ebgp-spine-neighbour-ip> received-routes

Collect the following output from both the spines: Spine1, Spin2 (run the command for all leaf ebgp neighbour IP):

  • show ip bgp neighbors <ebgp-leaf-neighbour-ip> advertised-routes
  • show ip bgp neighbors <ebgp-leaf-neighbour-ip> received-routes
  • show ip route 100.82.1.1
  • show ip route 100.82.1.2
  • show kernel ip route | grep 100.82.1.1
  • show kernel ip route | grep 100.82.1.1

After you collect all the above outputs, can you try removing the following command (under "router bgp 65000") from Spine1 and Spine2 and see if issue resolves:

  • update wait-install

^ The reason why  I am asking to remove this is: with "update wait-install" , Spine1/2 will advertise the prefix (Lo1) only if it is installed in kernel/software. If for any reason, it is not installed, it will not forward the prefix to the Leafs (will wait for it to be installed). If that is the case, we should see the routes in Leaf switches after removing this command from the Spines (but not necessary ping would work).

Regards,

Aniket

0
Posted by Omkar Dhargalkar
Answered on June 18, 2021 3:20 am

Aniket,

Thanks for all your help! As you suggested, I got rid of update wait-install command from all the switches.

After that, I can ping server02 from server1 and it seems that the EVPN has started working.

leaf1p1#show vxlan flood vtep vlan 10
VXLAN Flood VTEP Table
--------------------------------------------------------------------------------

VLANS Ip Address
----------------------------- ------------------------------------------------
10 100.82.1.2
leaf1p1#show vxlan address-table vlan 10
Vxlan Mac Address Table
----------------------------------------------------------------------

VLAN Mac Address Type Prt VTEP Moves Last Move
---- ----------- ---- --- ---- ----- ---------
10 0c84.6d22.f8d9 EVPN Vx1 100.82.1.2 1 0:03:51 ago

All the VTEPs could now be found on POD1 and POD2 switches as you can see from the attached text file.

However, I still could not ping 100.82.1.2 from leaf1p1 using source 100.82.1.1 as well as I could not 100.82.1.1 from leaf1p2 using source 100.82.1.2.

On the other hand, these PINGs work from second switch of every POD as below:

leaf2p1#ping 100.82.1.2 source 100.82.1.1
PING 100.82.1.2 (100.82.1.2) from 100.82.1.1 : 72(100) bytes of data.
80 bytes from 100.82.1.2: icmp_seq=1 ttl=63 time=7.89 ms
80 bytes from 100.82.1.2: icmp_seq=2 ttl=63 time=7.29 ms
80 bytes from 100.82.1.2: icmp_seq=3 ttl=63 time=6.98 ms
80 bytes from 100.82.1.2: icmp_seq=4 ttl=63 time=6.99 ms
80 bytes from 100.82.1.2: icmp_seq=5 ttl=63 time=6.35 ms

--- 100.82.1.2 ping statistics ---
5 packets transmitted, 5 received, 0% packet loss, time 35ms
rtt min/avg/max/mdev = 6.352/7.104/7.897/0.501 ms, ipg/ewma 8.887/7.467 ms
leaf2p1#

leaf2p2#ping 100.82.1.1 source 100.82.1.2
PING 100.82.1.1 (100.82.1.1) from 100.82.1.2 : 72(100) bytes of data.
80 bytes from 100.82.1.1: icmp_seq=1 ttl=63 time=7.91 ms
80 bytes from 100.82.1.1: icmp_seq=2 ttl=63 time=8.23 ms
80 bytes from 100.82.1.1: icmp_seq=3 ttl=63 time=8.80 ms
80 bytes from 100.82.1.1: icmp_seq=4 ttl=63 time=7.21 ms
80 bytes from 100.82.1.1: icmp_seq=5 ttl=63 time=7.16 ms

--- 100.82.1.1 ping statistics ---
5 packets transmitted, 5 received, 0% packet loss, time 38ms
rtt min/avg/max/mdev = 7.160/7.865/8.800/0.621 ms, ipg/ewma 9.659/7.855 ms
leaf2p2#

Can you please find the reason and probably a trick to resolve such as odd behavior?

Thanks,

Omkar

0
Posted by Aniket Bhowmick
Answered on June 18, 2021 5:54 am

Hi Omkar

Good to know the issue has resolved.

Regarding the following- "However, I still could not ping 100.82.1.2 from leaf1p1 using source 100.82.1.1 as well as I could not 100.82.1.1 from leaf1p2 using source 100.82.1.2."

This is an expected behaviour with MLAG+ Vxlan. The reason is, you are using same loopback IP (loopback1) on each pod:

Loopback1 IP on pod 1 (leaf1p1/2p1): 100.82.1.1 /32

Loopback1 IP on pod 2 (leaf2p1/2p2): 100.82.1.2/32

When you initiate ping (destined to 100.82.1.2) from leaf1p1 with source as 100.82.1.1 (which is the same IP on leaf2p1 as well), leaf2p1 is the device that is receiving all the ICMP reply. Spine has two routes (ECMP) for 100.82.1.1, one pointing to leaf1p1 and another pointing to leaf2p1, and due to same hash value of  the same ICMP reply, the ICMP reply always lands on leaf2p1. That's why ping works when you initiate from leaf2p1 as it always receives the ICMP reply due to same hash value from spine . Values in the ICMP reply packets are same whether you initiate the ping from leaf1p1 or leaf2p1 (due to same loopback-1 IP), hence the same hash value.

Above concept is same for second pod as well.

Please note, there is absolutely no issue/drops in data plane packets (end to end communication) due to this and this is completely expected.

Also another point- please note, in vEOS there is a limitation in using same Vxlan loopback IP in MLAG. Please do review the answer provided by Bhavana about this.

Regards,

Aniket

 

0
Posted by Omkar Dhargalkar
Answered on June 18, 2021 6:15 pm

Aniket,

Thank you for all your help as well as brief explanation of the issue.

Bhavana, I would like to keep Lo1 with the same IP address for switches in same POD in this topology to mimic the real world scenario. Hence, I tried to do the changes you suggested in bash to make MLAG accept_local. However, I could not locate vlan 10 interface under conf as you can see below:

[admin@leaf1p1 ~]$ cat /proc/sys/net/ipv4/conf/vlan10/accept_local
cat: /proc/sys/net/ipv4/conf/vlan10/accept_local: No such file or directory
[admin@leaf1p1 ~]$ cat /proc/sys/net/ipv4/conf/vlan10/
cat: /proc/sys/net/ipv4/conf/vlan10/: No such file or directory
[admin@leaf1p1 ~]$ cat /proc/sys/net/ipv4/conf
cat: /proc/sys/net/ipv4/conf: Is a directory
[admin@leaf1p1 ~]$ cd /proc/sys/net/ipv4/conf/
[admin@leaf1p1 conf]$ ls -l
total 0
dr-xr-xr-x 1 root root 0 Jun 18 18:00 all
dr-xr-xr-x 1 root root 0 Jun 18 18:00 cpu
dr-xr-xr-x 1 root root 0 Jun 18 18:00 default
dr-xr-xr-x 1 root root 0 Jun 18 18:00 dummy0
dr-xr-xr-x 1 root root 0 Jun 17 00:28 et1
dr-xr-xr-x 1 root root 0 Jun 17 00:28 et10
dr-xr-xr-x 1 root root 0 Jun 17 00:28 et11
dr-xr-xr-x 1 root root 0 Jun 17 00:28 et12
dr-xr-xr-x 1 root root 0 Jun 17 00:28 et2
dr-xr-xr-x 1 root root 0 Jun 17 00:28 et3
dr-xr-xr-x 1 root root 0 Jun 17 00:28 et4
dr-xr-xr-x 1 root root 0 Jun 17 00:28 et5
dr-xr-xr-x 1 root root 0 Jun 17 00:28 et6
dr-xr-xr-x 1 root root 0 Jun 17 00:28 et7
dr-xr-xr-x 1 root root 0 Jun 17 00:28 et8
dr-xr-xr-x 1 root root 0 Jun 17 00:28 et9
dr-xr-xr-x 1 root root 0 Jun 18 18:00 fabric
dr-xr-xr-x 1 root root 0 Jun 18 18:00 fwd0
dr-xr-xr-x 1 root root 0 Jun 17 00:25 lo
dr-xr-xr-x 1 root root 0 Jun 17 00:28 lo0
dr-xr-xr-x 1 root root 0 Jun 17 00:28 lo1
dr-xr-xr-x 1 root root 0 Jun 17 00:25 ma1
dr-xr-xr-x 1 root root 0 Jun 18 18:00 mlag_hb
dr-xr-xr-x 1 root root 0 Jun 17 00:31 pet10
dr-xr-xr-x 1 root root 0 Jun 17 00:31 pet4
dr-xr-xr-x 1 root root 0 Jun 17 00:31 pet5
dr-xr-xr-x 1 root root 0 Jun 17 00:31 pet6
dr-xr-xr-x 1 root root 0 Jun 17 00:31 pet7
dr-xr-xr-x 1 root root 0 Jun 17 00:31 pet8
dr-xr-xr-x 1 root root 0 Jun 17 00:31 pet9
dr-xr-xr-x 1 root root 0 Jun 17 00:28 po1
dr-xr-xr-x 1 root root 0 Jun 17 00:28 po10
dr-xr-xr-x 1 root root 0 Jun 17 00:31 ppo10
dr-xr-xr-x 1 root root 0 Jun 18 18:00 stp_stable
dr-xr-xr-x 1 root root 0 Jun 18 18:00 t0x7
dr-xr-xr-x 1 root root 0 Jun 18 18:00 t0x8
dr-xr-xr-x 1 root root 0 Jun 18 18:00 txraw
dr-xr-xr-x 1 root root 0 Jun 17 00:28 vlan4094
dr-xr-xr-x 1 root root 0 Jun 18 18:00 vmnicet1
dr-xr-xr-x 1 root root 0 Jun 18 18:00 vmnicet10
dr-xr-xr-x 1 root root 0 Jun 18 18:00 vmnicet11
dr-xr-xr-x 1 root root 0 Jun 18 18:00 vmnicet12
dr-xr-xr-x 1 root root 0 Jun 18 18:00 vmnicet2
dr-xr-xr-x 1 root root 0 Jun 18 18:00 vmnicet3
dr-xr-xr-x 1 root root 0 Jun 18 18:00 vmnicet4
dr-xr-xr-x 1 root root 0 Jun 18 18:00 vmnicet5
dr-xr-xr-x 1 root root 0 Jun 18 18:00 vmnicet6
dr-xr-xr-x 1 root root 0 Jun 18 18:00 vmnicet7
dr-xr-xr-x 1 root root 0 Jun 18 18:00 vmnicet8
dr-xr-xr-x 1 root root 0 Jun 18 18:00 vmnicet9
dr-xr-xr-x 1 root root 0 Jun 18 18:00 vx1
dr-xr-xr-x 1 root root 0 Jun 18 18:00 vxlan
[admin@leaf1p1 conf]$

Can you please suggest what should be my next steps? Also, do I need to do this for every VLAN I introduce across both the PODs?

Thanks,

Omkar Dhargalkar

0
Answered on June 21, 2021 10:55 am

Hi Omkar,

The reason why vlan10  is not listed under cd /proc/sys/net/ipv4/conf/ directory is due to not having an SVI for that vlan i.e vlan10.

I have tested this in lab and when I don't configure an SVI(int vlan 10) for Vlan10 then I don't see this under                       cd /proc/sys/net/ipv4/conf/ directory

As soon as I configured an SVI for vlan 10 then I could see the interface being listed in the directory as below:

switch(config)#int vlan 10

switch(config-if-Vl10)#bash

[admin@switch ~]$ cd /proc/sys/net/ipv4/conf/
[admin@switch conf]$ ls -l
total 0

dr-xr-xr-x 1 root root 0 Jun 21 03:36 txraw
dr-xr-xr-x 1 root root 0 Jun 21 03:37 vlan10
dr-xr-xr-x 1 root root 0 Jun 21 03:36 vxlan

[admin@switch conf]$ cat /proc/sys/net/ipv4/conf/vlan10/accept_local
0
[admin@switch conf]$ sudo su
bash-4.3# sudo echo 1 > /proc/sys/net/ipv4/conf/vlan10/accept_local
bash-4.3# cat /proc/sys/net/ipv4/conf/vlan10/accept_local
1

bash-4.3# exit

There is no need to issue an ip address for that SVI but just configuring "int vlan <x>" would do good and list under the conf directory.

Also yes we need to configure this for all the vlans that are present across the pod's in vxlan 1 i.e whichever vlans has VNI defined under interface vxlan1.

 

Thanks,

Bhavana.

Post your Answer

You must be logged in to post an answer.