Posted on August 31, 2021 9:46 pm
 |  Asked by Ismail Kalolwala
 |  187 views
Tags:
0
0
Print Friendly, PDF & Email

I have constructed a Lab with a Single Spine and 2 Leafs which are interconnected over MLAG link.

Initially everything is perfect. Before i start with the problem  statement – this should be known

 

Loopback 0 on the Spine is 10.254.5.1/32 , Loopback 0 on Leaf1 is 10.254.5.2/32 and Loopback 0 on the Leaf2 is 10.254.5.3/32

 

On the Spine : when i do show ip bgp 10.254.5.2/32 and show ip bgp 10.254.5.3/32

following is the results

 

SPINE#show ip bgp 10.254.5.2
BGP routing table information for VRF default
Router identifier 10.254.5.1, local AS number 65001
BGP routing table entry for 10.254.5.2/32
Paths: 1 available
65002
172.25.1.2 from 172.25.1.2 (10.254.5.2)
Origin IGP, metric 0, localpref 100, IGP metric 0, weight 0, tag 0
Received 00:12:32 ago, valid, external, best
Rx SAFI: Unicast
SPINE#show ip bgp 10.254.5.3
BGP routing table information for VRF default
Router identifier 10.254.5.1, local AS number 65001
BGP routing table entry for 10.254.5.3/32
Paths: 1 available
65002
172.25.1.6 from 172.25.1.6 (10.254.5.3)
Origin IGP, metric 0, localpref 100, IGP metric 0, weight 0, tag 0
Received 00:13:07 ago, valid, external, best
Rx SAFI: Unicast
SPINE#

 

When a BGP peering with Leaf1 and Spine fails then failover is immediate , but once the link is restored., then BGP still shows best path to reach out to 10.254.5.3/32 via Leaf2 only  — unless i execute clear ip bgp * on switch, then i can see once again the initial outcome.

 

I tried various things but no luck. Any help is appreciated. Here i am only concerned about Underlay – no overlay;

 

 

 

 

1
Answered on September 1, 2021 3:02 am

Hi Ismail,

Thanks for reaching out.

From Spine device sh ip bgp output looks good.

Leaf1 Loopback0 (10.254.5.2/32) and Leaf2 Loopback0 (10.254.5.3/32).The path to reach 10.254.5.2/32 is via Leaf1 and the path to reach 10.254.5.3/32 is via Leaf2  on spine and it is expected.

Could you please confirm the below:

1)Are we referring to the path to reach 10.254.5.1/32(spine Loopback0) is via Leaf2 On Leaf1 even after failover/link restore? or path to reach 10.254.5.2/32(Leaf1 Loopback0) is via Leaf2 On spine even after failover/link restore?

2)Do we have iBGP neighborship between Leaf1 and Leaf2?

3)Could you please provide us  "show ip bgp <prefix> detail" and "show ip route" during issue state from all 3 devices?

4)Is it vEOS or HW Platform on which we are testing and what is the EOS version running on these devices?

5)Also on which device are we issuing "clear ip bgp *"? i.e what is the problematic device that we are referring to here?

 

 

Thanks,

Bhavana.

0
Posted by Ismail Kalolwala
Answered on September 1, 2021 4:04 pm

Please find the configuration of Spine, Leaf01 and leaf02 for the reference view:

 

Spine configuration

 

SPINE#show running-config
! Command: show running-config
! device: SPINE (vEOS-lab, EOS-4.26.2F)
!
! boot system flash:/vEOS-lab.swi
!
no aaa root
!
transceiver qsfp default-mode 4x10G
!
service routing protocols model multi-agent
!
no logging console
!
hostname SPINE
!
spanning-tree mode mstp
!
interface Ethernet1
description LEAF1 Eth1 - Eth1
mtu 9214
no switchport
ip address 172.25.1.1/30
!
interface Ethernet2
description LEAF2 Eth2 - Eth2
mtu 9214
no switchport
ip address 172.25.1.5/30
!
interface Ethernet3
!
interface Ethernet4
!
interface Ethernet5
!
interface Ethernet6
!
interface Ethernet7
!
interface Ethernet8
!
interface Loopback0
description EVPN PEERING
ip address 10.254.5.1/32
!
interface Management1
!
ip virtual-router mac-address 00:1c:73:00:00:11
!
ip routing
!
ip prefix-list LEAF1-LOOPBACK0 seq 10 permit 10.254.5.2/32 le 32
ip prefix-list LEAF2-LOOPBACK0 seq 10 permit 10.254.5.3/32 le 32
ip prefix-list LOOPBACK seq 10 permit 10.254.5.0/24 eq 32
ip prefix-list UNDERLAY seq 10 permit 172.25.0.0/23 le 31
!
route-map LEAF1 permit 10
match ip address prefix-list LEAF1-LOOPBACK0
set local-preference 150
!
route-map LEAF1 permit 20
!
route-map LEAF2 permit 10
match ip address prefix-list LEAF2-LOOPBACK0
set local-preference 150
!
route-map LEAF2 permit 20
!
route-map REDISTRIBUTE permit 10
match ip address prefix-list LOOPBACK
!
route-map REDISTRIBUTE permit 20
match ip address prefix-list UNDERLAY
!
peer-filter LEAF-AS-RANGE
10 match as-range 65001-65199 result accept
!
router bgp 65001
router-id 10.254.5.1
update wait-for-convergence
no bgp default ipv4-unicast
distance bgp 20 200 200
maximum-paths 16 ecmp 16
neighbor UNDERLAY peer group
neighbor UNDERLAY allowas-in 2
neighbor 172.25.1.2 remote-as 65002
neighbor 172.25.1.2 allowas-in 3
neighbor 172.25.1.2 password 7 Vpb+UYG489sWwumJurS1GA==
neighbor 172.25.1.2 send-community
neighbor 172.25.1.2 maximum-routes 12000
neighbor 172.25.1.6 remote-as 65002
neighbor 172.25.1.6 allowas-in 3
neighbor 172.25.1.6 password 7 7XJ44VyPId47gE9sMOpe8Q==
neighbor 172.25.1.6 send-community
neighbor 172.25.1.6 maximum-routes 12000
redistribute connected
!
address-family ipv4
neighbor 172.25.1.2 activate
neighbor 172.25.1.6 activate
!
end

 

 

Leaf 01 configuration

 

 

LEAF1#terminal length 0
Pagination disabled.
LEAF1#show running-config
! Command: show running-config
! device: LEAF1 (vEOS-lab, EOS-4.26.2F)
!
! boot system flash:/vEOS-lab.swi
!
no aaa root
!
transceiver qsfp default-mode 4x10G
!
service routing protocols model multi-agent
!
no logging console
!
hostname LEAF1
!
spanning-tree mode mstp
no spanning-tree vlan-id 4093-4094
!
vlan 4093
name MLAG-iBGP-Peering
!
vlan 4094
name MLAG
trunk group MLAG
!
interface Port-Channel2000
description MLAG
load-interval 1
mtu 9214
switchport mode trunk
switchport trunk group MLAG
no spanning-tree portfast auto
spanning-tree portfast network
!
interface Ethernet1
description SPINE ETH1 - ETH1
mtu 9214
no switchport
ip address 172.25.1.2/30
!
interface Ethernet2
!
interface Ethernet3
channel-group 2000 mode active
!
interface Ethernet4
!
interface Ethernet5
!
interface Ethernet6
!
interface Ethernet7
!
interface Ethernet8
!
interface Loopback0
description EVPN PEERING
ip address 10.254.5.2/32
!
interface Management1
!
interface Vlan4093
description MLAG iBGP Peering
ip address 172.25.1.9/30
!
interface Vlan4094
description MLAG
ip address 172.25.1.13/30
!
ip virtual-router mac-address 00:1c:73:00:00:99
!
ip routing
!
mlag configuration
domain-id MLAG
local-interface Vlan4094
peer-address 172.25.1.14
peer-link Port-Channel2000
reload-delay 500
!
route-map MLAG permit 10
match source-protocol bgp
set local-preference 50
!
router bgp 65002
router-id 10.254.5.2
update wait-for-convergence
no bgp default ipv4-unicast
distance bgp 20 200 200
bgp cluster-id 10.254.5.2
maximum-paths 16 ecmp 16
neighbor MLAG peer group
neighbor MLAG remote-as 65002
neighbor MLAG next-hop-self
neighbor MLAG route-map MLAG in
neighbor MLAG password 7 FcLZx4Xg6ekGc/WqJqzbGQ==
neighbor MLAG send-community
neighbor MLAG maximum-routes 12000
neighbor UNDERLAY peer group
neighbor UNDERLAY remote-as 65001
neighbor UNDERLAY allowas-in 2
neighbor UNDERLAY password 7 5bOvpL966XhXM/op6L6sWg==
neighbor UNDERLAY send-community
neighbor UNDERLAY maximum-routes 12000
neighbor 172.25.1.1 peer group UNDERLAY
neighbor 172.25.1.10 peer group MLAG
redistribute connected
!
address-family ipv4
neighbor MLAG activate
neighbor UNDERLAY activate
!
end
LEAF1#

 

 

Leaf02 configuration

 

LEAF2#show running-config
! Command: show running-config
! device: LEAF2 (vEOS-lab, EOS-4.26.2F)
!
! boot system flash:/vEOS-lab.swi
!
no aaa root
!
transceiver qsfp default-mode 4x10G
!
service routing protocols model multi-agent
!
no logging console
!
hostname LEAF2
!
spanning-tree mode mstp
no spanning-tree vlan-id 4093-4094
!
vlan 4093
name MLAG-iBGP-Peering
!
vlan 4094
name MLAG
trunk group MLAG
!
interface Port-Channel2000
description MLAG
switchport mode trunk
switchport trunk group MLAG
no spanning-tree portfast auto
spanning-tree portfast network
!
interface Ethernet1
!
interface Ethernet2
description SPINE ETH2 - ETH2
mtu 9214
no switchport
ip address 172.25.1.6/30
!
interface Ethernet3
description MLAG
channel-group 2000 mode passive
!
interface Ethernet4
!
interface Ethernet5
!
interface Ethernet6
!
interface Ethernet7
!
interface Ethernet8
!
interface Loopback0
description EVPN PEERING
ip address 10.254.5.3/32
!
interface Management1
!
interface Vlan4093
description MLAG iBGP Peering
ip address 172.25.1.10/30
!
interface Vlan4094
description MLAG
ip address 172.25.1.14/30
!
ip virtual-router mac-address 00:1c:73:00:00:99
!
ip routing
!
mlag configuration
domain-id MLAG
local-interface Vlan4094
peer-address 172.25.1.13
peer-link Port-Channel2000
reload-delay 500
!
route-map MLAG permit 10
set local-preference 50
!
router bgp 65002
router-id 10.254.5.3
update wait-for-convergence
no bgp default ipv4-unicast
distance bgp 20 200 200
bgp cluster-id 10.254.5.2
maximum-paths 16 ecmp 16
neighbor MLAG peer group
neighbor MLAG remote-as 65002
neighbor MLAG next-hop-self
neighbor MLAG route-map MLAG in
neighbor MLAG password 7 FcLZx4Xg6ekGc/WqJqzbGQ==
neighbor MLAG send-community
neighbor MLAG maximum-routes 12000
neighbor UNDERLAY peer group
neighbor UNDERLAY remote-as 65001
neighbor UNDERLAY allowas-in 2
neighbor UNDERLAY password 7 5bOvpL966XhXM/op6L6sWg==
neighbor UNDERLAY send-community
neighbor UNDERLAY maximum-routes 12000
neighbor 172.25.1.5 peer group UNDERLAY
neighbor 172.25.1.9 peer group MLAG
redistribute connected
!
address-family ipv4
neighbor MLAG activate
neighbor UNDERLAY activate
!
end

 

 

BGP Peering details from Spine

SPINE#show ip bgp summary
BGP summary information for VRF default
Router identifier 10.254.5.1, local AS number 65001
Neighbor Status Codes: m - Under maintenance
Neighbor V AS MsgRcvd MsgSent InQ OutQ Up/Down State PfxRcd PfxAcc
172.25.1.2 4 65002 1878 1848 0 0 03:45:09 Estab 4 4
172.25.1.6 4 65002 1806 1801 0 0 21:58:05 Estab 4 4

 

BGP Peering from LEAF01

 

LEAF1#show ip bgp summary
BGP summary information for VRF default
Router identifier 10.254.5.2, local AS number 65002
Neighbor Status Codes: m - Under maintenance
Neighbor V AS MsgRcvd MsgSent InQ OutQ Up/Down State PfxRcd PfxAcc
172.25.1.1 4 65001 1842 1870 0 0 03:45:22 Estab 6 6
172.25.1.10 4 65002 1799 1808 0 0 21:58:17 Estab 6 6

 

BGP peering view from Leaf02

LEAF2#show ip bgp summary
BGP summary information for VRF default
Router identifier 10.254.5.3, local AS number 65002
Neighbor Status Codes: m - Under maintenance
Neighbor V AS MsgRcvd MsgSent InQ OutQ Up/Down State PfxRcd PfxAcc
172.25.1.5 4 65001 1790 1795 0 0 21:58:36 Estab 4 4
172.25.1.9 4 65002 1801 1794 0 0 21:58:36 Estab 6 6

 

 

Problem Statement :

Before any link failure or BGP peering failure, from Spine following is the outcome

 

SPINE#show ip bgp 10.254.5.2
BGP routing table information for VRF default
Router identifier 10.254.5.1, local AS number 65001
BGP routing table entry for 10.254.5.2/32
Paths: 1 available
65002
172.25.1.2 from 172.25.1.2 (10.254.5.2)
Origin IGP, metric 0, localpref 100, IGP metric 0, weight 0, tag 0
Received 03:46:18 ago, valid, external, best
Rx SAFI: Unicast
SPINE#show ip bgp 10.254.5.3
BGP routing table information for VRF default
Router identifier 10.254.5.1, local AS number 65001
BGP routing table entry for 10.254.5.3/32
Paths: 1 available
65002
172.25.1.6 from 172.25.1.6 (10.254.5.3)
Origin IGP, metric 0, localpref 100, IGP metric 0, weight 0, tag 0
Received 21:59:14 ago, valid, external, best
Rx SAFI: Unicast

 

When BGP peering between Leaf1 and Spine is down, failover works absolutely well and following would be routing table

LEAF1(config)#router bgp 65002
LEAF1(config-router-bgp)#neighbor UNDERLAY shutdown
LEAF1(config-router-bgp)#
LEAF1#show ip bgp summary
BGP summary information for VRF default
Router identifier 10.254.5.2, local AS number 65002
Neighbor Status Codes: m - Under maintenance
Neighbor V AS MsgRcvd MsgSent InQ OutQ Up/Down State PfxRcd PfxAcc
172.25.1.1 4 65001 1845 1874 0 0 00:00:02 Idle(Admin)
172.25.1.10 4 65002 1802 1811 0 0 21:59:59 Estab 6 6
LEAF1#show ip route

VRF: default
Codes: C - connected, S - static, K - kernel,
O - OSPF, IA - OSPF inter area, E1 - OSPF external type 1,
E2 - OSPF external type 2, N1 - OSPF NSSA external type 1,
N2 - OSPF NSSA external type2, B - BGP, B I - iBGP, B E - eBGP,
R - RIP, I L1 - IS-IS level 1, I L2 - IS-IS level 2,
O3 - OSPFv3, A B - BGP Aggregate, A O - OSPF Summary,
NG - Nexthop Group Static Route, V - VXLAN Control Service,
DH - DHCP client installed default route, M - Martian,
DP - Dynamic Policy Route, L - VRF Leaked,
G - gRIBI, RC - Route Cache Route

Gateway of last resort is not set

B I 10.254.5.1/32 [200/0] via 172.25.1.10, Vlan4093
C 10.254.5.2/32 is directly connected, Loopback0
B I 10.254.5.3/32 [200/0] via 172.25.1.10, Vlan4093
C 172.25.1.0/30 is directly connected, Ethernet1
B I 172.25.1.4/30 [200/0] via 172.25.1.10, Vlan4093
C 172.25.1.8/30 is directly connected, Vlan4093
C 172.25.1.12/30 is directly connected, Vlan4094

LEAF1#

 

SPINE#show ip bgp 10.254.5.2
BGP routing table information for VRF default
Router identifier 10.254.5.1, local AS number 65001
BGP routing table entry for 10.254.5.2/32
Paths: 1 available
65002
172.25.1.6 from 172.25.1.6 (10.254.5.3)
Origin IGP, metric 0, localpref 100, IGP metric 0, weight 0, tag 0
Received 00:00:21 ago, valid, external, best
Rx SAFI: Unicast
SPINE#show ip bgp 10.254.5.3
BGP routing table information for VRF default
Router identifier 10.254.5.1, local AS number 65001
BGP routing table entry for 10.254.5.3/32
Paths: 1 available
65002
172.25.1.6 from 172.25.1.6 (10.254.5.3)
Origin IGP, metric 0, localpref 100, IGP metric 0, weight 0, tag 0
Received 22:00:22 ago, valid, external, best
Rx SAFI: Unicast

 

 

Now the circuit is restored

 

LEAF1(config)#router bgp 65002
LEAF1(config-router-bgp)#no neighbor UNDERLAY shutdown
LEAF1(config-router-bgp)#show ip bgp summary
BGP summary information for VRF default
Router identifier 10.254.5.2, local AS number 65002
Neighbor Status Codes: m - Under maintenance
Neighbor V AS MsgRcvd MsgSent InQ OutQ Up/Down State PfxRcd PfxAcc
172.25.1.1 4 65001 1854 1887 0 28 00:00:01 Estab 0 0
172.25.1.10 4 65002 1802 1811 0 0 22:00:45 Estab 6 6
LEAF1(config-router-bgp)#show ip route

VRF: default
Codes: C - connected, S - static, K - kernel,
O - OSPF, IA - OSPF inter area, E1 - OSPF external type 1,
E2 - OSPF external type 2, N1 - OSPF NSSA external type 1,
N2 - OSPF NSSA external type2, B - BGP, B I - iBGP, B E - eBGP,
R - RIP, I L1 - IS-IS level 1, I L2 - IS-IS level 2,
O3 - OSPFv3, A B - BGP Aggregate, A O - OSPF Summary,
NG - Nexthop Group Static Route, V - VXLAN Control Service,
DH - DHCP client installed default route, M - Martian,
DP - Dynamic Policy Route, L - VRF Leaked,
G - gRIBI, RC - Route Cache Route

Gateway of last resort is not set

B E 10.254.5.1/32 [20/0] via 172.25.1.1, Ethernet1
C 10.254.5.2/32 is directly connected, Loopback0
B E 10.254.5.3/32 [20/0] via 172.25.1.1, Ethernet1
C 172.25.1.0/30 is directly connected, Ethernet1
B E 172.25.1.4/30 [20/0] via 172.25.1.1, Ethernet1
C 172.25.1.8/30 is directly connected, Vlan4093
C 172.25.1.12/30 is directly connected, Vlan4094

 

Problem seen on spine even after link is restored

 

SPINE#show ip bgp 10.254.5.2
BGP routing table information for VRF default
Router identifier 10.254.5.1, local AS number 65001
BGP routing table entry for 10.254.5.2/32
Paths: 2 available
65002
172.25.1.6 from 172.25.1.6 (10.254.5.3)
Origin IGP, metric 0, localpref 100, IGP metric 0, weight 0, tag 0
Received 00:01:15 ago, valid, external, ECMP head, ECMP, best, ECMP contributor
Rx SAFI: Unicast
65002
172.25.1.2 from 172.25.1.2 (10.254.5.2)
Origin IGP, metric 0, localpref 100, IGP metric 0, weight 0, tag 0
Received 00:00:28 ago, valid, external, ECMP, ECMP contributor
Rx SAFI: Unicast
SPINE#show ip bgp 10.254.5.3
BGP routing table information for VRF default
Router identifier 10.254.5.1, local AS number 65001
BGP routing table entry for 10.254.5.3/32
Paths: 1 available
65002
172.25.1.6 from 172.25.1.6 (10.254.5.3)
Origin IGP, metric 0, localpref 100, IGP metric 0, weight 0, tag 0
Received 22:01:14 ago, valid, external, best
Rx SAFI: Unicast
SPINE#

 

SPINE#traceroute 10.254.5.2
traceroute to 10.254.5.2 (10.254.5.2), 30 hops max, 60 byte packets
1 172.25.1.6 (172.25.1.6) 91.621 ms 42.945 ms 49.965 ms
2 10.254.5.2 (10.254.5.2) 52.004 ms 55.875 ms 61.395 ms
SPINE#traceroute 10.254.5.3
traceroute to 10.254.5.3 (10.254.5.3), 30 hops max, 60 byte packets
1 10.254.5.3 (10.254.5.3) 40.275 ms 48.182 ms 49.795 ms

 

Here expectation is, when BGP peering restores, then tracert to 10.254.5.2 should traverse directly to LEAF1 instead of going via Leaf2.

 

Let me know your thoughts please.

 

0
Posted by Keerthi Bharathi
Answered on September 2, 2021 12:48 am

Hello Ismail, 

From the configurations on the Spine and Leaf devices, we see that “maximum-paths 16 ecmp 16†is configured under router BGP.

Topology : 

Lets take a look at the outputs of show ip bgp before the issue state: 

On Spine: 

SPINE#show ip bgp 10.254.5.2
BGP routing table information for VRF default
Router identifier 10.254.5.1, local AS number 65001
BGP routing table entry for 10.254.5.2/32
Paths: 1 available
65002
172.25.1.2 from 172.25.1.2 (10.254.5.2)
Origin IGP, metric 0, localpref 100, IGP metric 0, weight 0, tag 0
Received 03:46:18 ago, valid, external, best
Rx SAFI: Unicast

We see that there is just one that the Spine received from Leaf1 for 10.254.5.2. On Leaf2 we notice that there is a local preference of 50 set for routes received from Mlag Peer: 

route-map MLAG permit 10
set local-preference 50

So if we check the routes on Leaf02, we would see that the best route to 172.25.1.2 is through the eBGP peering and not iBGP peering: 

From my lab setup: 

LEAF2#sh ip bgp 10.254.5.2
BGP routing table information for VRF default
Router identifier 10.254.5.3, local AS number 65002
BGP routing table entry for 10.254.5.2/32
 Paths: 2 available
  65001 65002
    172.25.1.5 from 172.25.1.5 (10.254.5.1)
      Origin IGP, metric 0, localpref 100, IGP metric 1, weight 0, received 00:01:34 ago, valid, external, best
      Rx SAFI: Unicast
  Local
    172.25.1.9 from 172.25.1.9 (10.254.5.2)
      Origin IGP, metric 0, localpref 50, IGP metric 1, weight 0, received 00:01:34 ago, valid, internal
      Rx SAFI: Unicast

Since Leaf02 received the best route for the prefix from Spine, it wouldn’t advertise the prefix to the Spine.

Now when the link between Leaf01 and Spine is shut down, Leaf02 wouldn’t receive the prefix from Spine and would start advertising the prefix that it received from its Mlag peer. Spine, on receiving this, would install it in its RIB as seen in your scenario. 

After we bring back the link between Leaf01 and Spine, Leaf01 would now advertise 10.254.5.2/32 too. So Spine on receiving both these routes, would try to select the best route in it. Since all parameters match and we have ECMP configured, it would select both paths as ECMP group. Since there is no tie breaker configured, the path learnt first would become the best path that Spine would advertise to its peers. 

SPINE#show ip bgp 10.254.5.2
BGP routing table information for VRF default
Router identifier 10.254.5.1, local AS number 65001
BGP routing table entry for 10.254.5.2/32
Paths: 2 available
65002
172.25.1.6 from 172.25.1.6 (10.254.5.3)
Origin IGP, metric 0, localpref 100, IGP metric 0, weight 0, tag 0
Received 00:01:15 ago, valid, external, ECMP head, ECMP, best, ECMP contributor
Rx SAFI: Unicast
65002
172.25.1.2 from 172.25.1.2 (10.254.5.2)
Origin IGP, metric 0, localpref 100, IGP metric 0, weight 0, tag 0
Received 00:00:28 ago, valid, external, ECMP, ECMP contributor
Rx SAFI: Unicast

Since we have two ECMP paths to the same prefix, the traceroute output is dependent on hashing done by Spine. In this scenario, Spine is hashing it to Leaf02 instead of Leaf01. 

I noticed that even if you have route-maps configured for prefixes learnt from Leaf01 and Leaf02 on Spine you have not applied it. If you apply these route-maps, then the prefix received from Leaf01 would have a local preference of 150 and would not be part of ECMP group. 

Hope this helps.

0
Posted by Ismail Kalolwala
Answered on September 2, 2021 4:21 am

Hi Bhavana,

 

In continuation to your response, here are couple of queries :

  1. if the failed link / failed BGP peering has recovered and Spine is still using Leaf2 as a transist to reach out Leaf1, then in this case, would n't  be the VXLAN traffic will also follow the same path from Spine to Leaf2 and from Leaf2 to Leaf1.
  2. Would this be appropriate traffic flow path from Spine to Leaf1 flowing via Leaf2.
  3. Can you advice what needs to be configured on BGP Path Tie Breaker as i tried but none of them worked.
  4. Within the ECMP group, best path elected would only be used to forward the VXLAN data  - Please correct me on my understanding.
  5. Also, one more observation, after the failed BGP Peering has recovered, flow from spine to leaf1 is via leaf2 but flow from leaf1 to spine is direct - looks like a different path for ingress and different path for egress.  Is this Normal.
  6. Traffic Path behavior as stated in point no 5 increases the traffic on Mlag link, and Mlag link is never meant for it unless primary link to spine fails.

Thanks

Regards,

Ismail Kalolwala

0
Posted by Keerthi Bharathi
Answered on September 13, 2021 2:52 am

Hello Ismail,

Please find the answers provided inline: 

In continuation to your response, here are couple of queries :

1. if the failed link / failed BGP peering has recovered and Spine is still using Leaf2 as a transist to reach out Leaf1, then in this case, would n't  be the VXLAN traffic will also follow the same path from Spine to Leaf2 and from Leaf2 to Leaf1.

[Answer] :The VXLAN traffic would have the Outer Destination IP to be the VTEP IP of Leaf1.  When we check the ip route to the VTEP IP of Leaf1, we would see 2 ECMP paths - One Via Leaf2 and second vian Leaf1 . So VXLAN traffic can go to either of the VTEPs. 

From My lab: 

SPINE#sh ip route 10.254.5.2
VRF: default
Codes: C - connected, S - static, K - kernel,
       O - OSPF, IA - OSPF inter area, E1 - OSPF external type 1,
       E2 - OSPF external type 2, N1 - OSPF NSSA external type 1,
       N2 - OSPF NSSA external type2, B - BGP, B I - iBGP, B E - eBGP,
       R - RIP, I L1 - IS-IS level 1, I L2 - IS-IS level 2,
       O3 - OSPFv3, A B - BGP Aggregate, A O - OSPF Summary,
       NG - Nexthop Group Static Route, V - VXLAN Control Service,
       DH - DHCP client installed default route, M - Martian,
       DP - Dynamic Policy Route, L - VRF Leaked,
       RC - Route Cache Route

 B E      10.254.5.2/32 [20/0] via 172.25.1.2, Ethernet1
                               via 172.25.1.6, Ethernet2
SPINE#sh ip bgp 10.254.5.2
BGP routing table information for VRF default
Router identifier 10.254.5.1, local AS number 65001
BGP routing table entry for 10.254.5.2/32
 Paths: 2 available
  65002
    172.25.1.6 from 172.25.1.6 (10.254.5.3)
      Origin IGP, metric 0, localpref 100, IGP metric 1, weight 0, received 00:02:53 ago, valid, external, ECMP head, ECMP, best, ECMP contributor
      Rx SAFI: Unicast
  65002
    172.25.1.2 from 172.25.1.2 (10.254.5.2)
      Origin IGP, metric 0, localpref 100, IGP metric 1, weight 0, received 00:02:36 ago, valid, external, ECMP, ECMP contributor
      Rx SAFI: Unicast
SPINE#

2. Would this be appropriate traffic flow path from Spine to Leaf1 flowing via Leaf2.

[Answer] This might not be the flow always since there is ECMP to the VTEP IP address on Spine. 

3. Can you advice what needs to be configured on BGP Path Tie Breaker as i tried but none of them worked.

[Answer] The BGP tie breaker would just set the path learnt from one or other as ECMP head. This would come into play only when the Spine has to advertise to the prefix to its neighboring  BGP peer. 

4. Within the ECMP group, best path elected would only be used to forward the VXLAN data  - Please correct me on my understanding.

[Answer] Within the ECMP group, the best path elected would be used to advertise the prefix to the BGP peers. BGP acts as underlay for VXLAN. When the Spine receives a VXLAN packet with outer destination IP as Leaf1’s VTEP IP, it would check its routing table to identify the next hop. Since there is ECMP, we would have both the paths installed in the routing table. 

5. Also, one more observation, after the failed BGP Peering has recovered, flow from spine to leaf1 is via leaf2 but flow from leaf1 to spine is direct - looks like a different path for ingress and different path for egress.  Is this Normal.

[Answer] Since we have two paths to reach the Leaf1 from Spine, the packet can be hashed to either of the VTEPs. From Leaf1, the route to reach remote VTEP would be through Spine (next hop in the show ip route would be that of spine) and so it will be sent directly to the Spine. Hence this is expected. 

6. Traffic Path behavior as stated in point no 5 increases the traffic on Mlag link, and Mlag link is never meant for it unless primary link to spine fails.

[Answer] Since these VTEPs are MLAG peers, the recommended configuration is to have the same IP addresses as the VTEP IPs. 

Ref: https://eos.arista.com/vxlan-with-mlag-configuration-guide/

For VXLAN routing: https://eos.arista.com/vxlan-routing-with-mlag/

Also, since these are MLAG peers, the end hosts from which the packet is actually initiating can be dually connected to both the MLAG peers and so Leaf2 wouldn’t need to send the packet to Leaf1 and can directly send it to the end host.

Hope this helps.

Post your Answer

You must be logged in to post an answer.