Posted on November 18, 2019 2:49 pm
 |  Asked by Stefano Sasso
 |  238 views
RESOLVED
0
0
Print Friendly, PDF & Email

Hello,
I am testing a solution based on VXLAN EVPN, with Type-5 routes, with vEOS 4.23.0.1, but I am having some issues in packet forwarding.

The routes are correctly propagated, but when I try to ping from a device connected to one node, on a device connected to another node, the ping does not work.
I also tried a ping, with a forced source IP, from the vEOS machine itself, but it has problems too.

LEAF-2

LEAF-2#sh ip route vrf gold

VRF: gold
Codes: C – connected, S – static, K – kernel,
O – OSPF, IA – OSPF inter area, E1 – OSPF external type 1,
E2 – OSPF external type 2, N1 – OSPF NSSA external type 1,
N2 – OSPF NSSA external type2, B – BGP, B I – iBGP, B E – eBGP,
R – RIP, I L1 – IS-IS level 1, I L2 – IS-IS level 2,
O3 – OSPFv3, A B – BGP Aggregate, A O – OSPF Summary,
NG – Nexthop Group Static Route, V – VXLAN Control Service,
DH – DHCP client installed default route, M – Martian,
DP – Dynamic Policy Route, L – VRF Leaked

Gateway of last resort:
B E 0.0.0.0/0 [20/0] via 10.34.34.34, Vlan34

C 10.7.7.2/32 is directly connected, Loopback7
B E 10.7.7.3/32 [200/0] via VTEP 10.0.255.12 VNI 100001 router-mac 0c:e8:3c:1b:36:e7
C 10.34.34.0/24 is directly connected, Vlan34
B E 10.78.78.78/32 [200/0] via VTEP 10.0.255.12 VNI 100001 router-mac 0c:e8:3c:1b:36:e7
B E 10.78.78.0/24 [200/0] via VTEP 10.0.255.12 VNI 100001 router-mac 0c:e8:3c:1b:36:e7
B E 10.100.0.0/16 [20/0] via 10.34.34.34, Vlan34

LEAF-2#sh bgp evpn summary
BGP summary information for VRF default
Router identifier 10.0.250.12, local AS number 65001
Neighbor Status Codes: m – Under maintenance
Neighbor V AS MsgRcvd MsgSent InQ OutQ Up/Down State PfxRcd PfxAcc
10.0.250.1 4 65000 19 23 0 0 00:12:39 Estab 6 6

LEAF-2#sh bgp evpn route-type ip-prefix ipv4
BGP routing table information for VRF default
Router identifier 10.0.250.12, local AS number 65001
Route status codes: s – suppressed, * – valid, > – active, # – not installed, E – ECMP head, e – ECMP
S – Stale, c – Contributing to ECMP, b – backup
% – Pending BGP convergence
Origin codes: i – IGP, e – EGP, ? – incomplete
AS Path Attributes: Or-ID – Originator ID, C-LST – Cluster List, LL Nexthop – Link Local Nexthop

Network Next Hop Metric LocPref Weight Path
* > RD: 10.0.250.12:1 ip-prefix 0.0.0.0/0
– – 100 0 65530 i
* > RD: 10.0.250.12:1 ip-prefix 10.7.7.2/32
– – – 0 i
* > RD: 10.0.250.13:1 ip-prefix 10.7.7.3/32
10.0.255.12 – 100 0 65000 65002 i
* > RD: 10.0.250.12:1 ip-prefix 10.34.34.0/24
– – – 0 i
* > RD: 10.0.250.13:1 ip-prefix 10.78.78.0/24
10.0.255.12 – 100 0 65000 65002 i
* > RD: 10.0.250.12:1 ip-prefix 10.100.0.0/16
– – 100 0 65530 i

LEAF-2#ping 10.0.255.12 source 10.0.255.11
PING 10.0.255.12 (10.0.255.12) from 10.0.255.11 : 72(100) bytes of data.
80 bytes from 10.0.255.12: icmp_seq=1 ttl=63 time=8.27 ms
80 bytes from 10.0.255.12: icmp_seq=2 ttl=63 time=7.52 ms
80 bytes from 10.0.255.12: icmp_seq=3 ttl=63 time=7.28 ms
80 bytes from 10.0.255.12: icmp_seq=4 ttl=63 time=8.67 ms
80 bytes from 10.0.255.12: icmp_seq=5 ttl=63 time=8.58 ms

— 10.0.255.12 ping statistics —
5 packets transmitted, 5 received, 0% packet loss, time 35ms
rtt min/avg/max/mdev = 7.280/8.065/8.671/0.572 ms, ipg/ewma 8.958/8.198 ms

LEAF-3

LEAF-3#sh ip route vrf gold

VRF: gold
Codes: C – connected, S – static, K – kernel,
O – OSPF, IA – OSPF inter area, E1 – OSPF external type 1,
E2 – OSPF external type 2, N1 – OSPF NSSA external type 1,
N2 – OSPF NSSA external type2, B – BGP, B I – iBGP, B E – eBGP,
R – RIP, I L1 – IS-IS level 1, I L2 – IS-IS level 2,
O3 – OSPFv3, A B – BGP Aggregate, A O – OSPF Summary,
NG – Nexthop Group Static Route, V – VXLAN Control Service,
DH – DHCP client installed default route, M – Martian,
DP – Dynamic Policy Route, L – VRF Leaked

Gateway of last resort:
B E 0.0.0.0/0 [200/0] via VTEP 10.0.255.11 VNI 100001 router-mac 0c:e8:3c:d8:63:22

B E 10.7.7.2/32 [200/0] via VTEP 10.0.255.11 VNI 100001 router-mac 0c:e8:3c:d8:63:22
C 10.7.7.3/32 is directly connected, Loopback7
B E 10.34.34.34/32 [200/0] via VTEP 10.0.255.11 VNI 100001 router-mac 0c:e8:3c:d8:63:22
B E 10.34.34.0/24 [200/0] via VTEP 10.0.255.11 VNI 100001 router-mac 0c:e8:3c:d8:63:22
C 10.78.78.0/24 is directly connected, Vlan78
B E 10.100.0.0/16 [200/0] via VTEP 10.0.255.11 VNI 100001 router-mac 0c:e8:3c:d8:63:22

LEAF-3#sh bgp evpn summary
BGP summary information for VRF default
Router identifier 10.0.250.13, local AS number 65002
Neighbor Status Codes: m – Under maintenance
Neighbor V AS MsgRcvd MsgSent InQ OutQ Up/Down State PfxRcd PfxAcc
10.0.250.1 4 65000 22 21 0 0 00:12:08 Estab 8 8

LEAF-3#sh bgp evpn route-type ip-prefix ipv4
BGP routing table information for VRF default
Router identifier 10.0.250.13, local AS number 65002
Route status codes: s – suppressed, * – valid, > – active, # – not installed, E – ECMP head, e – ECMP
S – Stale, c – Contributing to ECMP, b – backup
% – Pending BGP convergence
Origin codes: i – IGP, e – EGP, ? – incomplete
AS Path Attributes: Or-ID – Originator ID, C-LST – Cluster List, LL Nexthop – Link Local Nexthop

Network Next Hop Metric LocPref Weight Path
* > RD: 10.0.250.12:1 ip-prefix 0.0.0.0/0
10.0.255.11 – 100 0 65000 65001 65530 i
* > RD: 10.0.250.12:1 ip-prefix 10.7.7.2/32
10.0.255.11 – 100 0 65000 65001 i
* > RD: 10.0.250.13:1 ip-prefix 10.7.7.3/32
– – – 0 i
* > RD: 10.0.250.12:1 ip-prefix 10.34.34.0/24
10.0.255.11 – 100 0 65000 65001 i
* > RD: 10.0.250.13:1 ip-prefix 10.78.78.0/24
– – – 0 i
* > RD: 10.0.250.12:1 ip-prefix 10.100.0.0/16
10.0.255.11 – 100 0 65000 65001 65530 i

Host connected to LEAF-2
10.78.78.78 is the host connected to LEAF-3, 10.7.7.2 is a loopback address on LEAF-2 in same vrf, 10.7.7.3 is a loopback in LEAF-3 in same vrf.

[admin@H34] > ping count=2 10.78.78.78
SEQ HOST SIZE TTL TIME STATUS
0 10.78.78.78 timeout
1 10.78.78.78 timeout
sent=2 received=0 packet-loss=100%

[admin@H34] > ping count=2 10.7.7.2
SEQ HOST SIZE TTL TIME STATUS
0 10.7.7.2 56 64 4ms
1 10.7.7.2 56 64 4ms
sent=2 received=2 packet-loss=0% min-rtt=4ms avg-rtt=4ms max-rtt=4ms

[admin@H34] > ping count=2 10.7.7.3
SEQ HOST SIZE TTL TIME STATUS
0 10.7.7.3 timeout
1 10.7.7.3 timeout
sent=2 received=0 packet-loss=100%

As you can see, the host is able to reach at layer 3 the LEAF-2 where it is connected, but not the other site.
I tried to capture the traffic going out from LEAF-2, but apart from BGP traffic and other layer 2 stuff, nothing else is present.

Any advice?
thanks
Stefano

0
Posted by Alexis Dacquay
Answered on November 18, 2019 3:23 pm

Hi Stefano,
Do you have a diagram? and full configs?
What's the config on the leaf but also: do you have a spine?
Have you ensured to set the Spine's BGP peers to "next-hop unchanged"? Otherwise the dataplane will forward traffic to the Spine rather than then intended remote VTEPs.

Please share "show int vxlan 1" to see how's your dataplane configured.

Regards,
Alexis

0
Posted by Vikram
Answered on November 18, 2019 6:55 pm

Hi Stefano,

Could you please post a diagram as well as the output of the following commands from all your devices where applicable

show run
show ip route vrf all
show bgp ipv4 unicast summ vrf all
show bgp evpn summary
show bgp ipv4 unicast vrf all
show bgp evpn

Q: Is the host that is connected to Leaf2 another EOS based device acting as a host?

Q: Based on the output you have posted at this time it seems you have a BGP neighbor between Leaf2 and your Host. Is that correct?

Q: Are you learning the 10.78.78.78/32 or the 10.7.7.3/32 prefix on your Host that is connected to Leaf2?

Thanks

0
Posted by Stefano Sasso
Answered on November 19, 2019 3:32 pm

Hi all, thanks for the answers.
Attached you can find diagram, config files, and outputs.

To quickly answer: yes, I have a spine, and it's configured with next-hop unchanged. Btw, I don't see any vxlan traffic going out from the spine itself.
I removed the "external" bgp to simplify the troubleshooting, but even in the first stage every simulated host (which are Mikrotik CHR) had a static route to the leaf.

BTW, one strange stuff I noticed:

LEAF-2#ping vrf gold 10.7.7.3 source 10.7.7.2

PING 10.7.7.3 (10.7.7.3) from 10.7.7.2 : 72(100) bytes of data.
ping: sendmsg: Invalid argument
ping: sendmsg: Invalid argument
ping: sendmsg: Invalid argument
ping: sendmsg: Invalid argument
ping: sendmsg: Invalid argument
--- 10.7.7.3 ping statistics ---
5 packets transmitted, 0 received, 100% packet loss, time 41ms

thanks
Stefano

1
Posted by Edmund
Answered on November 19, 2019 8:03 pm

Hi Stefano:

Can you go into bash and run "ls -l /var/log/agents/EvpnrtrEncap*"?

If you see many files listed like this, you may be running into a known issue I came across recently..

-rw-rw-rw- 1 root root 6352 Nov 18 17:33 EvpnrtrEncap-1500
-rw-rw-rw- 1 root root 6352 Nov 18 17:33 EvpnrtrEncap-1540
-rw-rw-rw- 1 root root 6352 Nov 18 17:33 EvpnrtrEncap-1598
-rw-rw-rw- 1 root root 6352 Nov 18 17:33 EvpnrtrEncap-1630
-rw-rw-rw- 1 root root 6352 Nov 18 17:33 EvpnrtrEncap-2068
-rw-rw-rw- 1 root root 6352 Nov 18 17:33 EvpnrtrEncap-2072
-rw-rw-rw- 1 root root 6352 Nov 18 17:33 EvpnrtrEncap-2074
-rw-rw-rw- 1 root root 6352 Nov 18 17:33 EvpnrtrEncap-2076
-rw-rw-rw- 1 root root 6352 Nov 18 17:35 EvpnrtrEncap-2163
-rw-rw-rw- 1 root root 6352 Nov 18 17:35 EvpnrtrEncap-2165
-rw-rw-rw- 1 root root 6352 Nov 18 17:35 EvpnrtrEncap-2167
-rw-rw-rw- 1 root root 6352 Nov 18 17:35 EvpnrtrEncap-2169
-rw-rw-rw- 1 root root 6354 Nov 18 17:37 EvpnrtrEncap-2303
-rw-rw-rw- 1 root root 6354 Nov 18 17:37 EvpnrtrEncap-2305
-rw-rw-rw- 1 root root 6354 Nov 18 17:37 EvpnrtrEncap-2307

If you see a lot of these files, the EvpnrtrEncap agent is restarting frequently and indicates you are hitting the bug I suspect. This agent handles EVPN encapsulation on vEOS. Please open a TAC case if you do see this occurring.

Best Regards,
Ed

0
Posted by Stefano Sasso
Answered on November 20, 2019 9:49 am

Hi Ed,
yes, I confirm the behavior you were suspecting:
[admin@LEAF-2 agents]$ ls -lsha EvpnrtrEncap-*
8.0K -rw-rw-rw- 1 root root 6.3K Nov 20 09:36 EvpnrtrEncap-2942
8.0K -rw-rw-rw- 1 root root 6.3K Nov 20 09:36 EvpnrtrEncap-3016
8.0K -rw-rw-rw- 1 root root 6.3K Nov 20 09:36 EvpnrtrEncap-3056
8.0K -rw-rw-rw- 1 root root 6.3K Nov 20 09:36 EvpnrtrEncap-3094
8.0K -rw-rw-rw- 1 root root 6.3K Nov 20 09:38 EvpnrtrEncap-3706
8.0K -rw-rw-rw- 1 root root 6.3K Nov 20 09:38 EvpnrtrEncap-3708
8.0K -rw-rw-rw- 1 root root 6.3K Nov 20 09:38 EvpnrtrEncap-3719
8.0K -rw-rw-rw- 1 root root 6.3K Nov 20 09:38 EvpnrtrEncap-3721
8.0K -rw-rw-rw- 1 root root 6.3K Nov 20 09:40 EvpnrtrEncap-3842
8.0K -rw-rw-rw- 1 root root 6.3K Nov 20 09:40 EvpnrtrEncap-3844
8.0K -rw-rw-rw- 1 root root 6.3K Nov 20 09:40 EvpnrtrEncap-3852
8.0K -rw-rw-rw- 1 root root 6.3K Nov 20 09:40 EvpnrtrEncap-3854
8.0K -rw-rw-rw- 1 root root 6.3K Nov 20 09:42 EvpnrtrEncap-3898
8.0K -rw-rw-rw- 1 root root 6.3K Nov 20 09:42 EvpnrtrEncap-3900
8.0K -rw-rw-rw- 1 root root 6.3K Nov 20 09:42 EvpnrtrEncap-3902
8.0K -rw-rw-rw- 1 root root 6.3K Nov 20 09:42 EvpnrtrEncap-3904
[admin@LEAF-2 agents]$ date
Wed Nov 20 09:43:36 UTC 2019

Can I open a TAC case even if I don't have a valid support license? I was trying vEOS to test EOS before deciding to buy the hardware...

thanks,
stefano

0
Posted by Alexis Dacquay
Answered on November 20, 2019 11:28 am

Stefano,

I didn't spot anything wrong with your configurations, although I did not test them.

Under the leaves, you may want to remove the evpn peering that get enabled by default.

address-family ipv4
no neighbor evpn activate <=== new It doesn't cause problem, it is just a good practice to keep things clear in your "show" command outputs.

1
Posted by Stefano Sasso
Answered on November 20, 2019 3:25 pm

Quick update:
I opened a TAC case, they confirmed it's a bug only present in 4.23.
With 4.22 everything is working fine.

thanks guys!

0
Posted by Tom McCormack
Answered on November 29, 2019 3:37 am

Thankyou Stefano !!!

I've been pulling my hair out for the last few days trying to figure out the EXACT same issue

As you've outlined I've just downgraded to 4.22.2.1F and its worked immediately

Rgds
Tom

Post your Answer

You must be logged in to post an answer.