Posted on August 8, 2016 5:40 am
 |  Asked by Christopher Woodfield
 |  2465 views
0
0
Print Friendly, PDF & Email

I’m running vEOS (4.17.0F) in a KVM hypervisor along with an instance of Cumulus VX 3.0.1. I’m attempting to bring up a simple BGP session between the devices through a virbr virtual network, but AFACT the vEOS instance is failing to ack any SYN packets to TCP/179 coming from the Cumulus instance; every 30 seconds the vEOS instance sends a SYN to TCP/179 to the Cumulus instance, which then sends a SYNACK back…which vEOS doesn’t acknowledge either.

I’m not seeing any oddities WRT the BGP configs or basic reachability (the fact that I do see SYNs going in both directions gives me confidence I’m not doing anything wrong here).

vEOS BGP config looks like so (10.0.0.0 is another vEOS, 10.24.0.0 is the Cumulus device)

router bgp 4200000101
   maximum-paths 128
neighbor CORE-V4 peer-group
neighbor CORE-V4 send-community
neighbor CORE-V4 maximum-routes 12000
neighbor 10.10.0.0 peer-group CSFS-V4
neighbor 10.10.0.0 remote-as 4200000000
no neighbor 10.10.0.0 shutdown

neighbor 10.10.24.0 peer-group CORE-V4
neighbor 10.10.24.0 remote-as 4200000003
no neighbor 10.10.24.0 shutdown

The quagga config on the Cumulus device looks like so:

router bgp 4200000003
bgp router-id 10.0.0.3
neighbor CORE-V4 peer-group
neighbor CORE-V4 capability dynamic
neighbor 10.10.24.1 remote-as 4200000101
neighbor 10.10.24.1 peer-group CORE-V4
!
address-family ipv4 unicast
neighbor CORE-V4 activate
exit-address-family

PCAP of the resulting BGP SYN (and occasional SYNACK) is attached. Has anyone run into anything similar here?

0
Posted by Vikram
Answered on August 8, 2016 5:52 am

Hi Christopher,

Could you please include a show tech from the veos side so we can look into this further? In addition could you paste the output of "show lldp neighbors" directly in the post. Just as a FYI I tried this out quickly with both VMs (vEOS & cumulus VX 3.0.1 on ESXi) and it seems to work fine for me. Here’s the configuration and cli command output from both sides for your reference. Thx

Cumulus VX 3.0.1

cumulus# sh run zebra 
hostname zebra
log file /var/log/quagga/zebra.log
!
!
interface eth0
 link-detect
 ipv6 nd suppress-ra
!
interface lo
 link-detect
!
interface swp1
 link-detect
 ipv6 nd suppress-ra
!
interface swp2
 link-detect
!
interface swp3
 link-detect
!
ip forwarding
ipv6 forwarding
!
!
line vty
!
cumulus# sh run bgpd 
hostname bgpd
log file /var/log/quagga/bgpd.log
log timestamp precision 6
!
!
router bgp 4200000003
 bgp router-id 10.0.0.3
 neighbor CORE-V4 peer-group
 neighbor CORE-V4 capability dynamic
 neighbor 10.2.1.0 remote-as 4200000101
 neighbor 10.2.1.0 peer-group CORE-V4
 !
 address-family ipv4 unicast
  neighbor CORE-V4 activate
 exit-address-family
!
line vty
!
cumulus# 
cumulus# sh int swp1
Interface swp1 is up, line protocol is up
  Link ups:       0    last: (never)
  Link downs:     0    last: (never)
  PTM status: disabled
  vrf: Default-IP-Routing-Table
  index 3 metric 0 mtu 1500 
  flags: <UP,BROADCAST,RUNNING,MULTICAST>
  HWaddr: 00:0c:29:5c:d1:c5
  inet 10.2.1.1/31
  inet6 fe80::20c:29ff:fe5c:d1c5/64
cumulus# 
cumulus# show ip bgp summ
BGP router identifier 10.0.0.3, local AS number 4200000003 vrf-id 0
BGP table version 1
RIB entries 1, using 120 bytes of memory
Peers 1, using 16 KiB of memory
Peer groups 1, using 56 bytes of memory

Neighbor        V    AS MsgRcvd MsgSent   TblVer  InQ OutQ Up/Down  State/PfxRcd
10.2.1.0        4 4200000101     526     527        0    0    0 00:03:47        1

Total number of neighbors 1
cumulus# sh ip bgp
BGP table version is 1, local router ID is 10.0.0.3
Status codes: s suppressed, d damped, h history, * valid, > best, = multipath,
              i internal, r RIB-failure, S Stale, R Removed
Origin codes: i - IGP, e - EGP, ? - incomplete

   Network          Next Hop            Metric LocPrf Weight Path
*> 10.0.0.1/32      10.2.1.0                               0 4200000101 i

Total number of prefixes 1
cumulus# 

vEOS 4.17.0F

Leaf3.01:48:21#show ver | in Arista|image
Arista vEOS
Software image version: 4.17.0F
Leaf3.01:48:23#sh run int eth2
interface Ethernet2
   no switchport
   ip address 10.2.1.0/31
Leaf3.01:48:25#sh run | b router bgp
router bgp 4200000101
   neighbor CORE-V4 peer-group
   neighbor CORE-V4 send-community
   neighbor CORE-V4 maximum-routes 12000 
   neighbor 10.2.1.1 peer-group CORE-V4
   neighbor 10.2.1.1 remote-as 4200000003
   no neighbor 10.2.1.1 shutdown
   network 10.0.0.1/32
!
Leaf3.01:48:29#sh ip bgp summ
BGP summary information for VRF default
Router identifier 192.168.10.9, local AS number 4200000101
Neighbor Status Codes: m – Under maintenance
  Neighbor         V  AS           MsgRcvd   MsgSent  InQ OutQ  Up/Down State  PfxRcd PfxAcc
  10.2.1.1         4  4200000003       479       488    0    0 00:12:36 Estab  0      0
Leaf3.01:48:35#
Leaf3.01:48:38#sh ip bgp ne
BGP neighbor is 10.2.1.1, remote AS 4200000003, external link
  BGP version 4, remote router ID 10.0.0.3, VRF default
  Inherits configuration from and member of peer-group CORE-V4
  Negotiated BGP version 4
  Last read 00:00:02, last write 00:00:02
  Hold time is 9, keepalive interval is 3 seconds
  Configured hold time is 180, keepalive interval is 60 seconds
  Connect timer is inactive
  Idle-restart timer is inactive
  BGP state is Established, up for 00:12:41
  Number of transitions to established: 3
  Last state was OpenConfirm
  Last event was RecvKeepAlive
  Last sent notification:Cease/administrative reset, Last time 00:12:43
  Last rcvd notification:Cease/administrative shutdown, Last time 00:17:37
  Last rcvd socket-error:received unexpected EOF, Last time 00:17:13, First time 00:17:33, Repeats 5
  Neighbor Capabilities:
    Multiprotocol IPv4 Unicast: advertised and received and negotiated
    Four Octet ASN: advertised and received
    Route Refresh: advertised and received and negotiated
    Send End-of-RIB messages: advertised and received and negotiated
    Dynamic Capabilities: received
    Additional-paths Receive:
      IPv4 Unicast: advertised and received
  Restart timer is inactive
  End of rib timer is inactive
  Message statistics:
    InQ depth is 0
    OutQ depth is 0
                         Sent      Rcvd
    Opens:                  9         3
    Notifications:          1         1
    Updates:                4         4
    Keepalives:           476       473
    Route-Refresh:          0         0
    Total messages:       490       481
  Prefix statistics:
                         Sent      Rcvd
    IPv4 Unicast:           1         0
    IPv6 Unicast:           0         0
  Inbound updates dropped by reason:
    AS path loop detection: 1
    Enforced First AS: 0
    Malformed MPBGP routes: 0
    Originator ID matches local router ID: 0
    Nexthop matches local IP address: 1
    Unexpected IPv6 nexthop for IPv4 routes: 0
    Nexthop invalid for single hop eBGP: 0
  Inbound paths dropped by reason:
    IPv4 labeled-unicast NLRIs dropped due to excessive labels: 0
  Outbound paths dropped by reason:
    IPv4 local address not available: 0
    IPv6 local address not available: 0
Local AS is 4200000101, local router ID 192.168.10.9
TTL is 1
Local TCP address is 10.2.1.0, local port is 179
Remote TCP address is 10.2.1.1, remote port is 44612
Auto-Local-Addr is disabled
TCP Socket Information:
  TCP state is ESTABLISHED
  Recv-Q: 0/32768
  Send-Q: 0/32768
  Outgoing Maximum Segment Size (MSS): 1448
  Total Number of TCP retransmissions: 0
  Options:
    Timestamps enabled: yes
    Selective Acknowledgments enabled: yes
    Window Scale enabled: yes
    Explicit Congestion Notification (ECN) enabled: no
  Socket Statistics:
    Window Scale (wscale): 8,7
    Retransmission Timeout (rto): 216.0ms
    Round-trip Time (rtt/rtvar): 17.5ms/3.0ms
    Delayed Ack Timeout (ato): 40.0ms
    Congestion Window (cwnd): 10
    TCP Throughput: 6.62 Mbps
    Advertised Recv Window (rcv_space): 14480
 
Leaf3.01:48:42#
0
Answered on August 9, 2016 3:54 am

Hi,

The output of show tech is attached. LLDP is below – Arista side first:

closlab-cs1#show lldp neighbors
Last table change time   : 0:00:54 ago
Number of table inserts  : 1
Number of table deletes  : 0
Number of table drops    : 0
Number of table age-outs : 0
Port       Neighbor Device ID               Neighbor Port ID           TTL
Et4        cumulus                          swp1                       120


lab@cumulus:~$ sudo lldpcli show neighbors
-------------------------------------------------------------------------------
LLDP neighbors:
-------------------------------------------------------------------------------
Interface:    swp1, via: LLDP, RID: 1, Time: 0 day, 00:03:56
  Chassis:     
    ChassisID:    mac 52:54:00:82:87:37
    SysName:      closlab-cs1
    SysDescr:     Arista Networks EOS version 4.17.0F running on an Arista Networks vEOS
    MgmtIP:       10.0.0.10
    Capability:   Bridge, on
    Capability:   Router, on
  Port:        
    PortID:       ifname Ethernet4
    PortDescr:    Ethernet1:FS4
-------------------------------------------------------------------------------
lab@cumulus:~$

0
Posted by Vikram
Answered on August 19, 2016 6:19 am

Hi, sorry for the delayed response. Everything seems ok from a config perspective. The only thing that seemed a bit off was that in the show tech the LLDP neighbors were not being shown at all but the output in your post above shows the same. Is there any particular reason why there are no other devices shown in the output? The reason I ask this is because per your initial post you had mentioned there were other virtual switches in the environment. Your show-tech seems to indicate a BGP peering relationship is established with 10.0.0.0 so I would expect to see that in the lldp output since that is another vEOS device as per your initial post.

I was hoping to verify the underlying connectivity between your virtual devices based on your hypervisor via LLDP hence the LLDP related queries. Based on the information provided the underlying connectivity seems fine however was just curious about the anomalies.

Could you please try and use 2-Byte ASNs between the two virtual switches and verify if that works? Alternatively, if you are using existing Arista switches in your environment then please feel free to go ahead and open a Tac case so we can look into this further. Thx

0
Answered on February 20, 2017 10:58 pm

Revisited this a while later, admittedly – and solved the issue. It appears that this is due to TCP checksums being incorrect, which wireshark (which I’d been using for earlier troubleshooting) doesn’t validate by default. In turn, the invalid checksums are due to the Cumulus VX image having TCP checksum offload enabled by default on a virtio interface. Disabling the offload there resulted in correct checksums and successful BGP peer establishment.

Mailing list thread on the subject that pointed me to the fix: https://www.redhat.com/archives/libvirt-users/2016-March/msg00035.html

Post your Answer

You must be logged in to post an answer.