• Basic BGP Troubleshooting

 
 
Print Friendly, PDF & Email

Objective

The objective of this document is to outline the various common issues faced in BGP and the troubleshooting commands for the same.

I. Neighborship

BGP sends unicast messages, unlike other routing protocols. For this reason, please make sure the neighbor’s IP address is reachable.

For issues with BGP neighborship, check the output of ‘show ip bgp summary vrf all’ to check the neighborship state.

R1#show ip bgp summary vrf all
BGP summary information for VRF default
Router identifier 1.1.1.1, local AS number 100
Neighbor Status Codes: m - Under maintenance
Neighbor   V AS MsgRcvd MsgSent InQ OutQ   Up/Down State PfxRcd PfxAcc
12.12.12.2 4 200 16        15    0   0     00:01:05 Estab  3      3

The final state of a BGP neighbor is expected to be Established. Following are some pointers to keep in mind in case the BGP peering is stuck in an intermediate state/flapping:

a. Idle (NoIf)

This means that there is no interface to initiate the TCP session.

R1#show ip bgp summary vrf all
BGP summary information for VRF default
Router identifier 1.1.1.1, local AS number 100
Neighbor Status Codes: m - Under maintenance
Neighbor V AS MsgRcvd MsgSent InQ OutQ Up/Down State PfxRcd PfxAcc
2.2.2.2  4 200 0        0      0    0  00:00:51 Idle(NoIf)

This is usually seen in the following scenarios:

  • If the physical interface is down.
  • If an eBGP neighborship is being formed between loopback IPs or routers which are multiple hops away, please make sure ‘ebgp-multihop’ is configured for the neighbor. This is to disable the ‘directly connected’ check for the eBGP neighbor and to increase the TTL for eBGP packets (default is 1).

Syntax: Under ‘router bgp’, neighbor <ip-address> ebgp-multihop

(config-router-bgp)#neighbor 2.2.2.2 ebgp-multihop

b. Idle(MaxPath)

If the routes received from the peer are more than the configured routes, the peer will be stuck in this state.

R1#show ip bgp summary vrf all
BGP summary information for VRF default
Router identifier 1.1.1.1, local AS number 100
Neighbor Status Codes: m - Under maintenance
Neighbor   V AS MsgRcvd MsgSent InQ OutQ Up/Down  State  PfxRcd PfxAcc
12.12.12.2 4 200 8        10     0   0   00:00:08 Idle(MaxPath)

You could check the number of routes that could be accepted from a neighbor under the BGP configuration.

router bgp 100
neighbor 12.12.12.2 remote-as 200
neighbor 12.12.12.2 maximum-routes 300

Please change the ‘maximum-routes’ configured for the neighbor and set the idle-restart-timer for the neighbor or reset the BGP connection with ‘clear ip bgp’.

neighbor 12.12.12.2 idle-restart-timer <60-4294967295> 
Time (in seconds) before restarting, after going to idle state

Note: ‘clear ip bgp *’ will reset all the IPv4 and IPv6 peering sessions.

c. The neighborship state is flapping between Connect and Active:

In this case, the issue is seen generally seen due to TCP 3-way handshake not being successful.

BGP operates over TCP port 179. We need to ensure that the port is open on the BGP peers and there are no access-lists blocking the BGP peer’s IP address/ port number.

The following commands could be used to ensure the above conditions are met:

  • Telnet: Telnet to the neighbor’s IP destined to the TCP port 179.
Syntax: telnet <neighbor-ip> 179

R1#telnet 12.12.12.2 179
Trying 12.12.12.2...
Connected to 12.12.12.2.
Escape character is 'off'.

  • netstat: We could use this linux command to check if the port is open on the peers.
Syntax: Under bash mode, issue ‘netstat -ant’ and grep for the port 179.

[admin@R1 ~]$ netstat -ant | grep 179
tcp  0 0 0.0.0.0:179      0.0.0.0:*      LISTEN
tcp  0 0 12.12.12.1:51265 12.12.12.2:179 ESTABLISHED
tcp6 0 0 :::179           :::*           LISTEN

d. Stuck in Active state

This is usually seen due to BGP misconfiguration:

  •  Mismatch in the BGP neighbor’s AS number in the configuration.

The reason could be seen using the ‘show ip bgp neighbors’ output.

R1#show ip bgp neighbors 12.12.12.2
BGP neighbor is 12.12.12.2, remote AS 100, internal link
BGP version 4, remote router ID 0.0.0.0, VRF default
.
.
BGP state is Active
Peering failure hint: Open Message Error/bad AS number
Last sent notification:Open Message Error/bad AS number, Last time 00:00:04, First time 00:00:18, Repeats 4
.
<..>

Please configure the right ‘remote-as’ number for the neighbor.

neighbor 12.12.12.2 remote-as 200

  •  Duplicate router IDs for iBGP neighborship.
R1#show ip bgp neighbors 12.12.12.2
BGP neighbor is 12.12.12.2, remote AS 100, internal link
BGP version 4, remote router ID 0.0.0.0, VRF default
.
.
BGP state is Active
Peering failure hint: Open Message Error/bad BGP ID
.
.
Last sent notification:Open Message Error/bad BGP ID, Last time 00:00:01, First time 00:00:25, Repeats 6
<..>

This message essentially means that both the peering routers have the same ‘router-id’ assigned to them.

For any iBGP peering, if both the routers have same router-ids then the connection will be terminated with BAD BGP ID notification. To overcome this, please ensure that the ‘router-id’ configured on the device is unique.

Note: This router-id check has been relaxed from 4.21.3F onwards and the neighborship will be established even with the same router-id.

 

tcpdump is a useful utility for debugging in such cases to check the packets originated/destined from/to CPU. A packet capture could be collected using the following command:

From CLI:

#tcpdump interface ethernet 1/1 verbose filter host 12.12.12.2 and port 179

tcpdump: listening on et1_1, link-type EN10MB (Ethernet), capture size 262144 bytes
06:56:38.285165 44:4c:a8:c6:5c:03 > 28:99:3a:da:b1:1a, ethertype IPv4 (0x0800), length 74: (tos 0xc0, ttl 64, id 53825, offset 0, flags [DF], proto TCP (6), length 60)
12.12.12.2.50114 > 12.12.12.1.bgp: Flags [S], seq 3032203684, win 29200, options [mss 1460,sackOK,TS val 429134821 ecr 0,nop,wscale 7], length 0
06:56:38.285208 28:99:3a:da:b1:1a > 44:4c:a8:c6:5c:03, ethertype IPv4 (0x0800), length 74: (tos 0x0, ttl 255, id 0, offset 0, flags [DF], proto TCP (6), length 60)
12.12.12.1.bgp > 12.12.12.2.50114: Flags [S.], seq 2467440916, ack 3032203685, win 28960, options [mss 1460,sackOK,TS val 143763724 ecr 429134821,nop,wscale 7], length 0
06:56:38.285329 44:4c:a8:c6:5c:03 > 28:99:3a:da:b1:1a, ethertype IPv4 (0x0800), length 66: (tos 0xc0, ttl 64, id 53826, offset 0, flags [DF], proto TCP (6), length 52)
12.12.12.2.50114 > 12.12.12.1.bgp: Flags [.], seq 1, ack 1, win 229, options [nop,nop,TS val 429134821 ecr 143763724], length 0
06:56:38.286383 44:4c:a8:c6:5c:03 > 28:99:3a:da:b1:1a, ethertype IPv4 (0x0800), length 121: (tos 0xc0, ttl 64, id 53827, offset 0, flags [DF], proto TCP (6), length 107)
12.12.12.2.50114 > 12.12.12.1.bgp: Flags [P.], seq 1:56, ack 1, win 229, options [nop,nop,TS val 429134822 ecr 143763724], length 55: BGP
Open Message (1), length: 55
Version 4, my AS 100, Holdtime 180s, ID 2.2.2.2
Optional parameters, length: 26
Option Capabilities Advertisement (2), length: 24
Graceful Restart (64), length: 2
Restart Flags: [R], Restart Time 300s
0x0000: 812c
Route Refresh (2), length: 0
Multiprotocol Extensions (1), length: 4
AFI IPv4 (1), SAFI Unicast (1)
0x0000: 0001 0001
32-Bit AS Number (65), length: 4
4 Byte AS 100
0x0000: 0000 0064
Multiple Paths (69), length: 4
AFI IPv4 (1), SAFI Unicast (1), Send/Receive: Receive
0x0000: 0001 0101

From bash:
$ tcpdump -nevvi et1 host 12.12.12.2 and port 179

Note: If the neighborship is configured in a non-default VRF, please make sure the namespace is changed appropriately while using ‘tcpdump’ from bash. For instance,

R1#show vrf
Maximum number of vrfs allowed: 1023
Vrf RD Protocols State Interfaces
------------- --------- --------------- ------------------- -----------

vrf-red 2:2 ipv4,ipv6 v4:routing, Ethernet1/1
                      v6:no routing
[admin@R1 ~]$ sudo ip netns exec ns-vrf-red tcpdump -nevvi et1_1 host 12.12.12.2 and port 179
tcpdump: listening on et1_1, link-type EN10MB (Ethernet), capture size 262144 bytes
03:26:03.937766 44:4c:a8:c6:5c:03 > 28:99:3a:da:b1:1a, ethertype IPv4 (0x0800), length 87: (tos 0xc0, ttl 1, id 60471, offset 0, flags [DF], proto TCP (6), length 73)
12.12.12.2.bgp > 12.12.12.1.38498: Flags [P.], seq 3487058549:3487058570, ack 2484704575, win 227, options [nop,nop,TS val 469176234 ecr 183802914], length 21: BGP

 

II. Route Advertisement/Reception

The default behavior of BGP is to advertise the routes which are installed in the routing table.

To advertise a route in BGP, use the ‘network’ command.

Under ‘router bgp’, network x.x.x.x/y. Please make sure the route x.x.x.x/y is present in the routing table.

a. Route reception issue

If routes advertised from a BGP neighbor are not seen in the output of ‘show ip bgp’ on the receiving peer:

  • Confirm that the route is actually being advertised from the advertising peer with show ip bgp neighbor <ip> advertised-routes
  • Check if there is any route-map configured for the neighbor. If yes, check if there are any clauses in the route-maps in the inbound direction for the neighbor blocking the route from entering the BGP table. If yes, please configure a ‘permit’ rule for the route.

You could check the routes received from a neighbor using show ip bgp neighbor <ip> received-routes

R1#show ip bgp neighbors 12.12.12.2 received-routes
BGP routing table information for VRF default
Router identifier 1.1.1.1, local AS number 100
Route status codes: s - suppressed, * - valid, > - active, # - not installed, E - ECMP head, e - ECMP
S - Stale, c - Contributing to ECMP, b - backup, L - labeled-unicast
Origin codes: i - IGP, e - EGP, ? - incomplete
AS Path Attributes: Or-ID - Originator ID, C-LST - Cluster List, LL Nexthop - Link Local Nexthop

     Network       Next Hop   Metric  LocPref Weight Path
* > 3.3.3.0/24     12.12.12.2  -       -      -      200 ?
* > 77.77.77.77/32 12.12.12.2  -       -      -      200 ?

b. Route advertisement issue

For advertisement issues, check if there are any clauses in the route-maps in the outbound direction for the neighbor blocking the route from being advertised. If yes, please configure a ‘permit’ rule for the route.

You could check the routes advertised to a neighbor using show ip bgp neighbor <ip> advertised-routes

R2#show ip bgp neighbors 12.12.12.1 advertised-routes
BGP routing table information for VRF default
Router identifier 2.2.2.2, local AS number 200
Route status codes: s - suppressed, * - valid, > - active, # - not installed, E - ECMP head, e - ECMP
S - Stale, c - Contributing to ECMP, b - backup, L - labeled-unicast, q - Queued for advertisement
% - Pending BGP convergence
Origin codes: i - IGP, e - EGP, ? - incomplete
AS Path Attributes: Or-ID - Originator ID, C-LST - Cluster List, LL Nexthop - Link Local Nexthop

    Network       Next Hop    Metric LocPref Weight Path
* > 77.77.77.7/32 12.12.12.2    -       -      0    200 ?

c. AS path loop

BGP’s loop prevention mechanism discards prefixes from being inserted into the BGP table if a peer detects its own AS in the AS path for a prefix. In such cases, by default, the routes will not be seen even in the output of ‘show ip bgp neighbor <ip> received-routes

However, the number of prefixes discarded due to this behavior could be seen in the output of ‘show ip bgp neighbor

R1#show ip bgp neighbors 12.12.12.2
BGP neighbor is 12.12.12.2, remote AS 200, external link
BGP version 4, remote router ID 2.2.2.2, VRF default
Negotiated BGP version 4
.
.
Inbound updates dropped by reason:
AS path loop detection: 1
Enforced First AS: 0
.
<..>

The loop prevention could be avoided with:

  • Removing the receiving router’s AS in the AS path for the prefix.
  • Configuring ‘allow-as in’. This relaxes the loop detection rule in BGP.

Note: Please consider the consequences before enabling command as this may cause routing loops.

III. Route Installation

Once it is identified that the NLRI (prefix) in question is correctly advertised by the peer(s) and received by another peer, we can confirm if the NLRI is installed as the route in the routing table (output of “show ip route”).

Case 1: The prefix received from one peer is preferred over the same prefix received from another peer.

The “show ip bgp” command lists all the prefixes received/advertised that are part of the BGP routing table. For a specific prefix, we can check the parameters (nexthop, metric, origin, etc.). For a specific prefix, the entry marked as best is used to install the BGP route in the routing table.

R2#show ip bgp 5.5.5.5/32
BGP routing table information for VRF default
Router identifier 2.2.2.2, local AS number 200
BGP routing table entry for 5.5.5.5/32
Paths: 2 available
100
12.12.12.1 from 12.12.12.1 (1.1.1.1)
Origin IGP, metric 0, localpref 100, IGP metric 1, weight 0, received 00:14:09 ago, valid, external, not installed (better AD route present)
Rx SAFI: Unicast
300
23.23.23.3 from 23.23.23.3 (3.3.3.3)
Origin IGP, metric 0, localpref 100, IGP metric 1, weight 0, received 00:14:04 ago, valid, external
Rx SAFI: Unicast

A more detailed command is “show ip bgp x.x.x.x/y detail”. This output lists the reason why other prefixes were not preferred (based on the BGP Best Path Selection Algorithm).

R2#show ip bgp 5.5.5.5/32 detail
BGP routing table information for VRF default
Router identifier 2.2.2.2, local AS number 200
Route status: [a.b.c.d] - Route is queued for advertisement to peer.
BGP routing table entry for 5.5.5.5/32
Paths: 2 available
100
12.12.12.1 from 12.12.12.1 (1.1.1.1)
Origin IGP, metric 0, localpref 100, IGP metric 1, weight 0, received 01:19:29 ago, valid, external, best
Rx SAFI: Unicast
300
23.23.23.3 from 23.23.23.3 (3.3.3.3)
Origin IGP, metric 0, localpref 100, IGP metric 1, weight 0, received 01:19:24 ago, valid, external
Rx SAFI: Unicast
Not best: Router ID
Advertised to 1 peers:
23.23.23.3

Case 2: Route for the prefix is installed from a routing protocol other than BGP

When contending routes for a network from various routing protocols are present, the route with a better AD is installed in the routing table. For instance, the route learnt from OSPF if preferred over the one learnt from BGP. In such a scenario, the routing table will have the information:

R2#show ip route 5.5.5.5/32

VRF: default
Codes: C - connected, S - static, K - kernel,
O - OSPF, IA - OSPF inter area, E1 - OSPF external type 1,
E2 - OSPF external type 2, N1 - OSPF NSSA external type 1,
N2 - OSPF NSSA external type2, B I - iBGP, B E - eBGP,
R - RIP, I L1 - IS-IS level 1, I L2 - IS-IS level 2,
O3 - OSPFv3, A B - BGP Aggregate, A O - OSPF Summary,
NG - Nexthop Group Static Route, V - VXLAN Control Service,
DH - DHCP client installed default route, M - Martian

O E2 5.5.5.5/32 [110/1] via 45.45.45.1, Ethernet38

Since none of the BGP table entries for the prefix are installed, we see “#” for the entry:

R2(config)#show ip bgp
BGP routing table information for VRF default
Router identifier 2.2.2.2, local AS number 200
Route status codes: s - suppressed, * - valid, > - active, # - not installed, E - ECMP head, e - ECMP
S - Stale, c - Contributing to ECMP, b - backup, L - labeled-unicast
Origin codes: i - IGP, e - EGP, ? - incomplete
AS Path Attributes: Or-ID - Originator ID, C-LST - Cluster List, LL Nexthop - Link Local Nexthop

    Network    Next Hop     Metric LocPref Weight Path
* # 5.5.5.5/32 12.12.12.1     0      100      0   100 i
*   5.5.5.5/32 23.23.23.3     0      100      0   300 i
* > 6.6.6.6/32     -          0       0       -       i

Since the route is not installed from BGP, we see the reason “better AD route present” in the following:

R2#show ip bgp 5.5.5.5/32 detail
BGP routing table information for VRF default
Router identifier 2.2.2.2, local AS number 200
Route status: [a.b.c.d] - Route is queued for advertisement to peer.
BGP routing table entry for 5.5.5.5/32
Paths: 2 available
100
12.12.12.1 from 12.12.12.1 (1.1.1.1)
Origin IGP, metric 0, localpref 100, IGP metric 1, weight 0, received 00:58:24 ago, valid, external, not installed (better AD route present)
Rx SAFI: Unicast
300
23.23.23.3 from 23.23.23.3 (3.3.3.3)
Origin IGP, metric 0, localpref 100, IGP metric 1, weight 0, received 00:58:19 ago, valid, external
Rx SAFI: Unicast
Not best: Router ID
Not advertised to any peer

Note: ‘advertise-inactive’ causes the best BGP route to be advertised to BGP neighbors even if a route with a better AD is present for the same prefix in the routing table.

Case 3: No route to the next hop

In case the next hop for the prefix in the BGP routing table is not reachable, the route will not be installed in the routing table. Such a prefix will be marked invalid and will be shown as below:

R3#show ip bgp 7.7.7.7/32 detail
BGP routing table information for VRF default
Router identifier 1.1.1.1, local AS number 100
BGP routing table entry for 7.7.7.7/32
Paths: 1 available
200 300
23.23.23.3 from 12.12.12.2 (2.2.2.2)
Origin IGP, metric -, localpref 100, weight 0, invalid, external

To clarify further, check for the route to the next-hop in the routing table.

VI. Logs collection:

For issues related to BGP, please collect the following data:

  • The output of the following commands:
# show tech-support ribd vrf all | no-more
# show tech-support bgp | no-more         // If multi-agent is configured.
# show agent logs | no-more
# show agent qtrace | no-more
  • Packet captures

Please note that Arista supports two models in which BGP is implemented: Ribd and ArBGP. In case of ArBGP (multi-agent), check for “service routing protocols model multi-agent” in the running-config.

If the issues aren’t resolved even after performing the above checks, please engage Arista TAC by sending an email to support@arista.com.

Follow

Get every new post on this blog delivered to your Inbox.

Join other followers: