• High CPU on FHR or RP due to PIM

 
 
Print Friendly, PDF & Email

Introduction

High CPU due to PIM processes is not always a bug and may be caused by either a misconfiguration or a routing issue.  For the purposes of this document we will focus primarily on network misconfiguration.

Overview

Network wide choppy video, music on hold streams or loud speaker issues are commonly caused by multicast problems in the network.  Should the issue be network wide and not isolated to one area of the network, the next place to start looking is high CPU on the First Hop Routers (FHR) and/or the Rendezvous Point (RP).  When the CPU process is high due to PIM processes there are a couple of triggers which may cause this.  One would be routing inconsistencies and you would see other network issues in conjunction. Next would be a misconfiguration of the network. 

While in the majority of multicast cases you want to start from a single receiver to the RP & Source, in network wide problems you want to start from the Source to the RP.  For the first step confirm your IPs:  Source; RP; and Test Receiver. Now that you have defined these IPs you narrow the scope of your troubleshooting and are ready to start.

Background

The multicast registration process allows multicast streams to be advertised to the root of the multicast tree. This is explained in the following article.

https://www.arista.com/en/um-eos/eos-protocol-independent-multicast

The main responsibilities for the RP and FHR are summarized below.  These steps are important to understand why this broken process would cause high CPU and to understand which step in the process is broken to identify what needs to be fixed.

  1. Source sends traffic to the FHR. 
  2. FHR encapsulates the multicast traffic in a unicast Register packet to the RP which dynamically forms a GRE tunnel known as a PIM tunnel.
  3. The RP receives the Register packet, decapsulates the packets and forms a state (form S,G) with the FHR .
  4. The RP now generates the (S,G) join towards the FHR
  5. FHR now receives the (S,G)  join, adds the Outgoing Interface List (OIL) towards the RP and starts sending native multicast traffic towards the RP.  
  6. A Register Stop is sent from the RP to the FHR once the native multicast traffic arrives, tearing down the PIM tunnel
  7. Once the FHR receives the Register Stop, NULL registers are sent to the RP to maintain state with the RP.

When looking at a tcpdump of the PIM Registration, you can see the SRC & DST IPs of the GRE header.  Further down in the packet you  can see the actual source IP and the actual multicast group.  

12:18:57.561616 28:99:3a:26:d4:4f > 98:5d:82:c1:83:ff, ethertype IPv4 (0x0800), length 88: (tos 0x0, ttl 255, id 54936, offset 0, flags [DF], proto PIM (103), length 74)
    192.168.15.1 > 9.9.9.9: PIMv2, length 54  
        Register, cksum 0xdeff (correct), Flags [ none ]
        (tos 0x0, ttl 63, id 0, offset 0, flags [none], proto UDP (17), length 46)
    1.1.1.1.50001 > 239.10.10.10.50001: UDP, length 18

It is essential to understand that the registration process leverages software forwarding (cpu cycles) via a dynamic GRE tunnel. Forwarding
packets require additional encapsulation by the FHR and deencapsulation by the RP.
Traffic is not forwarded fully in hardware until the FHR receives the Register-Stop from the RP.  As the Register-Stop is a PIM packet, you must ensure that there is a PIM path between the RP and the FHR.  If there is not a PIM path then you will be stuck in Register and processing via CPU and not hardware.  

For more information regarding how to troubleshoot packets at the CPU, please refer to the following article:

https://eos.arista.com/troubleshooting-multicast-packets-to-cpu/

As you can see in the diagram above, PIM is only configured on three out of four interfaces between the FHR and the RP.  The “PIM-Transit” router does not have PIM configured on its uplink port to RP. Without a complete PIM path from the RP to the FHR, the RP cannot complete RPF (reverse path forwarding), cut over to the native port which forwards in hardware and tell the FHR to tear down the PIM tunnel by sending a Register-Stop.  

You can confirm that state (FHR notified the RP that there is a server, 1.1.1.1, for group 239.10.10.10) has been formed between the FHR and the RP as seen below.  You can also see the incoming interface from which you would receive the traffic from the source and also an outgoing interface would be listed in the cases are receiver/s downstream.  In this example, there are no interested receivers and thus no outgoing interface.

RP#show ip mroute 239.10.10.10
        ***snip***
        239.10.10.10
        1.1.1.1, 0:29:51, flags: SLP
           Incoming interface: Vlan100

The most common reason for high CPU on the FHR is “stuck in register”. First is the absence of PIM-Sparse enabled interfaces along the unicast path between FHR and the RP is the most common reason for this.  The second most common cause is a routing inconsistencies could exist.  Routing inconsistencies could result in Register Stop packet being lost in the network possibly due to asymmetric routing which would cause the path from the FHR to RP to be a PIM path but from the RP to the FHR to not be a PIM path.

But what happens when that request, which is a PIM packet, does not make it from the RP to the FHR?  In such a scenario the PIM Tunnel remains up, all traffic for that group is still forwarded via CPU and the CPUs get high.  What is the solution?  

Troubleshooting

1. Confirm that the CPU is high on both the FHR and RP:  

------------- show processes top once -------------
 
top - 15:39:59 up 244 days,  7:12,  0 users,  load average: 1.44, 1.42, 1.44
Tasks: 353 total,   1 running, 352 sleeping,   0 stopped,   0 zombie
%Cpu(s):  6.6 us,  1.2 sy,  0.0 ni, 91.4 id,  0.0 wa,  0.4 hi,  0.4 si,  0.0 st
KiB Mem:   8171400 total,  7931960 used,   239440 free,   355096 buffers
KiB Swap:        0 total,        0 used,        0 free,  2661712 cached
 
  PID USER      PR  NI  VIRT  RES  SHR S  %CPU %MEM    TIME+  COMMAND
 3672 root      20   0  547m 131m 106m S  60.4  1.6 134:39.68 PimReg


2. Check COPP counters for drops towards the CPU 
#show cpu counter queue | nz
--------------------------------------------------------------------------------
                                 Linecard0/0
--------------------------------------------------------------------------------
Queue                                          Counter/pkts*          Drops/pkts
---------------                          ------------------- -------------------
Sflow                                             2958474981                   0
Other                                               54720792              333599
TTL1                                                  179880                6728
L3 Slow Path                                         1738536                3107
ARP                                               1751402615                   0
Glean                                             1242257317                2071
Multicast Miss                                       2229869               99390
IGMP                                               150295880                   0
Multicast LL                                       390181238                  55

 

3. Confirm on the FHR that the group is hashing to the proper RP:

show ip pim rp-hash  239.10.10.10
RP 9.9.9.9           <--- confirmed RP
  PIM v2 Hash Values:
  RP: 9.9.9.9
    Uptime: 14d16h, Expires: never, Priority: 0, HashMaskLen: 30, HashMaskValue: 561587137, Override: False
    Hash Algorithm: Default

 

4. Trace the route path back from RP to the FHR and ensure that each L3 interface has PIM-Sparse configured.

interface Ethernet5
   no switchport
   ip address 192.168.X.X/31
   pim ipv4 sparse-mode

 

 

If the issue is still seen, collect the below outputs and reach out to Arista TAC support by sending an email at support@arista.com

CLI commands:
show ip mroute (group ip)
show ip pim rp-hash (group ip)
​SW# show tech-support all | gzip > /mnt/flash/show-tech-$HOSTNAME-$(date +%m_%d.%H%M).log.gz 
Follow

Get every new post on this blog delivered to your Inbox.

Join other followers: