• Troubleshooting based on Control Plane Policing (CoPP) for 7500R, 7280R, 7020R Platforms

 
 
Print Friendly, PDF & Email

Objective

This article is going to explain few important Copp queues of Arista platforms- 7280 series, 7020 series, 7500 series that are responsible for handling various control-plane packets and how to troubleshoot based on that. 

Copp or Control Plane Policing are various queues or buffers in Arista switches for handling various plane packets. Different Copp handles different kinds of control-plane packets and each has its own significance.

The control plane policing (CoPP) feature increases security on the switch by protecting the CPU from unnecessary or DOS traffic and giving priority to important control plane traffic.. It also segregates different Control-plane packets (based on their respective queues) thus helping the CPU to manage those packets better.

When should we perform Copp based troubleshooting ?

Copp queues should be checked for issues majorly related to- Packet drops, Latency, certain control-plane session (like BGP, OSPF, LACP, etc) is flapping or not coming up. As a good practice, it is always better to check the Copp as part of a general health check of the switch.

Command to be used:

Only command required for this is: “show cpu counters queue”.

The idea is to check if there are any drops in any of the Copp queue which may give a hint of what’s going on in the network and help in root causing the issue. If there is a drop in any of the Copp queue, it means that queue is congested due to large amounts of traffic (which would use that queue) and cannot accommodate any further packets of the same type.

Various Copp queues on these Platforms:

Certain Copp queue names are very direct and can be understood from the name. Below are such Copp queues:

CoppSystemLacp —> This queue is used by LACP packets.
CoppSystemLldp —-> This queue is used by LLDP packets.
CoppSystemBgp —-> This queue is used by BGP packets.
CoppSystemBfd —-> This queue is used by BFD packets.
CoppSystemBpdu —> This queue is used by STP (BPDU’s)
CoppSystemOspfIsis —> This queue is used by OSPF and ISIS packets

If you see drops in any of the above such queue it simply means that queue is congested and will cause issues in the respective protocol. For example, if you see drops in “CoppSystemBgp”, it means there is a large number of BGP packets that are ingressing the switch and congesting the queue. In that case, legitimate BGP packets (hello, updates) may get dropped due to the congestion in the CoppSystemBgp queue. It must be checked why there is an abnormally large number of packets entering the queue.

Now let’s discuss some Copp queues which are not straightforward as the earlier ones:

1. CoppSystemL3DstMiss –

Consider you have a directly connected network-

This is how the above route would look like in Hardware (platform specific) :

#show platform fap ip route

CoppSystemL3DstMiss is the next-hop for the subnet 10.10.10.0/24.
Now consider the switch received a packet which has destination IP- 10.10.10.1 . The switch will first check whether it has an ARP entry in the hardware for 10.10.10.1 or not.
ARP entry in HW would look like this- 10.10.10.1/32 in “show platform fap ip route”

#show platform fap ip route

If there is no ARP entry, then the packet will hit the 10.10.10.0/24 route which has a next-hop as CoppSystemL3DstMiss, Packet will then be sent to CPU (via the CoppSystemL3DstMiss queue) for ARP resolution.

So in conclusion– CoppSystemL3DstMiss is used for sending data packets to CPU for ARP resolution of the destination IP.

Q) What would it mean if we see drops in this queue?

A) It means the switch is receiving packets and the route for the destination IP of those packets is a directly connected subnet . However, ARP resolution is failing for those packets and hence all those packets end up going to the CPU. It can happen if there is traffic for random hosts in the Vlan which doesn’t exist (or they are down) and hence they don’t respond with the ARP reply.
Taking the above example for explanation- If we send packets for a host- 10.10.10.2 (route is via vlan10) and the host is not active, packet will reach the CPU for ARP resolution. But since there will be no ARP response, traffic will keep on going to CPU and will congest the CoppSystemL3DstMiss queue and other application traffic, which requires ARP resolution, may get dropped causing traffic loss.

Q) Is it possible to check what traffic is going to CPU for ARP resolution when queue is being utilised ?

A) Yes. For that you need to find the incoming interface where the packets are received which you can check in the output of :

“show cpu counters queue | grep -i coppsysteml3dstmiss”

In the above output you can see drops are incrementing for the CoppSystemL3DstMiss and the associated interface is Et2/4 . This is the interface where traffic is incoming and going to CPU for ARP resolution.

Now, you can simply take a tcpdump on this interface to see the traffic :

“tcpdump interface ethernet2/4”

The reason why we will see data packets on Eth2/4 in tcpdump is because they are all going to CPU for ARP resolution. Normally, we won’t see it (if ARP is resolved).

You can also see the ARP that is generated by the switch :

Another thing to check if the drop counter for this queue is incrementing is whether the “vxlan-routing” tcam profile configured or not when Vxlan routing is being used. It should not be default. This is the expected output:

Command to configure it- “hardware tcam profile vxlan-routing”

If you are using Vxlan-routing and the tcam profile is not “vxlan-routing” then ARP entries will not be programmed in Hardware and traffic will end up going to CPU. That will definitely cause latency in the network.

If EVPN is used to exchange mac-ip or ip-prefix routes between VTEPs and if the tcam profile (vxlan-routing) is not configured, it will lead to 100% loss as none of the EVPN routes will program in Hardware.

Note- This is ONLY for 7280R/7280R2 and 7500R/7500R2 platforms

2. CoppSystemL2Ucast –

If you are facing an issue where ARP for a host is not resolving, then you should probably check that there is no drop in this queue. This queue accepts all L2 Unicast packets (ARP reply) which has destination mac address as that of the switch’s system mac address or virtual mac.
“ARP reply” packets are typically seen in this queue and this queue should not be congested to allow all ARP reply to reach the CPU.

3. CoppSystemIpUcast –

If you are facing an issue where you cannot ping/SSH/telnet an IP that is configured on an Arista Switch, then you should probably check for this queue to see if there are any drops. Any packet that is received on the switch which has a destination IP as the IP configured on the switch (it could be any IP- SVI, Loopback, interface) , then this queue is used to send the packet to CPU. All packets destined to the switch will go to CPU for further processing.

Example- Below is a directly connected IP:

The route in Hardware:

#show platform fap ip route

So if the switch receives any packet with destination IP- 100.100.100.2, it will go to CPU via CoppSystemIpUcast

4. CoppSystemL2Bcast –

This queue is used to trap all Layer2 broadcast packets to CPU (which includes ARP request as well). So if the queue is oversubscribed, then ARP requests sent by hosts to it’s gateway (which is the Arista switch where CoppSystemL2Bcast is congested) may get dropped and hosts may not be able to resolve the ARP of it’s gateway.

5. CoppSystemL3LpmOver-

There can be two situation:

Either your switch can have a default route, something like this:

Gateway of last resort:
S 0.0.0.0/0 [1/0] via 100.100.100.1, Ethernet3/1

In the above case, this is how the default route would like in Hardware:

In the above output, next-hop is Eth3/1.

Or you won’t have a default route:

In that situation, the switch will program it’s own default route in Hardware where the next-hop would be “CoppSystemL3LpmOver”:

The reason to program this default route in Hardware is so that if the switch receives a packet, for which there is no route at all in Hardware (including the default route which would have a valid next-hop as some physical interface), then the switch will use it’s own system defined default route (next-hop as CoppSystemL3LpmOver) so that the packet is sent to CPU for software routing. The reason why it sends the packet to the CPU is because it may be possible that some route is not programmed in Hardware ( say- Hardware capacity is full), but the route should be present in software and hence by sending it to software, it can avoid the packet from getting dropped.

Q) What would it mean if we see drops in this queue?

A) It would mean that the switch is receiving packets for which there is no route in Hardware and hence Hardware is sending those packets to CPU, assuming that the route might be present in software and get routed. It must be checked all routes are programmed in Hardware or not if there are drops in this queue. Also, it must be checked whether the hardware table (LEM and LPM) is full or not in which case certain routes may not get programmed in Hardware. 

Another point to note– If there is latency in the network, then it must be checked if there are any incrementing drops in this queue or not along with CoppSystemL3DstMiss.

If there is Latency (or intermittent drops) in Network

  • Check for drops in “CoppSystemL3DstMiss” : If there are drops which are incrementing, then check the section covering “CoppSystemL3DstMiss” in this article for further details.
  • Check for drops in “CoppSystemL3LpmOver”: If there are drops which are incrementing, then check the section covering “CoppSystemL3LpmOver” in this article for further details.
  • To check Copp drops, command is- “show cpu counters queue | nz”
Follow

Get every new post on this blog delivered to your Inbox.

Join other followers: