• Troubleshooting Egress Queue drops on 7280/7500 devices

 
 
Print Friendly, PDF & Email

Aggregate VoQ drops on 7280/7500 devices

On 7280/7500 devices, the platform architecture uses Virtual Output Queuing (VoQ) between the ingress and egress chips to forward known unicast traffic.

Whenever a packet is to be transmitted, the ingress chip requests for credit from the egress. Once the credits are issued/granted, the packet is dequeued to the egress chip. While the packets are awaiting the credit, they are enqueued on the ingress chip buffers, in the Virtual Output Queue (VoQ) for the corresponding egress port.

Accordingly, in the output of “show interfaces counters queue detail” on these devices, we see two sections:

switch#show interfaces counters queue detail
Aggregate VoQ Counters 
Egress                      Traffic               Pkts             Octets           DropPkts         DropOctets
Port                        Class
Et3/1/1                         TC0                  0                  0                  0                  0
Et3/1/1                         TC1                  0                  0                  0                  0
…
……
Egress Queue Counters
Port               Traffic   DropPrec   DestType        OutEnqPkts      OutEnqOctets       OutDropPkts     OutDropOctets
                   Class
Et4/1/1                TC0      DP0-3         UC                 0                 0                 0                 0
……

Drops seen in the “Aggregate VoQ Counters” are at the ingress chip VoQ for the respective egress port. Drops seen in the “Egress Queue Counters” take into account packets dropped at the egress chip for the respective egress port.

For known unicast traffic, in the event of congestion, we typically expect to see drops on the VoQ buffers. For troubleshooting congestion on these devices, please refer to the following article:
https://eos.arista.com/how-to-troubleshoot-congestion/

Egress Queue drops on 7280/7500 devices

On most 7280/7500 devices, by default, we use fabric/egress replication mode for forwarding the BUM (Broadcast, Unknown unicast, Multicast) traffic. In this mode, the credit mechanism for egress VoQs is not used. As the VoQ architecture is not used and we directly enqueue the packets to egress chip, any drops for BUM traffic will be counted in the “Egress Queue Counters” only. As such, in the event of high amounts of BUM traffic, it is more likely to cause egress queue contention, leading to potential packet loss.

switch#show interfaces counters queue detail
Aggregate VoQ Counters
Egress                     Traffic               Pkts             Octets           DropPkts         DropOctets
Port                       Class  
Et5                            TC6                153             155720                  0                  0
Et5                            TC7               4113             220989                  0                  0

Egress Queue Counters
Port                 Traffic   DropPrec   DestType        OutEnqPkts      OutEnqOctets       OutDropPkts     OutDropOctets
                     Class
Et5                      TC1      DP0-3         MC         664484064      909472606828            171704         235014904
Et5                      TC7      DP0-3         UC              4120            221353                 0                 0

Solution

In the event of reported egress drops, the most likely culprit for the same could be BUM traffic.
In such cases, if the amount of BUM traffic (such as multicast) is expected, one option is to change the replication mode to “ingress only”. As opposed to egress replication, where we rely on the fabric and egress chip to replicate the packet, Ingress only replication creates multiple copies of the packet for every interested egress port member on the ingress chip itself. The traffic is then enqueued on each VoQ of the individual egress ports, and sent over the fabric to the egress chip using the credit mechanism, once the requested credit is granted. This ensures that we do not overwhelm the egress chips with the traffic they are unable to handle. With ingress-only replication enabled, congestion drops for BUM traffic will also be reported on the Aggregate VoQ counters similar to known unicast traffic.

Configuration

To change the replication mode on the switch to ingress only, you can use the following command:

switch(config)#platform sand multicast replication default ingress

To revert back to egress replication, we can use the following command:

switch(config)#platform sand multicast replication default fabric-egress

Validation

We can validate the current replication mode on the switch using the following command:
(Note that “20” is the VLAN ID in this scenario)

switch...23:19:26#show platform fap multicast-chain 20
Jericho0.0
Configured global replication mode: ingress only (default)
MulticastId: 20 current replication mode: ingress only
Using unicast buffers
Ingress replication enabled in IRR_IRDB
Ingress Membership from: Ingress chain in IRR_MCDB
mcId Type QueueId/ IntfName outlif
SysPortId
20 SysPort 91 Ethernet9 20

Egress Membership from: Egress chain is empty
…

switch...23:21:44(config)#show platform fap multicast-chain 20
Jericho0.0
Configured global replication mode: fabric/egress
MulticastId: 20 current replication mode: fabric/egress
Port-channel load-balance: disabled
Using unicast buffers
Ingress replication enabled in IRR_IRDB
Ingress Membership from: Ingress chain in IRR_MCDB
mcId Type QueueId/ IntfName outlif
SysPortId
20 Queue 0 20

Mesh Replication Membership from: Fabric bitmap in FDT_IPT_MESH_MC
mcId Type
20 Core1

Egress Membership from: Egress chain is empty
…

Even after changing the replication mode, if you continue to see the Egress Queue drops increment, please get in touch with Arista Support for further investigation.

Additional References

https://eos.arista.com/eos-4-21-3f/multicast-ingress-replication-filter/

https://eos.arista.com/eos-4-22-1f/maximize-full-multicast-buffer-usage-on-jericho/ 

Follow

Get every new post on this blog delivered to your Inbox.

Join other followers: