• Introduction to Managing EOS Devices – Troubleshooting

 
 
Print Friendly, PDF & Email

Note: This article is part of the Introduction to Managing EOS Devices series:

https://eos.arista.com/introduction-to-managing-eos-devices/ 

 

 

3) Troubleshooting

The following monitoring tools provide information on Arista EOS for all platforms:

  • Event-Monitor
  • Control-plane TCPdump
  • Tracing (debug)
  • Show Tech Support
  • Log consolidation
Platform specific:

  • Data-plane TCPdump (7150S series)

 

 

3.1) Event Monitor

Event Monitor is part of a suite of tools called Advanced Event Management (AEM). The goal of AEM is to improve both reactive and proactive management functions, enabling the network to scale while maintaining visibility of it’s various components.

Event Monitoring moves away from traditional “point in time” monitoring, by collecting and storing critical information in a local database regarding ARP, MAC and Route changes.  All of which can be queried either via show commands, or directly via SQLite.  Event Monitoring enables a network manager to literally go back in time and replay network changes.

Event Monitor is enabled by default on all EOS devices.

switch#show event-monitor ?
 arp           Monitor ARP table events
 igmpsnooping  Monitor IGMP snooping table events
 mac           Monitor MAC table events
 mroute        Monitor mroute table events
 route         Monitor routing events
 sqlite        enter a sqlite statement


switch#show event-monitor route
2014-06-19 20:35:44|127.0.0.0/8|kernel|0|1|added|0
2014-06-19 20:35:44|0.0.0.0/8|kernel|0|1|added|1
2014-06-19 20:35:44|192.168.0.0/32|receiveBcast|0|1|added|2
2014-06-19 20:35:44|127.0.0.1/32|kernel|0|1|added|3
2014-06-19 20:35:44|192.168.3.255/32|receiveBcast|0|1|added|4
2014-06-19 20:35:44|192.168.1.217/32|receive|0|1|added|5
2014-06-19 20:35:44|192.168.0.0/22|connected|1|0|added|6


switch#show event-monitor sqlite select * from route WHERE route.time='2014-06-19 20:50:49';
2014-06-19 20:50:49|100.0.0.0/32||||removed|17
2014-06-19 20:50:49|100.0.0.1/32||||removed|18
2014-06-19 20:50:49|100.0.0.255/32||||removed|19
2014-06-19 20:50:49|100.0.0.0/24||||removed|20

 

3.2) Using TCPDump to Monitor Control Plane Traffic

The Linux TCPDump utility is packaged with EOS allowing fast and efficient monitoring of control plane or CPU bound traffic. TCPDump provides ready access to L2/3 protocols and any other traffic destined for the switch itself without the need to SPAN interfaces.

TCPDump is supported natively from the bash shell or from EOS CLI (version 4.10 onwards).

Before running TCPDump it is important to identify the interface in relation to which type of traffic you want to capture:

Interface Type TCPDump will capture
L2 Standalone Interface L2 Generated packets; LLDP, STP etc.
L2 Port-channel Interface L2 Port-channel global packets, STP etc.
L2 Port-channel Member L2 Member interface specific packets; LACP, LLDP
L3 Interface (Routed port or SVI) L3 Generated traffic, ICMP, OSPF Hellos etc.

 

Note: Packets such as STP which are relevant to the whole port-channel would not be seen on a TCPDump of a member interface.

 

3.2.1) Running TCPDump natively in EOS

The utility is executed using the native EOS command ‘TCPDump’, alongside a mandatory interface argument, then optional arguments such as a capture filter or writing to a file.

Note : TCPDump will run with -e (capture Ethernet headers) by default.

For example, to run a capture on interface ma1 for LLDP frames the following command would be used.

7150S#tcpdump interface Management1 filter ether proto 0x88cc
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on ma1, link-type EN10MB (Ethernet), capture size 65535 bytes

11:33:47.750573 00:1c:73:00:44:d5 (oui Arista Networks) > 01:80:c2:00:00:0e (oui Unknown), ethertype LLDP (0x88cc), length 187: LLDP, length 173: s7151.lab.local

Note : The full interface name (including case) must be used to set the source.  The filter argument refers to a capture-filter, so display-filter arguments will not be accepted.

 

3.2.2) Running TCPDump from Bash

To TCPDump control-plane traffic off an interface, first find out the Linux name for the interface (note, L2, L3 and Management interfaces are listed individually):

switch#bash ifconfig

et1       Link encap:Ethernet  HWaddr 00:1C:73:00:44:D6
         UP BROADCAST MULTICAST  MTU:9214  Metric:1
         RX packets:0 errors:0 dropped:0 overruns:0 frame:0
         TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
         collisions:0 txqueuelen:1000
         RX bytes:0 (0.0 b)  TX bytes:0 (0.0 b)

et2       Link encap:Ethernet  HWaddr 00:1C:73:00:44:D6
         UP BROADCAST MULTICAST  MTU:9214  Metric:1
         RX packets:0 errors:0 dropped:0 overruns:0 frame:0
         TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
         collisions:0 txqueuelen:1000
         RX bytes:0 (0.0 b)  TX bytes:0 (0.0 b)
[...]

ma1       Link encap:Ethernet  HWaddr 00:1C:73:00:44:D5
         inet addr:192.168.1.202  Bcast:255.255.255.255  Mask:255.255.252.0
         UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
         RX packets:2658 errors:0 dropped:0 overruns:0 frame:0
         TX packets:1579 errors:0 dropped:0 overruns:0 carrier:0
         collisions:0 txqueuelen:1000
         RX bytes:394599 (385.3 KiB)  TX bytes:322439 (314.8 KiB)
         Interrupt:21

Next run the utility passing the required interface and optionally a standard filter along with any other advanced arguments:

switch#bash tcpdump -i et11 stp
tcpdump: WARNING: et11: no IPv4 address assigned
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on et11, link-type EN10MB (Ethernet), capture size 65535 bytes

11:55:39.244615 00:1c:73:00:44:e1 (oui Arista Networks) > 01:80:c2:00:00:00 (oui Unknown), 802.3, length 119: LLC, dsap STP (0x42) Individual, ssap STP (0x42) Command, ctrl 0x03: STP 802.1s, Rapid STP, CIST Flags [Proposal, Agreement], length 102

 

3.3) Tracing Processes with EOS

EOS provides operators with extensive troubleshooting tools to help debug control plane and protocol layer interactions through built-in tracing, optionally delivering live debug output to the CLI. To configure tracing, first review the available agent processes:

switch#show agent names
Aaa
Acl
Adt7462
Adt7483
Adt7483-system
AgentMonitor
AltaLanz
Arp
Bfd
Capi
Cdp
[...]

 

Having selected an agent to trace, review the available trace facilities for that process:

switch#show trace Rib | grep Ospf*
Trace facility settings for agent Rib is
-----------------------------------------------
Rib::Ospf            enabled  ............
Rib::Ospf3           enabled  ............

 

By default all logging generated by the tracing facilities will be sent to the log file of agent we are tracing (/var/log/agents/<AgentName><ProcessID>) for example /var/log/agents/Rib-1527. The agent log files incorporate an auto log rotate function, which protects against excessive consumption of memory. This is the recommended way to execute tracing functions from 4.11.1 onwards.

If however it is desired to keep the tracing outputs and agent logs separate we can nominate a temporary file to store the tracing outputs (on a per agent basis in /tmp).  This file will not auto log rotate, making it useful for extended tracing that would otherwise fill the agent log.

switch(config)#trace Rib filename OSPF.trace

The above file is stored in RAM, so will not persist following a reload.  If the output contains data which should be referred back to later, it would be advisable to either copy it to flash, or to an external tftp/ftp/scp server.  It is also advisable to delete the original copy from memory.

switch#bash cp /tmp/OSPF.trace /mnt/flash/OSPF.trace
switch#rm /tmp/OSPF.trace

 

NOTE: If tracing to a nominated location, once tracing has been completed, please ensure to disable all traces, otherwise they will continue to log to the nominated file, and will continue to consume memory.

Finally, enable tracing for each required facility (or * for all facilities) and select the level. For common troubleshooting purposes, the first 3 or 4 levels should suffice (e.g. 0 to 3). For very deep details, you may choose “all”

switch(config)#trace rib enable Rib::Ospf1::Hello levels 0 1 2 3

Once active either run ‘trace monitor’ to output live process trace information to the CLI:  Or for larger captures simply use ‘bash more /var/log/agents/<agent><pid>’ or bash more /tmp/<selected filename>’.  This enables you to use Linux filters on the output file.

switch#bash tail -n 30 /var/log/agents/Rib-2001
21:22:50.548829 OSPF RECV: 10.0.0.2 -> 224.0.0.5: Version 2, Type Hello (1), Length 44 ret 0
21:22:50.548907   Router ID 10.0.0.2, Area 0.0.0.0, Authentication <None> (0)
21:22:50.548933   Authentication data: 00000000 00000000
21:22:50.548960   Mask 255.255.255.128, Options <E> (2), Priority 1, Neighbours 0
21:22:50.548985   Intervals: Hello 10s, Dead Router 40s, Designated Router 10.0.0.2, Backup 0.0.0.0
21:22:50.549195 OSPF: invalid HELLO packet from 10.0.0.2: Invalid Mask  (9)

 

3.4) Log Collection

On occasion it may be necessary to collect the contents of the agent logs for TAC, the simplest way to group all the logs together onto the flash is:

switch#bash cat /var/log/agents/* > /mnt/flash/agents.log
switch#dir flash:agents.log
Directory of flash:/agents.log

      -rwx       79896           Mar 18 11:26  agents.log

1761558528 bytes total (248496128 bytes free)

 

Exactly as with regular CLI commands, shell commands may be added to aliases for easy repetition:

switch(config)#alias getlogs bash cat /var/log/agents/* > /mnt/flash/aliasagents.log
switch#getlogs

 

Verification:

switch#dir flash:aliasagents.log
Directory of flash:/agents2.log

      -rwx       80372           Mar 18 11:28  aliasagents.log

1761558528 bytes total (248414208 bytes free)

 

An example script for automating log collection can be found on EOS Central :

https://eos.arista.com/wiki/index.php/EOSTroubleshooting:logGrab

 

3.5) Show tech-support

For non-interactive capture, avoiding prompts of pressing a key to scroll down, you may either set “terminal length 0” (infinite) or use  “show tech-support | no-more”.

 

3.6) Platform Specific (7150S) – Using TCPDump to Monitor Data-Plane Traffic

3.6.1)Configuring mirroring to the CPU

The Advanced Mirroring functionality on the 7150 series switches provides the ability to mirror to the CPU some data-plane traffic, whose internal path would normally never cross the control-plane, since it is forwarded in hardware. Such data-plane traffic is exposed in the control-plane through an interface mirror, which can be listened to by the software of your choice; for example TCPdump.

 

To enable mirroring to the CPU, you must specify cpu as destination in you session; for example:

7150(config)#monitor session test-session source Et2
7150(config)#monitor session test-session destination cpu

 

The control-plane is protected with CoPP, therefore an overloading mirroring session towards to CPU would only result in lost mirrored packet. To ensure you do not miss packets visibility, filter only the interesting traffic with an ACL applied to your mirroring sessions.

7150S(config)#ip access-list ACL-MIRROR-TO-CPU
7150S(config-acl-ACL-MIRROR-TO-CPU)#permit tcp any 10.0.1.0/24 eq www ssh https
7150S(config)#monitor session MIRROR-CPU ip access-group ACL-MIRROR-TO-CPU

 

Verify your mirroring settings

7150S#show monitor session
Session MIRROR-CPU
------------------------
Source Ports:
 Both:        Et2(Acl:ACL-MIRROR-TO-CPU)   ← Mirror ACL granularity is per source interface

Destination Ports:               
   Cpu :  active (mirror0)   ← inf mirror X (where X=[0-3]) can be used in kernel bash
   Et1 :  active             ← CPU as a destination can coexist along Eth or Po destinations

ip access-group: ACL-MIRROR-TO-CPU

 

You may capture traffic directly from the EOS CLI, or from the kernel Bash. The following examples employ TCPdump, but from the kernel you could run potentially any application of your choice.

 

3.6.2) Running TCPDump for data-plane traffic natively in EOS

EOS TCPdump was detailed in previous section. While it can be used for control-planed traffic on any interface, the data-plane traffic employs the mirroring / monitor session :

7150S#tcpdump ?
 file           Set the output file
 filecount      Specify the number of output files
 filter         Set the filtering expression
 interface      Select an interface to monitor (default=fabric)
 max-file-size  Specify the maximum size of output file
 monitor        Select a monitor session
 packet-count   Limit number of packets to capture
 queue-monitor  Monitor queue length
 size           Set the maximum number of bytes to dump per packet
 verbose        Enable verbose mode

 

This TCPdump will run on the mirroring/monitor session previously configured. You may use auto-complete for the session name.

7150S#tcpdump monitor M?    ← contextual list of the configured session. Press TAB to auto-complete
MIRROR-CPU  WORD

7150S#tcpdump monitor MIRROR-CPU

23:20:30.666829 00:50:56:99:fe:47 (oui Unknown) > 00:1c:73:85:bd:61 (oui Arista Networks), ethertype 802.1Q (0x8100), length 152: vlan 101, p 0, ethertype IPv4, 10.10.101.201.58504 > 10.10.200.101.4789: VXLAN, flags [I] (0x08), vni 10003
00:50:56:99:11:19 (oui Unknown) > 00:50:56:99:77:52 (oui Unknown), ethertype IPv4 (0x0800), length 98: 192.168.1.100 > 192.168.1.200: ICMP echo reply, id 45575, seq 6, length 64

 

The above TCPdump output presents traffic between host A and host B, not destined to the switch’s control-plane, purely forwarded in hardware by the network processor. The traffic was mirrored in hardware, forwarded towards the CPU, and exposed to the software.

This is an extremely fast and convenient way to troubleshoot.

Note: the amount of mirroring traffic from the data-plane to the control-plane is restricted by CoPP to 400Mb/s by default. This can be changed if required, with considerations to the potential load on internal links and CPU. Refer to CoPP configuration for more details. It is recommended to apply ACLs to filter interesting traffic.

 

3.6.3) Running TCPDump for data-plane traffic from Bash

To TCPDump data-plane traffic form Bash, first assess through which kernel interface is the mirroring traffic being expose. It would be either mirror0, mirror1, mirror2, or mirror3.

The command “show monitor session” provides this information:

7150S#show monitor session
Session MIRROR-CPU
------------------------
[...]

Destination Ports:               
   Cpu :  active (mirror0)

 

Next, run TCPdump listening to this interface:

7150S#bash tcpdump -i mirror0
tcpdump: WARNING: mirror0: no IPv4 address assigned
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on mirror0, link-type EN10MB (Ethernet), capture size 65535 bytes

23:20:30.666829 00:50:56:99:fe:47 (oui Unknown) > 00:1c:73:85:bd:61 (oui Arista Networks), ethertype 802.1Q (0x8100), length 152: vlan 101, p 0, ethertype IPv4, 10.10.101.201.58504 > 10.10.200.101.4789: VXLAN, flags [I] (0x08), vni 10003

00:50:56:99:11:19 (oui Unknown) > 00:50:56:99:77:52 (oui Unknown), ethertype IPv4 (0x0800), length 98: 192.168.1.100 > 192.168.1.200: ICMP echo reply, id 45575, seq 6, length 64

 

 

Follow

Get every new post on this blog delivered to your Inbox.

Join other followers: