Note: This article is part of the Introduction to Managing EOS Devices series:
- 3) Troubleshooting
- 3.1) Event Monitor
- 3.2) Using TCPDump to Monitor Control Plane Traffic
- 3.3) Tracing Processes with EOS
- 3.4) Log Collection
- 3.5) Show tech-support
- 3.6) Platform Specific (7150S) – Using TCPDump to Monitor Data-Plane Traffic
The following monitoring tools provide information on Arista EOS for all platforms:
3.1) Event Monitor
Event Monitor is part of a suite of tools called Advanced Event Management (AEM). The goal of AEM is to improve both reactive and proactive management functions, enabling the network to scale while maintaining visibility of it’s various components.
Event Monitoring moves away from traditional “point in time” monitoring, by collecting and storing critical information in a local database regarding ARP, MAC and Route changes. All of which can be queried either via show commands, or directly via SQLite. Event Monitoring enables a network manager to literally go back in time and replay network changes.
Event Monitor is enabled by default on all EOS devices.
switch#show event-monitor ? arp Monitor ARP table events igmpsnooping Monitor IGMP snooping table events mac Monitor MAC table events mroute Monitor mroute table events route Monitor routing events sqlite enter a sqlite statement switch#show event-monitor route 2014-06-19 20:35:44|127.0.0.0/8|kernel|0|1|added|0 2014-06-19 20:35:44|0.0.0.0/8|kernel|0|1|added|1 2014-06-19 20:35:44|192.168.0.0/32|receiveBcast|0|1|added|2 2014-06-19 20:35:44|127.0.0.1/32|kernel|0|1|added|3 2014-06-19 20:35:44|192.168.3.255/32|receiveBcast|0|1|added|4 2014-06-19 20:35:44|192.168.1.217/32|receive|0|1|added|5 2014-06-19 20:35:44|192.168.0.0/22|connected|1|0|added|6 switch#show event-monitor sqlite select * from route WHERE route.time='2014-06-19 20:50:49'; 2014-06-19 20:50:49|184.108.40.206/32||||removed|17 2014-06-19 20:50:49|220.127.116.11/32||||removed|18 2014-06-19 20:50:49|18.104.22.168/32||||removed|19 2014-06-19 20:50:49|22.214.171.124/24||||removed|20
3.2) Using TCPDump to Monitor Control Plane Traffic
The Linux TCPDump utility is packaged with EOS allowing fast and efficient monitoring of control plane or CPU bound traffic. TCPDump provides ready access to L2/3 protocols and any other traffic destined for the switch itself without the need to SPAN interfaces.
TCPDump is supported natively from the bash shell or from EOS CLI (version 4.10 onwards).
Before running TCPDump it is important to identify the interface in relation to which type of traffic you want to capture:
|Interface Type||TCPDump will capture|
|L2 Standalone Interface||L2 Generated packets; LLDP, STP etc.|
|L2 Port-channel Interface||L2 Port-channel global packets, STP etc.|
|L2 Port-channel Member||L2 Member interface specific packets; LACP, LLDP|
|L3 Interface (Routed port or SVI)||L3 Generated traffic, ICMP, OSPF Hellos etc.|
Note: Packets such as STP which are relevant to the whole port-channel would not be seen on a TCPDump of a member interface.
3.2.1) Running TCPDump natively in EOS
The utility is executed using the native EOS command ‘TCPDump’, alongside a mandatory interface argument, then optional arguments such as a capture filter or writing to a file.
Note : TCPDump will run with -e (capture Ethernet headers) by default.
For example, to run a capture on interface ma1 for LLDP frames the following command would be used.
7150S#tcpdump interface Management1 filter ether proto 0x88cc tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on ma1, link-type EN10MB (Ethernet), capture size 65535 bytes 11:33:47.750573 00:1c:73:00:44:d5 (oui Arista Networks) > 01:80:c2:00:00:0e (oui Unknown), ethertype LLDP (0x88cc), length 187: LLDP, length 173: s7151.lab.local
Note : The full interface name (including case) must be used to set the source. The filter argument refers to a capture-filter, so display-filter arguments will not be accepted.
3.2.2) Running TCPDump from Bash
To TCPDump control-plane traffic off an interface, first find out the Linux name for the interface (note, L2, L3 and Management interfaces are listed individually):
switch#bash ifconfig et1 Link encap:Ethernet HWaddr 00:1C:73:00:44:D6 UP BROADCAST MULTICAST MTU:9214 Metric:1 RX packets:0 errors:0 dropped:0 overruns:0 frame:0 TX packets:0 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:0 (0.0 b) TX bytes:0 (0.0 b) et2 Link encap:Ethernet HWaddr 00:1C:73:00:44:D6 UP BROADCAST MULTICAST MTU:9214 Metric:1 RX packets:0 errors:0 dropped:0 overruns:0 frame:0 TX packets:0 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:0 (0.0 b) TX bytes:0 (0.0 b) [...] ma1 Link encap:Ethernet HWaddr 00:1C:73:00:44:D5 inet addr:192.168.1.202 Bcast:255.255.255.255 Mask:255.255.252.0 UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:2658 errors:0 dropped:0 overruns:0 frame:0 TX packets:1579 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:394599 (385.3 KiB) TX bytes:322439 (314.8 KiB) Interrupt:21
Next run the utility passing the required interface and optionally a standard filter along with any other advanced arguments:
switch#bash tcpdump -i et11 stp tcpdump: WARNING: et11: no IPv4 address assigned tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on et11, link-type EN10MB (Ethernet), capture size 65535 bytes 11:55:39.244615 00:1c:73:00:44:e1 (oui Arista Networks) > 01:80:c2:00:00:00 (oui Unknown), 802.3, length 119: LLC, dsap STP (0x42) Individual, ssap STP (0x42) Command, ctrl 0x03: STP 802.1s, Rapid STP, CIST Flags [Proposal, Agreement], length 102
3.3) Tracing Processes with EOS
EOS provides operators with extensive troubleshooting tools to help debug control plane and protocol layer interactions through built-in tracing, optionally delivering live debug output to the CLI. To configure tracing, first review the available agent processes:
switch#show agent names Aaa Acl Adt7462 Adt7483 Adt7483-system AgentMonitor AltaLanz Arp Bfd Capi Cdp [...]
Having selected an agent to trace, review the available trace facilities for that process:
switch#show trace Rib | grep Ospf* Trace facility settings for agent Rib is ----------------------------------------------- Rib::Ospf enabled ............ Rib::Ospf3 enabled ............
By default all logging generated by the tracing facilities will be sent to the log file of agent we are tracing (/var/log/agents/<AgentName><ProcessID>) for example /var/log/agents/Rib-1527. The agent log files incorporate an auto log rotate function, which protects against excessive consumption of memory. This is the recommended way to execute tracing functions from 4.11.1 onwards.
If however it is desired to keep the tracing outputs and agent logs separate we can nominate a temporary file to store the tracing outputs (on a per agent basis in /tmp). This file will not auto log rotate, making it useful for extended tracing that would otherwise fill the agent log.
switch(config)#trace Rib filename OSPF.trace
The above file is stored in RAM, so will not persist following a reload. If the output contains data which should be referred back to later, it would be advisable to either copy it to flash, or to an external tftp/ftp/scp server. It is also advisable to delete the original copy from memory.
switch#bash cp /tmp/OSPF.trace /mnt/flash/OSPF.trace switch#rm /tmp/OSPF.trace
NOTE: If tracing to a nominated location, once tracing has been completed, please ensure to disable all traces, otherwise they will continue to log to the nominated file, and will continue to consume memory.
Finally, enable tracing for each required facility (or * for all facilities) and select the level. For common troubleshooting purposes, the first 3 or 4 levels should suffice (e.g. 0 to 3). For very deep details, you may choose “all”
switch(config)#trace rib enable Rib::Ospf1::Hello levels 0 1 2 3
Once active either run ‘trace monitor’ to output live process trace information to the CLI: Or for larger captures simply use ‘bash more /var/log/agents/<agent><pid>’ or ‘bash more /tmp/<selected filename>’. This enables you to use Linux filters on the output file.
switch#bash tail -n 30 /var/log/agents/Rib-2001 21:22:50.548829 OSPF RECV: 10.0.0.2 -> 126.96.36.199: Version 2, Type Hello (1), Length 44 ret 0 21:22:50.548907 Router ID 10.0.0.2, Area 0.0.0.0, Authentication <None> (0) 21:22:50.548933 Authentication data: 00000000 00000000 21:22:50.548960 Mask 255.255.255.128, Options <E> (2), Priority 1, Neighbours 0 21:22:50.548985 Intervals: Hello 10s, Dead Router 40s, Designated Router 10.0.0.2, Backup 0.0.0.0 21:22:50.549195 OSPF: invalid HELLO packet from 10.0.0.2: Invalid Mask (9)
3.4) Log Collection
On occasion it may be necessary to collect the contents of the agent logs for TAC, the simplest way to group all the logs together onto the flash is:
switch#bash cat /var/log/agents/* > /mnt/flash/agents.log switch#dir flash:agents.log Directory of flash:/agents.log -rwx 79896 Mar 18 11:26 agents.log 1761558528 bytes total (248496128 bytes free)
Exactly as with regular CLI commands, shell commands may be added to aliases for easy repetition:
switch(config)#alias getlogs bash cat /var/log/agents/* > /mnt/flash/aliasagents.log switch#getlogs
switch#dir flash:aliasagents.log Directory of flash:/agents2.log -rwx 80372 Mar 18 11:28 aliasagents.log 1761558528 bytes total (248414208 bytes free)
An example script for automating log collection can be found on EOS Central :
3.5) Show tech-support
For non-interactive capture, avoiding prompts of pressing a key to scroll down, you may either set “terminal length 0” (infinite) or use “show tech-support | no-more”.
3.6) Platform Specific (7150S) – Using TCPDump to Monitor Data-Plane Traffic
3.6.1)Configuring mirroring to the CPU
The Advanced Mirroring functionality on the 7150 series switches provides the ability to mirror to the CPU some data-plane traffic, whose internal path would normally never cross the control-plane, since it is forwarded in hardware. Such data-plane traffic is exposed in the control-plane through an interface mirror, which can be listened to by the software of your choice; for example TCPdump.
To enable mirroring to the CPU, you must specify cpu as destination in you session; for example:
7150(config)#monitor session test-session source Et2 7150(config)#monitor session test-session destination cpu
The control-plane is protected with CoPP, therefore an overloading mirroring session towards to CPU would only result in lost mirrored packet. To ensure you do not miss packets visibility, filter only the interesting traffic with an ACL applied to your mirroring sessions.
7150S(config)#ip access-list ACL-MIRROR-TO-CPU 7150S(config-acl-ACL-MIRROR-TO-CPU)#permit tcp any 10.0.1.0/24 eq www ssh https 7150S(config)#monitor session MIRROR-CPU ip access-group ACL-MIRROR-TO-CPU
Verify your mirroring settings
7150S#show monitor session Session MIRROR-CPU ------------------------ Source Ports: Both: Et2(Acl:ACL-MIRROR-TO-CPU) ← Mirror ACL granularity is per source interface Destination Ports: Cpu : active (mirror0) ← inf mirror X (where X=[0-3]) can be used in kernel bash Et1 : active ← CPU as a destination can coexist along Eth or Po destinations ip access-group: ACL-MIRROR-TO-CPU
You may capture traffic directly from the EOS CLI, or from the kernel Bash. The following examples employ TCPdump, but from the kernel you could run potentially any application of your choice.
3.6.2) Running TCPDump for data-plane traffic natively in EOS
EOS TCPdump was detailed in previous section. While it can be used for control-planed traffic on any interface, the data-plane traffic employs the mirroring / monitor session :
7150S#tcpdump ? file Set the output file filecount Specify the number of output files filter Set the filtering expression interface Select an interface to monitor (default=fabric) max-file-size Specify the maximum size of output file monitor Select a monitor session packet-count Limit number of packets to capture queue-monitor Monitor queue length size Set the maximum number of bytes to dump per packet verbose Enable verbose mode
This TCPdump will run on the mirroring/monitor session previously configured. You may use auto-complete for the session name.
7150S#tcpdump monitor M? ← contextual list of the configured session. Press TAB to auto-complete MIRROR-CPU WORD 7150S#tcpdump monitor MIRROR-CPU 23:20:30.666829 00:50:56:99:fe:47 (oui Unknown) > 00:1c:73:85:bd:61 (oui Arista Networks), ethertype 802.1Q (0x8100), length 152: vlan 101, p 0, ethertype IPv4, 10.10.101.201.58504 > 10.10.200.101.4789: VXLAN, flags [I] (0x08), vni 10003 00:50:56:99:11:19 (oui Unknown) > 00:50:56:99:77:52 (oui Unknown), ethertype IPv4 (0x0800), length 98: 192.168.1.100 > 192.168.1.200: ICMP echo reply, id 45575, seq 6, length 64
The above TCPdump output presents traffic between host A and host B, not destined to the switch’s control-plane, purely forwarded in hardware by the network processor. The traffic was mirrored in hardware, forwarded towards the CPU, and exposed to the software.
This is an extremely fast and convenient way to troubleshoot.
Note: the amount of mirroring traffic from the data-plane to the control-plane is restricted by CoPP to 400Mb/s by default. This can be changed if required, with considerations to the potential load on internal links and CPU. Refer to CoPP configuration for more details. It is recommended to apply ACLs to filter interesting traffic.
3.6.3) Running TCPDump for data-plane traffic from Bash
To TCPDump data-plane traffic form Bash, first assess through which kernel interface is the mirroring traffic being expose. It would be either mirror0, mirror1, mirror2, or mirror3.
The command “show monitor session” provides this information:
7150S#show monitor session Session MIRROR-CPU ------------------------ [...] Destination Ports: Cpu : active (mirror0)
Next, run TCPdump listening to this interface:
7150S#bash tcpdump -i mirror0 tcpdump: WARNING: mirror0: no IPv4 address assigned tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on mirror0, link-type EN10MB (Ethernet), capture size 65535 bytes 23:20:30.666829 00:50:56:99:fe:47 (oui Unknown) > 00:1c:73:85:bd:61 (oui Arista Networks), ethertype 802.1Q (0x8100), length 152: vlan 101, p 0, ethertype IPv4, 10.10.101.201.58504 > 10.10.200.101.4789: VXLAN, flags [I] (0x08), vni 10003 00:50:56:99:11:19 (oui Unknown) > 00:50:56:99:77:52 (oui Unknown), ethertype IPv4 (0x0800), length 98: 192.168.1.100 > 192.168.1.200: ICMP echo reply, id 45575, seq 6, length 64