Contents
Introduction
Arista Latency Analyzer, or LANZ, is a technology that tracks and logs buffer congestion and latency in real time. The visibility provided by LANZ of network hot-spots and microburst oversubscription gives the network operator greater insight into when problems are occurring on the network and why. With LANZ you will know when congestion happened, track the sources of congestion, and be able to export real-time events to external applications. LANZ also shows the effect of packet buffering on an application as well as monitors and records packet drops during network congestion. It is an invaluable tool which allows proactive monitoring and visibility into a network rather than the reactive approach of looking for dropped packets after slowness in the application or overall network has been reported.
LANZ operates by setting threshold values on the interface and global buffer pools and then generates records for the start and end events causing those threshold values to be exceeded. Update records are also generated when buffer use exceeds those thresholds for a prolonged period of time. Those records can then be seen through a series of show commands on the CLI, syslog events, and/or streamed off switch encoded in Google Protocol Buffer format.
This article is meant to highlight how to enable LANZ on Arista switches and to highlight the difference in LANZ functionality across different platforms.
1) Enabling Latency Analyzer
LANZ can be enabled on the switch with a single command:
# Enable LANZ globally
switch(config)#queue-monitor length
# Disable LANZ for interface Ethernet 1
switch(config-if-Et1)#no queue-monitor length
LANZ can be enabled for the global buffer on the 7150S switches with the following command:
# Enable LANZ for the global buffer
7150S(config)#queue-monitor length global buffer
The architectural differences between the 7150S line of switches and the 7500E/7280SE provide slightly different visibility. In the 7150S, we have already discussed the ability to configure both a high and low threshold. The 7150S is a shared memory switch meaning that there is a single pool of memory that is allocated to all interfaces to provide packet buffering. During the serialization of packets, or when multiple interfaces receive traffic and attempt to send traffic to the same egress port, queuing will begin to occur for that egress interface. Please see the diagram below.
The 7500E and 7280SE both utilize Virtual Output Queuing (VOQ). VOQ uses input side queuing, where a virtual queue exists for every egress port, to effectively eliminate Head of Line Blocking (HOLB) on egress. This allows for packets to be queued at the ingress port and requires LANZ to monitor buffer depth at the ingress port as opposed to the egress port as seen in the diagram below:
2) Setting LANZ Thresholds
The 7150S provides visibility into both individual interface buffers as well as the global buffer. The packets buffered in a 7150 queue are held in a fixed segment size of 160 bytes. LANZ buffer monitoring tracks these as 480 byte segments on the interface level.
# Update thresholds for the global buffer
7150S(config)#queue-monitor length global-buffer thresholds 1000 500
# Update thresholds for the interface buffers
7150S(config-if-Et1)#queue-monitor length thresholds 1000 500 7150S(config-if-Et2)#queue-monitor length thresholds 300 100
For a deeper understanding on how to fine tune thresholds see the EOS Central article LANZ Tuning
The 7500E and 7280SE both provide visibility into individual interface buffers only. The packets buffered on these interface queues are measures in standard bytes on the interface level.
# Update thresholds for the interface buffers
7280SE(config-if-Et1)#queue-monitor length threshold 1000
3) Viewing LANZ Output
All platforms support the ability to see if LANZ is enabled or disabled, the current threshold levels, and other pertinent information for the device specific LANZ configuration. You can see in the below output, interfaces Et1 and Et2 have the adjusted thresholds from the commands shown above while the remainder of the interfaces are set to default values.
# Viewing queue thresholds (7150S)
7150S#show queue-monitor length status queue-monitor length enabled queue-monitor length packet sampling is enabled queue-monitor length update interval in micro seconds: 5000000 Mirror destination interface is Cpu Global Buffer Monitoring ------------------------ Global buffer monitoring is enabled Segment size in bytes : 160 Total buffers in segments : 36864 High threshold : 14415 Low threshold : 5766 Per-Interface Queue Length Monitoring ------------------------------------- Queue length monitoring is enabled Segment size in bytes : 480 Maximum queue length in segments : 4806 Port thresholds in segments: Port High threshold Low threshold Mirroring Enabled Cpu 11792 11792 True Et1 1000 500 True Et2 300 100 True Et3 512 256 True Et4 512 256 True Et5 512 256 True Et6 512 256 True Et7 512 256 True -----truncated-----
# Viewing queue thresholds (7280SE/7500E)
7280SE(config-if-Et1)#show queue-monitor length status queue-monitor length enabled queue-monitor length packet sampling is disabled Per-Interface Queue Length Monitoring ------------------------------------- Queue length monitoring is enabled Maximum queue length in bytes : 52428800 Port threshold in bytes: Port High threshold Mirroring Enabled Et1 1000 False Et2 5242880 False Et3 5242880 False Et4 5242880 False Et5 5242880 False Et6 5242880 False Et7 5242880 False Et8 5242880 False Et9 5242880 False Et10 5242880 False Et11 5242880 False Et12 5242880 False -----truncated-----
All platforms also support the ability to show LANZ events through the CLI or syslog. By default, LANZ does not log events to syslog and must be configured with a time interval value between syslog entries.
# Viewing LANZ events through CLI(7150S)
7150S#show queue-monitor length Report generated at 2015-03-10 22:57:04 E-End, U-Update, S-Start, TC-Traffic Class GH-High, GU-Update, GL-Low Segment size for E, U and S congestion records is 480 bytes Segment size for GL, GU and GH congestion records is 160 bytes * Max queue length during period of congestion + Period of congestion exceeded counter -------------------------------------------------------------------------------- Type Time Intf Congestion Queue Time of Max (TC) duration length Queue length (usecs) (segments) relative to congestion start(usecs) -------------------------------------------------------------------------------- E 0:00:03.48675 ago Et1(1) 29 2* 0 S 0:00:03.48678 ago Et1(1) N/A 2 N/A E 0:00:03.49949 ago Et1(1) 29 2* 0 S 0:00:03.49952 ago Et1(1) N/A 2 N/A E 0:00:03.50384 ago Et1(1) 29 2* 0 S 0:00:03.50387 ago Et1(1) N/A 2 N/A E 0:00:03.50826 ago Et1(1) 29 2* 0 S 0:00:03.50829 ago Et1(1) N/A 2 N/A E 0:00:03.51763 ago Et1(1) 29 2* 0 S 0:00:03.51766 ago Et1(1) N/A 2 N/A E 0:00:03.53011 ago Et1(1) 29 2* 0 S 0:00:03.53014 ago Et1(1) N/A 2 N/A -----truncated-----
# Viewing LANZ events through CLI(7280SE/7500E)
7150S#show queue-monitor length Report generated at 2015-03-10 22:11:08 Time Interface Queue Duration Traffic Ingress Length Class Port-set (bytes) (secs) ------------------------------------------------------------------------------------ 0:03:37.06666 ago Et50/1 272 1 7 Et25 -Et50/4 0:04:08.06666 ago Et50/1 272 1 7 Et25 -Et50/4 0:04:37.06666 ago Et50/1 272 1 7 Et25 -Et50/4 0:05:07.06666 ago Et50/1 272 1 7 Et25 -Et50/4 0:05:38.06666 ago Et50/1 272 1 7 Et25 -Et50/4 0:06:07.06666 ago Et50/1 272 1 7 Et25 -Et50/4 0:06:37.06666 ago Et50/1 272 1 7 Et25 -Et50/4 0:07:08.06666 ago Et50/1 272 1 7 Et25 -Et50/4 0:07:37.06666 ago Et50/1 272 1 7 Et25 -Et50/4 0:08:07.06666 ago Et50/1 272 1 7 Et25 -Et50/4 0:08:38.06666 ago Et50/1 272 1 7 Et25 -Et50/4 0:09:07.06666 ago Et50/1 272 1 7 Et25 -Et50/4 0:09:37.06666 ago Et50/1 272 1 7 Et25 -Et50/4 0:10:08.06666 ago Et50/1 272 1 7 Et25 -Et50/4 0:10:37.06666 ago Et50/1 272 1 7 Et25 -Et50/4 0:11:07.06666 ago Et50/1 272 1 7 Et25 -Et50/4 -----truncated-----
# Viewing LANZ events in syslog
switch(config)#queue-monitor length log 300 switch(config-if-Et2)#show log | grep threshold Oct 27 12:48:22 switch QUEUE_MONITOR-6-LENGTH_OVER_THRESHOLD: Interface Ethernet1 queue length is over threshold of 512, current length is 1024
The 7150 platform provides additional capabilities of viewing queue drops, high threshold statistics, and additional latency added because of queue depth. You are also able to generate a CSV report listing the most recent 100,000 events.
# Viewing more detailed LANZ events
7150S(config)#show queue-monitor length ? Ethernet Ethernet interface all Display all the congestion records cpu Cpu port(s) csv CSV format, with oldest samples first drops Queue drops information global-buffer Display buffer usage limit Limit samples displayed statistics high threshold counts status Display status tx-latency Display queue tx-delay > Redirect output to URL >> Append redirected output to URL | Output modifiers
Additionally, the 7150 platform provides the ability to stream LANZ records to external devices via Google Protocol Buffers (GPB). The below command starts the switch to listen on port 50001 for any GPB client that would try to connect to the switch and receive the records.
# Enabling LANZ Streaming
7150S(config)#queue-monitor streaming 7150S(config-qm-streaming)#no shutdown
4) LANZ Traffic Sampling
Additionally, the 7150 platform can be configured to automatically send traffic experiencing congestion to either the CPU or an egress interface once a queue threshold has been crossed.
# Enable LANZ mirroring
7150S(config)#queue-monitor length mirror
# Configure mirror destination
7150S(config)#queue-monitor length mirror destination ? Cpu Cpu port(s) Ethernet Ethernet interface
This can be useful to either export that congested traffic to a packet capture device or some other tool for analysis or directly to the CPU of the switch for immediate inspection. To inspect the traffic on the switch itself use the following command:
7150S(config)#tcpdump queue-monitor
Alternatively you can use the bash shell to view the output as well:
7150S(config)#bash tcpdump -i lanz
The output below was generated using basic ping traffic, but you can see how the functionality can be used to obtain detailed visibility into buffered traffic on the switch itself or sent off to another capture device
7150S(config)#tcpdump queue-monitor tcpdump: WARNING: lanz: no IPv4 address assigned tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on lanz, link-type EN10MB (Ethernet), capture size 65535 bytes 23:01:17.794281 00:1c:73:00:44:d6 > 00:1c:73:74:32:7f, ethertype 802.1Q (0x8100), length 1138: vlan 1006, p 0, ethertype IPv4, 5.0.0.1 > 5.0.0.2: ip-proto-1 23:01:17.991120 00:1c:73:00:44:d6 > 00:1c:73:74:32:7f, ethertype 802.1Q (0x8100), length 1138: vlan 1006, p 0, ethertype IPv4, 5.0.0.1 > 5.0.0.2: ip-proto-1 23:01:18.091730 00:1c:73:00:44:d6 > 00:1c:73:74:32:7f, ethertype 802.1Q (0x8100), length 1138: vlan 1006, p 0, ethertype IPv4, 5.0.0.1 > 5.0.0.2: ip-proto-1 23:01:18.599131 00:1c:73:00:44:d6 > 00:1c:73:74:32:7f, ethertype 802.1Q (0x8100), length 1138: vlan 1006, p 0, ethertype IPv4, 5.0.0.1 > 5.0.0.2: ip-proto-1 23:01:18.838424 00:1c:73:00:44:d6 > 00:1c:73:74:32:7f, ethertype 802.1Q (0x8100), length 1138: vlan 1006, p 0, ethertype IPv4, 5.0.0.1 > 5.0.0.2: ip-proto-1 23:01:19.745172 00:1c:73:00:44:d6 > 00:1c:73:74:32:7f, ethertype 802.1Q (0x8100), length 1138: vlan 1006, p 0, ethertype IPv4, 5.0.0.1 > 5.0.0.2: ip-proto-1 23:01:19.792002 00:1c:73:00:44:d6 > 00:1c:73:74:32:7f, ethertype 802.1Q (0x8100), length 1138: vlan 1006, p 0, ethertype IPv4, 5.0.0.1 > 5.0.0.2: ip-proto-1 23:01:19.906370 00:1c:73:00:44:d6 > 00:1c:73:74:32:7f, ethertype 802.1Q (0x8100), length 1138: vlan 1006, p 0, ethertype IPv4, 5.0.0.1 > 5.0.0.2: ip-proto-1 -----truncated-----
5) LANZ lite (7500 and 7048T)
A lightweight LANZ capability is also available on first generation 7500 modular and 7048 fixed form switches. The granularity of the event polling is limited to a single event per second and just like on the 7500E/7280SE switches, only a single threshold is configurable and the queue is measured in bytes.
The configuration for LANZ is identical to other devices.
# Enable LANZ globally
7048(config)#queue-monitor length
# Update thresholds for the interface buffers
7048(config-if-Et1)#queue-monitor length threshold 1000
Due to limited hardware support on these platforms, it is not possible to monitor congestion events for all queues simultaneously as in other systems. Only the largest congestion events can be found in part due to the less frequent polling cycles. It should be noted however, that significant visibility is still added to the network, and congestion events in the network, with this functionality.