Troubleshooting congestion – Investigating and taking corrective steps

  1) Introduction Congestion might not be obvious, it can be discovered reactively in disastrous situations, or proactively by collecting statistics off equipment and investigating symptoms demonstrated by the applications and systems.   Deep buffers on switches is a blanket and effortless solution to the problem, but it might not be materially possible or justifiable everywhere on a network. This document discusses design considerations in case of congestion.     2) Measuring The first step (which might seem obvious) for understanding some potential issues is to translate the symptoms such as slow, unresponsive, poor performance, into measurable and baselined metrics...
Continue reading →

VXLAN Without Controller for Network Virtualization with Arista physical VTEPs

  1) Introduction This article assumed an understanding of the VXLAN concepts. This article aims at guiding the design and implementation of network virtualization with VXLAN, employing physical VTEPs. This controller-less design provides Layer2 communication across a Layer3 network for any Layer2 Ethernet device. This solution guide resolves network virtualization for network teams that might not have yet a network virtualisation controller, or cloud management platform (CMP), but want to benefit now from all the advantages of VXLAN. Without network controller, the virtual switches will not participate natively in the VXLAN overlay setup, they would be configured the traditional way...
Continue reading →

Hint – Naming ACLs for easier contextual help and auto-complete

You might like to name your ACLs with a suffix “ACL-” or  similar, so that when you type question mark  (‘?’) or TAB for auto-complete, you would automatically get the ACL name, without having to remember it (often cause of typos): Example: Arista(config)#show ip access-lists ? <==== asking for ACL name <WORD>; not listing all the ACLs by default as there could be too many WORD Access-list name summary Access list summary > Redirect output to URL >> Append redirected output to URL | Output modifiers <cr> Arista(config)#show ip access-lists ACL? <==== the contextual help now lists all the ACL...
Continue reading →

Understanding Deduplication in Tap Aggregation (NPB)

  1) What is deduplication ? Deduplication in the context of packet broker networks (Tap Aggregation) is the ability to detect duplicates of a packet, allowing only the first packet and dropping other iterations of the same packet.   2) Hardware impacts the Deduplication performance Deduplication, like many features, requires certain hardware characteristics to be supported by the silicon (network processor), which is the foundation of hardware packet processing and forwarding in networking/Ethernet equipment. It allows matching packet, manipulating, and making forwarding decisions in hardware.   2.1) Processing performance The Arista switches are based on high performance network processors of different...
Continue reading →

Tap solutions for Arista Tap Aggregation – Network Packet Broker

  Arista Tap Aggregators are agnostic to the taps capturing the light signal, although optical budget should remain a careful consideration, like in any optical media. The below is a selection of Tap vendors deployed by our customer based, in alphabetical order. Feel free to post a comment with your own favourite Tap supplier, if not listed here.   CableXpress http://www.cablexpress.com/solutions/port-replication/   Comcraft – ProfiTAP http://www.profitap.com/fiber-taps/   Corning Cable Systems – Pretium EDGE Tap module http://catalog.corning.com/opcomm/en-US/catalog/MasterProduct.aspx?cid=pretium_EDGE_AO_module_web&pid=114264   Enlight Data http://www.enlightdata.com/products.html     Garland Technology http://www.garlandtechnology.com/products/network-taps   M2 Optics http://www.m2optics.com/products/network-taps   Mimetrix http://www.mimetrix.com/optical-taps.php   Tapics http://www.tapics.us    

Tip for Arista vEOS on VMware ESX 6

Note: This tip was discovered and shared by Sandy Breeze at Claranet   Arista provide the EOS network operating system for test/lab virtual environment under the form of vEOS, either as a VMDK or a SWI (software image to install on an existing vEOS). With the vEOS VMDK as currently provided, in thin provisioning for saving on the file size, ESX4 and 5 would work fine, but upon booting the vEOS VM under ESX6, it will report “LZMA data is corrupt”,  and “system halted”, despite the image not being corrupted (you could verify the checksum). This issue may also manifest itself with an...
Continue reading →

7150S NAT – Practical Guide – Source NAT – Dynamic

Introduction This article presents Dynamic Source NAT, as part of a series of articles about Source NAT on the Arista 7150S with practical examples. It assumes an understanding of NAT and Source NAT. See the article Static Source NAT as foundation to the present Dynamic Source NAT article The following topics are covered in this article: Dynamic Source NAT with Pool Dynamic Source NAT Overload   The following additional topics are covered in other articles: Static Source NAT Source NAT – Baseline Static Source NAT – Unicast and multicast with routed ports Static Source NAT – with SVI Static Source NAT + ACL...
Continue reading →

MTP12 Cheat Sheet for QSFP 40G SR4 Optical Cabling

  1) Overview This document explains the optical connectivity involved in 40G optical QSFP for short reach (40GBASE-SR4), on multimode fibres. The standard specifies MPO12 (or MTP12) as connector to the SR4 QSFP, which employs traditionally 12 fibres, but 40G only need 8 (4 pairs) to carry the 4 parallels bidirectional paths. You might know that QSFPs can be programmed to operate as 4 x 10G.     2) QSFP to QSFP light path on MTP12 cables Notice in the below QSFP 40G SR4 transceiver that the connector is not LC but a MPO/MTP12 receptacle. You may also notice the...
Continue reading →

LANZ – Tuning packet buffer monitoring thresholds – Gain the most adequate visibility to you

This article introduces LANZ briefly, and then concentrate on explaining how you may want to tune the threshold. Threshold tuning allow you to have the right level of visibility for your environment.     1) LANZ Introduction LANZ on the Arista 7150S and other platforms provide trigger-based micro-burst visibility. This guarantees capturing congestion events, even the shortest, as compared with any hit-and-miss polling mechanisms. For some other platform families whose hardware does not support trigger-based detection, the polling LANZ-lite alternative is available, still very useful but simply not as accurate. Refer to the manual for LANZ differences.   LANZ generated...
Continue reading →

sFlow Generation for Legacy Networks with Tap Aggregation (NPB / Matrix switch)

  sFlow is a standard hadware sampling available on all the Arista platforms, providing rich statistical information on all ports. sFlow is available in Tap Aggregation mode, allowing additional use cases of Tap Aggregation than traffic analysis on analyzer tools: Retro-fitting sFlow to legacy infrastructure Distributed analysis This article focuses on Retro-fitting sFlow to legacy infrastructure.   1) sFlow vs Netflow sFlow is a sampling mechanism implemented in hardware: Widely available on non-legacy platforms, and widely supported on collectors/monitoring software sFlow requires minimal local processing which contrast with Netflow that is very CPU-intensive, making Netflow poorly suitable for any high performance...
Continue reading →

DANZ – Tap Aggregation optics / transceivers selection

This articles clarifies certain criteria that are important to consider in the design of a Network Packet Broker (NPB) aggregating traffic from various sources. For distance reasons, the main type of media used in tap aggregation is optical (multimode or single mode), therefore this article mainly focuses on these media.   1) Understanding Optical Budgets Multiple factors contribute to the degradation of optical signals Fiber attenuation Insertion loss (e.g. connectors, patch panels and splices) Fiber type mismatch (e.g. connecting 50/125MMF to 62.5/125MMF) Over-bending of fibre plant Intermediate passive devices (e.g. taps, attenuators or mode filters)   Media Type Approximate Loss...
Continue reading →

DANZ Tap Aggregation – Basic settings – Before you start

Several Arista switches support DANZ feature set for Tap Aggregation. The tap aggregation mode is a mere configuration (1-2 lines) that transform a high performance L2/L3 switch into a Tap Aggregator (NPB). This mode require certain considerations: 1) Tap aggregation – How to selecting the exclusive mode That tap aggregation mode is exclusive to part of a switch of the whole switch. Parts of the switch that are excluded from the Tap Aggregation mode can work either in fully L2/L3 forwarding mode (normal switching mode), or in simple hub mode. The options available vary per platforms, as per the below list....
Continue reading →

Script example – Automating VXLAN deployments with EAPI

  1) Introduction This article describes briefly what is required to deploy overlay networks with VXLAN, but we assume a good understanding of the VXLAN fundamentals. To achieve such VXLAN deployments, multiple options exist, from simple but manual, to fully automated service chaining (orchestration) at the cost of having to also set a Cloud Management Platform or a network virtualization controler This article focuses on an easy option that is a good balance between simplicity of operation (automation), and simplicity of  setting up (script ready to go)   2) Working towards automation: it is an evolution This article is not providing...
Continue reading →

DANZ Tap Aggregation – Filtering on inner Q-in-Q header, and stripping outer header – At the same time

  This article documents the ability, for the Arista 7150S in Tap Aggregation mode, to selectively filter on inner Q-in-Q header, and also strip the outer  header on egress, effectively allowing a granular selection of what Q-tagged traffic tools will be receiving. Let’s take as traffic example some Q-in-Q traffic: Outer Q-header (Eth-type 0x88a8) – STAG – VLAN ID = 100 Inner Q-header (Eth-type 0x8100) – CTAG – VLAN ID = 101, 102   Packet capture example for this Q-in-Q traffic:   7150S(config)#bash sudo tcpdump -nni mirror0 [...] 22:23:44.040896 00:ab:00:00:02:23 > 00:1c:73:86:00:69, ethertype 802.1Q-QinQ (0x88a8), length 1020: vlan 100, p...
Continue reading →

7150S NAT – Practical Guide – Source NAT – Static

    Introduction This article presents Static Source NAT, as part of a series of articles about Source NAT on the Arista 7150S with practical examples. The following topics are covered in this article: Source NAT – Baseline Static Source NAT – Unicast and multicast with routed ports Static Source NAT – with SVI Static Source NAT + ACL Match Static Source NAT + PAT   The following additional topics are covered in other articles: Dynamic Source NAT with Pool Dynamic Source NAT Overload Static Twice NAT Static Twice NAT – With SVI Troubleshooting Tuning NAT     1) Source NAT –...
Continue reading →

Tap Aggregation – Filtering with Port ACLs

  1) Introduction   This article details the filtering of traffic across the Tap Aggregator by using port ACL. The filters allow granular selection of Layer2, Layer3, and Layer4 traffic on a per-port basis. The following other features might also be of interest, but are out of scope of this article: VLAN membership filters Traffic Steering   2) Filtering Overview   The well known MAC and IP Access-List filtering is used to filter traffic in Tap Aggregation mode, just like it does in switching mode. The Layer2/3/4 ACLs can be applied on Tap ports, ingress on Tool ports, egress  ...
Continue reading →

Introduction to Managing EOS Devices – EOS Tips for Power Users

Note: This article is part of the Introduction to Managing EOS Devices series: https://eos.arista.com/introduction-to-managing-eos-devices/      Annex B)  EOS Tips for Power Users B.1) CLI – Show Commands Redirections   EOS CLI supports the following “show” command redirections, by “|” (pipe): LINE      Filter command by common Linux tools such as grep/awk/sed/wc append    Append redirected output to URL begin     Begin with the line that matches exclude   Exclude lines that match include   Include lines that match no-more   Disable pagination for this command nz        Include only non-zero counters ← Hides line with all 0 numbers redirect  Redirect output to URL section   Include sections that match...
Continue reading →

Introduction to Managing EOS Devices – Configuration Example

Note: This article is part of the Introduction to Managing EOS Devices series: https://eos.arista.com/introduction-to-managing-eos-devices/      Annex A – Configuration Example   This section provides an example of switch configuration file. It retrieves many of the management settings covered in the series Introduction to Managing EOS Devices, grouped here for convenience, but it is not exhaustive.   ! username <Username> secret <User_password> no logging console logging format timestamp high-resolution logging facility local6 ! hostname <Hostname> ip name-server <DNSHost_Address> ip name-server <DNSHost_Address> ip domain-name <Company_DomainName> ! ntp source Management1 ntp server <NTPHost_Address-1> prefer ntp server <NTPHost_Address-2 ! snmp-server contact "Enterprise Network Operations xxx-xxx-xxxx"...
Continue reading →

Introduction to Managing EOS Devices – Automation and Extensibility

Note: This article is part of the Introduction to Managing EOS Devices series: https://eos.arista.com/introduction-to-managing-eos-devices/      5) Automation and Extensibility   The Arista EOS facilitates task automation, provisioning, and extending capabilities on the Arista switches. The following features are available on all the platforms: Managing extensions and applications AEM: Event Manager AEM: CLI Scheduler     5.1) Managing EOS Extensions   The most simple and efficient way to make the most of the extensibility on which EOS is built is through the use of extensions.  An extension is a pre-packaged optional feature or set of scripts in an RPM or SWIX format....
Continue reading →

Introduction to Managing EOS Devices

Summary   Several mechanisms exist to manage and instrument Arista Networks’ devices, ranging from industry standard SNMP counters to more Arista EOS/platform-centric functionality and deep debugging capabilities. The following articles introduce some fundamental management activities:   Setting up Management Monitoring Troubleshooting Platform Specific Monitoring and Troubleshooting Automation and Extensibility Annex A – Configuration Example Annex B – EOS Tips for Power Users   This document serves to highlight the basic parameters required to automate monitoring of an Arista EOS based device, while providing a high level overview of additional, more advanced functionality for low level troubleshooting and application specific monitoring. Many of the topics...
Continue reading →

Introduction to Managing EOS Devices – Troubleshooting

Note: This article is part of the Introduction to Managing EOS Devices series: https://eos.arista.com/introduction-to-managing-eos-devices/      3) Troubleshooting The following monitoring tools provide information on Arista EOS for all platforms: Event-Monitor Control-plane TCPdump Tracing (debug) Show Tech Support Log consolidation Platform specific: Data-plane TCPdump (7150S series)     3.1) Event Monitor Event Monitor is part of a suite of tools called Advanced Event Management (AEM). The goal of AEM is to improve both reactive and proactive management functions, enabling the network to scale while maintaining visibility of it’s various components. Event Monitoring moves away from traditional “point in time” monitoring, by...
Continue reading →

Introduction to Managing EOS Devices – Platform Specific Monitoring and Troubleshooting

Note: This article is part of the Introduction to Managing EOS Devices series: https://eos.arista.com/introduction-to-managing-eos-devices/      4) Platform Specific Monitoring and Troubleshooting   Some of the monitoring/troubleshooting commands provide visibility into hardware-specific information. These commands vary by platform family, which are based on different processor: Broadcom Trident+/2, Interl Alta/FM6000, Broadcom Arad, etc.   4.1) 7050-7050X The 7050 and 7050X platforms are based on the Broadcom Trident+/Trident2 network processor. It offers Trident-specific commands, located in the “show platform trident” command hierarchy. This level of information can be particularly useful during detailed troubleshooting, of general resource monitoring (e.g. TCAM). Commonly used ones have been...
Continue reading →

Introduction to Managing EOS Devices – Monitoring

Note: This article is part of the Introduction to Managing EOS Devices series: https://eos.arista.com/introduction-to-managing-eos-devices/      2) Monitoring The following monitoring tools provide information on Arista EOS for all platforms:   General System Health (CPU, Power, Temperature, etc.) Hardware counters (Interfaces, TCAM, etc) System and process logging Port mirroring (SPAN) Advanced Event Management (AEM) LANZ Platform specific: Advanced Mirroring (7150S series) Platform-specific “show” commands   2.1) Using SNMP for monitoring Besides CLI “show” commands, most of the monitoring information can be collected by SNMP. EOS natively provides the ability to walk and search local MIBs for specific OIDs. These OIDs can then...
Continue reading →

Introduction to Managing EOS Devices – Setting up Management

Note: This article is part of the Introduction to Managing EOS Devices series: https://eos.arista.com/introduction-to-managing-eos-devices/      1) Setting Up Management The following management tools are available on Arista EOS for all platforms: VRF-aware management Telnet and SSH Syslog and Console Logging SNMP Versions 1 and 3 NTP DNS Local and remote user control (AAA) TACACS+, RADIUS sFlow XMPP eAPI   Note: in the following configuration examples, the commands in square brackets are optional: [optional]   1.1) VRF Aware Management As of release 4.10.1, EOS supports the ability to constrain management functions to a VRF. This enables the user to separate management based functions...
Continue reading →

Tap Aggregation – VLAN List Filtering

  1) Introduction   A list of allowed VLANs simply specifies, under an interface in Tap Aggregation mode, which VLAN traffic is allowed. Removing VLANs from the allowed list means those VLANs would be blocked. It allows filtering traffic in a flexible manner, directly from the interface command, without creating ACLs or steering policies. This article details how to configure the VLAN list, and combine them to achieve multi-stage VLAN filtering.   2) Allowed VLAN List Definition   An allowed VLAN list is simply a definition of VLAN IDs. By default, all VLANs are allowed. The below commands illustrate the...
Continue reading →

Understanding CPU Utilization

Introduction This article explains the different values from the CPU utilization output, down to the per-thread CPU usage on an Arista EOS. To this aim, we will cover the topics in different sections, as follows: How to view the CPU usage Understanding CPU Usage – Row 1 Understanding CPU Usage – Row 2 Understanding CPU Usage – Row 3 Understanding CPU Usage – Row 4 Understanding CPU Usage – Row 5 How to view CPU usage on a multi core switch? Understanding CPU Processes     1) How to view the CPU usage To view the CPU usage, use the...
Continue reading →

Troubleshooting High CPU Utilization

Introduction   This article aims at helping you define what is CPU load on an Arista switch, how to know when it has a high load, and help troubleshoot high CPU utilization. We will cover different topics related to High CPU utilization: How do I identify if my switch has a high CPU? What is considered normal CPU % utilization? What is example of a High CPU utilization? How to identify average load on an Arista switch? What is the difference between CPU load average and CPU utilization How to interpret the load numbers? Can the load average be greater than...
Continue reading →

Timestamping Deep Dive – Frequent Questions and Tips on Integration

  Introduction Accurate packet timestamps are essential for network event correlation and performance analysis. The Arista 7150S provides hardware timestamping with nanosecond granularity and ≤10ns precision. Timestamping is applied in hardware on all packets, at line rate in parallel. The timestamping format and implementation is detail is this article: https://eos.arista.com/timestamping-on-the-7150-series/ The present article explains in more details the internals of timestamping on the 7150S, and provides an overview of expected behaviours, as well as tips for integrating with your tooling environment.   1) How does Timestamping work ?   Timestamping on the Arista 7150S is a function of the MAC...
Continue reading →

Package RPMs into an Arista SWIX extension

  Introduction This article describes how you can make use of the Arista EOS native extensibility by preparing extensions for installation on your switches.   Overview The Arista EOS is built on top of a standard Fedora Linux kernel that natively supports compiled applications packaged as RPMs. It is therefore very convenient for a dutiful administrator to find applications to load and install on the kernel, with all the diligence normally due to sourcing 3rd party software (no guarantee or responsibility from Arista). Arista has defined a package format for use with EOS called SWIX (Software image extension). It is...
Continue reading →

Displaying Hardware Timestamps in Wireshark

Overview The Arista 7150S enables highly accurate timestamps (3ns granularity, 10ns accuracy) to be applied to all traffic flowing through the switch. In this post we are presenting how you can quickly display hardware timestamps imposed by the Arista 7150S platform. This method does not need any special software to convert the frames. You simply need to run a live capture with Wireshark, or load a PCAP file. If you need more details on hardware timstamping on the Arista 7150S, then please refer to this article: https://eos.arista.com/timestamping-on-the-7150-series/ Here is an example of the timestamps being displayed in hex as a column in...
Continue reading →

Restarting EOS agents

Objective This article shows an experiment which demonstrates what happens when an EOS agent is killed (either voluntarily, or as a result of a failure). Preparation (optional) In the frame of this article, and for testing purpose only, we take few preparation steps that are not absolutely necessary in a production environment: 1) Clear the logs to improve visibility (less noise in the logs) Arista#clear logging 2) Configure high resolution logging timestamps Arista (config)#logging format timestamp high-resolution Killing an agent 1) Access the bash shell Arista#bash 2) Select a process and find its PID (Process ID) [admin@Arista ~]$ ps –ef |...
Continue reading →

How to backup EOS configs to a remote server

This article describes how a switch can push its configuration to a remote server, either on demand or periodically. Automating remote authentication using SSH keys Generate public/private DSA key pair: [root@Arista root]#ssh-keygen -t dsa Enter file in which to save the key (/root/.ssh/id_dsa): Enter passphrase (empty for no passphrase): Enter same passphrase again: Your identification has been saved in /root/.ssh/id_dsa. Your public key has been saved in /root/.ssh/id_dsa.pub. Create an ssh config file for the (in this example) root user. Make sure the formatting is correct. [root@Arista ~]#vi /root/.ssh/config Host * IdentityFile /root/.ssh/id_dsa Copy the public key to the remote...
Continue reading →