• Blog

 
 

Why Java APIs and Industry-Standard CLIs are Different

In the past few years, the tech industry has watched with increasing concern as various entrenched participants have brandished copyright law as a weapon to stifle competition and innovation. Recently, we have been treated to yet another novel claim: that after over a decade of broad adoption, the industry-standard set of commands that a user types into a command line interface (or CLI) to configure a network device is subject to copyright. This startling claim raises many questions, but today I want to address one in particular: What effect, if any, does the recent decision in Oracle v. Google have...
Continue reading →

PTP Best Master Clock Algorithm (BMCA)

ContentsScopeBMCAPTP port-statesBMCA- GM election orderBMCA- GM election attributesPriority 1 and Priority 2ConfigurationClock classClock AccuracyClock offsetScaledLogVariance“show ptp clock”PTP DomainAnnounce MessagesL3 announce MessagesL2 announce Messages“show ptp interface”BMCA-GM election exampleTopologyConfigurationsObservationsConverged PTP topologySlave/passive port-election Scope This article describes the “Best Master Clock Algorithm”(BMCA) and the manner in which it’s carried out on Arista switches. BMCA BMCA is used for selecting a Grandmaster (GM) in a PTP domain. Additionally, it is also used to decide the PTP port-states on the Arista switches. PTP port-states Master It provides timing to a downstream clock. PTP master ports send out announce messages. Slave It retrieves timing from an...
Continue reading →

Routing Context – Management VRF and Logs backup

ContentsI. OverviewII. IntroductionIII. Steps to Identify the Management VRF and backup the logsA. Identify the VRF:B. Switch Routing Context and backup logsIV. Summary I. Overview The following article describes the functionality of routing context mode and how to use the functionality to export the logs and files from the device to the Desktop machine or to the backup/storage server. II. Introduction In most of the networking infrastructure, the networking devices are being administered or accessed in non-default vrf (Management VRF). The non-default will have access to the Network orchestration tools, Backup servers, Desktop machines depending upon the network infrastructure policy....
Continue reading →

Troubleshooting EVPN IRB with VXLAN

ContentsOverviewIntroductionPlatform CompatibilityTopologyAsymmetric/Symmetric IRBPre-checksTroubleshooting ScenariosEVPN routes are not received or not advertisedFlood list not populatedData-plane issueSub-optimal forwarding or High peer-link utilizationLimitations (Updated till EOS 4.21.6F ) Data collection for Arista SupportUseful Resources Overview This article provides a brief introduction to EVPN IRB with VXLAN along with basic debugging methods for the same. Introduction Ethernet VPN (EVPN) is an extension of the MP-BGP protocol introducing a new address family. EVPN is used as a control-plane for VXLAN environments to exchange information such as MAC addresses and ARP bindings along with VTEP flood list. Additionally,  IP prefixes can be exchanged in the overlay using...
Continue reading →

Dot1q tagged LACPDU

Introduction This document provides details on how 802.1Q tagged LACP packets are handled on our Arista device. 802.1Q tagged LACP PDUs The LACP PDU frames were ingressing from other vendors into an Arista switch with an 802.1Q tag and designated as VLAN 0. 10:03:58.521076 58:ac:78:f2:8c:05 > 01:80:c2:00:00:02, ethertype 802.1Q (0x8100), length 128: vlan 0, p 0, ethertype Slow Protocols, LACPv1, length 110 10:03:59.421028 58:ac:78:f2:8c:05 > 01:80:c2:00:00:02, ethertype 802.1Q (0x8100), length 128: vlan 0, p 0, ethertype Slow Protocols, LACPv1, length 110 Natively, EOS discards tagged LACP PDUs as they are out of spec. These discards can be observed using the...
Continue reading →

LACP Rate Fast

ContentsIntroductionHow it worksCLI show commandsWireshark output of LACPDU FlagsRecommendation Introduction The LACP rate fast feature is used to set the rate (once every second) at which the LACP control packets are sent from partner. The normal rate at which LACP packets are sent is 30 seconds. This document provides workflow of the LACP rate fast feature including the packet capture and some recommendations/concerns in MLAG setup. How it works When LACP is synchronizing between two device LACP PDUs are sent at a rate of 1 per second until both sides are synchronized. Once this is complete they are sent at...
Continue reading →

Troubleshooting sFlow

ContentsOverviewIntroductionDefault SettingsTroubleshooting1. Not all ingress packets are sampled 2. Higher CPU utilization 3. sFlow traffic is not being sent to the collector Overview This document aims at providing the basic checks that can be performed for troubleshooting sFlow. Introduction Arista switches provide an sFlow agent that samples only ingress traffic from all Ethernet and port-channel interfaces. This agent combines the interface counters and flow samples into sFlow datagrams that are sent to a sFlow collector. A sFlow collector is a server that runs software which analyzes and reports network traffic. Arista switches do not include sFlow collector software. The switch sends sFlow...
Continue reading →

Console Troubleshooting Guide

ContentsObjectiveIntroductionInitial Manual Provisioning:Console is not working post deployment:SSH and Console both are not accessible: Objective The objective of this document is to outline the common issues faced while using a console cable/server to access an Arista Switch. This document lists the troubleshooting steps to isolate the issue with these connections. Introduction In order to access the device, we use either an SSH or a Console connection. Normally, the console port is used for serial access to the switch and is used in the following cases: • initial provisioning of the device manually (when the management ports are not assigned IP...
Continue reading →

Supervisor replacement procedure

ContentsObjectiveVerification prior supervisor replacementRedundancy Protocol: Stateful Switchover (SSO)Redundancy Protocol: Route Processor Redundancy (RPR)Steps to be followed during replacementScenario 1. Chassis has only one supervisor, redundant supervisor is not present1. The active supervisor is partially functional and needs a replacement2. Switch is unresponsive via management or console, forwarding across the chassis is brokenScenario 2. The chassis has two supervisor modules1. Active supervisor is malfunctioning and needs to be replaced2. The standby supervisor is malfunctioning and needs to be replaced Objective The aim of this document is to outline the procedure when replacing a supervisor in a modular chassis. Currently, Arista chassis...
Continue reading →

Understanding subscription paths for Open-source Telemetry streaming

ContentsIntroductionChecking the paths1) Using TerminAttr2) Using the Telemetry browser in CVPMetric ExplorerMetric path structuringInterfaces countersDOM values QSFP RX Power on Modular boxesHow it looks like on the server sideExample 1 – Rx Power for QSFP on Fixed systemsExample 2 – Rx/Tx Power for SFPs on Fixed SystemsExample 3 – interface countersGraph examples on Grafana Data streamed from octsdb to GraphiteData streamed from ocprometheus to PrometheusSample ConfigsTerminAttrOcprometheusOckafkaOctsdbUseful Links: Introduction   The purpose of this document is to understand how the subscription paths are constructed for our openconfig connector apps (ocprometheus, ockafka, octsdb, etc.) that communicate with TerminAttr and send telemetry data to 3rd...
Continue reading →

Streaming EOS telemetry states to Prometheus

ContentsIntroductionPrerequisitesInstalling Prometheus and GrafanaInstalling PrometheusAdd new targetsOption 1:Option 2Option 3Installing GrafanaInstalling and configuring ocprometheusGit Clone the Arista GO libraryCompile ocprometheus in GOOption 1) Using the binaryOption 2) Install it as a swixFlags cheat sheetSample EOS Configuration No VRF and streaming to both CVP and PrometheusNo VRF and only PrometheusVRF management and streaming to both CVP and PrometheusVRF and only PrometheusVRF and only Prometheus using authenticationPDP: VRF and only PrometheusCreating dashboards in GrafanaHow to graph data only for specific interface?Pre-defined dashboardUsing rule records in Prometheus for EOS pathsOcprometheus.yml on EOSPrometheus.yml on the serverKernel_rules.yml on the serverTroubleshooting tipsChecking Logs and troubleshootingCommon issuesContext_deadline_exceededNot seeing...
Continue reading →

Resilient load-sharing using Nexthop Groups

ContentsIntroductionNexthop GroupsResiliency in Nexthop GroupsEven flow distributionAdding NexthopsCombining Addition of Nexthops and Even Flow DistributionHashingMuti-switch designAdvertising VIP addressSummary Introduction Load-sharing of traffic flows towards a specific prefix in a L3 topology is usually achieved with Equal-Cost Multi-Path (ECMP) routing. With ECMP, multiple nexthops of equal preference are available for the prefix. Traffic is distributed towards the different next-hops based on a hashing algorithm and packets belonging to the same traffic flow are by default hashed to the same nexthop. A problem with ECMP is that if one of the nexthops is removed all flows are affected as a new hash...
Continue reading →

PTP slave-passive port election

ContentsScopeSlave-Passive port election orderSteps removedParent clock identitySelf Port-IDExampleTopologyConfigurationsOutputsElection based on the “steps removed” valueTopologyObservationsElection based on the parent clock identityTopologyObservationsElection based on the self port-IDTopologyObservations Scope 1. This article takes account of how the slave-passive port election for PTP is done on Arista switches. Slave-Passive port election order The below sequence of comparison occurs in order to decide if a port should take slave or passive state: 1. Steps removed 2. Parent clock identity 3. Self Port-ID Steps removed “Steps removed” is the number of hops separating a PTP clock from the GM. The port that has a lower “steps removed”...
Continue reading →

cEOS-lab in GNS3

GNS3 is a great tool to visualize your (home-)lab environment and simulate all kinds of network topologies using different virtualization and isolation technologies. It has been widely used to create environments using vEOS-lab, but because vEOS-lab requires quite some resources (e.g. 2GB of RAM is required) the scale of these labs was often quite limited, especially on low-memory devices. Arista’s cEOS-lab is a new way of packaging the EOS-lab suite. Using the Docker container daemon, it is possible to use the kernel of the host machine and to only run the EOS processes that are required on the machine, making...
Continue reading →

How to FTP/SCP/WinSCP

In this document we will look at tools for quickly uploading and downloading files between hosts and Arista switches. 1) SCP On a Linux or Mac, scp is a CLI tool already built in and can be invoked by using the scp command. SCP or secure copy allows secure transferring of files between a local host and a remote host or between two remote hosts. It uses the same authentication and security as the Secure Shell (SSH) protocol from which it is based. Before we look at the commands and examples, please make sure steps given below are followed:  ...
Continue reading →

“Wait-for-warmup” command – To understand if an agent has initialized

ContentsObjectiveMain use caseDetails of the wfw commandExamplesa) The agent is up and runningb) The agent is shutdownWFW command without the verbose optionEquivalent CLI command for wfwWFW command behavior after terminating an EOS agent Objective The aim of this document is to convey the use case and details of the bash command, wait-for-warmup (wfw). An equivalent CLI command exists for the same which is described later in this article. Main use case Agents like the forwarding agent of the switch take some time to come up when terminated. The same is the case for the linecard and fabric module agents when...
Continue reading →

Taking packet captures on Arista devices

ContentsControl-plane packet captureRunning tcpdump natively in EOSRunning tcpdump from bashData-plane packet captureMonitor sessionRunning tcpdump for data-plane traffic natively in EOSRunning tcpdump for data-plane traffic from bashRunning tcpdump for data-plane traffic from bash for 7050/7060/7260 devices using mirror to GRETest setupConfigurationShow commandsViewing the packet capture on CLIViewing the packet capture on wireshark:To enable sflow globallyConfiguring the agent source addressConfiguring the polling intervalConfiguring the sampling rate and sample contentsEnabling sflowShow commandsRunning tcpdump for data-plane traffic from bash for 7050/7060 devices using sflowConfigurationShow commandsViewing the packet capture on CLILimitationsReferences Control-plane packet capture TCPDUMP on physical ports and SVIs. This will help in capturing...
Continue reading →

Troubleshooting Multicast

ContentsOverviewPreliminary ChecksCommon Issues in Multicast1) Multicast Bridging(i) Multicast traffic not received by the subscriber(ii)Multicast traffic is flooded within a VLAN2) Multicast Routing(i) Last-Hop Router (LHR)(a) (*,G) mroute is missing (b) OIL is not populated for a (*,G)(ii) First-Hop Router (FHR)(a) (S,G) mroute is missing or the IIF for (S,G) has null populated(b) OIL is stuck in Register on FHR:(iii) Rendez-vous Point (RP)(a) (*,G) mroute is missing(b) (S,G) mroute is missing(iv) Unresolved Mroutes: 3) Is Multicast traffic being software forwarded i.e. is traffic going to CPU? Logs Collection: Overview The aim of this article is to highlight common issues related to...
Continue reading →

VxLAN troubleshooting guide

ContentsVxLAN Basic Troubleshooting GuideI. ObjectiveII. Introduction:III. TopologyIV. Generic Configurations to be checkedIV. Scenario specific troubleshootingVxLAN Bridging:VxLAN Routing: VxLAN Basic Troubleshooting Guide I. Objective Provide basic/generic troubleshooting steps to customers in case any VxLAN issue is encountered in their network. II. Introduction: Troubleshooting VxLAN involves few steps as mentioned in the upcoming sections of this document. The below referred topology includes VxLAN configurations with server 1,2,3 as the host devices which obtain connectivity over a vxlan tunnel. Troubleshooting steps are bifurcated into routing and bridging to include multiple scenarios possible.   III. Topology   IV. Generic Configurations to be checked A....
Continue reading →

Basic BGP Troubleshooting

ContentsObjectiveI. Neighborshipa. Idle (NoIf)b. Idle(MaxPath)c. The neighborship state is flapping between Connect and Active:d. Stuck in Active stateII. Route Advertisement/Receptiona. Route reception issueb. Route advertisement issuec. AS path loopIII. Route InstallationCase 1: The prefix received from one peer is preferred over the same prefix received from another peer.Case 2: Route for the prefix is installed from a routing protocol other than BGPCase 3: No route to the next hopVI. Logs collection: Objective The objective of this document is to outline the various common issues faced in BGP and the troubleshooting commands for the same. I. Neighborship BGP sends unicast messages,...
Continue reading →

Centralized vs. Distributed VxLAN Routing with EVPN

Tech Note: Centralized vs. Distributed VxLAN Routing with EVPN Over the past few years EVPN VxLAN deployments have become an increasingly popular overlay architecture selected by customers, primarily in data-center layer 3 leaf-spine (L3LS) fabrics.  With this popularity, numerous deployment topologies, and configuration options have presented themselves. This article reflects our observations based on real-world deployment experiences on one such choice; centralized vs. distributed gateways. When deploying EVPN VXLAN integrated routing and bridging (IRB), both VXLAN bridging and VXLAN routing are required concurrently on the switch.  This capability is also commonly referred to as an EVPN VxLAN gateway. There are...
Continue reading →

Follow

Get every new post on this blog delivered to your Inbox.

Join other followers: