Fabric Visibility

A leaf and spine fabric is challenging to monitor. The fabric spreads traffic across all the switches and links in order to maximize bandwidth. Unlike traditional hierarchical network designs, where a small number of links can be monitored to provide visibility, a leaf and spine network has no special links or switches where running CLI commands or attaching a probe would provide visibility. Even if it were possible to attach probes, the effective bandwidth of a leaf and spine network can be as high as a Petabit/second, well beyond the capabilities of current generation monitoring tools.

The 2 minute video provides an overview of some of the performance challenges with leaf and spine fabrics and demonstrates Fabric View – a monitoring solution that leverages industry standard sFlow instrumentation built into EOS to provide real-time visibility into fabric performance. This article article provides step by step instructions and includes eAPI Python scripts to automate sFlow configuration and topology discovery.

First install a copy of Fabric View, the software is free to try, just register at MyInMon.com and request an evaluation.

Next enable eAPI on all the switches in the fabric. The article Arista eAPI 101 introduces eAPI and describes how to enable the service in EOS.

Configure all the switches in the fabric to send sFlow to the Fabric View server. Edit the variables: switch_list, username, password, and sflow_collector in the following script and assign appropriate values for your network:

#!/usr/bin/env python

import requests
import json
import signal
from jsonrpclib import Server

switch_list = ['switch1.example.com','switch2.example.com']
username = "admin"
password = "password"

sflow_collector = "192.168.56.1"
sflow_port = "6343"
sflow_polling = "20"
sflow_sampling = "10000"

for switch_name in switch_list:
  switch = Server("https://%s:%s@%s/command-api" %
                (username, password, switch_name))
  response = switch.runCmds(1,
   ["enable",
    "configure",
    "sflow source %s" % switch_ip,
    "sflow destination %s %s" % (sflow_collector, sflow_port),
    "sflow polling-interval %s" % sflow_polling,
    "sflow sample output interface",
    "sflow sample dangerous %s" % sflow_sampling,
    "sflow run"])

Next use the following script to discover the topology and convert it into a JSON representation that can be imported into Fabric View. Edit the variables switch_list, eapi_username, and eapi_password in the following script  and assign appropriate values for your network:

#/usr/bin/python 
'''
Copyright (c) 2015, Arista Networks, Inc. All rights reserved.
 
Redistribution and use in source and binary forms, with or without 
modification, are permitted provided that the following conditions are met:
 
 * Redistributions of source code must retain the above copyright notice, 
   this list of conditions and the following disclaimer. 

 * Redistributions in binary form must reproduce the above copyright notice, 
   this list of conditions and the following disclaimer in the documentation 
   and/or other materials provided with the distribution. 

 * Neither the name of Arista Networks nor the names of its contributors 
   may be used to endorse or promote products derived from this software 
   without specific prior written permission.
 
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
DISCLAIMED. IN NO EVENT SHALL ARISTA NETWORKS BE LIABLE FOR ANY DIRECT,
INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE
OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF
ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 
'''

# v0.5 - initial version of the script to discover network topology using
# Arista eAPI and generate output in json format recognized by sFlow-RT.

from jsonrpclib import Server 
import json 
from pprint import pprint

# define switch in your topology, eapi transport protocol (http or https),
# eapi username and password
switch_list = ['switch1.example.com','switch2.example.com']
eapi_transport = 'https'
eapi_username = 'admin'
eapi_password = 'password'

debug = False

# internal variables used by the script
allports = {}
allswitches = {}
allneighbors = []
alllinks = {}

# method to populate allswitches and allports - called only from processNeighbor()
def addPort(switchname, switchIP, portname, ifindex):
 id = switchname + '>' + portname
 prt = allports.setdefault(id, { "portname": portname, "linked": False })
 if ifindex is not None:
  prt["ifindex"] = ifindex
 sw = allswitches.setdefault(switchname, { "name": switchname, "agent": switchIP, "ports": {} });
 if switchIP is not None:
  sw["agent"] = switchIP
 sw["ports"][portname] = prt

# method to collect neighbor records - called with each LLDP neighbor 
# entry as they are discovered
def processNeighbor(localname,localip,localport,localifindex,remotename,remoteport):
 addPort(localname, localip, localport,localifindex);
 addPort(remotename, None, remoteport, None);
 allneighbors.append({ "localname": localname, "localport": localport,
         "remotename": remotename, "remoteport": remoteport });

# method to remove agents that we did not discover properly, or
# that we did not intend to include in the topology.  (If we
# assigned an agent field to the switch then we assume it should stay.)
def pruneAgents():
 for nm,sw in allswitches.items():
  #if not "agent" in sw:
  if sw['agent'] == '0.0.0.0' or not sw['agent']:
   del allswitches[nm]

# method to test for a new link - called only from findLinks()
def testLink(nbor,linkno):
 swname1 = nbor["localname"]
 swname2 = nbor["remotename"]
 # one of the switches might have been pruned out
 if swname1 not in allswitches or swname2 not in allswitches:
  return False
 sw1 = allswitches[swname1]
 sw2 = allswitches[swname2]
 pname1 = nbor["localport"]
 pname2 = nbor["remoteport"]
 port1 = sw1["ports"][pname1];
 port2 = sw2["ports"][pname2];
 if not port1["linked"] and not port2["linked"]:
  # add new link
  linkid = "link" + str(linkno)
  port1["linked"] = True;
  port2["linked"] = True;
  alllinks[linkid] = {
   "node1": nbor["localname"],
   "port1": nbor["localport"],
   "node2": nbor["remotename"],
   "port2": nbor["remoteport"]
   }
  return True
 return False

# method to find unique links - call at the end once all the LLDP records have
# been processed from all the switches
def findLinks():
 linkcount = 0
 for nbor in allneighbors:
  if testLink(nbor, linkcount+1):
   linkcount += 1

# method to dump topology in json format recognized by sFlow-RT
def dumpTopology():
 topology = { "nodes": allswitches, "links": alllinks }
 print(json.dumps(topology, indent=4))

# method to get LLDP neighbors of each switch - calls processNeighbor() for each LLDP neighbor found
def getLldpNeighbors(switch_name):
 try:
  switch = Server('%s://%s:%s@%s/command-api' % (eapi_transport, eapi_username, eapi_password, switch_name))

  # Get LLDP neighbors
  commands = ["enable", "show lldp neighbors"]
  response = switch.runCmds(1, commands, 'json')
  neighbors = response[1]['lldpNeighbors']

  # Get local hostname
  commands = ["enable", "show hostname"]
  response = switch.runCmds(1, commands, 'json')
  hostname = response[1]['hostname']

  # Get SNMP ifIndexes
  commands = ["enable", "show snmp mib ifmib ifindex"]
  response = switch.runCmds(1, commands, 'json')
  interfaceIndexes = response[1]['ifIndex']

  # Get sFlow agent source address
  commands = ["enable", "show sflow"]
  response = switch.runCmds(1, commands, 'json')
  sflowAddress = response[1]['ipv4Sources'][0]['ipv4Address']
  
  # Create 2D array lldp_neighbors where each line has following entries 
  # , , , 
  lldp_neighbors = []
  for neighbor in neighbors:
   lldp_neighbors.append([neighbor['neighborDevice'].split('.')[0], 
        neighbor['port'], neighbor['neighborPort'], interfaceIndexes[neighbor['port']]])
  
  if (debug): 
   pprint(lldp_neighbors)


  # collect switches, ports and neighbor-relationships
  for row in lldp_neighbors:
   processNeighbor(hostname, 
    sflowAddress,
    row[1], # localport
    row[3], # localifindex
    row[0], # remotename
    row[2]) # remoteport

  # Print list of LLDP neighbors in human friendly format:
  #  neighbor, , connected to local  with remote 
  if debug:
   print "Switch %s has following %d neighbors:" % (hostname[1], len(neighbors))
   for i, neighbor in enumerate(lldp_neighbors):
    print "#%d neighbor, %s, connected to local %s with remote %s" % (i+1, neighbor[0], neighbor[1], neighbor[2])

 except:
  print 'Exception while connecting to %s' % switch_name
  return []


for switch in switch_list:
 getLldpNeighbors(switch)

pruneAgents()
findLinks()
dumpTopology()

The script outputs a JSON representation of the topology that should look something like the following:

{
    "nodes": {
        "leaf332": {
            "name": "leaf332", 
            "agent": "10.10.130.142", 
            "ports": {
                "Management1": {
                    "portname": "Management1", 
                    "ifindex": 999001, 
                    "linked": false
                }, 
...
    "links": {
        "link5": {
            "node1": "leaf260", 
            "node2": "core212", 
            "port2": "Ethernet3/15/1", 
            "port1": "Ethernet31"
        }, 
...

Access the Fabric View web interface at http://fabricview:8008/ and navigate to the settings page:

fv-configure

Upload the JSON topology file by clicking on the disk icon in the Topology section. Alternatively, the topology can be installed programmatically using the Fabric View REST API documented at the bottom of the Settings page.

As soon as the topology is installed, traffic data should start appearing in Fabric View. The video provides a quick walkthrough of the software features.