• Remote Port Health Manager

 
 
Print Friendly, PDF & Email

Overview:

Remote Port Health Manager (rphm) monitors interface counters on one or more EOS devices.  It will send an SNMP trap to a management station whenever one of those counters increases at a rate greater than the defined threshold.  Further, rphm is easily extensible so other actions could be added.

Example uses include:

  • I want to know when one of my critical ports gets more than N number of CRC errors during a window of time.
  • Am I receiving excessive rxPause frames from certain paths, and if so, when?
  • When is there a burst in broadcast or multicast packets being transmitted from a certain port?
  • Trigger a warning when critical link utilization nears saturation.

Features:

  • Get proactive alerts on your network monitoring system when any selected interface counter grows faster than desired.
  • Run on-switch as an extension or on a separate monitoring server.
  • Define devices, interfaces, and statistics to be monitored as well as thresholds and poll frequency in the simple config file.
  • Supports SNMP V2c and V3.
  • Implemented actions: snmptrap.  Others may be easily added in the source.

Get the extension:

Installable RPM

Source code:

https://github.com/arista-eosext/rphm

Prerequisites:

  • Arista eAPI (EOS 4.12 or later)
  • Rpmbuild tools on linux are required to build the extension.  If installed on a non-EOS linux system, net-snmp and jsonrpclib are also required.

Build and install for EOS

Build the extension:

bash $git clone https://github.com/arista-eosext/rphm.git
bash $cd rphm/
bash $make rpm

Copy to the switch and install:

Arista#copy scp://user@buildhost/<path>/rphm/rpmbuild/rphm-1.0.0-1.rpm extensions:
Arista#extension stat-mon-<ver>.rpm

Enable eAPI on the switch(es):

Arista(config)#username <name> privilege 15 secret <password>
Arista(config)#management api http-commands
Arista(config-mgmt-api-http-cmds)#no shutdown

Edit the configuration:

(Default: /persist/sys/rphm.conf)

Arista#bash
[admin@Arista ~]$sudo vi /persist/sys/rphm.conf

Configure [snmp] settings:

Configure the traphost and uncomment, then configure the appropriate lines for SNMP V2c or V3.

Configure [counters] poll interval:

This is the time to wait between successive polls.

Configure [switches] settings:

For a single switch deployment, add the switch’s hostname or IP to the “switchList” in the [switches] section.  Then in the [DEFAULT] section, set the eAPI connection info, such as username and password and the default interfaceList.

If configuring multiple switches, use the [DEFAULT] section for common items.   Then copy the portions of the [DEFAULT] section that need to be unique to new sections where the section name is the hostanme, ip, or friendly name of the switch(es).  See the config file for examples.

Configure per counter thresholds

Also in the [DEFAULT] section, with the ability to override in per-switch sections, set the counterList which defines which counters will be monitored and adjust the threshold level for the desired counters.

Example config

# rphm.conf
#
[snmp]
traphost = snmp-traphost.example.com
# SNMP v2:
version = 2c
community = eosplus

[counters]
# Seconds between polls
poll = 300

[DEFAULT]
#protocol=https
#port=443
#hostname=localhost
#username=arista
#password=arista
#url = %(protocol)s://%(username)s:%(password)s@%(hostname)s:%(port)s/command-api

# The default list of interfaces to monitor on any switch
interfaceList="Management1",
              "Ethernet1",
              "Ethernet2"

# The default list of counters to monitor on each interface.
# NOTE: a threshold must be defined for each counter.
counterList=totalInErrors,
            totalOutErrors,
            fcsErrors,
            symbolErrors

# Default thresholds:
totalInErrors=20
totalOutErrors=20
alignmentErrors=1
fcsErrors=1

# Simple method for defining the switch(es) to monitor with default options.
switchList=10.10.10.11,
           localhost,
           spine-l3-04.example.com

# In a multi-switch monitoring setup, defaults may be overridden on a per-switch basis

#[vEOS-1]
#hostname=10.10.10.100
#password="different-pass"
#counterList=inUcastPkts,
#            inDiscards
#inUcastPkts = 4000000

Optionally, test your SNMP configuration from a shell:

The following command will send a single test trap to your traphost:

[admin@Arista ~]$/usr/bin/rphm [--config=<path-to>/my.conf] [--debug] --test=trap

Start rphm:

Arista(config)#daemon rphm
Arista(config-daemon-stat-mon)#command /usr/bin/rphm
Arista(config-daemon-stat-mon)#exit

Example trap message:

Rphm uses the enterprise-specific, generic trap OID:
.iso.org.dod.internet.private.arista.generic (.1.3.6.1.4.1.30065.6) string

“Device my-switch-02 DCS-7048T-4S-R, interface Ethernet2: fcsErrors increasing at > 1 per 30 seconds. Found 9/3284 packets in”

Save extensions and config:

In order for rphm to run after a reload, save the configuration and extensions.

Arista#copy installed-extensions boot-extensions
Arista#copy running-config startup-config

Other related extensions:

An related extension, the Port Health Monitor script generates syslog notifications
whenever the FCS/symbol errors counters on an interface exceed pre-configured levels, then, optionally, will shutdown interfaces with high error rates over consecutive poll intervals.  The Port Health Monitor thresholds measure change in the error counters since the script starts or the interface comes up.  Rphm, on the other hand, compares changes in counters since the last poll interval.

Follow

Get every new post on this blog delivered to your Inbox.

Join other followers: