Introduction to Managing EOS Devices – Monitoring

Note: This article is part of the Introduction to Managing EOS Devices series:

https://eos.arista.com/introduction-to-managing-eos-devices/ 

 

 

2) Monitoring

The following monitoring tools provide information on Arista EOS for all platforms:

 

  • General System Health (CPU, Power, Temperature, etc.)
  • Hardware counters (Interfaces, TCAM, etc)
  • System and process logging
  • Port mirroring (SPAN)
  • Advanced Event Management (AEM)
  • LANZ
Platform specific:

  • Advanced Mirroring (7150S series)
  • Platform-specific “show” commands

 

2.1) Using SNMP for monitoring

Besides CLI “show” commands, most of the monitoring information can be collected by SNMP. EOS natively provides the ability to walk and search local MIBs for specific OIDs. These OIDs can then be used to get general system health information, hardware statistics, and a lot more, out of the EOS-powered  devices.

switch(config)#show snmp mib ? 
 getGet one object
 get-nextGet the next object
 ifmibShow SNMP IF-MIB contents
 tableGet the contents of a table
 translateTranslate between OID <-> name
 walkWalk a subtree
 
switch(config)#show snmp mib walk ?
 OIDAn object-ID (e.g., IP-MIB::ipAddrTable)
 >Redirect output to URL
 >>Append redirected output to URL
 |Output modifiers
 <cr>
 
switch#sh snmp mib walk . | grep -i ifmtu
IF-MIB::ifMtu[1] = INTEGER: 9214
IF-MIB::ifMtu[2] = INTEGER: 9214
[...]

 

 

2.2) General System Health

When monitoring the overall health of an EOS based device, internal factors (such as CPU and memory load), environmental factors, and general module status, are all useful metrics. This data is available both from the CLI and via SNMP, as shown in the following examples.

 

2.2.1) Inventory and Modules Status

 

Collect the switches (all Arista platforms) serial number by using the following OID to poll the ENTITY-MIB::entPhysicalSerialNum[1]:

OID: .1.3.6.1.2.1.47.1.1.1.1.11.1

 

The show command below lists the status of all modules, along with the model and the serial numbers.

Arista 7500E specific output:

7500E(s1)#show module all
Module    Ports Card Type                            Model           Serial No.
--------- ----- ------------------------------------ --------------- -----------
1         2     DCS-7500 Series Supervisor Module    7500-SUP        JPE12271544
2         1     Standby supervisor                   Unknown         Unknown
3         72    48 port 10GbE SFP+ & 2x100G Linecard 7500E-72S-LC    JPE13270113
5         144   36 port 40GbE QSFP+ Linecard         7500E-36Q-LC    JPE13430129
Fabric1   0     DCS-7504-E Fabric Module             7504E-FM        JPE13270417
Fabric2   0     DCS-7504-E Fabric Module             7504E-FM        JPE13120301
Fabric3   0     DCS-7504-E Fabric Module             7504E-FM        JPE13120296
Fabric4   0     DCS-7504-E Fabric Module             7504E-FM        JPE13120285
Fabric5   0     DCS-7504-E Fabric Module             7504E-FM        JPE13120401
Fabric6   0     DCS-7504-E Fabric Module             7504E-FM        JPE13120269

Module    MAC addresses                          Hw      Sw      Status
--------- -------------------------------------- ------- ------- -------
1         00:1c:73:1d:e2:33 - 00:1c:73:1d:e2:33  07.06   4.12.8.1 Active
2                                                        4.12.8.1 Standby
3         00:1c:73:29:03:57 - 00:1c:73:29:03:9e  01.00           Ok
5         00:1c:73:47:42:6c - 00:1c:73:47:42:fb  01.11           Ok
Fabric1                                          01.01           Ok
Fabric2                                          01.01           Ok
Fabric3                                          01.01           Ok
Fabric4                                          01.01           Ok
Fabric5                                          01.01           Ok
Fabric6                                          01.01           Ok

 

Arista 7300X specific output:

7300X(s1)#show module all
Module    Ports Card Type                            Model           Serial No.
--------- ----- ------------------------------------ --------------- -----------
2         3     Supervisor 7300X SSD                 7300-SUP-D      JPE13370363
3         64    48 port 10GbE SFP+ & 4 port QSFP+ LC 7300X-64S-LC    JPE13440152
4         128   32 port 40GbE QSFP+ LC               7300X-32Q-LC    JPE13440514
5         128   32 port 40GbE QSFP+ LC               7300X-32Q-LC    JPE13440499
Fabric1   0     7304X Fabric Module                  7304X-FM        JAS13480124
Fabric2   0     7304X Fabric Module                  7304X-FM        JAS13480036
Fabric3   0     7304X Fabric Module                  7304X-FM        JAS13480130
Fabric4   0     7304X Fabric Module                  7304X-FM        JAS13480119

Module    MAC addresses                          Hw      Sw      Status
--------- -------------------------------------- ------- ------- -------
2         00:1c:73:57:ee:58 - 00:1c:73:57:ee:59  01.01   4.13.6F Active
3         00:1c:73:5a:b9:a8 - 00:1c:73:5a:b9:db  03.08           Ok
4         00:1c:73:58:f6:e8 - 00:1c:73:58:f7:07  03.04           Ok
5         00:1c:73:58:e7:68 - 00:1c:73:58:e7:87  03.04           Ok
Fabric1                                          02.00           Ok
Fabric2                                          02.00           Ok
Fabric3                                          02.00           Ok
Fabric4                                          02.00           Ok

 

2.2.2) Redundant Supervisors (Modular switches: 7500E, 7300X)

 

Modular switches have redundant supervisors, which can work either in RRP or SSO mode.

  • RRP (Route Processor Redundancy) : configurations are synchronized between supervisors, but not the forwarding states.
  • SSO (Stateful SwitchOver): configurations and states are synchronised. See the manual for details on configuration and scope.

To verify the redundancy states of the supervisors (either active or standby) and their configuration, use  the following “show redundancy” commands:

modular(s2)#show redundancy states
my state = ACTIVE           ← Current state of the supervisor (currently looking at sup2/s2)
peer state = DISABLED       ← No other supervisor detected
     Unit = Secondary
  Unit ID = 2

Redundancy Protocol (Operational) = Simplex    ← Operational status of the redundancy synchronisation
Redundancy Protocol (Configured) = Route Processor Redundancy
Communications = Down
Not ready for switchover (Peer supervisor is powered off)

Last switchover time = 3 days, 19:08:40 ago    ← Last switchover event
Last switchover reason = Other supervisor stopped sending heartbeats

modular(s2)#show redundancy file-replication   ← Even in RRP mode, configs should be in sync
4 files unsynchronized, 0 files synchronized, 0 files failed, 4 files total.

File                         Status               Last Synchronized
---------------------------- -------------------- -----------------
flash:startup-config         Unsynchronized       Never
flash:zerotouch-config       Unsynchronized       Never
file:persist/sys             Unsynchronized       Never
file:persist/secure          Unsynchronized       Never

 

Note: the above reflect a system with a single supervisor card, the redundancy states are therefore normals

In the different example that follows, a dual-supervisor chassis illustrate correct synchronisation states

modular(s2)#show redundancy states
my state = ACTIVE
peer state = STANDBY WARM
     Unit = Secondary
  Unit ID = 2

Redundancy Protocol (Operational) = Route Processor Redundancy
Redundancy Protocol (Configured) = Stateful Switchover
Communications = Up
Ready for switchover

Last switchover time = 1 day, 2:55:13 ago
Last switchover reason = Other supervisor stopped sending heartbeats

modular(s2)#show redundancy file-replication
0 files unsynchronized, 3 files synchronized, 1 files failed, 4 files total.

File                         Status             Last Synchronized
---------------------------- ------------------ ------------------
flash:startup-config         Synchronized       1 day, 2:52:37 ago
flash:zerotouch-config       Synchronized       1 day, 2:52:36 ago
file:persist/sys             Synchronized       1 day, 2:52:37 ago
file:persist/secure          Failed             Never

 

The Arista 7508E supervisor modules can be monitored via ARISTA-REDUNDANCY-MIB.txt.  The ARISTA-REDUNDANCY-MIB module provides configuration and status information related to the high availability andor redundancy infrastructure on the Arista 7508E devices.

Please refer to the Arista Networks Manual for further information on the various Redundancy states of the supervisor modules.

ARISTA-REDUNDANCY-MIB::aristaRedundancyProtocolConfig.0 = INTEGER: sso(3)
ARISTA-REDUNDANCY-MIB::aristaRedundancyProtocolOper.0 = INTEGER: sso(3)
ARISTA-REDUNDANCY-MIB::aristaRedundancyUnitState[1] = INTEGER: active(2)
ARISTA-REDUNDANCY-MIB::aristaRedundancyUnitState[2] = INTEGER: standby(1)
ARISTA-REDUNDANCY-MIB::aristaRedundancyUnitStateEntryTime[1] = Timeticks: (0) 0:00:00.00
ARISTA-REDUNDANCY-MIB::aristaRedundancyUnitStateEntryTime[2] = Timeticks: (0) 0:00:00.00
ARISTA-REDUNDANCY-MIB::aristaRedundancyLastSwOverReason.0 = STRING: "Supervisor has control of the active supervisor lock"

 

2.2.3) CPU and Memory Monitoring

The Arista switches utilize multi-core CPUs, the status of which can be viewed quickly from the CLI:

switch#show processes top
top - 19:50:36 up 30 min,  1 user,  load average: 0.06, 0.32, 0.43
Tasks: 164 total,   1 running, 163 sleeping,   0 stopped,   0 zombie
Cpu(s): 16.9%us,  2.8%sy,  0.0%ni, 78.9%id,  1.2%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:   4037448k total,  1544396k used,  2493052k free,   126824k buffers
Swap:    0k total,    0k used,    0k free,   938660k cached
 PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+  COMMAND                      
1764 root  20   0  644m  85m  49m S  3.1  2.2   0:49.16 FocalPointV2                    
1461 root  20   0  241m  25m 2488 S  1.6  0.7   0:07.45 ProcMgr-worker               
1462 root  20   0  268m  96m  57m S  1.6  2.5   0:48.22 Sysdb                           
   1 root  20   0 23520  11m 9676 S  0.0  0.3   0:00.97 init                         
   2 root20   0    0   0    0 S  0.0  0.0   0:00.00 kthreadd   
[...]

 

The CPU utilization can be observed in a more granular fashion, per CPU. To access such view, press the number ‘1’ while observing the “show process top” output.  The new output would be as shown below, illustrating all the individual CPU cores.

 

Arista 7500E / 7300X specific output:

For example below, on a dual quad-core (Arista 7500E), details the 8 cores: CPU0, 1, etc, till CPU7:

top - 14:24:46 up 10 days,  4:33,  2 users,  load average: 1.20, 1.22, 1.21
Tasks: 316 total,   2 running, 314 sleeping,   0 stopped,   0 zombie
Cpu0  :  4.4%us,  1.5%sy,  0.0%ni, 90.0%id,  0.0%wa,  0.0%hi,  3.1%si,  0.0%st
Cpu1  :  2.3%us,  0.0%sy,  0.0%ni, 97.7%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu2  :  4.6%us,  0.0%sy,  0.0%ni, 95.4%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu3  :  0.8%us,  0.0%sy,  0.0%ni, 99.2%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu4  :  3.1%us,  0.8%sy,  0.0%ni, 96.1%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu5  :  5.0%us,  0.0%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu6  :  0.8%us,  0.0%sy,  0.0%ni, 99.2%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu7  :  0.8%us,  0.0%sy,  0.0%ni, 99.2%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:  16023176k total,  3983008k used, 12040168k free,   202772k buffers
Swap:        0k total,        0k used,        0k free,  1506516k cached

 PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND                                                              
2026 root      20   0  407m  72m  24m S  2.6  0.5 359:40.36 Stp
15006 root     20   0  497m  77m  43m S  3.9  0.5 446:08.96 SandCounters
2114 root      20   0  394m  59m  22m S  2.3  0.4 335:38.31 Smbus   
12702 brose    20   0  756m  13m  10m R  1.5  0.1   0:00.14 top  
14163 root     20   0  385m  47m  15m S  0.8  0.3 125:23.03 AgentMonitor  
14236 root     20   0  404m  71m  33m S  0.8  0.5  33:34.10 Lag+LacpAgent   
14594 root     20   0  389m  53m  19m S  0.8  0.3   5:47.21 GpioLedAgent
[...]

 

Arista 7150S specific output:

top - 13:19:20 up 15 days,  7:43,  1 user,  load average: 0.70, 0.61, 0.68
Tasks: 310 total,   3 running, 307 sleeping,   0 stopped,   0 zombie
Cpu0  :  3.4%us,  0.8%sy,  0.0%ni, 95.8%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu1  :  3.2%us,  0.8%sy,  0.0%ni, 96.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:   3624988k total,  3276320k used,   348668k free,   161684k buffers
Swap:        0k total,        0k used,        0k free,  1549560k cached

 PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
24035 root      20   0  752m 206m  87m S  6.2  5.8  85:06.55 SandFap
21420 root      20   0  639m 251m 113m R  4.6  7.1 120:40.77 Sysdb
24025 root      20   0  525m  71m  26m S  4.1  2.0  93:32.74 Smbus
24074 root      20   0  517m  76m  30m S  3.6  2.2  77:29.21 XcvrAgent
21965 root      20   0  516m  72m  25m S  3.1  2.0  66:12.18 Smbus
23864 root      20   0  630m  89m  42m S  3.1  2.5  77:28.69 SandCounters

 

To obtain more details on understanding and troubleshooting CPU utilization, please refer to the follow EOS Central articles:

https://eos.arista.com/understanding-cpu-utilization/

https://eos.arista.com/troubleshooting-high-cpu-utilization/

 

The CPU and Memory status can also be found using SNMP. Under the HOST-RESOURCES MIB, the dual-core CPU appears as three distinct processors, the first providing an average view of the behavior of the two physical cores that follow. The values are percentages expressed as integers:

switch#sh snmp mib walk 1.3.6.1.2.1.25

 

Arista 7150S specific output:

HOST-RESOURCES-MIB::hrDeviceDescr[1] = STRING: AMD Turion(tm) II Neo N41H Dual-Core Processor
[...]
HOST-RESOURCES-MIB::hrDeviceDescr[2] = STRING: Core 1
HOST-RESOURCES-MIB::hrDeviceDescr[3] = STRING: Core 2
HOST-RESOURCES-MIB::hrProcessorLoad[1] = INTEGER: 30   ← Average
HOST-RESOURCES-MIB::hrProcessorLoad[2] = INTEGER: 30
HOST-RESOURCES-MIB::hrProcessorLoad[3] = INTEGER: 30

 

Arista 7500E / 7300 specific output:

HOST-RESOURCES-MIB::hrDeviceDescr[1] = STRING: Intel(R) Xeon(R) CPU  @ 2.60GHz
HOST-RESOURCES-MIB::hrDeviceDescr[2] = STRING: Core 1
HOST-RESOURCES-MIB::hrDeviceDescr[3] = STRING: Core 2
HOST-RESOURCES-MIB::hrDeviceDescr[4] = STRING: Core 3
HOST-RESOURCES-MIB::hrDeviceDescr[5] = STRING: Core 4
[...]
HOST-RESOURCES-MIB::hrProcessorLoad[1] = INTEGER: 2   ← Average
HOST-RESOURCES-MIB::hrProcessorLoad[2] = INTEGER: 4
HOST-RESOURCES-MIB::hrProcessorLoad[3] = INTEGER: 4
HOST-RESOURCES-MIB::hrProcessorLoad[4] = INTEGER: 4
HOST-RESOURCES-MIB::hrProcessorLoad[5] = INTEGER: 4
HOST-RESOURCES-MIB::hrProcessorLoad[6] = INTEGER: 0
HOST-RESOURCES-MIB::hrProcessorLoad[7] = INTEGER: 0
HOST-RESOURCES-MIB::hrProcessorLoad[8] = INTEGER: 0
HOST-RESOURCES-MIB::hrProcessorLoad[9] = INTEGER: 0

 

Note: hrProcessorLoad % represents the average time the processor was not idle.  While the ‘load average’ seen in ‘show proc top’ calculates the average number of processes waiting to run over the last 1,5 and 15 minutes.

Memory utilization can be monitored using the following OIDs that provide the description, total amount of memory and its utilization, these items are common to all most Arista switches.

 

HOST-RESOURCES-MIB::hrStorageDescr[1] = STRING: RAM
HOST-RESOURCES-MIB::hrStorageSize[1] = INTEGER: 4037448
HOST-RESOURCES-MIB::hrStorageUsed[1] = INTEGER: 1543660

 

 

2.2.4) Environmental Monitoring

Each device is equipped with an array of sensors covering temperature monitoring, fan speed and power availability. The detailed information available through the CLI maps directly to a number of OIDs.

The “show environment all” command displays the environment value of the entire unit.

For example, a fully configured Arista 7508E chassis has got 2 Supervisors, 6 Fabric Modules, 4 Power Supplies and 6 Fan trays.

Arista 7500E specific output:

7500E(s1)#show environment all
System temperature status is: Ok

Supervisor 1:
                                                               Alert   Critical
Sensor  Description                            Temperature  Threshold  Threshold
------- ------------------------------------ ------------- ---------- ----------
1       Intake                                     16.250C        50C        65C
2       CPU                                        30.500C        60C        75C
3       Exhaust                                    16.750C        60C        75C
4       CPU VRM temp sensor                        43.000C       105C       110C
[...]

Linecard 3:   ← Note: First line card slots (e.g 1st of 8 starts is in slot 3. The 8th and last line card would be in slot 10)
                                                               Alert   Critical
Sensor  Description                            Temperature  Threshold  Threshold
------- ------------------------------------ ------------- ---------- ----------
1       Board sensor                               33.000C        75C        85C
2       Switch chip 1 sensor                       33.000C        93C        98C
3       Switch chip 2 sensor                       33.000C        93C        98C
4       Switch chip 3 sensor                       34.000C        93C        98C
5       Inlet sensor                               18.000C        60C        70C
6       Board sensor                               28.000C        75C        85C
7       Outlet sensor                              24.000C        75C        85C
[...]

Fabric 1:
                                                               Alert   Critical
Sensor  Description                            Temperature  Threshold  Threshold
------- ------------------------------------ ------------- ---------- ----------
1       Outlet sensor                              36.000C        95C       105C
2       Fan controller 1 sensor                    38.000C        95C       105C
3       Fabric chip 1 sensor                       42.000C        95C       105C
[...]   ← continues till fabric 6

PowerSupply 1:
                                                               Alert   Critical
Sensor  Description                            Temperature  Threshold  Threshold
------- ------------------------------------ ------------- ---------- ----------
1       Power supply sensor                            N/A        65C        70C
2       Power supply sensor                        23.125C        65C        70C
[...]   ← continues till PSU 4

System cooling status is: Ok
Ambient temperature: 16C
Airflow: front-to-back
Fan Tray         Status           Speed
---------------- --------------- ------
1                Ok                 46%
2                Ok                 46%
3                Ok                 46%
4                Ok                 46%
5                Ok                 46%
6                Ok                 46%
PowerSupply1     Ok                 70%
PowerSupply2     Ok                 70%
PowerSupply3     Ok                 70%
PowerSupply4     Ok                 70%

Power                                  Input    Output   Output
Supply  Model                Capacity  Current  Current  Power    Status
------- -------------------- --------- -------- -------- -------- -------------
1       PWR-2900AC               2900W    0.00A    0.00A     0.0W Power Loss
2       PWR-2900AC               2900W    0.00A    0.00A     0.0W Power Loss
3       PWR-2900AC               2900W    0.00A    0.00A     0.0W Power Loss
4       PWR-2900AC               2900W    6.12A  116.14A  1384.0W Ok

Arista 7300X specific output:

7300X(s1)#show environment all
System temperature status is: Ok

Supervisor 2:
                                                               Alert   Critical
Sensor  Description                            Temperature  Threshold  Threshold
------- ------------------------------------ ------------- ---------- ----------
1       Digital Temperature Sensor on cpu0         42.000C        95C       105C
2       Digital Temperature Sensor on cpu1         35.000C        95C       105C
3       Digital Temperature Sensor on cpu2         31.000C        95C       105C
4       Digital Temperature Sensor on cpu3         35.000C        95C       105C
5       Supervisor temp sensor                     34.000C        75C        85C
6       Plx0 sensor                                55.000C       100C       103C
7       Plx1 sensor                                53.000C       100C       103C
8       Rear sensor                                23.000C        65C        75C
9       Front sensor                               19.000C        65C        75C
10      CPU VRM temp sensor 0                      26.000C       105C       110C
11      CPU VRM temp sensor 1                      26.000C       105C       110C

Linecard 3:     ← Note: First line card slots (e.g 1st of 8 starts is in slot 3. The 8th and last line card would be in slot 10)
                                                               Alert   Critical
Sensor  Description                            Temperature  Threshold  Threshold
------- ------------------------------------ ------------- ---------- ----------
1       Board sensor                               29.000C        75C        85C
2       Board Front                                21.000C        65C        75C
3       Board Back                                 28.000C        65C        75C
4       Pcie Switch                                40.000C       100C       110C
5       T2 PCB                                     34.000C        95C       105C
6       Trident Bottom Right Outer                 42.253C       100C       110C
7       Trident Bottom Left Outer                  42.253C       100C       110C
8       Trident Top Left Outer                     42.253C       100C       110C
9       Trident Top Right Outer                    44.965C       100C       110C
10      Trident Bottom Right Inner                 41.168C       100C       110C
11      Trident Bottom Left Inner                  42.253C       100C       110C
12      Trident Top Left Inner                     40.626C       100C       110C
13      Trident Top Right Inner                    40.626C       100C       110C
[...]    ← continues for other line cards


Fabric 1:
                                                               Alert   Critical
Sensor  Description                            Temperature  Threshold  Threshold
------- ------------------------------------ ------------- ---------- ----------
1       Board sensor                               39.000C        70C        75C
2       Board Back                                 37.000C        65C        75C
3       Board Front LC6                            30.000C        65C        75C
4       Fabric chip 1 temp sensor                  38.000C        95C       105C
5       Trident Bottom Right Outer                 55.270C       100C       110C
6       Trident Bottom Left Outer                  54.186C       100C       110C
7       Trident Top Left Outer                     57.440C       100C       110C
8       Trident Top Right Outer                    57.440C       100C       110C
9       Trident Bottom Right Inner                 55.270C       100C       110C
10      Trident Bottom Left Inner                  56.355C       100C       110C
11      Trident Top Left Inner                     57.440C       100C       110C
12      Trident Top Right Inner                    54.186C       100C       110C
[...]    ← continues till fabric 4

PowerSupply 1:
                                                               Alert   Critical
Sensor  Description                            Temperature  Threshold  Threshold
------- ------------------------------------ ------------- ---------- ----------
1       Primary heatsink sensor                    29.750C        45C        50C
2       Fan air temperature sensor                 23.500C        45C        50C
[...]    ← continues till PSU 4

System cooling status is: Ok
Ambient temperature: 19C
Airflow: front-to-back
Fan Tray         Status           Speed
---------------- --------------- ------
1/1              Ok                 30%
1/2              Ok                 30%
2/1              Ok                 30%
2/2              Ok                 30%
3/1              Ok                 30%
3/2              Ok                 30%
4/1              Ok                 30%
4/2              Ok                 30%
PowerSupply1     Ok                 30%
PowerSupply2     Not Inserted       N/A
PowerSupply3     Not Inserted       N/A
PowerSupply4     Ok                 30%

Power                                  Input    Output   Output
Supply  Model                Capacity  Current  Current  Power    Status
------- -------------------- --------- -------- -------- -------- -------------
1       PWR-2700-AC-F            2700W    1.63A   28.16A   338.5W Ok
4       PWR-2700-AC-F            2700W    1.77A   30.12A   362.5W Ok

 

Arista 1RU (e.g. 7150S) specific output:

switch#show environment all
System temperature status is: Ok
                                                                Alert   Critical
Sensor  Description                        Temperature  Threshold  Threshold
------- ------------------------------------ ------------- ---------- ----------
1   Cpu temp sensor                        30.435C    95C   100C
2   Rear temp sensor                       31.500C    55C         65C
3   Board temp sensor                      21.000C    55C    65C
4   Front-panel temp sensor                20.000C    42C    55C
5   Board temp sensor                      30.000C    75C    85C
6     FM6000 temp sensor                     38.000C    92C   100C
PowerSupply 1:
                                                                Alert   Critical
Sensor  Description                        Temperature  Threshold  Threshold
------- ------------------------------------ ------------- ---------- ----------
1   Power supply sensor                    26.000C    50C    70C
System cooling status is: Ok
Ambient temperature: 20C
Airflow: front-to-back
Fan Tray     Status       Speed
---------------- --------------- ------
1            Ok             60%
2            Ok             60%
3            Ok             60%
4            Ok             60%
PowerSupply1 Ok                60%
Power                              InputOutput   Output            
Supply  Model            Capacity  Current  Current  PowerStatus      
------- -------------------- --------- -------- -------- -------- -------------
1   PWR-460AC-F           460W0.47A    8.00A97.0W Ok

 

Note: If the temperature reaches the Alert threshold, all fans run at maximum speed and a warning message is logged.  If the temperature reaches the critical threshold the component is immediately shut down with the status LED flashing orange, in order to prevent damage.

The following ENTITY-MIB OIDs provide temperature monitoring relating to the sensors as listed. Each integer output is degrees Celsius x 10:

 

Arista 7500E specific output:

Note:  The 7500E SNMP index values (e.g. 101006001) represents the line card temperature sensors of the switch.  The third number from the left (e.g. 101006001) is actual slot number in the chassis. It means that the line card in slot 3 will have an SNMP index value of 103006001 and onwards for temperature sensors.

 

Below are the temperature sensors for the Arista 7508E / 7300X supervisors module:

ENTITY-MIB::entPhysicalDescr[101006001] = STRING: Digital Temperature Sensor on cpu0
ENTITY-MIB::entPhysicalDescr[101006002] = STRING: Digital Temperature Sensor on cpu1
ENTITY-MIB::entPhysicalDescr[101006003] = STRING: Digital Temperature Sensor on cpu2
ENTITY-MIB::entPhysicalDescr[101006004] = STRING: Digital Temperature Sensor on cpu3
ENTITY-MIB::entPhysicalDescr[101006005] = STRING: Supervisor temp sensor
ENTITY-MIB::entPhysicalDescr[101006006] = STRING: PlxLc sensor
ENTITY-MIB::entPhysicalDescr[101006007] = STRING: PlxFc sensor
ENTITY-MIB::entPhysicalDescr[101006008] = STRING: Rear sensor
ENTITY-MIB::entPhysicalDescr[101006009] = STRING: Front sensor
ENTITY-MIB::entPhysicalDescr[101006010] = STRING: CPU VRM temp sensor 0
[...]     ← etc. (omitted: Power supplies 2 to 4)

 

Below are the temperature values of the aforementioned sensors, for the Arista 7508E / 7300X supervisors module. Each integer output is degrees Celsius x 10.

ENTITY-SENSOR-MIB::entPhySensorValue[101006001] = INTEGER: 330
ENTITY-SENSOR-MIB::entPhySensorValue[101006002] = INTEGER: 320
ENTITY-SENSOR-MIB::entPhySensorValue[101006003] = INTEGER: 270
ENTITY-SENSOR-MIB::entPhySensorValue[101006004] = INTEGER: 350
ENTITY-SENSOR-MIB::entPhySensorValue[101006005] = INTEGER: 280
[...]

 

The Arista 7500E line cards have multiple sensors each, which can be monitored via the ENTITY-SENSOR-MIB as shown below. Whilst there can be 4 or 8 line cards, depending on the 7500E model, the below example depicts a single line card output (in slot 3), the model DCS-7500E-36Q-LC (36x40G line card). Some other line cards have different number of chips and sensors.

ENTITY-MIB::entPhysicalDescr[103006001] = STRING: Inlet sensor
ENTITY-MIB::entPhysicalDescr[103006002] = STRING: Board sensor
ENTITY-MIB::entPhysicalDescr[103006003] = STRING: Outlet sensor
ENTITY-MIB::entPhysicalDescr[103006004] = STRING: Board sensor
ENTITY-MIB::entPhysicalDescr[103006005] = STRING: Switch chip 1 sensor
ENTITY-MIB::entPhysicalDescr[103006006] = STRING: Switch chip 2 sensor
[...]

 

Each integer output is degrees Celsius x 10:

ENTITY-SENSOR-MIB::entPhySensorValue[103006001] = INTEGER: 288
ENTITY-SENSOR-MIB::entPhySensorValue[103006002] = INTEGER: 590
ENTITY-SENSOR-MIB::entPhySensorValue[103006003] = INTEGER: 353
ENTITY-SENSOR-MIB::entPhySensorValue[103006004] = INTEGER: 570
[...]

 

Below is an output of multiple line cards, the Arista DCS-7300X-32-LC (32x40G line card).

ENTITY-MIB::entPhysicalDescr[103006001] = STRING: Board sensor
ENTITY-MIB::entPhysicalDescr[103006002] = STRING: Board Front
ENTITY-MIB::entPhysicalDescr[103006003] = STRING: Board Back
ENTITY-MIB::entPhysicalDescr[103006004] = STRING: Pcie Switch
ENTITY-MIB::entPhysicalDescr[103006005] = STRING: Trident Bottom Right Outer
ENTITY-MIB::entPhysicalDescr[103006006] = STRING: Trident Bottom Left Outer
ENTITY-MIB::entPhysicalDescr[103006007] = STRING: Trident Top Left Outer
ENTITY-MIB::entPhysicalDescr[103006008] = STRING: Trident Top Right Outer
ENTITY-MIB::entPhysicalDescr[103006009] = STRING: Trident Bottom Right Inner
[...]
Each integer output is degrees Celsius x 10.
ENTITY-SENSOR-MIB::entPhySensorValue[103006001] = INTEGER: 288
ENTITY-SENSOR-MIB::entPhySensorValue[103006002] = INTEGER: 590
ENTITY-SENSOR-MIB::entPhySensorValue[103006003] = INTEGER: 353
ENTITY-SENSOR-MIB::entPhySensorValue[103006004] = INTEGER: 570
[...]

 

The 7500E has six fabric modules that include the chassis’ fans. The MIB therefore represent also six fan trays. Fan-speed is measured in RPM, which is reflected in the CLI as a percentage of the maximum nominal speed of 27000 RPMs:

ENTITY-MIB::entPhysicalDescr[100601000] = STRING: Fan Tray Slot 1
ENTITY-MIB::entPhysicalDescr[100601100] = STRING: Fan Tray 1
ENTITY-MIB::entPhysicalDescr[100601110] = STRING: Fan Tray 1 Fan 1
ENTITY-MIB::entPhysicalDescr[100601111] = STRING: Fan Tray 1 Fan 1 Sensor 1
ENTITY-MIB::entPhysicalDescr[100601120] = STRING: Fan Tray 1 Fan 2
ENTITY-MIB::entPhysicalDescr[100601122] = STRING: Fan Tray 1 Fan 2 Sensor 2
[...]     ← etc. (omitted: some fans in Tray 1 and Fan Trays 2 to 6)
ENTITY-SENSOR-MIB::entPhySensorValue[100601111] = INTEGER: 6815
ENTITY-SENSOR-MIB::entPhySensorValue[100601122] = INTEGER: 6960
ENTITY-SENSOR-MIB::entPhySensorValue[100601133] = INTEGER: 6815
ENTITY-SENSOR-MIB::entPhySensorValue[100601144] = INTEGER: 6815
ENTITY-SENSOR-MIB::entPhySensorValue[100601155] = INTEGER: 6960
[...]     ← etc. (omitted: Fan Trays 2 to 6)

 

The Arista 7500E chassis has got 4 power supplies. Each power supply includes a fan that can be monitored via SNMP, as well as other power supply metrics:

ENTITY-MIB::entPhysicalDescr[100710000] = STRING: Power Supply Slot 1
ENTITY-MIB::entPhysicalDescr[100711000] = STRING: PowerSupply1
ENTITY-MIB::entPhysicalDescr[100711101] = STRING: Power supply sensor
ENTITY-MIB::entPhysicalDescr[100711102] = STRING: Power supply sensor
ENTITY-MIB::entPhysicalDescr[100711103] = STRING: PowerSupply1 input current sensor
ENTITY-MIB::entPhysicalDescr[100711104] = STRING: PowerSupply1 output current sensor
ENTITY-MIB::entPhysicalDescr[100711105] = STRING: PowerSupply1 input voltage sensor
ENTITY-MIB::entPhysicalDescr[100711106] = STRING: PowerSupply1 output voltage sensor
ENTITY-MIB::entPhysicalDescr[100711210] = STRING: PowerSupply1 Fan 1
ENTITY-MIB::entPhysicalDescr[100711211] = STRING: PowerSupply1 Fan 1 Sensor 1
[...]     ← etc. (omitted: Power supplies 2 to 4)
ENTITY-SENSOR-MIB::entPhySensorValue[100711101] = INTEGER: 300
ENTITY-SENSOR-MIB::entPhySensorValue[100711102] = INTEGER: 249
ENTITY-SENSOR-MIB::entPhySensorValue[100711103] = INTEGER: 222
ENTITY-SENSOR-MIB::entPhySensorValue[100711104] = INTEGER: 4020
ENTITY-SENSOR-MIB::entPhySensorValue[100711105] = INTEGER: 24550
ENTITY-SENSOR-MIB::entPhySensorValue[100711106] = INTEGER: 1193
ENTITY-SENSOR-MIB::entPhySensorValue[100711211] = INTEGER: 8546
[...]     ← etc. (omitted: Power supplies 2 to 4)

 

The Arista 7304 chassis has 4 power supplies and the Arista 7308 chassis has 6 power supplies. Each power supply includes a fan which can be monitored via SNMP, as well as other power supply metrics:

ENTITY-MIB::entPhysicalDescr[100710000] = STRING: Power Supply Slot 1
ENTITY-MIB::entPhysicalDescr[100711000] = STRING: PowerSupply1
ENTITY-MIB::entPhysicalDescr[100711101] = STRING: Primary heatsink sensor
ENTITY-MIB::entPhysicalDescr[100711102] = STRING: Fan air temperature sensor
ENTITY-MIB::entPhysicalDescr[100711103] = STRING: PowerSupply1 input current sensor
ENTITY-MIB::entPhysicalDescr[100711104] = STRING: PowerSupply1 output current sensor
ENTITY-MIB::entPhysicalDescr[100711105] = STRING: PowerSupply1 input voltage sensor
ENTITY-MIB::entPhysicalDescr[100711106] = STRING: PowerSupply1 output voltage sensor
ENTITY-MIB::entPhysicalDescr[100711210] = STRING: PowerSupply1 Fan 1
ENTITY-MIB::entPhysicalDescr[100711211] = STRING: PowerSupply1 Fan 1 Sensor 1
[...]     ← etc. (omitted: Power supplies 2 to 4)
ENTITY-SENSOR-MIB::entPhySensorType[100711101] = INTEGER: celsius(8)
ENTITY-SENSOR-MIB::entPhySensorType[100711102] = INTEGER: celsius(8)
ENTITY-SENSOR-MIB::entPhySensorType[100711103] = INTEGER: amperes(5)
ENTITY-SENSOR-MIB::entPhySensorType[100711104] = INTEGER: amperes(5)
ENTITY-SENSOR-MIB::entPhySensorType[100711105] = INTEGER: voltsAC(3)
ENTITY-SENSOR-MIB::entPhySensorType[100711106] = INTEGER: voltsAC(3)
ENTITY-SENSOR-MIB::entPhySensorType[100711211] = INTEGER: rpm(10)
[...]     ← etc. (omitted: Power supplies 2 to 4)

ENTITY-SENSOR-MIB::entPhySensorValue[100711101] = INTEGER: 312
ENTITY-SENSOR-MIB::entPhySensorValue[100711102] = INTEGER: 258
ENTITY-SENSOR-MIB::entPhySensorValue[100711103] = INTEGER: 264
ENTITY-SENSOR-MIB::entPhySensorValue[100711104] = INTEGER: 4750
ENTITY-SENSOR-MIB::entPhySensorValue[100711105] = INTEGER: 23750
ENTITY-SENSOR-MIB::entPhySensorValue[100711106] = INTEGER: 1206
ENTITY-SENSOR-MIB::entPhySensorValue[100711211] = INTEGER: 8500
[...]     ← etc. (omitted: Power supplies 2 to 4)

 

Arista 1RU (e.g. 7150S) specific output:

ENTITY-MIB::entPhysicalDescr[100004002] = STRING: Scd Chip 2
ENTITY-MIB::entPhysicalDescr[100006001] = STRING: Cpu temp sensor
ENTITY-MIB::entPhysicalDescr[100006002] = STRING: Rear temp sensor
ENTITY-MIB::entPhysicalDescr[100006003] = STRING: Board temp sensor
ENTITY-MIB::entPhysicalDescr[100006004] = STRING: Front-panel temp sensor
ENTITY-MIB::entPhysicalDescr[100006005] = STRING: Board temp sensor
ENTITY-MIB::entPhysicalDescr[100006006] = STRING: FM6000 temp sensor
ENTITY-SENSOR-MIB::entPhySensorValue[100006001] = INTEGER: 320
ENTITY-SENSOR-MIB::entPhySensorValue[100006002] = INTEGER: 315
ENTITY-SENSOR-MIB::entPhySensorValue[100006003] = INTEGER: 210
ENTITY-SENSOR-MIB::entPhySensorValue[100006004] = INTEGER: 200
ENTITY-SENSOR-MIB::entPhySensorValue[100006005] = INTEGER: 300
ENTITY-SENSOR-MIB::entPhySensorValue[100006006] = INTEGER: 380

 

Fan-speed is measured in RPM that is reflected in the CLI as a percentage of the maximum nominal speed of 27000rpm:

ENTITY-MIB::entPhysicalDescr[100601110] = STRING: Fan Tray 1 Fan 1
ENTITY-MIB::entPhysicalDescr[100602110] = STRING: Fan Tray 2 Fan 1
ENTITY-MIB::entPhysicalDescr[100603110] = STRING: Fan Tray 3 Fan 1
ENTITY-MIB::entPhysicalDescr[100604110] = STRING: Fan Tray 4 Fan 1
ENTITY-SENSOR-MIB::entPhySensorValue[100601111] = INTEGER: 10800
ENTITY-SENSOR-MIB::entPhySensorValue[100602111] = INTEGER: 10800
ENTITY-SENSOR-MIB::entPhySensorValue[100603111] = INTEGER: 10980
ENTITY-SENSOR-MIB::entPhySensorValue[100604111] = INTEGER: 10800

 

 

2.3) Interface Statistics

Arista EOS exposes interface statistics either through CLI outputs or via SNMP MIBs.

 

2.3.1) CLI / eAPI counters

EOS provides a lot of interfaces counters. These counters are available for Ethernet, Port-Channel and Management interfaces. Counters can be displayed per individual interface, range of interfaces or all interfaces. Using the CLI, the following counters are available:

  • Unicast, Multicast, Broadcast packets & bytes (inbound and outbound)
  • Packet counters categorized by packet length
  • Error and discard counters
  • Rates in Mb/sec or kpacket/sec
  • Queue drop counters

 

The following CLI example shows the interface counters for all interfaces that are non-zero:

switch#show interface counters | nz
Port                OutOctets    OutUcastPkts    OutMcastPkts    OutBcastPkts
Et1                  26934962          205855           54306           33499
Et2                  16052597              29          119697           31199
Et3                  33024926              24          264269            4348
Et4                  12759018              97          106229              14
Ma1                   5151068           17269               0               0
Po5                  16049838              29          119671           31198
Po6                  27986105              24          225414            4342

 

Another example uses the ‘watch’ command to automatically update the screen every second and highlights the changes.

switch(config)#watch 1 diffshow interface counters | nz

 

2.3.2) SNMP Interface Statistics

A selection of standard MIBs cater for a multitude of interface stats counters covering throughput, packet size and error statistics. Using the integrated MIB browsing capability it is possible to select appropriate counters from MIBs such as:

  • EtherLike-MIB
  • IF-MIB
  • RMON-MIB

Understanding port numbering was already detailed in a previous section named “understanding port numbering”. Understanding this scheme allow to correlate the port number with the fixed MIB interface index for that interface.

You can find an interface MIB index with the following command:

switch#show snmp mib ifmib ifindex ethernet 1/1
Ethernet1/1: Ifindex = 1001

switch#show snmp mib walk IF-MIB::ifHCOutOctets | grep 1001
IF-MIB::ifHCOutOctets[1001] = Counter64: 3739492

 

This example gives the IfIndex for Ethernet 1/1. This information can then be used to get statistics of only Ethernet 1/1. Note that it relates directly to the A/B/C port numbering scheme previously explained, you can easily and  predictably correlate the ifindex with the port number itself. Ifindex is fixed.

Examples of MIB interfaces names and related counters on a modular chassis:

IF-MIB::ifName[3013] = STRING: Ethernet3/1/1
IF-MIB::ifName[3014] = STRING: Ethernet3/1/2
IF-MIB::ifName[3015] = STRING: Ethernet3/1/3
IF-MIB::ifName[3016] = STRING: Ethernet3/1/4
IF-MIB::ifName[3025] = STRING: Ethernet3/2/1
IF-MIB::ifHCOutOctets[3193] = Counter64: 941349881326
IF-MIB::ifHCOutOctets[3194] = Counter64: 0
IF-MIB::ifHCOutOctets[3195] = Counter64: 0
IF-MIB::ifHCOutOctets[3196] = Counter64: 0

 

Some Arista switches can provide other interface statistics of interest, such as the port queue depth statistics, via the ARISTA-QUEUE-MIB.  Those counter generalizes ingress and egress queue counters.

ARISTA-QUEUE-MIB::aristaIngressQueuePktsDropped[3013][0] = Counter64: 0
ARISTA-QUEUE-MIB::aristaIngressQueuePktsDropped[3014][0] = Counter64: 0
ARISTA-QUEUE-MIB::aristaIngressQueuePktsDropped[3015][0] = Counter64: 0
ARISTA-QUEUE-MIB::aristaIngressQueuePktsDropped[3016][0] = Counter64: 0
ARISTA-QUEUE-MIB::aristaIngressQueuePktsDropped[3025][0] = Counter64: 0
ARISTA-QUEUE-MIB::aristaIngressQueuePktsDropped[3026][0] = Counter64: 0
[...]

 

2.4) Arista MIBs

Arista provide additional MIBs. Refer to the manual at the section “SNMP MIB Support” for a full support matrix.

 

2.4.1) ARISTA-SW-IP-FORWARDING-MIB

The ARISTA-SW-IP-FORWARDING-MIB table augments the ipIfTableStats with system-wide IP version specific traffic statistics. This table and the ipIfStatsTable contain similar objects whose difference is in their granularity. Where this table contains system wide traffic statistics, the IPIfTableStats contains the same statistics but counted on a per-interface basis.

The following example is an extract:

ARISTA-SW-IP-FORWARDING-MIB::aristaSwFwdIpStatsInReceives[ipv4] = Counter32: 1909975
ARISTA-SW-IP-FORWARDING-MIB::aristaSwFwdIpStatsInReceives[ipv6] = Counter32: 19714
ARISTA-SW-IP-FORWARDING-MIB::aristaSwFwdIpStatsHCInReceives[ipv4] = Counter64: 1909975
ARISTA-SW-IP-FORWARDING-MIB::aristaSwFwdIpStatsHCInReceives[ipv6] = Counter64: 19714
ARISTA-SW-IP-FORWARDING-MIB::aristaSwFwdIpStatsInOctets[ipv4] = Counter32: 0
ARISTA-SW-IP-FORWARDING-MIB::aristaSwFwdIpStatsInOctets[ipv6] = Counter32: 2252170
ARISTA-SW-IP-FORWARDING-MIB::aristaSwFwdIpStatsInForwDatagrams[ipv4] = Counter32: 271
ARISTA-SW-IP-FORWARDING-MIB::aristaSwFwdIpStatsInForwDatagrams[ipv6] = Counter32: 30
ARISTA-SW-IP-FORWARDING-MIB::aristaSwFwdIpStatsInDiscards[ipv6] = Counter32: 9310
ARISTA-SW-IP-FORWARDING-MIB::aristaSwFwdIpStatsInDelivers[ipv4] = Counter32: 1875493
ARISTA-SW-IP-FORWARDING-MIB::aristaSwFwdIpStatsInDelivers[ipv6] = Counter32: 10359
ARISTA-SW-IP-FORWARDING-MIB::aristaSwFwdIpStatsHCInDelivers[ipv4] = Counter64: 1875493
ARISTA-SW-IP-FORWARDING-MIB::aristaSwFwdIpStatsHCInDelivers[ipv6] = Counter64: 10359
ARISTA-SW-IP-FORWARDING-MIB::aristaSwFwdIpStatsOutRequests[ipv4] = Counter32: 1807363
ARISTA-SW-IP-FORWARDING-MIB::aristaSwFwdIpStatsOutRequests[ipv6] = Counter32: 9132
[...]

 

2.4.2) ARISTA-ENTITY-SENSOR-MIB

This MIB module augments the entPhySensorTable of ENTITY-SENSOR-MIB to provide threshold information for various sensors in the system. For example, a given device may have several voltage sensors as well as temperature sensors each with appropriate threshold support to help NMS systems detect and alert appropriately.

In addition if the sensor value crosses the supported threshold value the system can generate appropriate notification as well.

Example:

ARISTA-ENTITY-SENSOR-MIB::aristaEntSensorThresholdHighWarning[102006005] = INTEGER: 750
ARISTA-ENTITY-SENSOR-MIB::aristaEntSensorThresholdHighWarning[102006006] = INTEGER: 950
ARISTA-ENTITY-SENSOR-MIB::aristaEntSensorThresholdHighWarning[102006007] = INTEGER: 950
ARISTA-ENTITY-SENSOR-MIB::aristaEntSensorThresholdHighWarning[102006008] = INTEGER: 650

ARISTA-ENTITY-SENSOR-MIB::aristaEntSensorStatusDescr[102006005] = STRING: Sensor value 280 is within bounds
ARISTA-ENTITY-SENSOR-MIB::aristaEntSensorStatusDescr[102006006] = STRING: Sensor value 490 is within bounds
ARISTA-ENTITY-SENSOR-MIB::aristaEntSensorStatusDescr[102006007] = STRING: Sensor value 510 is within bounds
ARISTA-ENTITY-SENSOR-MIB::aristaEntSensorStatusDescr[102006008] = STRING: Sensor value 230 is within bounds

 

2.4.3) ARISTA-BRIDGE-EXT-MIB

A table that contains host move information about unicast entries for which the device has forwarding information:

ARISTA-BRIDGE-EXT-MIB::aristaDot1qTpFdbNumMoves[0:00:00.00][1101][STRING: 0:1c:73:33:d9:70] = Counter32: 1
ARISTA-BRIDGE-EXT-MIB::aristaDot1qTpFdbNumMoves[0:00:00.00][1201][STRING: 4c:96:14:ef:25:93] = Counter32: 1
ARISTA-BRIDGE-EXT-MIB::aristaDot1qTpFdbNumMoves[0:00:00.00][1202][STRING: 4c:96:14:ef:42:73] = Counter32: 1
ARISTA-BRIDGE-EXT-MIB::aristaDot1qTpFdbNumMoves[0:00:00.00][1203][STRING: 0:1c:73:33:35:70] = Counter32: 1
ARISTA-BRIDGE-EXT-MIB::aristaDot1qTpFdbNumMoves[0:00:00.00][2101][STRING: 0:1c:73:33:d9:70] = Counter32: 1
ARISTA-BRIDGE-EXT-MIB::aristaDot1qTpFdbNumMoves[0:00:00.00][2201][STRING: 4c:96:14:ef:25:93] = Counter32: 1

 

2.4.4) ARISTA-CONFIG-MAN-MIB

This MIB provides notification in case of configuration events. aristaConfigManEvent would provide information about command source, config source, config destination,  config source URL, (for instance flash, http, ftp and so on) and config destination URL.

ARISTA-CONFIG-MAN-MIB::aristaCmdHistoryRunningLastChanged.0 = Timeticks: (3163836) 8:47:18.36
ARISTA-CONFIG-MAN-MIB::aristaCmdHistoryEventTime[0] = Timeticks: (1013) 0:00:10.13
ARISTA-CONFIG-MAN-MIB::aristaCmdHistoryEventTime[1] = Timeticks: (3163836) 8:47:18.36
ARISTA-CONFIG-MAN-MIB::aristaCmdHistoryEventCommandSource[0] = INTEGER: commandLine(0)
ARISTA-CONFIG-MAN-MIB::aristaCmdHistoryEventCommandSource[1] = INTEGER: commandLine(0)
ARISTA-CONFIG-MAN-MIB::aristaCmdHistoryEventConfigSource[0] = INTEGER: commandSource(2)
ARISTA-CONFIG-MAN-MIB::aristaCmdHistoryEventConfigSource[1] = INTEGER: commandSource(2)
ARISTA-CONFIG-MAN-MIB::aristaCmdHistoryEventConfigDestination[0] = INTEGER: running(3)
ARISTA-CONFIG-MAN-MIB::aristaCmdHistoryEventConfigDestination[1] = INTEGER: running(3)
ARISTA-CONFIG-MAN-MIB::aristaCmdHistoryEventConfigSourceURLScheme[0] = ""
ARISTA-CONFIG-MAN-MIB::aristaCmdHistoryEventConfigSourceURLScheme[1] = ""
ARISTA-CONFIG-MAN-MIB::aristaCmdHistoryEventConfigDestURLScheme[0] = ""
ARISTA-CONFIG-MAN-MIB::aristaCmdHistoryEventConfigDestURLScheme[1] = ""

 

 

2.5) System and Process Logging

 

The current system logging can be viewed using the ‘show logging’ command:

switch#show logging
Mar 18 11:11:13 s7151 Fru: %FRU-6-FAN_INSERTED: Fan tray 3 has been inserted
Mar 18 11:11:13 s7151 Fru: %FRU-6-FAN_INSERTED: Fan tray 2 has been inserted
Mar 18 11:11:13 s7151 Fru: %FRU-6-FAN_INSERTED: Fan tray 1 has been inserted
Mar 18 11:11:13 s7151 Fru: %FRU-6-FAN_INSERTED: Fan tray 4 has been inserted
Mar 18 11:13:01 s7151 SuperServer: %SYS-5-SYSTEM_RESTARTED: System restarted
Mar 18 11:11:15 s7151 Lldp: %LLDP-5-NEIGHBOR_NEW: LLDP neighbor with chassisId 28c6.8eff.4e5d and portId "g37" added on interface Management1
Mar 18 11:13:01 s7151 SuperServer: %SYS-5-SYSTEM_RESTARTED: System restarted

 

The logging output can become rather substantial in size, to aid in the analysis of the logging the command also permits various filtering options:

switch#show logging ?
alerts         Immediate action needed
all            Show all the lines in the logging buffer
critical       Critical conditions
debugging      Debugging messages
emergencies    System is unusable
errors         Error conditions
follow         Keep following the log buffer as it grows
informational  Informational messages
last           Show messages in last <N> time-units
mce            Show the contents of the mcelog buffer
notifications  Normal but significant conditions
system         Show the contents of the system log buffer
threshold      Show only log messages at threshold level or above
time-range     Filter logs by begin and end time
warnings       Warning conditions
<1-9999>       Show last number of messages in the logging buffers

 

The following introduces how to collect deeper system logs through Bash.

In addition to the EOS log provided by the ‘show logging’ CLI command, EOS keeps detailed system-wide logs and individual agent process logs for multiple agent instances (due to reconfiguration or in-service stateful repair). These logs can be found in the underlying Linux shell as follows:

switch#bash sudo tail /var/log/messages        ← Arista EOS logs
Mar 18 11:11:24 s7151 Stp: %SPANTREE-6-ROOTCHANGE: Root changed for instance MST0: new root interface is (none), new root bridge mac address is 00:1c:73:00:44:d6 (this switch)
Mar 18 11:11:27 s7151 snmpd[2859]: AgentX subagent AGENTX Ribd version ribd-2.0.2, built Mon Feb 10 01:48:41 PST 2014 established session 0xf8ce73a8/0xf8cef3e0/11
Mar 18 11:12:01 s7151 Xmpp: %XMPP-6-CLIENT_CONNECTED: Connected to 192.168.1.220:5222 with JID s7151@pcknapweed.lab.local/33851489411395138547694133
Mar 18 11:13:01 s7151 SuperServer: %SYS-5-SYSTEM_RESTARTED: System restarted
Mar 18 11:15:01 s7151 CROND[3842]: (root) CMD (/etc/cron.hourly/logrotate)

 

As per the above, Bash shell commands may be executed directly from the CLI or alternatively a shell may be launched providing full access to familiar Linux tool sets for managing files, including the various log files.

switch#bash
Arista Networks EOS shell
[admin@Arista ~]$ cd /var/log                  ← System (kernel) logs
[admin@Arista log]$ sudo cat messages | grep Rib
Mar 18 11:10:27 localhost Launcher: %LAUNCHER-6-PROCESS_START: Configuring process 'Rib' to start in role 'AllSupervisors'
Mar 18 11:10:31 localhost ProcMgr-worker: %PROCMGR-6-PROCESS_STARTED: 'Rib' starting with PID=2256 (PPID=1715) -- execing '/usr/bin/Rib'
Mar 18 11:10:47 localhost Rib: Commence routing updates
Mar 18 11:11:27 s7151 snmpd[2859]: AgentX subagent AGENTX Ribd version ribd-2.0.2, built Mon Feb 10 01:48:41 PST 2014 established session 0xf8ce73a8/0xf8cef3e0/11

 

Individual agent logs are available in ‘/var/log/agents’ multiple restarts of an agent will create multiple files, each suffixed with the new process ID.

[admin@Arista log]$ cd /var/log/agents/
[admin@Arista agents]$ ls
Aaa-2216Dot1x-2227IgmpHostProxy-2233Lm73-2476
Picasso-2211Sb820-2383Sysdb-1716Acl-2234
Ebra-2399IgmpSnooping-2258Max6658-2502Pmbus-2921
Scd-2394Thermostat-2248AgentMonitor-2186EventMon-2202
Ira-2199Mirroring-2198PortSec-2193Smbus-2384
TopoAgent-2252Arp-2214FanDetector-2527LacpTxAgent-2183
Mpls-2225PowerManager-2253Snmp-2188Ucd9012-2382
[...]

 

Key Agents

  • Rib – The Routing Information Base, a table of the best routes to all known destinations.
  • Ebra – Ethernet Bridging Agent – L2 interaction with the Kernel
  • Ira – IP Routing Agent – L3 interaction with the kernel.
  • Strata* (Strata, StrataL2, …) – On Trident-based platforms, they are responsible for communications between the control-plane and the Trident+/2/etc chips
  • Sand* (Sand, SandFap, SandFabric, SandCounters, etc) – On Arad platforms (7500E, 7280,…) they manage the hardware communications (line cards, fabrics)
  • FocalPointV2 – Interacts with the ASIC moving software configuration into hardware.
  • ProcMgr-worker – Monitors the health of other processes, and restarts any which fails.
  • Sysdb – Central EOS state database.
  • others – Many other processes are self-explicit, such as Stp, Igmp, Snmp, Sshd, Acl, Lldp, Cli, Fhrp, Lag, etc

 

As an example of many useful system information, a commonly used script provided by Arista TAC, called loggrab and accessible on the download page, collects the following:

cd /mnt/flash/$LOGNAME
ls -alR /persist/sys > persist-sys-contents
ls -alR /mnt/flash > flash-contents
ls -alR /var/log/agents > agent-contents
ls -alR /var/core > core-contents
df > disk-utilization
cp /mnt/flash/*config .
cp -r /var/log/agents .
cp -r /var/core .
# All the past scheduled tech-supports
cp -r /mnt/flash/schedule/tech-support .
sudo cp /var/log/messages .
Cli -p15 -c "show tech" > ./show-tech
Cli -p15 -c "show tech ribd" > ./show-tech-ribd

 

2.6) Port Mirroring

Port Mirroring is used on a switch to send a copy of packets seen on one or more ports to a network monitoring connection on another switch port. This is commonly used for network appliances that require monitoring of network traffic like an intrusion-detection system.

switch(config)#monitor session Monitor1 destination Ethernet 1/1
switch(config)#monitor session Monitor1 source Ethernet 2/1-4,3/1-4
switch#show monitor session
Session Monitor1
------------------------
Source Ports:
Both:        Et2/1, Et2/2, Et2/3, Et2/4, Et3/1
Et3/2, Et3/3, Et3/4
Destination Ports:
Et1/1 :  active

 

2.7) Advanced Port Mirroring (7150S only)

The Advanced Mirroring functionality on the 7150 series switches is an enhancement to the standard mirroring functionality. It adds functionalities such as:

  • Mirroring direct to EOS control plane
  • Filtering with ACLs
  • Multi-destination mirroring
  • Advanced Load Sharing
  • Time stamping of mirrored traffic
  • Packet truncation

 

The following example shows the  possibility  to simultaneously have a range of ethernet ports, a port-channel and the CPU as destination for a mirror session:

7150(config)#monitor session test-session source Et2
7150(config)#monitor session test-session destination Et1,3
7150(config)#monitor session test-session destination port-channel 1
7150(config)#monitor session test-session destination cpu

 

Sending traffic to the cpu allow capturing data-plane traffic in EOS or on the kernel with TCPdump.

7150#show monitor session
Session test-session
------------------------
Source Ports
Both:        Et2
 
Destination Ports
Cpu :  active   ← CPU as destination for local TCPdump
Et1 :  active
Et3 :  active   ← Multiple Ethernet ports as destinations : replication / re-generation
Po1 :  active   ← Port-channel / LAG as destination : load-balancing

No Acl is specified for this mirror session.

 

It is also possible to add an IP ACL to the mirrored traffic to only mirror interesting traffic. ACLs can be applied for the whole session, or per port. Only mirrored traffic is filtered, not the original traffic.

7150#show ip access-lists allow-host
IP Access List allow-host
10 permit ip host 10.10.11.24 host 233.39.215.23
20 deny ip any any
 
7150(config)#monitor session test-session ip access-group allow-host
7150#show monitor session
Session test-session
------------------------
Source Ports
Both:        Et2
 
Destination Ports
Cpu :  active
Et1 :  active
Et3 :  active
Po1 :  active
 
Acl: allow-host

 

 

2.8)  LANZ-lite (7048T, 7500E, 7280SE)

LANZ-lite versus LANZ:

  • LANZ is trigger-based – guarantees capturing µburst congestion events
  • LANZ-lite is poll-based – provides buffer measurements at short intervals

Arista’s Latency ANalyZer (LANZ) provides the unique ability to monitor switch queue-depth on a per-port basis with microsecond granularity. LANZ can provide early warning of impending congestion and increasing latency through CLI, Syslog and also an application layer streaming export protocol providing administrators and applications themselves real-time awareness of changing network conditions and microburst behavior. LANZ is disabled by default, but can be enabled globally using the command:

Arad(config)#queue-monitor length

 

Once enabled, it is recommended you disable LANZ on any interface you do not wish to monitor.

Arad(config)#interface ethernet 3/1/1
Arad(config-if-Et1-2)#no queue-monitor length

 

On the interfaces you do wish to monitor you can configure a maximum and minimum threshold. LANZ data can be viewed using ‘show queue-monitor length <interface>’.  The output provides congestion information per interface,.

switch(config)#show queue-monitor length
Report generated at 2012-04-06 13:05:55
Time                          Interface  Queue Duration  Traffic  Ingress
Length                  Class Port-set (bytes)   (secs)
------------------------------------------------------------------------------------
0:00:05.75071 ago             Et6/12   5619824     1      7     Et5/1 - Et5/8
0:00:05.75071 ago             Et6/28   5619824     1      5     Et5/25-Et5/32
0:01:05.75071 ago             McastQ  10455456    10      1     Et5/1 - Et5/8
0:02:05.75071 ago             Et5/34   7892352   240      0     Et6/33-Et6/40
1:00:05.75071 ago             CpuTm     100000     5      1     Et7/1 - Et7/8
11:00:06.75071 ago            Et7/4    5619824     3      0     Et6/1 - Et6/8
1 day, 4:33:23.12345 ago      Et6/3    7892352   100      1     Et7/17-Et7/24

 

  • Ingress Port-set :  Points to the ingress chip the congestion event occurred.
  • Interface : One whose VOQ suffered congestion. This can be a physical interface, McastQ (Fabric Mcast Queue) or CPU (traffic to CPU) being congested.
  • Duration : How long the VOQ suffered congestion, starting from the record timestamp
  • Traffic Class :Class of Service the traffic belongs to

 

 

2.9) LANZ (7150S)

Reminder on LANZ-lite versus LANZ:

  • LANZ is trigger-based – guarantees capturing µburst congestion events
  • LANZ-lite is poll-based – provides buffer measurements at short intervals

Arista’s Latency ANalyZer (LANZ) provides the unique ability to monitor switch queue-depth on a per-port basis with microsecond granularity. LANZ can provide early warning of impending congestion and increasing latency through CLI, Syslog and also an application layer streaming export protocol providing administrators and applications themselves real-time awareness of changing network conditions and microburst behavior.

In addition to the traditional LANZ behaviors, LANZ on a 7150 tracks per interface per queue buffer utilization, the duration of the congestion and counts any packets dropped due to full buffers during the congestion event.

LANZ is disabled by default, but can be enabled globally using the command:

7150(config)#queue-monitor length

 

Once enabled, it is recommended you disable LANZ on any interface you do not wish to monitor.

7150(config)#interface ethernet1-2
7150(config-if-Et1-2)#no queue-monitor length

 

On the interfaces you do wish to monitor you can configure a maximum and minimum threshold.  We will trigger an event when the maximum threshold is reached, to avoid filling the log we then trigger a sleep timer.  The sleep timer is instantly expired if the threshold drops below the minimum value.  These values allow you to not only control when LANZ triggers, but how often.

7150(config)#interface ethernet1-2
7150(config-if-Et1-2)#no queue-monitor length threshold 512 256

 

LANZ data can be viewed using ‘show queue-monitor length <interface>’.  The output provides congestion information per interface, per traffic class.

7150S#show queue-monitor length
Report generated at 2014-07-02 14:28:46
E-End, U-Update, S-Start, TC-Traffic Class
GH-High, GU-Update, GL-Low
Segment size for E, U and S congestion records is 480 bytes
Segment size for GL, GU and GH congestion records is 160 bytes
* Max queue length during period of congestion
+ Period of congestion exceeded counter
--------------------------------------------------------------------------------
Type    Time                   Intf    Congestion     Queue       Time of Max
(TC)    duration       length      Queue length
(usecs)        (segments)  relative to
congestion
start
(usecs)
--------------------------------------------------------------------------------
E  0:21:45.14067 ago         Et17(1)  20755358       3555*       1129
U  0:21:45.89304 ago         Et17(1)  N/A            3552        N/A
S  0:22:05.89603 ago         Et17(1)  N/A            598         N/A
 
S = Start of trigger event
U = Updates at regular short interval (~10µs) for the course of the whole congestion
E = End of congestion, summary reporting

 

Drop counters can also be viewed on a per congestion event basis using the command ‘show queue-monitor length drops’

7150S#show queue-monitor length drops 
Report generated at 2012-12-24 13:16:45
Time                                    Interface      Drops
-----------------------------------------------------------------
E  0:15:12.11012 ago                    Et17(1)        1921

 

The buffer values can be equated to relative latency using the ‘show queue-monitor length tx-latency’ command.

7150S#show queue-monitor length tx-latency 
Report generated at 2012-12-06 09:15:22
Time                          Intf( TC )     Tx-Latency (usec)
-----------------------------------------------------------------
0:22:41.62959 ago             Et17(1)        329.904