• CloudVision Event Guide

 
 
Print Friendly, PDF & Email

Contents

Overview

This article identifies some of the common CloudVision Events and provides information regarding the events themselves or references to troubleshoot the underlying cause of the events.

CloudVision Portal Events

Streaming Analytics Error

Explanation: CloudVision encountered an internal error in the streaming analytics process. A common reason for the error could be because a device was upgraded to an EOS version unsupported in the current CloudVision instance. In addition, any known issues will be listed in the CloudVision Portal release notes. If none of the mentioned apply, please collect a copy of cvpi debug and contact Arista TAC to debug the error.

CVE Bug Exposed

Explanation: CloudVision detected a potential CVE on the switches. For more information, please visit https://www.arista.com/en/support/advisories-notices.

Change Control Failed

Explanation: CloudVision encountered a failure for the change control cited. A few common reasons for failure are:

  • An image push task failed, as the target device filesystem is full.
  • An incorrect configuration is being pushed.
  • Management IP is missing from the proposed configurations.
  • A user modified commands via switch CLI while a config push task was in progress.

Please also review the change control logs for information regarding the failure.

Change Control Running

Explanation: This is an informational message confirming the mentioned change control is successfully running.

Change Control Succeeded

Explanation: This is an information message confirming the mentioned change control is successfully updated across the devices.

Clock Not Synchronized

Explanation: An NTP server is not configured on an end device. To resolve the event please ensure a valid NTP server is configured and is effective on the switch.

Anomaly in CloudTracer Latency

Explanation: The cloudtracer latency anomaly event monitors the latency metric between devices and configured hosts. CloudVision detected a deviation in these metrics from the historical bounds. For more information, please visit https://eos.arista.com/toi/cvp-2020-1-0/events/#CloudTracer_Latency_Anomaly_Events.

CVX Disconnection

Explanation: The switch has disconnected from the CVX. Please check if there is connectivity to the CVX nodes and that in a CVX cluster the minimum number of nodes are active. If the issue persists, please contact Arista TAC to debug further.

Low Disk Partition Space Available

Explanation: CloudVision detected that the filesystem space on a device is below the set threshold. To debug possible causes for the same, please visit https://eos.arista.com/troubleshooting-filesystem-full-issues/.

Disk Partition Usage Approaching Threshold

Explanation: This is a proactive event predicting that the filesystem on the flagged device could be full in 24  hours. To debug possible causes for the file system filling up, please visit https://eos.arista.com/troubleshooting-filesystem-full-issues/.

Packet Loss Detected for CloudTracer Host

Explanation: CloudVision detected a CloudTracer packet loss metric greater than the threshold set. The default threshold generates a WARNING event if there is a 5% packet loss for longer than 30 seconds, and an ERROR event if there is a 50% packet loss for longer than 30 seconds. For more information, please visit https://eos.arista.com/toi/cvp-2020-1-0/events/#CloudTracer_Packet_Loss_events.

High CPU Load

Explanation: CloudVision detected a high load average on the device. For more information on understanding CPU load average, please visit https://eos.arista.com/troubleshooting-high-cpu-utilization/#4_How_to_identify_average_load_on_an_Arista_switch.

High CPU Utilization

Explanation: The total CPU utilization of the device exceeds the set threshold.

High QSFP DOM Temperature

Explanation: The DOM temperature of the QSFP exceeded the threshold set on the module.

High QSFP DOM Voltage

Explanation: The DOM voltage of the QSFP exceeded the threshold set on the module.

High SFP DOM Temperature

Explanation: The DOM temperature of the SFP exceeded the threshold set on the module.

High SFP DOM Voltage

Explanation: The DOM voltage of the SFP exceeded the threshold set on the module.

Interface Went Down Unexpectedly

Explanation: The flagged interface transitioned from UP to DOWN.

Interface Went Down Expectedly

Explanation: This is an informational event that the flagged interface was shut administratively.

Unexpected Link Change

Explanation: The flagged link between the two cited devices transitioned from UP to DOWN.

Expected Link Change

Explanation: This is an informational event that the flagged link between the two cited devices was shut administratively.

Tunnel Interface Went Down

Explanation: The tunnel cited in the event transitioned to the DOWN state.

EOS Version Change

Explanation: This is an informational event that a device is now running a new EOS version.

High Interface Alignment Errors

Explanation: CloudVision detected a high rate of alignment on an interface. Alignment errors can be a result of a bad cable/optic and it would be recommended to swap out/clean cables, optics, patches. If the issue persists, please contact Arista TAC for further assistance.

Abnormally Large Frames

Explanation: CloudVision detected a high amount of giant frames on the interface.

Abnormally Small Frames

Explanation: CloudVision detected a large number of runts on an interface.

High Interface Symbol Errors

Explanation: CloudVision detected a high number of symbol errors on an interface. Symbol errors are typically an indicator or an L1 issue with the local link and it would be recommended to swap out/clean cables, optics, patches. If the issue persists please contact Arista TAC for further assistance.

High Interface FCS Errors

Explanation: CloudVision detected incremental FCS errors on switch links. FCS errors can be localized to the affected link or propagated as a result of cut-through forwarding. For more information on debugging the cause for the errors, please visit https://www.youtube.com/embed/mcQ-zqNLBeY?rel=0&wmode=transparent.

High Input CRC Errors

Explanation: CloudVision detected incremental CRC errors on switch links. CRC errors can be localized to the affected link or propagated as a result of cut-through forwarding. For more information on debugging the cause for the errors, please visit https://www.youtube.com/embed/mcQ-zqNLBeY?rel=0&wmode=transparent.

High Input Interface Drops

Explanation: CloudVision detected a high rate of input discards on the device. This is typically an indicator of ongoing congestion.

High Output Interface Drops

Explanation: CloudVision detected ongoing congestion on the devices exceeding the threshold limit set. For more information on debugging congestion-related issues, please visit the below links:

High Output CRC Errors

Explanation: CloudVision detected incremental transmit CRC errors on switch links. CRC errors can be localized to the affected link or propagated by the switch as a result of cut-through forwarding. For more information on debugging the cause for the errors, please visit https://www.youtube.com/embed/mcQ-zqNLBeY?rel=0&wmode=transparent.

High PTP Offset From Master

Explanation: CloudVision detected the switch is experiencing PTP offsets from the Master exceeding the set threshold. For more information on the PTP BMCA algorithm, please visit https://eos.arista.com/ptp-best-master-clock-algorithm-bmca/.

High PTP Mean Path Delay

Explanation: CloudVision detected a high mean path delay.

High PTP Skew

Explanation: CloudVision detected a high PTP skew from the master.

LANZ Queue Threshold Exceeded

Explanation: CloudVision detected that the queue size is exceeding the set threshold. This could be an indicator of ongoing congestion on the switch.

Routing Table Threshold Exceeded

Explanation: CloudVision detected that the hardware resource utilization for L3 routes is exceeding the set threshold. For 7280/7500 platforms, please review if route optimization i.e https://eos.arista.com/eos-4-22-0f/optimized-ipv4-route-scale-with-2-to-1-compression/ is needed.

Abnormally High Streaming Latency

Explanation: CloudVision detected that the switch is streaming at a latency greater than the set threshold. Please check if there is any form of network latency to the CloudVision instance or if there are NTP sync issues. Should neither apply, please reach out to Arista TAC for additional debugging

Incorrect Interface Speed

Explanation: CloudVision detected that the interface speed is set differently from the predefined user design rules. For more information on network constraint events, please visit https://eos.arista.com/toi/cvp-2020-1-0/events/#Network_Constraint_Events.

Insufficient Downlink Device Redundancy

Explanation: CloudVision detected that the number of the available downlink devices is lesser than the threshold set in the predefined user design rules. For more information on network constraint events, please visit https://eos.arista.com/toi/cvp-2020-1-0/events/#Network_Constraint_Events.

Insufficient Peer Device Redundancy

Explanation: CloudVision detected that the number of the available peer devices is lesser than the threshold set in the predefined user design rules. For more information on network constraint events, please visit https://eos.arista.com/toi/cvp-2020-1-0/events/#Network_Constraint_Events.

Insufficient Peer Lag Redundancy

Explanation: CloudVision detected that the number of the available interfaces in a LAG towards a peer is lesser than the threshold set in the predefined user design rules. For more information on network constraint events, please visit https://eos.arista.com/toi/cvp-2020-1-0/events/#Network_Constraint_Events.

Insufficient Uplink Device Redundancy

Explanation: CloudVision detected that the number of the available uplink devices is lesser than the threshold set in the predefined user design rules. For more information on network constraint events, please visit https://eos.arista.com/toi/cvp-2020-1-0/events/#Network_Constraint_Events.

Insufficient Uplink Lag Redundancy

Explanation: CloudVision detected that the number of the available interfaces in an uplink LAG is lesser than the threshold set in the predefined user design rules. For more information on network constraint events, please visit https://eos.arista.com/toi/cvp-2020-1-0/events/#Network_Constraint_Events.

Low Interface MTU

Explanation: CloudVision detected that the interface MTU is set differently from the predefined user design rules. For more information on network constraint events, please visit https://eos.arista.com/toi/cvp-2020-1-0/events/#Network_Constraint_Events.

Interface Exceeded Inbound Utilization Threshold

Explanation: The inbound bandwidth utilization exceeded the set threshold.

Interface Exceeded Outbound Utilization Threshold

Explanation: The outbound bandwidth utilization exceeded the set threshold.

Port Channel Traffic Imbalance

Explanation: This CloudVision as a Service only event indicates that based on predictive analysis, an imbalance of traffic distribution was detected on members of a port channel.

Device Reloaded

Explanation: CloudVision detected that the flagged device was reloaded.

Device Stopped Streaming

Explanation: CloudVision has flagged the device as inactive as no streaming updates were received over an extended period of time. This is typically a result of:

  • Ongoing network issues
  • The device was rebooted or removed
  • Terminattr was shutdown

If the above is checked out and you are still observing the issue, please contact Arista TAC for further debugging.

New Device Detected

Explanation: This is an information event indicating that a new device was detected through the zerotouch process.

Streaming Agent Low Memory Mode

Explanation: CloudVision has detected that the streaming agent (Terminattr) is running in low memory mode. As a result, only a partial device state might be streaming. Please contact Arista TAC to debug the issue further.

EOS Version High

Explanation: The EOS version deployed on the device is higher than the supported versions.

EOS Version Low

Explanation: The EOS version deployed on the device is lower than the minimum supported versions.

TerminAttr Version Low

Explanation: The Terminattr version deployed on the device is lower than the minimum supported version.

VXLAN Configuration Error

Explanation: CloudVision detected a potential configuration error with VxLAN. Please review the errors to ensure no issues are seen with traffic forwarding. All vxlan config sanity checks can also be locally viewed on the switch via “show vxlan config-sanity”. For more information about various vxlan sanity errors, please visit https://eos.arista.com/vxlan-configuration-check-using-show-vxlan-config-sanity/. 

Custom Syslog Event

Explanation: In addition to predefined syslog triggered events, CloudVision enables users to create their own custom events based on the syslogs generated on the devices. For more information about this event, please visit https://eos.arista.com/toi/cvp-2020-3-0/custom-syslog-events/.

SYSLOG:DeqDelete Detected

Explanation: CloudVision detected that the 7280/7500 devices deleted a stale packet that was buffering for more than 500ms on the switch. These are counted as DeqDelete events. For information on the events and how to debug them, please visit https://eos.arista.com/troubleshooting-dequeue-deletes-on-7280-7500-devices/.

Follow

Get every new post on this blog delivered to your Inbox.

Join other followers: