Please. Pretty please. Pretty please with sugar on top. Do these pleas sound familiar when trying to buy tools for your network? Making a purchase for moving Production traffic is easier. You may be able to quantify how much time can be saved with the purchase of a tool for automation. Or for a tool with an integration focus. Easiest of all may be when proposing a self-service tool that unburdens the thin IT staff. How do you justify spending money on tools for a rainy day when the sun is shining and the birds are chirping? It can take a creative mind to thread the line of ‘we need to be prepared’ yet, not tilting at windmills. The intent of this chapter in our series about Revisiting the Fundamentals is that you must – Know Your Tools. Master what you have in hand. Then leverage that to add additional tools like Arista’s TapAgg with DANZ (Data Analyzer) and DMF (DANZ Monitoring Fabric).
Mastery feeds justification. If you are not leveraging what you already have how can you justify a purchase?
Like many good discussions, this one spawns from a story. A few months ago I was assisting a customer who was experiencing a service outage. And by outage, I mean Outage. Flat out down. “We can’t get from here to there.” “We’re broken.” Not the statements you want to hear when joining a crowded conference call. The business unit VPs screaming; “We’re losing $$$ by the minute.” “Our website traffic has dropped off a cliff.” “Application Blah is throwing more errors than I can figure out.” This was one of those outages that a team of scribes could not have accurately documented all the broken things.
Within a few minutes of joining the call I realized the network team was paralyzed. Their normal indicators weren’t sounding any alarms. It was like they were in the Control Room at Chernobyl and all lights were green. They could manually run tests and see things failing. But the tools they had come to rely on offered no clues. This is where your mastery of ALL tools in your possession matters. Wake up human, now is your time to shine. That was another alarming part of this outage. The network team didn’t realize they had tools in hand that could provide the visibility they needed to isolate the cause and initiate service restoration.
Let’s review what tools you have at your finger tips. These are some of the features built-in to Arista’s EOS (Extensible Operating System).
Tools and Arista Switches
Quick and painless. When you have Arista switches deployed in your network you have a built-in tool to see what is on the wire. Run tcpdump. Where did tcpdump come from? According to this Wikipedia article¹, tcpdump was written in 1988. This grass roots tool has been around since the 80’s. The decade that gave us musical genres such as ‘new wave’ and ‘hair metal’ also gave us a tool to see the truth.
The truth? Absolutely. There is nothing truer than that which is written on the wire. If you want to know the conversations taking place in the network then tcpdump is a fantastic place to start. One method of using the tcpdump command is to display the Control Plane frames and packets. In this case we’re not getting the Data Plane data. Quite often the Control Plane related output is all we need to prove/disprove the existence of a problem.
Let’s say that you are experiencing a problem with OSPF between two networking devices. On an Arista switch you can run a benign command to see all of the OSPF packets on an interface. Here’s an example of a tcpdump command to see all of the OSPF packets on interface VLAN 126:
bash sudo tcpdump -i vlan126 -n "ip == 89" Here's an example output showing some OSPF Hello packets between two switches: SW1.18:54:39#bash sudo tcpdump -i vlan126 -n "ip == 89" tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on vlan126, link-type EN10MB (Ethernet), capture size 262144 bytes 18:56:39.983861 50:00:00:d7:ee:0b > 01:00:5e:00:00:05, ethertype IPv4 (0x0800), length 86: 10.12.6.1 > 22.214.171.124: OSPFv2, Hello, length 52 18:56:44.478465 50:00:00:72:8b:31 > 01:00:5e:00:00:05, ethertype IPv4 (0x0800), length 86: 10.12.6.6 > 126.96.36.199: OSPFv2, Hello, length 52
Let’s invite Wireshark to the party. Running the tcpdump command at the switch CLI is a quick way to diagnose a problem. What if you really need to read deeply into the frames or packets? Well, tcpdump isn’t the prettiest. You can export the output to an installation of Wireshark. There are several articles on Arista’s EOS Central that show methods for reading the output on your screen, including a live rolling capture. Here’s a recent one that Jonathan posted, https://eos.arista.com/forward-tcpdump-to-wireshark/.
Here’s a screenshot of using this trick to send the output of a tcpdump command to the Wireshark running locally on my laptop
Up next in our parade of built-in tools is iPERF. There’s iPerf2 and iPerf3 out there. Some Arista platforms run iPerf2 and some, like CloudEOS, run iPerf3. iPerf was originally developed by NLANR / DST. iPerf3 is maintained by several folks including ESNET². iPerf has a Server-Client model. You can configure one switch to be the iPerf Server and another switch to be the iPerf Client. Here’s an example on vEOS-Lab where SW1 is running iPerf as the Server and SW2 is running iPerf as the Client and we want to generate transfers at ~500 Mbps:
SW1.20:24:09#bash [[arista@SW1 ~]$ iperf -s -p 5005 ------------------------------------------------------------ Server listening on TCP port 5005 TCP window size: 85.3 KByte (default) ------------------------------------------------------------ SW2.20:28:18#bash [arista@SW2 ~]$ iperf -c 10.1.2.1 -u -p 5005 -b 500M -t 600 -i 1 ------------------------------------------------------------ Client connecting to 10.1.2.1, UDP port 5005 Sending 1470 byte datagrams UDP buffer size: 208 KByte (default) ------------------------------------------------------------ [ 3] local 10.1.2.2 port 33961 connected with 10.1.2.1 port 5005 [ ID] Interval Transfer Bandwidth [ 3] 0.0- 1.0 sec 60.6 MBytes 508 Mbits/sec [ 3] 1.0- 2.0 sec 60.9 MBytes 511 Mbits/sec [ 3] 2.0- 3.0 sec 59.2 MBytes 496 Mbits/sec [ 3] 3.0- 4.0 sec 60.5 MBytes 508 Mbits/sec [ 3] 4.0- 5.0 sec 60.7 MBytes 509 Mbits/sec ^C[ 3] 0.0- 5.2 sec 315 MBytes 506 Mbits/sec [ 3] Sent 224762 datagrams
iPerf is a fantastic way for you to generate some traffic. In the world of switches, since we’re dealing with the CPU and not the ASIC, we’re not going to be able to fully load up our ridiculously fast interfaces like 100GbE and 400GbE. But you can verify some amount of load. You can verify end to end reachability. This is particularly useful when you want to test a specific port that a finicky application uses. And verify reachability through those restrictive firewalls.
Connectivity Monitor – Cloud Tracer
But wait, there’s more. What if we wanted to run continuous transactions? How nice would it be to run a synthetic transaction from Leaf1 through the Spines, across the Border Leafs, out through the DCI and over to the secondary data center? Or how about running continuous tests out to resources on the Internet that your organisation is dependent upon like GCP, Azure or AWS? In walks Connectivity Monitor. Also referred to as Cloud Tracer, this is a tool that can be used to constantly check your data paths. With this built-in tool you get HTTP response times, Jitter and Packet Loss.
Here’s an example from a Production network. Running from Campus switches this customer has HTTP Gets and ICMP Pings running out to sites that they depend on. Additionally, they run tests internally from switch to switch to verify in-house connectivity.
! monitor connectivity host AWS-East-1 ip 188.8.131.52 url https://quicksight.us-east-1.amazonaws.com ! host Discord ip 184.108.40.206 url https://discordapp.com ! host Google ip 220.127.116.11 url https://www.google.com ! host SW-ACCESS-A ip 10.100.100.4 url https://10.100.100.4 ! host SW-ACCESS-B ip 10.100.100.5 url https://10.100.100.5 ! host Twitch ip 18.104.22.168 url https://twitch.tv no shutdown
We can see the output via the CLI command show monitor connectivity. Here’s an example:
SW-CORE-A#show monitor connectivity Host: SW-ACCESS-A Network statistics: Ip address Latency Jitter Packet loss 10.100.100.4 0.122ms 0.049ms 0% HTTP statistics: https://10.100.100.4 Response Time: 28.841ms Host: SW-ACCESS-B Network statistics: Ip address Latency Jitter Packet loss 10.100.100.5 0.101ms 0.037ms 0% HTTP statistics: https://10.100.100.5 Response Time: 24.959ms Host: Twitch Network statistics: Ip address Latency Jitter Packet loss 22.214.171.124 12.124ms 0.243ms 0% HTTP statistics: https://twitch.tv Response Time: 240.215ms
Connectivity Monitor is referred to as Cloud Tracer in Arista’s CloudVision Portal. While this article is focused on tools built into the switches, I wanted to show a prettier method of seeing the output of these continuously running tests:
Hands down my personal favorite. This comes in handy in about 33 different ways. Can you tell how long a route has been in the FIB? Do you know when a route left the FIB? Do you know if the attributes of a route (e.g. Administrative Distance or Metric) changed? What about that mac-address table? When did we last see this MAC? Did we ever see this MAC?
More than once I’ve had a discussion with Compute peers that insist a newly turned up VM that can’t talk to the network is the fault of the network. Oh brother! Normally, all we could attempt innocence is issue the command show mac address-table. Network engineers are guilty until proven innocent. We need more proof.
What if we had some history to work with? In comes show event-monitor ? Here’s a snippet of output from a Campus PoE switch:
poe1#show event-monitor mac 2020-06-26 06:18:52.660159|1|00e0.6714.fc4a|Ethernet15|learnedDynamicMac|removed|1810 2020-06-26 06:23:49.078640|1|00e0.6714.fc4a|Ethernet15|learnedDynamicMac|added|1811 2020-06-26 06:23:52.658849|1|9801.a75c.dcae|Ethernet15|learnedDynamicMac|removed|1812 2020-06-26 06:23:52.659947|1|bc09.63c8.4333|Ethernet15|learnedDynamicMac|removed|1813 2020-06-26 06:23:52.659961|1|b8d7.af9e.8435|Ethernet15|learnedDynamicMac|removed|1814
We can see via the MAC option that we’ve got Dynamically learned MAC addresses coming and going in this switch’s table. And with the history, I can prove that I haven’t seen MAC xxxx.xxxx.BLAH on any switch.
Another helpful example is finding out exactly when Routing table, FIB, changes occurred. Here’s an example from a vEOS-Lab and some goofing around with OSPF:
SW1.20:01:42#show event-monitor route 2020-05-15 12:31:51.837899|126.96.36.199/32|default|ospfIntraArea|20|110|updated|51 2020-05-15 12:31:51.837911|188.8.131.52/32|default|ospfIntraArea|20|110|updated|52 2020-05-15 12:31:51.837918|10.2.4.0/24|default|ospfIntraArea|20|110|updated|53 2020-05-18 19:08:42.222489|184.108.40.206/32|default|ospfIntraArea|0|0|removed|54 2020-05-18 19:08:42.316220|220.127.116.11/32|default|ospfIntraArea|0|0|removed|55
I’ve got days and possibly weeks, depending on amount of churn, of evidence as to exactly when any changes occurred in the routing table, MAC address table, ARP table, IGMP info, Multicast table and even LACP information. Truly a handy tool built directly into the switch for quick action to get that Compute kid to look elsewhere. And if you have Arista’s CloudVision Portal you’ll find this information stored for you there as well.
In all fairness, we have to recognise that many tools cost more than a house. OK, maybe not those $M Silicon Valley homes. But for the rest of us outside the nucleus of IT innovation, tool Purchase Orders can rival a mortgage. We need to master what we have in hand before moving up to the ultra sexy tool suites. It is only far that we master what is in our tool box. As a puppy we need to prove that mastery and then layer on Arista’s DANZ and DMF to start running with the big dogs.