• Network CI/CD Part 2 – Automated Testing with Robot Framework library for Arista devices

 
 
Print Friendly, PDF & Email

Previously on Network CI/CD Part 1…

We’ve established that lack of simulated test environments and automated test tools are among the inhibitors in a transition from a traditional network operation model to a DevOps workflow, where all changes are verified and tested prior to being deployed in production. We’ve seen how to solve the first problem with Arista’s cEOS-Lab docker container and a simple container orchestration tool. We’ve shown how the new containerised EOS allows us to dramatically increase the number of nodes we can deploy on a single host and decrease the total build and boot time compared to VM orchestration methods, e.g. the ones based on Vagrant. Now that we have our virtualised topology built, we can start thinking about how to test it, and it always helps to start with a bit of on overview of the current lay of the land.

The problem of network testing

Network testing has always been an afterthought of both traditional network design and network operation workflows. When designing a new data center or a new campus network most of the efforts are focused around scalability, reliability, fault talerance, automation. When planning a network change, both implementation and testing procedures are written by an engineer based on his/hers expectations of what’s supposed to happen. Very rarely do we verify our assumptions in a simulated lab environment and even then our tests are limited to a few ping and traceroute commands. However a ping doesn’t mean the traffic is taking the right path in the network and traceroutes, especially in ECMP environments, can be quite hard to verify visually.

Ultimately, even with very high test coverage we’re still doing things manually and relying on humans to interpret the output, which means there’s always a chance for a mistake. If the networking industry is ever to transition to a DevOps operation model having a fully-fledged, robust and reliable test automation framework is a must. The question is: What’s the right tool for this?

Why Ansible isn’t enough?

One of the unfortunate side-effects of network engineers learning Ansible is that now everything looks like it can be solved with yet another intricate playbook and maybe a custom module. I’ve made this mistake myself a long time ago when I developed a network TDD framework on top of Ansible to verify traffic paths inside the network.  The truth is Ansible is not a general purpose automation framework, and it was designed to address a very specific set of use cases and problems:

  • software provisioning, i.e. running a bunch of wget, yum, apt and pip commands
  • configuration management, i.e. creating/modifying configuration files
  • pushing data into a device, i.e. configuration files, binaries
  • running ad-hoc CLI commands over SSH

What Ansible isn’t very good at is:

  • state management, i.e. maintaining state between different playbook runs
  • event management, i.e. reacting to events or state changes on a managed device
  • parsing unstructured data, i.e. output of “show” commands
  • cross-device data correlation, e.g. trigger an action on one devices using data collected on another device

However Ansible is very flexible and customisable and with enough effort it can be “taught” how to do a lot of things it wasn’t designed to do originally. The problem with that is at some point those playbooks become too hard to manage and troubleshoot. This is where the additional complexity outweighs any benefits of automation and obscurity of the resulting DSL code outweighs the benefits of its readability. 

Why scripting is too much?

On the other side of the spectrum are general purpose programming languages, with the most prominent in a networking community being Python.  It certainly loses in readability and speed of development, compared to DSL-based automation frameworks, for general purpose tasks, however it makes up in flexibility and extensibility. At a point when Ansible becomes hard to troubleshoot and manage, Python maintains the same level of complexity. 

Another downside of using pure Python for network testing automation is the need to write a lot of boilerplate code to create common testing abstractions and libraries (it took 6 months and 10k lines of code to write Brigade). Nevertheless we should never discount the possibility of using scripting for network testing, however there may be, at least for this specific use case, a middle ground that on the one hand offers the simplicity and readability of a DSL framework, but on the other hand allows as much customisation as necessary to extend and augment the default behaviour. Enter the Robot.

Robot Framework 

Robot is a generic test automation framework written in Python. Its DSL has a very lightweight syntax which makes it very easy to write and read. The framework comes with a set of standard libraries that implement typical functionality expected from a test framework – data types and structures, conditionals and expectation, automated UI interactions (e.g. selenium, telnet, ssh), as well as many other 3rd party libraries. One of the most recent additions is AristaLibrary – a library to interact with Arista devices over eAPI. At the time of writing this library defines 18 new keywords that allow users to define most typical test scenarios. However, one of the major advantages of Robot framework is the ability to define your own keywords. As I will show later, we can re-use any of the existing keywords to define our own higher-level keywords and use them in our test definitions. 

Arista Network Validation

Arista Network Validation is a mini-wrapper on top of Robot Framework that makes it easier to use it for network testing. It’s available in software downloads as a tar.gz file that can be installed using pip and installs all the required 3rd party packages, including the Robot Framework itself. The installer file is distributed with a “User Guide” pdf document, which contains a detailed description of how to use the framework. It’d be pointless to repeat information from the user guide here, so I’ll refer readers to it for detailed description of the framework and Arista Library. Now it’s time for a quick demo…

Test bed setup

First thing we need to do is install all the Arista Network Validation tool. To simplify dependency management, we’ll install it inside a python2 virtual environment:

$ python2 -m virtualenv testing; cd testing

$ source bin/activate

$ pip install network_validation-1.0.1.tar.gz

We’ll do our testing against a virtual topology built from cEOS devices I’ve described in the previous post.:

$ python3 -m pip install git+https://github.com/networkop/arista-ceos-topo.git

 $ cat <<EOF >> topology.yml

 PUBLISH_BASE: 9000

links:

  – [“Device-A:Interface-1”, “Device-B:Interface-1”]

EOF

$ sudo docker-topo –create topology.yml 

 This will create a pair of cEOS devices interconnected back-to-back with Ethernet interfaces:

+------+             +------+
|cEOS 1|et1+-----+et1|cEOS 2|
+------+ +------+

Testing 

Let’s assume we’ve configured those devices with a simple BGP peering over their directly connected interfaces and advertised their respective loopbacks into BGP. The pseudocode for this config would look something like this:

interface Loopback0

 ip address X.X.X.X/32

!

router bgp 65XXX

 neighbor 12.12.12.Y remote-as 65YYY

 redistribute connected 

!

Now we want to verify that our control plane has converged and we have reachability to the loopback interfaces. We start by creating a simple YAML configuration file “test.yml”, describing the device connection details:

TRANSPORT: https

PORT: 80

USERNAME: admin

PASSWORD: admin

RUNFORMAT: suite

nodes:

 SW1:

   host: localhost

   port: 9000

 SW2:

   host: localhost

   port: 9001

PROD_TAGS:

 – ignoretags

testfiles: 

 – network_validation

Arista Network Validation tool will look for test cases inside a “network_validation” directory and execute all tests that match a particular tag (“ignoretags” will exectue all of them).

Now it’s time to create our first test scenario. Each test case file contains a number of sections responsible for various parts of testing procedure. For now let’s focus on the main section called “Test Cases”. In there we first check that our BGP peering with a neighbor is in “Established” state. We do that by issuing a “show ip bgp summary” command, using a “Get Command Output” keyword, and picking apart the output until we get the “peerState” attribute of a response. The second test case verifies that peer loopback is reachable with a special “Address Is Reachable” keyword, which behind the scenes issues a ping and verifies that at least one ping request received a response.

*** Settings ***

Documentation     This test verifies control and dataplane connectivity between two BGP peers

Suite Setup       Connect To Switches

Suite Teardown    Clear All Connections

Library           AristaLibrary

Library           AristaLibrary.Expect

Library           Collections

 

*** Variables ***

# Neighbor peer address

${PEER_ADDRESS}    12.12.12.2

${PEER_LOOPBACK}    2.2.2.2

 

*** Test Cases ***

Controlplane verification

   [Documentation]    Check PEER Established

   Get Command Output    cmd=show ip bgp summary

   Expect    vrfs default peers ${PEER_ADDRESS} peerState    is Established

 

Dataplane verification

   [Documentation]   Check the PEER Loopback is reachable

   ${result}=    Address Is Reachable    ${PEER_LOOPBACK}

   Should Be True    ${result}

 

Gather Post Change Output

   Record Output    cmd=show ip bgp summary

 

*** Keywords ***

Connect To Switches

   [Documentation]    Establish connection to a switch which gets used by test cases.

   Connect To    host=${SW1_HOST}    transport=${TRANSPORT}    username=${USERNAME} password=${PASSWORD}    port=${SW1_PORT}

 Finally we can execute our test scenario and get the result:

$ validate_network.py –config test.yml –reportdir output

==============================================================================

Run Full Suite                                                                

==============================================================================

Run Full Suite.1 Bgp :: This test verifies control and dataplane connectivi…

==============================================================================

Controlplane verification :: Check PEER Established                   | PASS |

——————————————————————————

Dataplane verification :: Check the PEER Loopback is reachable        | PASS |

——————————————————————————

Run Full Suite.1 Bgp :: This test verifies control and dataplane c… | PASS |

2 critical tests, 2 passed, 0 failed

2 tests total, 2 passed, 0 failed

==============================================================================

Run Full Suite                                                        | PASS |

2 critical tests, 2 passed, 0 failed

2 tests total, 2 passed, 0 failed

==============================================================================

Now that we’ve seen how easy it is to write and read tests using standard AristaLibrary keywords, let’s have a look at how to extend the Robot Framework by adding new high-level keywords.

Custom keywords

Let’s assume we want to verify some internal behaviour that is not necessarily exposed through Arista CLI. One of the common tasks in acceptance testing is to run a debug to record timing of a certain event (e.g. BGP keepalive or RIP update). Normally, this would involve some setup/teardown commands to turn the debugging on and off and some match command to match an event signature. Instead of doing all of these steps at every tests case, we can define our own keywords in the bottom “Keywords” section of a test case file:

Enable tracing for ${agent} ${setting}

Run Keyword And Ignore Error Configure bash timeout ${BASH_TIMEOUT} sudo rm /tmp/${TRACE_FILE}

${trace_on}= Create List trace ${agent} setting ${setting} trace ${agent} filename ${TRACE_FILE}

${result}= Configure ${trace_on}

Length Should Be ${result} 2

Record all occurrences of ${event}

${result}= Configure bash timeout ${BASH_TIMEOUT} grep “${event}” /tmp/${TRACE_FILE}

Log ${result[0][‘messages’][0]}

Disable tracing for ${agent}

${trace_off}= Create List no trace ${agent} setting no trace ${agent} filename

${result}= Configure ${trace_off}

Length Should Be ${result} 2

Run Keyword And Ignore Error Configure bash timeout ${BASH_TIMEOUT} sudo rm /tmp/${TRACE_FILE}

We can then make use of those keywords in the “Test Cases” section like this:

[Setup] Enable tracing for ${DEBUG_AGENT} ${DEBUG_SETTING}

Sleep ${DEBUG_TIMEOUT}

Record all occurrences of ${DEBUG_EVENT} 

[Teardown] Disable tracing for ${DEBUG_AGENT}

 Assuming we’ve defined the debug variables in the config YAML file like this:

DEBUG_AGENT: “Rib”

DEBUG_SETTING: “Rib::Rip*/*”

DEBUG_EVENT: “RIP RECV” 

DEBUG_TIMEOUT: 35

We get all occurrences of “RIP RECV” event recorded during a 35 second window in the output logs:

07:57:23.247214 RIP RECV 12.12.12.2 -> 224.0.0.9 vers 2, cmd Response, length 244 

07:57:53.651828 RIP RECV 12.12.12.2 -> 224.0.0.9 vers 2, cmd Response, length 244

Further reading

Obviously, since Robot Framework has its own DSL, some learning curve is expected. However, once one get familiar with most common standard libraries and keywords, writing robot test cases becomes very easy. Thankfully Robot boasts one of the best-written documentation for an open-source project, which, along with the Arista Network Validation user guide, should be enough for anyone to get up to speed and start writing test cases in a matter of hours.

Coming up

Hopefully this post has given a feel of how easy we can perform automated network verification and validation, which brings us one step closer to our final goal – a fully automated build and test pipeline for network devices. In the next and final post we’ll complete our journey towards the network CI/CD nirvana by building our own network CI server based on GitLab and creating a simple CI/CD pipeline that would make use of both cEOS and Robot framework to build and test all network changes.

Follow

Get every new post on this blog delivered to your Inbox.

Join other followers: