As the networking industry continues riding the DevOps wave and network engineers become more comfortable using automation tools like Ansible and Salt, network still remains the most brittle piece of IT infrastructure. Partially this can be justified by the fact that network underpins all other areas of IT infrastructure stack – compute, storage and virtualisation. However we, as network engineers, have done very little to improve our confidence in networks and until now some of the biggest outages were caused by trivial network configuration mistakes. When software industry was facing similar challenges, the response was to create Continuous Integration (CI) pipelines – series of steps that automatically build and test every new code change in an environment that closely resembles production. However when it comes to networking, things aren’t always as easy.
The recent “network automation” movement has been focused mostly on one problem – pushing configuration into network devices. In the absence of any prior configuration management, this was the first logical step to take and a very difficult one as well, thanks to the decades of manual CLI provisioning and a common (mis)belief that networking is special. Although today this problem is still far from being solved, technologies like Arista’s eAPI, configuration modelling with YANG and various other commercial and open-source tools enable some form of API access to install and retrieve configuration across most of the network operating systems. That’s why now the focus is starting to shift towards more advanced use cases, like automated network verification (control plane) and validation (data plane) testing. However here the networking industry is facing another set of challenges:
- Lack of simulated test environment – it’s still very hard to get a good working virtual replica of a production network device and network simulation is usually a manual task which involves hours of pre-configuration and fine-tuning.
Lack of automated test tools – although it’s possible to validate some state using Ansible, napalm or YANG state models, this still covers only a fraction of a total device state and doesn’t tell much about a network-wide state or dataplane connectivity.
Cultural shift – perhaps the most important and difficult issue to tackle, stems from the lack of trust in automation and requires consolidated effort from multiple layers of staff inside an organisation.
The last problem can not be classified as purely technical and represents a vast area which may require a slightly bigger publication format (e.g. The Phoenix Project). That’s why we’re going to focus on the first two problems and over a series of 3 blog posts explore the possible solutions using some of the latest additions to Arista toolset portfolio:
- In this post we’re going to focus on how to automatically create simulated test environments using the new containerised EOS, built specifically for lab simulations – cEOS-Lab.
- In the next post we’ll see how to perform automatic network verification and validation testing using Arista network validation framework.
- By putting everything together in the final post we’ll create a complete network CI pipeline that will build a simulated network topology, apply new changes and test them, all inside a portable Gitlab environment.
cEOS-lab is a containerised version of vEOS-lab with similar functionality and limitations. Compared to vEOS-lab, cEOS-lab has all the benefits a container, such as much lighter footprint of approximately 600 MB of RAM per device, smaller image size and greater portability. However from network automation perspective, the most interesting features are:
- Standard tools and APIs allowing users to interact with it just like with any other Docker container.
- Image Immutability, which means that you can change configuration of a running device and even upgrade the EOS version all with no effect on the docker image so that the next time you spin up a container, it will come up in its original state.
- Typical container lifecycle which makes it fast and easy to spin up and tear down arbitrary multi-node topologies
cEOS-Lab is distributed as an archived Docker image, which can be imported directly into the local Docker image repository as follows:
From here on, we can work with cEOS just like with any other Docker container. For example, the following commands would create a cEOS container and connect it to a pair of new networks:
The above commands would result in the following topology:
We can interact with a running cEOS container using standard Docker tools. For example this is how we can connect to an interactive Cli shell inside the container:
And this is how we can destroy all containers with names containing “ceos”:
Running EOS inside a container is a very convenient alternative to vEOS, however since all the containers are interconnected by Docker-managed bridges, this imposes a number of limitation that may result in an unexpected behaviour:
- By default Linux bridges will consume all L2 link-local multicast frames, including STP and LACP (source). The workaround is to connect containers directly with veth pairs either manually or using something like koko.
- It is possible to force Linux bridge to flood LLDP PDUs with the following command
- For any docker container with a number of interfaces greater than 3, the order in which these networks are plugged in inside the container is not deterministic (source). The workaround is to build a custom docker binary using the proposed patch which changes the internal libnetwork data structure storing connected networks from heap to array.
Building multinode cEOS topologies
When building complex network topologies with multiple nodes, manual approach of creating and interconnecting each container individually can become cumbersome. It would be much easier if we could define a desired topology in a text file and have it built and destroyed automatically. Thankfully, there are multiple ways to orchestrate the creation of multiple containers and networks.
Using existing container orchestration tools
One of the obvious tools of choice from Docker’s native ecosystem is docker-compose. It uses a YAML file which describes the desired state of a multi-container application to build or destroy multiple interconnected containers at the same time. Below is an example of a simple two-node topology defined in a docker-compose format:
If saved in a `docker-compose.yml` file in the current directory, the above topology can be built using a single command:
However docker compose has its own caveat which doesn’t honour the order in which networks are defined inside the container. This bug has no current workaround and effectively makes docker-compose unusable for any network topology with more than just 2 links per device.
Another popular container orchestration tool, Kubernetes, assumes a networking model with just a single network interface per POD, which also makes it unusable for network simulations, although a few CNI plugins like multus and knitter were created to address this issue.
Building network topology orchestrator for cEOS using Docker API
Finally, when none of the existing tools are deemed good enough for the task at hand, there’s always an option to build one yourself. The benefit of a custom-built tool is that we can focus on just what’s required without having to care for a broader set of use cases, as it is the case with general-purpose orchestrators. In this case we know that we want to create multiple containers and attach them to multiple networks but we don’t care about load-balancing, security or automatic service scale-out. An example implementation of such orchestrator can be found on Github. To begin working with it, we need to define our topology as a list of links, each described by a unique set of connected interfaces
Optionally, we can provide configuration that we want to apply to a particular device. For example, this is how we define a hostname for device “Leaf1”:
Now, we can create our topology with just a single command:
This will result in the following topology being created inside our Docker host:
From here on we can interact with each device either through a Cli or eAPI:
By default eAPI ports of all containers will be published on docker host starting from port 8000.
Now that we’ve covered how to build arbitrary network topologies from a simple text file, we can move on to the next task – testing them.
In the next post we’ll explore Arista network validation framework built on top of Robot, one of the most popular open-source general-purpose test automation frameworks in the world.