From your manager; “We have a greenfield data center project heading our way. I need you to start working on a design for two data centers. Each data center will be 10,000 square feet in size. We’ll need full network redundancy. It needs to support virtualized compute, physical compute, IP Storage, load balancers, firewalls, an oversubscription ration of 3:1 or better, horizontal cabling based on MMF and a set of Data Center Interconnect links with Layer 2 adjacency to support VM Mobility. Oh, and I need a rough budget estimate by the end of the week.”
Sound familiar? Hurry up and give me all the things and do it in a manner that satisfies my need for instant gratification. We’ve all been there. Ready, Fire, Aim. While all of those production facing and money producing services certainly matter, there is one thing that is too often treated as a second class citizen. One thing that is what I like to call the junkyard of our network. That which no one wants to spend money on. That thing that doesn’t produce revenue but absolutely saves money for us when all else goes sideways. That thing is the topic of this post. That thing is; the Out of Band network.
What was it?
Unfortunately, we’ve made a grave error. We’ve let the OOB become a nice to have. We’ve made it eligible for redlining out of a quote. We’ve let it become the place where devices go to die. Or if not dead, certainly retired from playing a primary role in service delivery. Today’s network designs tend to focus on the latest trends. We seek the sexiest confluence of ultra-fast equipment combined with fresh RFC topics and a sprinkling of automation on top.
It was something that used to be very important to us. It was a portion of the network that, at one time, was considered to be a critical part of our designs. It was our last line of defense when gremlins ran around the data center wreaking the havoc that they so dearly enjoy.
The Out of Band network, from a network engineer’s perspective, is a means to monitor and manage the network devices without having to rely on the normal Production flow of our data and voice traffic. By contrast, In-Band refers to managing a device by IP addresses that are assigned to interfaces where those interfaces also carry production frames and packets.
What should it be?
The Out of Band network should be dedicated switches that all Production switches are connected to. There is an interface on every device provided to you for this purpose. This interface is often referred to as the management interface. This dedicated physical interface is then cabled to dedicated switching and routing infrastructure that doesn’t carry Production traffic. Some networking equipment has two dedicated interfaces: One for the remote management we’re talking about and one for local device access often called the console interface. The management interface can be viewed as a ‘host’ interface in that it is ethernet, can be assigned an IP address, and even placed into a separate VRF (Virtual Routing and Forwarding) instance. The console interface is not typically a ‘host’ interface. The console interface is usually a ‘jack’ if you will where a technician can make a local connection from a laptop or crash cart. Access to the console interface usually requires physical access to the device. This physical access may not be required as we’ll see later in this document when we discuss console servers.
The OOB network should be, no strike that, it must be brought to the forefront again. We’re all busy mastering new skills and meeting time-to-market goals that would have been laughable a decade ago. But we’ve forgotten to plan for Murphy. The OOB network must be considered an indispensable part of the build. It must be treated with the same careful rumination and planning as the core of the network gets. It must be part of any Proof of Concept lab conducted with the networking OEMs. The OOB network must go through the same scrutiny for failover and disaster scenario games that we play with our Production equipment. The design of the OOB network must be presented to the Security team for their review and feedback. This insurance policy must be monitored, patched, and perhaps most importantly, tested on a regular basis.
What might it look like?
Like all other things networking, there is more than one way to build an Out of Band network. The examples that we provide here are all suggestions. Some may be stronger suggestions than others. Pieces and parts of the following recommendations should be weighed against your specific needs.
Start with taking an inventory of all the devices you are responsible for. Routers, switches, firewalls, load balancers, etc… Audit all of these devices for what Out of Band management options they have. We expect that most will have two. A management interface that is either RJ45 (Copper) based or possibly an interface that is SFP (Fiber) based. This management interface is the one that can be assigned an IP address, subnet mask, and either a default gateway or a default static route. There is also likely a console interface. This interface is less of a host interface and more of a portal. It may be an RJ45 (Copper) interface or it could be a USB interface. The console interface probably doesn’t provide remote access, as in the ability to SSH from a remote location. The console interface is usually meant for a user to connect locally with some sort of console cable and a laptop, crash cart, or console servers. We’ll talk about console servers in the Leveling Up section of this document.
For now, let’s focus on the management interface and what we can do with it. Part of a basic OOB network design will include a routed network that is separate from the Production network. Ideally, we’d like to cable the management interface to a dedicated Management switch. All devices are subsequently cabled to this dedicated Management switch. If you have more than 30 or 40 devices you may find the need to have multiple Management switches. In that case, then we’ll want to build a Leaf/Spine Management network. This Leaf/Spine Management network has Spines configured with SVIs (Switched Virtual Interfaces) that provide the Default Gateway and routing functionalities needed for the Management network. Next, cable the Management Leafs to the Management Spines. Typically we don’t need a lot of bandwidth here. The majority of the traffic flowing through the OOB network is often SSH, Streaming Telemetry, SNMP, and possibly API traffic. However, file transfers for things like code upgrades could also be configured to flow across the OOB network and in that case, could use more bandwidth than the other protocols mentioned.
Now that you’ve built a Leaf/Spine Management network and connected all of your network devices to the Leafs we need a way to route into this network. We recommend using a separate firewall for this subnet. The management network is often assigned a /24 prefix. 250 IP addresses are usually enough to accommodate all of the devices. To be on the safe side, allocating a /23 prefix would be even better. This allows for growth should you end up migrating from a legacy network to a new network. Allocating a /23 prefix also helps in device identification. Strategically assigning devices to different /24s within that 3rd octet of the /23 may help for quick reference of what the device is, its place in the network, or perhaps its age in the network.
We suggest you keep the OOB management network simple. By that we mean there likely isn’t a reason to build a complex Layer 3 Leaf/Spine network with layered on technologies such as VXLAN or EVPN VXLAN. You want the OOB management network to be straight forward, rock-solid, and easy to understand.
In the case of multiple data centers, we recommend that you provide a dedicated OOB management network separately for each data center. We don’t recommend that you stretch Layer 2 of the management network between data centers even if such paths exist. We’re focusing on simplicity and fault isolation. We don’t want to complicate this portion of the network nor do we want to create other dependencies of which are the exact reason we are building a network that is segmented off from Production.
Access to the OOB
Once you build the basics of the OOB network you need to consider all of the ways you may need to access this network. Starting from the most common means; creating a path from the Production network where you sit within the office. We need to make sure the OOB network subnet or subnets is reachable from both wired and wireless connections. If you have multiple wired or wireless subnets that you could be connected from, then verify that reachability from those networks to the OOB management network is built.
VPN to the OOB Network
To help with true isolation yet full remote access, we recommend you install a firewall with VPN capabilities and at least one if not two Internet Service Providers. This Internet pathway is purposefully separate from the Corporate Internet and Corporate VPN. If the outage is impacting the Production Internet path or Production Internet firewalls we need to make sure we can still get into the network. Consider scenarios where a Denial of Service attack is happening. If your company is targeted it is likely that the malicious traffic is aimed at the Production Internet and firewalls by means of publicly known IP addresses. By installing a separate firewall and Internet connection you create a path that is unlikely known to the remote attackers. This ensures you have backdoor access even while the front door of the business is under assault. This separate Internet connection could be a smaller model of your Production firewalls and managed with the same tools. You could even go as low as a simple compute device with PfSense installed and a VPN software package added. This holds true for co-lo facilities as well. Along with your Production traffic cross-connects to the MMR (Meet Me Room), add in a few cross-connects for a completely separate, non-Production Internet path.
Often overlooked, we need to consider all of the services the OOB network devices rely upon. First and foremost, authentication and authorization. If the sky is falling and services are not reachable, how are you going to authenticate on the Production network devices? You likely have the option for a local username/password on the device that is a fallback when TACACS or RADIUS servers are unreachable. Another option that provides more accountability than a shared service account could be to stand up another TACACS or RADIUS server within the OOB network. Configure your Production devices to use this as the Secondary or Tertiary option for authentication, authorization, and accounting. If you elect to simply use a service account be sure to test it regularly and don’t forget to rotate this username/password based on your corporate password lifetime policies or when an employee leaves the organization.
The physical Management interface is a good place to start. But, it is just that, a start. To level up your access to the Production devices we recommend considering adding a Console server. The Production switches get cabled from their Console interface to the Console server. This allows you to get the equivalent of local physical access to the Production devices. This comes in handy when a device is malfunctioning so badly that it can’t respond to CLI SSH or API access. It may be stuck in a boot cycle where it hasn’t fully loaded the network operating system. Or, it may be under such a heavy load that CLI SSH or API access is rendered unusable. There are plenty of Console server options available in the market. When shopping for a console server we suggest you account for how many Production devices need to be cabled to it, that it has an interface that can be assigned an IP address for remote access. That each connection can be labeled so that it is easy for you to find the device you need to Console into. Some Console servers come with VPN or LTE options to give you more choices in how you’ll access the invaluable backdoor.
Caring for the OOB network
First, monitor it. Just as closely as you watch the health of the Production network you need to monitor it. Enable Streaming Telemetry from it. Put it in your polling servers to measure availability. Use Arista’s CVP (CloudVision Portal) to manage the configurations and monitor the health of the OOB network devices. The OOB network should be held to the same KPI (Key Performance Indicators) standards as the Production network.
You need to test it. Schedule regular intervals, such as quarterly, for testing that you can access the OOB network. Verify that you can access it from all locations. From your primary workspace, from within the data center itself as well as from remote offices. Test that you have full access whether you are on a wired network or wireless network. Make sure everyone on the team knows the service account usernames and passwords if that is the break-glass-in-case-of-emergency access method. And because Murphy prefers to strike in the wee hours of a Sunday morning when you are hungover from a weekend cookout with the neighbors, make sure you have full functionality when using remote access. Test it with the standard Corporate VPN as well as from the separate VPN pathway that we recommended above.
In times of critical need, when the business is more than happy to scream at you about how much $MM they are losing during every minute of the outage, you’ll be glad you defended the need for a robust OOB network. Once the devastating outage has been resolved and your shoulders finally slump down in relaxation that the event is over…you’ll be glad you stood up for the OOB network. You’ll be glad you didn’t treat the OOB network as a second class citizen. You’ll be thankful that you took a Pause – Revisited the Fundamentals – and cared for your precious OOB network.
Arista EOS Hardening Guide