• A Simple Quality of Service Design Example

 
 
Print Friendly, PDF & Email

While there is plenty of documentation available discussing the individual mechanics of Quality of Service, such as Class of Service (CoS) or Differentiated Services Code Point (DSCP) markings and what they mean, there is not as much documentation available bridging the gap from those basic building blocks to a working network QoS deployment. There are some understandable reasons for that lack of documentation, because the design and implementation of a QoS policy on a network is so closely coupled to the specific network’s business objectives and policies that it’s hard to develop much of a QoS policy and have it be relevant to a broad audience.

Introduction

Given that a network’s QoS policy is so closely tied to the network’s business objective, this means that the config examples in this article are likely not going to be directly applicable to your network. The alternative was that I try to keep it applicable to everyone by keeping the design much more abstract, but I feel that a concrete example based on specific decisions (albeit likely wrong decisions or not the most important decisions for your network) will still be helpful in understanding how to apply the mechanics of Access Control List based Quality of Service. In this article, I’m going to assume that you’re familiar with the fundamentals of QoS (i.e. what DSCP and CoS marks are), and walk through a concrete example making somewhat arbitrary traffic engineering decisions just so we have something to talk about.

Said differently, there is a lot of effort needed at the beginning working with the served customers of the network to define the metrics of success for a QoS policy before you can start implementing them. This article isn’t going to be a lesson on how best to navigate office politics, so we’re going to gloss over these extremely important initial steps and move quickly to the implementation step:

  • Identify the Business Objective – Why are you looking to implement QoS? Is it to enable you to more effectively over-subscribe expensive links while keeping the business critical applications functional? Is it to harden your network against unusual or unexpected events like denial of service attacks or large and popular OS/application updates? Or are you most interested in being able to decouple different classes of users or applications so you can do things like run bulk backups without impacting voice user’s experience in the middle of the day?
  • Define the trust boundary for the QoS policy – In a datacenter environment where the networking team and the applications team are able to work closely, the best place to be marking which traffic is more important than what other traffic can be as far out as the actual application on the server, since the application has the most context about how important each flow is. Alternatively, in a campus or carrier environment where there’s no collaboration between the network and the end users of the network, you will want to make a clear demarcation of where you will not trust the QoS markings on a packet and will classify/mark them based on your own policy before allowing them to flow through the rest of the network. For example, if you were an ISP and one of your customers realized that they can get better Internet if they mark all of their traffic as “Expedited Forwarding,” you might not be able to rely on them not doing that, so you might not want to trust the DSCP QoS markings coming from the customer access ports.
  • Determine which traffic needs to be treated differently and how to identify it – Once you decide that a certain class of traffic should be handled as more or less important than other traffic, you need to be able to write access control lists to somehow match on that traffic and apply the traffic labels which you then use on egress interfaces to queue traffic differently.
  • Decide how to handle different categories of traffic differently to meet the original objectives – Once the traffic is classified and marked on ingress, you will want the switch to somehow act differently on different classes of traffic.

To even further complicate things for us here, implementing quality of service policies is also closely coupled to the exact capabilities of the Arista platform you’re working on. Depending on the underlying hardware, different platforms will support differing numbers of traffic queues, buffer sizes, packet remarking capabilities, etc. For this example, I’m going to be working on the 7280SE platform, but most of the configuration syntax should be substantially the same regardless of what platform you’re running EOS on.

7280SE#show version
Arista DCS-7280SE-64-F
Hardware version:    01.10
Serial number:       xxx
System MAC address:  xxxx.bbbb.ccc1

Software image version: 4.23.2F
Architecture:           i686
Internal build version: 4.23.2F-15405360.4232F
Internal build ID:      85bd3770-9843-4dbb-b95b-cd95eb743c5b

On the 7280E/7280R platforms, the default hardware resource allocation profile doesn’t enable applying QoS ACL policies on L2 switched traffic. For this QoS example, I’m going to try and keep it as simple as I can by applying quality of service between two L2 ports on the same VLAN, so we need to enable L2 QoS ACL support to be able to apply a qos service policy on the interfaces.

7280SE#show hardware tcam profile default feature qos ipv6
Profile default [ FixedSystem ]
 Feature:             qos ipv6
 Key size:            320
 Key Fields:          dst-ipv6, ipv6-next-header, ipv6-traffic-class,
                      l4-dst-port, l4-src-port, src-ipv6-high, src-ipv6-low
 Actions:             set-dscp, set-policer, set-tc
 Packet type:         ipv6 forwarding routed

Notice how the only packet type listed is “forwarding routed” – This means that if you’re implementing QoS in an entirely L3 routed environment, the default TCAM profile may support the QoS features you need, but if you’re building a layer 2 switched network where you want quality of service, we’re going to need to make some changes. Thankfully, our friendly EOS developers anticipated that, so you don’t need to write your own TCAM profile to enable QoS on L2 switched traffic; EOS includes a “qos” profile with the L2 “forwarding bridged” feature enabled:

7280SE#show hardware tcam profile qos feature qos ipv6
Profile qos
 Feature:             qos ipv6
 Key size:            320
 Key Fields:          dst-ipv6, ipv6-next-header, ipv6-traffic-class,
                      l4-dst-port, l4-ops, l4-src-port, src-ipv6-high, vlan
 Actions:             set-dscp, set-policer, set-tc
 Packet type:         ipv6 forwarding bridged
                      ipv6 forwarding routed

So assuming that this alternative profile meets your needs, you can activate it to enable L2 QoS policies:

7280SE#configure
7280SE(config)#hardware tcam
7280SE(config-hw-tcam)#system profile qos
7280SE(config-hw-tcam)#end

Introduction to QoS Service Profiles

For this article, we’re going to start by pouring all of the traffic into one big bucket, and then one by one start picking out a few types of traffic to build a relatively basic four class QoS model using the quality of service access control lists feature in EOS. The sky’s the limit on how sophisticated you want to ultimately make your QoS profile, but I will caution you that it might be prudent to start with a small and simple policy and slowly add to it to see which changes have a bigger or smaller impact for your users and applications. So to start, let us implement the rather silly policy of treating every single packet the same. We can do this by creating a QoS policy-map, editing the class-default action to set the same traffic class on every packet, and add that policy map to a QoS profile.

policy-map type quality-of-service pmap-example
   class class-default
      set traffic-class 1
!
qos profile qprof-example
   service-policy type qos input pmap-example

Is this a good QoS profile? No. It’s a terrible policy, but we’re starting simple. We’re also using the QoS profile feature to simplify applying settings to multiple interfaces at once without duplicating all of our configuration as it gets more complicated.  Now we can apply this profile to a few interfaces on a switch and we can think about what this means:

interface Ethernet1
   switchport access vlan 10
   service-profile qprof-example
!
interface Ethernet2
   switchport access vlan 10
   service-profile qprof-example

So we’re telling EOS that all traffic coming in on these interfaces should be classified as “traffic-class 1” per the default class in the policy-map regardless of their packet contents. This means that when that traffic goes to egress the switch, EOS refers to the “traffic-class to transmit queue” map to place this traffic in the appropriate transmit queue.

7280SE#show qos maps | begin Tc - tx
   Tc - tx-queue map:
     tc:        0  1  2  3  4  5  6  7
     ---------------------------------
     tx-queue:  0  1  2  3  4  5  6  7

One thing that is important to note is that the concept of the packet’s traffic class is entirely local to this switch. When the packet egresses the switch, any information about what traffic class it was classified as is lost, so if you want to carry that information across your network, you would need to set the CoS or DSCP marks on the packet as well.

Traffic class is local to the device, CoS is local to the L2 segment, and DSCP markings can be carried across the entire IP network

Using CoS or DSCP markings can be handy if you only want to be using hardware resources to classify traffic on the edge of your network; a common design pattern is to have your access switches classify and mark all of the traffic, then all your core network needs to do is trust the CoS/DSCP markings and handle the various traffic accordingly. In this example, we’re going to be focusing on a single standalone switch, which means I won’t be including “set cos” or “set dscp” parameters in my policy map, but remember that those are options if you want to carry your classifications further than the current switch.

An overview of how CoS, DSCP, and policy maps on ingress interfaces get mapped to transmit queues

This figure shows a simplified overview of the different ways that traffic can be classified into different traffic classes (based on either a CoS map, a DSCP map, or a policy map like we’re doing here) and how the traffic classes are then mapped to transmit queues.

Protecting Network Routing Protocols

Like I said, this “everything in one bucket” concept isn’t a very effective QoS policy, and a good way to see how that’s so is to imagine the extreme where you try and send much more traffic out an interface than it can support. Excess traffic means the transmit buffers start to fill, and eventually the switch is forced to start dropping packets, and since we’ve assigned every packet to traffic class 1, the switch is going to be completely indiscriminate about which packets it drops.

Let’s pretend that we’re using some combination of OSPF and BGP as the routing protocols across this network. If our switch is dropping traffic, we probably don’t want it to be dropping our routing protocol traffic, since there is so little of it, and it’s so important. If we happen to drop enough BGP traffic to cause a BGP peering to time out and tear down, we’ll certainly fix the congestion problem, but causing our routing protocol to flap and have an outage for a few minutes while our network reconverges probably doesn’t meet our service objectives for this network.

To fix this risk of our simple “everything in one bucket” policy dropping routing protocol traffic, we should write an ACL to match on the network control traffic we’re interested in delivering over anything else, and add a class map to the policy map to set a higher traffic class on this traffic.

ipv6 access-list acl-qos-networkcontrol-v6
   5 remark Match OSPFv3 protocol traffic
   10 permit ospf any any
   15 remark Match BGP protocol traffic
   20 permit tcp any any eq bgp
   30 permit tcp any eq bgp any
!
class-map type qos match-any cmap-networkcontrol-v6
   match ipv6 access-group acl-qos-networkcontrol-v6
!
policy-map type quality-of-service pmap-example
   class cmap-networkcontrol-v6
      set traffic-class 6
   !
   class class-default
      set traffic-class 1

As you can see in the addition to the “pmap-example” policy map, this new class of traffic has been assigned a traffic-class of 6, which is higher than 1, so now any incoming routing protocol traffic will be handled with a higher priority and always transmitted before anything else on the egress interface. This policy expresses the idea that we consider network stability more important than any other traffic; given the choice between dropping a routing protocol packet or any user data packet, we’d rather drop user data and keep the network operational for the rest of the user data.

Prioritizing Latency/Jitter Sensitive User Traffic

After protecting our network protocols from congestion, we can now turn back to the big pile of traffic which we put in the traffic-class 1 and ask the next logical question: “What user traffic is the most important to not drop and deliver in a timely manner?”

This is where it again gets tough to talk about specifics, because the question of what user traffic is the most important and how to identify it can’t have a universal answer. It gets even more subtle when you consider the fact that different categories of traffic may get prioritized, but for different reasons!

ipv6 Access List acl-qos-tc5-v6
   10 permit udp any any eq ntp
   20 permit udp any eq ntp any
   30 permit udp any any eq domain
   40 permit udp any eq domain any
   50 permit tcp any any eq domain
   60 permit tcp any eq domain any

Take this ACL for example. Lines 10 and 20 match on NTP, not because we don’t want to drop NTP traffic (NTP can be very loss tolerant) but because we want NTP traffic to get delivered as quickly and consistently as possible to improve the ability to synchronize hosts to our NTP servers accurately. Line 10 matches on any IPv6 address as the source, and any address as the destination with the destination UDP port equal to 123 (NTP), where line 20 does the reverse. This can be a fine rule on its own, but if you also happened to know the exact addresses of your NTP servers, you could write an even more specific rule to match on this traffic as well. (This statement will be true for pretty much everything in this article, at the expense that more specific ACLs can consume more of the limited hardware resources available on each switch)

Lines 30-60 match on DNS traffic (Port 53, “domain”), not because the jitter for DNS matters, but because dropping DNS traffic can be so painful. Every other application connection depends on one or possibly many DNS resolutions, and if the DNS query or response is dropped, the end host needs to depend on their local timeouts to try again. This can severely impact the tail latency for applications, and DNS traffic takes so little bandwidth, so why not prioritize it over all other user traffic to make sure it’s not dropped?

The other traditional high priority user traffic which is very loss and jitter sensitive, and usually the primary motivator for a network rolling out a QoS policy, is Voice over IP (VoIP) traffic. To support good user experiences while using VoIP, you want to make sure that there’s as little packet loss and jitter as possible to prevent gaps in the audio and as little buffering delay as possible since that can cause problems when people are trying to have conversations. The problem with prioritizing VoIP traffic is that there are so many different VoIP protocols, it isn’t always straightforward how to identify this traffic. If you’re lucky, your VoIP devices or applications will use the standard EF (Expedited Forwarding, 46) DSCP code point to mark its traffic, and assuming you’re willing to trust your users to not abuse that DSCP marking, you can match on that:

ipv6 Access List acl-qos-tc5-v6
   70 permit ipv6 any any dscp ef

The problem is that many VoIP systems won’t utilize DSCP traffic marking by default, so you may need to enable that feature in your VoIP application, or consult the vendor’s documentation on how else to best match on their traffic. For example, Zoom is a popular teleconferencing service used by many companies, and if you consult their documentation, they include information on how to identify their traffic on your network. The main thing to take from this documentation is that all of their traffic comes from their 2620:123:2000::/40 address block, and they use the UDP ports 8801-8810 for the voice transport. We could match on either one of those facts, or both, if Zoom traffic is important to prioritize on your network.

ipv6 Access List acl-qos-tc5-v6
   75 remark Match on traffic to/from Zoom.us
   80 permit ipv6 any 2620:123:2000::/40
   90 permit ipv6 2620:123:2000::/40 any

Remember that this access control list on its own doesn’t change anything on the switch, so we need to add another class map to our policy map for this different class of traffic:

class-map type qos match-any cmap-networkcontrol-v6
   match ipv6 access-group acl-qos-networkcontrol-v6
!
class-map type qos match-any cmap-tc5-v6
   match ipv6 access-group acl-qos-tc5-v6
!
policy-map type quality-of-service pmap-example
   class cmap-networkcontrol-v6
      set traffic-class 6
   !
   class cmap-tc5-v6
      set traffic-class 5
   !
   class class-default
      set traffic-class 1

Deprioritizing Less Time Sensitive Traffic

Just as how the last section showed how to prioritize some user traffic above the default “Best Effort” delivery class, it is also possible to identify traffic which is less important than default. Consider background or transactional traffic like backups or file transfers; not only is traffic for a large file transfer less urgent than the VoIP traffic crossing the same network, it’s likely that you’d consider a large transfer less urgent than a user interacting with a web application. When a user is interactively using a website, we would prefer that their request when they click on something and the response they get back be delivered in a more timely manner than a part of a bulk backup. When someone (or also likely background scheduled backup jobs) fire off a large transfer, it’s unlikely if they could even tell if packets are significantly delayed. As long as the transfer eventually completes, there’s no disadvantages to large amounts of jitter or delay; the TCP protocol is designed to deal with jitter/delay/packet loss, and any small part of a file transfer often doesn’t matter to the receiver until you’ve delivered the whole file.

This concept of less important traffic is often called “scavenger class” traffic or a “lower effort” per-hop behavior. Again, the specifics of exactly what traffic falls into this class on your network depends on your objectives, so these are just some examples of the sort of traffic you might want to consider classifying. This feature is made possible by the fact that our original “everything in this traffic class by default” choice was traffic-class 1, which isn’t the lowest traffic class. This leaves us with the ability to write another QoS class map to map traffic into the lower priority #0 transmit queue.

ipv6 access-list acl-qos-scavenger-v6
   10 permit tcp any any eq rsync
   20 permit tcp any eq rsync any

One good example of traffic which could be eligible for scavenger class treatment would be rsync. Designed specifically for bulk transfers and backups, rsync is commonly used for transferring large sets of files across the network. It is important to note that these two rules match on raw rsync, and would not match on the also common application of rsync where it is tunneled over ssh.

ipv6 access-list acl-qos-scavenger-v6
   30 permit tcp any any eq ftp-data
   40 permit tcp any eq ftp-data any

Another common file transfer protocol, although one that is quite a bit older than rsync, is the classic File Transfer Protocol. FTP is interesting because it actually separates the data stream from the control stream on two different ports, so while it makes sense to match on port 20 “ftp-data” traffic as scavenger class traffic, port 21 “ftp control” traffic is much smaller, and can often be users interactively getting listings of folders, so that traffic may not qualify as scavenger class traffic.

ipv6 access-list acl-qos-scavenger-v6
   50 permit tcp any any eq ssh dscp cs1
   60 permit tcp any eq ssh any dscp cs1
   70 permit tcp any any eq ssh dscp 2
   80 permit tcp any eq ssh any dscp 2

I think one of the most interesting examples of a scavenger class candidate is Secure SHell, or SSH. This is because SSH is particularly tricky to classify, due to the fact that it’s often used for two very different things; SSH can be used to remotely interact with a system, which is not a scavenger class sort of traffic, since you have users who expect their SSH session to not be laggy, but at the same time SSH copy (SCP) is used to perform bulk file transfers on the same port, so treating all SSH traffic as interactive may also not be the best option.

This is a great example of why the best place to mark IP traffic is on the end hosts if that is possible, because SSH is entirely encrypted, so there’s no way for a device in the middle of the network to look inside an SSH stream to determine if it’s interactive or not. Thankfully, the OpenSSH developers considered this! By default, OpenSSH actually sets the DSCP code point in all of its traffic as “CS1” if it’s a bulk SCP operation, or “AF21” if it’s an interactive session, so we can use that difference to distinguish between the two! (Unfortunately, this update to OpenSSH to use DSCP only happened in 2018, so many of the SSH servers out there still use the deprecated Type of Service code points, so they mark interactive traffic with the DSCP code point of decimal 4, and bulk traffic decimal 2)

Default OpenSSH DSCP markings for traffic

Putting all of these classes of traffic together, the finished policy map for this example looks like the following:

ipv6 access-list acl-qos-networkcontrol-v6
   5 remark Match OSPFv3 protocol traffic
   10 permit ospf any any
   15 remark Match BGP protocol traffic
   20 permit tcp any any eq bgp
   30 permit tcp any eq bgp any
!
ipv6 Access List acl-qos-tc5-v6
   10 permit udp any any eq ntp
   20 permit udp any eq ntp any
   30 permit udp any any eq domain
   40 permit udp any eq domain any
   50 permit tcp any any eq domain
   60 permit tcp any eq domain any
   70 permit ipv6 any any dscp ef
   75 remark Match on traffic to/from Zoom.us
   80 permit ipv6 any 2620:123:2000::/40
   90 permit ipv6 2620:123:2000::/40 any
!
ipv6 access-list acl-qos-scavenger-v6
   10 permit tcp any any eq rsync
   20 permit tcp any eq rsync any
   30 permit tcp any any eq ftp-data
   40 permit tcp any eq ftp-data any
   50 permit tcp any any eq ssh dscp cs1
   60 permit tcp any eq ssh any dscp cs1
   70 permit tcp any any eq ssh dscp 2
   80 permit tcp any eq ssh any dscp 2
!
class-map type qos match-any cmap-networkcontrol-v6
   match ipv6 access-group acl-qos-networkcontrol-v6
!
class-map type qos match-any cmap-tc5-v6
   match ipv6 access-group acl-qos-tc5-v6
!
class-map type qos match-any cmap-scavenger-v6
   match ipv6 access-group acl-qos-scavenger-v6
!
policy-map type quality-of-service pmap-example
   class cmap-networkcontrol-v6
      set traffic-class 6
   !
   class cmap-tc5-v6
      set traffic-class 5
   !
   class cmap-scavenger-v6
      set traffic-class 0
   !
   class class-default
      set traffic-class 1

Strict Priority vs Weighted Round Robin

Now that we have separated traffic into four different classes based on their service requirements, the last thing to consider in this tutorial is how to configure transmit queues to best handle this traffic on egress from the device. By default, EOS configures the eight transmit queues to be handled in “Strict Priority”, which can be confirmed by noting the “Priority: SP” in the output of “show qos interfaces ethernet 1”:

7280SE#show qos interfaces ethernet 1
Ethernet1:
   Trust Mode: COS
   Default COS: 0
   Default DSCP: 0

   Port shaping rate: disabled

  Tx    Bandwidth         Shape Rate         Priority   ECN/WRED
 Queue  (percent)          (units)
 ----------------------------------------------------------------
   7      - / -       - / -          ( - )   SP / SP       D
   6      - / -       - / -          ( - )   SP / SP       D
   5      - / -       - / -          ( - )   SP / SP       D
   4      - / -       - / -          ( - )   SP / SP       D
   3      - / -       - / -          ( - )   SP / SP       D
   2      - / -       - / -          ( - )   SP / SP       D
   1      - / -       - / -          ( - )   SP / SP       D
   0      - / -       - / -          ( - )   SP / SP       D

Note: Values are displayed as Operational/Configured

Legend:
RR -> Round Robin
SP -> Strict Priority
 - -> Not Applicable / Not Configured
 % -> Percentage of line rate
ECN/WRED: L -> Queue Length ECN Enabled     T -> Queue Delay ECN Enabled     W -> WRED Enabled     D -> Disabled

Service-policy pmap-example input ( programming: Successful )

This strict priority means that the switch will always pull available traffic from the highest priority queue available until the queue is empty before considering lower priority queues. This means that any traffic in tx-queue 6 will always be transmitted before tx-queue 5, 5 before 1, and 1 before 0.

Congestion with strict priority queues drops all lower class traffic

While strict priority does ensure the lowest queuing delay and the most bandwidth for higher priority traffic, it runs the risk of completely starving lower priority queues of any bandwidth at all. This can be problematic because many applications do not behave well when they start to experience 100% packet loss. Instead of a low priority flow backing off and using less resources until the amount of higher priority traffic lowers, with complete starvation some applications may completely fail and cancel the requested operation all together.

Weighted Round Robin queuing ensures that every traffic class receives at least a minimum fraction of available bandwidth

The alternative to strict priority queuing is weighted round robin (WRR), where traffic is taken from multiple queues equally, with an optional bias towards some queues over others. This enables you to specify that a certain class of traffic should always be allowed a certain percentage of the link bandwidth to prevent total starvation, but ensure that more important traffic flows are allowed to use a greater portion of the available bandwidth.

Note: EOS supports mixing both strict priority and weighted round robin queues on the same egress interface, with the limitation that all strict priority queues must be higher priority than all the WRR queues.

For our design example here, it makes sense to leave the network control traffic and jitter-sensitive VoIP traffic as strict priority queues, but specify that the Best Effort (tx-queue 1) and Least Effort (tx-queue 0) traffic should share the remainder of the link on a weighted round robin basis to prevent starvation and application failures for the Least Effort traffic. To configure this, we’re going to add tx-queue configuration options to our qos profile to convert tx-queues 0-4 to be treated as round robin queues, and then bias the queues such that tx-queue 0 gets a smaller fraction of the bandwidth than tx-queue 1.

qos profile qprof-example
   service-policy type qos input pmap-example
   !
   tx-queue 0
      bandwidth percent 5
   !
   tx-queue 4
      no priority

Notice that the “no priority” configuration only needs to be applied to tx-queue 4, since any “no priority” statement will implicitly change all lower priority queues to round robin regardless of whether they are configured as priority queues or not.

The round robin queues have the available bandwidth split evenly between them by default, so since we explicitly specify that tx-queue 0 should only get 5% of the bandwidth under congested conditions, the other 95% of the remaining bandwidth after the strict priority queues is automatically divided between tx-queues 1-4.

7280SE#show qos interfaces ethernet 1
Ethernet1:
   Trust Mode: COS
   Default COS: 0
   Default DSCP: 0

   Port shaping rate: disabled

  Tx    Bandwidth         Shape Rate         Priority   ECN/WRED
 Queue  (percent)          (units)
 ----------------------------------------------------------------
   7      - / -       - / -          ( - )   SP / SP       D
   6      - / -       - / -          ( - )   SP / SP       D
   5      - / -       - / -          ( - )   SP / SP       D
   4     26 / -       - / -          ( - )   RR / RR       D
   3     23 / -       - / -          ( - )   RR / SP       D
   2     23 / -       - / -          ( - )   RR / SP       D
   1     23 / -       - / -          ( - )   RR / SP       D
   0      5 / 5       - / -          ( - )   RR / SP       D

Note: Values are displayed as Operational/Configured

Legend:
RR -> Round Robin
SP -> Strict Priority
 - -> Not Applicable / Not Configured
 % -> Percentage of line rate
ECN/WRED: L -> Queue Length ECN Enabled     T -> Queue Delay ECN Enabled     W -> WRED Enabled     D -> Disabled

Service-policy pmap-example input ( programming: Successful )

Notice how the priority for tx-queue 4 is configured as “RR” instead of “SP”, which causes all five queues 0-4 to enter an operational priority state of “RR”! The “RR / RR” in the priority column for Tx queue 4 means that it is operational as a round robin queue and configured as a round-robin queue; tx-queues 0-3 are listed as “RR / SP” since they’re still configured as strict priority but are forced to be operating as round robin queues due to us configuring tx-queue 4.

Also notice how the configured 5% bandwidth limit on tx-queue 0 is applied, and the remainder is roughly divided between queues 1-4.

qos profile qprof-example
   service-policy type qos input pmap-example
   !
   tx-queue 0
      bandwidth percent 5
   !
   tx-queue 4
      no priority
   !
   tx-queue 5
      shape rate 10 percent
   !
   tx-queue 6
      shape rate 500000

It is also possible to enforce hard limits on the amount of bandwidth that strict priority queues are allowed to use. This can be used if you’re confident making statements like “our routing traffic should never exceed 500Mbps” or “our VoIP traffic shouldn’t use more than 10% of the link” to protect the entire pool of WRR tx-queues from starvation if something were to go wrong and an unreasonable amount of strict priority traffic was to start crossing the network.

Conclusion

Remember that this article has covered only a basic application of the QoS features supported in EOS creating a four class QoS model to demonstrate the principles of how to apply QoS policies to traffic.  We skipped over the prerequisite task of identifying the problem areas in the network and what categories of traffic can be safely deprioritized to ensure the best experience for the most important traffic on the network. To identify the places of congestion and the source of that traffic, you may consider using features such as sFlow and LANZ which will give you better visibility into the health and behavior of your network so your work to improve performance can be better informed as to how much of a difference you’re making and if further work or upgrades are needed.

This article glossed over the concept of carrying traffic classifications further than the local switch. It’s likely in a real world deployment that you would want to apply CoS and/or DSCP markings to your classified traffic, and have a different qos profile to apply to core network links to trust those markings on ingress but still configure the egress tx-queues on those core links appropriately.

Finally, we didn’t talk about more sophisticated QoS tools available in EOS such as WRED (Weighted Random Early Detection) or ECN (Explicit Congestion Control) which can help limit queuing delay for traffic without suffering packet drops.

Follow

Get every new post on this blog delivered to your Inbox.

Join other followers: