• Understanding Table Sizes on the 7050QX-32

 
 
Print Friendly, PDF & Email

A common question asked about Arista switches is “how many routes can they handle”, and unfortunately, this is never an easy question to answer. Dedicated switch ASIC hardware is required to program each route so that when a packet arrives with a certain destination address, the switch can look up the destination and route the packet to the correct interface at line-rate across all the ports. The part that makes it hard is that there is practically never a 1:1 mapping between hardware resources on a switch and the number of routes that can be programmed into them, and under some circumstances, there isn’t even a constant ratio between the number of routes and the number of hardware resources they consume.

This article is going to walk through the table scale and options available to change table scale on the DCS-7050QX-32 platform specifically to help you understand the sorts of moving parts that can impact the route scale supported on any Arista platform. This information will be directly applicable to any platform in the 7050X series, and helpful on later platforms in the 7050 product line.

Base Mode Table Sizes

As a starting point, let’s look at the datasheet for the 7050QX-32, where you will find a “table sizes” section, which is a good place to start for getting a rough idea of the route scale of any platform.

7050QX-32 Table Size

The Trident2 ASIC used in the 7050X platform has two sets of forwarding hardware; the “base mode” which is dedicated hardware to each of the three categories of forwarding resources, and the “Unified Forwarding Table”, which is a feature of the Trident2 switch ASIC used in the 7050QX-32 (and other 7050X series platforms) that allows an additional flexible pool of forwarding resources to be used to supplement the capacity of the base mode. This means that the base mode numbers are the worst-case dedicated numbers per type of forwarding resource, but the Unified Forwarding Table means the actual available scale is much larger and can be carved depending on the specific application.

That said, the table in the datasheet for the sizes in the base mode doesn’t fully convey the complexity in the dedicated forwarding resources. The issue is that these resources are shared between similar types of routes:

  • MAC Addresses: These are the 48 bit long L2 addresses that are used in the MAC address-tables on the switch for forwarding L2 bridged Ethernet frames. (These can be viewed by “show mac address-table”)
  • IPv4 Hosts: These are exact match routes for a single /32 host address, where the switch happens to know how to route an exact address because it’s a directly attached host, it’s a /32 route in the IPv4 routing table, etc. (These can be viewed by “show ip route host”)
  • IPv4 Routes – Unicast: These are what people are generally interested in when they talk about “how many routes can it handle” and are your typical “IPv4 prefix with netmask and gateway” routes that you’re used to. These are also often referred to as “Longest Prefix Match” (LPM) routes, since these routes can be any length from 0-31 bits, and Internet Protocol routing expects routing decisions to be made based on the longest (most specific) prefix in the routing table.
  • IPv4 Routes – Multicast: Unlike unicast routes, multicast routes are exact match routes, generally based on the source address and group (destination address), but the exact details can vary by which type of multicast PIM routing you’re using.
  • IPv6 Hosts – Exact match routes for full /128 IPv6 addresses in the routing table.
  • IPv6 Routes – Unicast: Longest prefix match routes for IPv6 routes, which can vary from anywhere between the ::/0 default route to /64 subnet routes, or even longer prefixes for /127 point to point routes or sub-/64 subnets.
  • IPv6 Routes – Multicast: Exact source-group match routes for IPv6 multicast routing.

So there are seven different types of routes handled here, but they actually only use three different types of resources in the hardware, so it’s important to remember that these numbers are assuming the entire pool is used for that type of route. You can see how these resources are being used using the “show hardware capacity” command in EOS. To simplify the discussion for a moment, I tricked a switch into “Base mode” so it doesn’t include the added capacity afforded by UFT (By default, EOS enables UFT, so I needed to deliberately disable it here). Trimming out all the other lines which aren’t relevant for what we’re talking about here, you can see the following:

7050qx32#show hardware capacity
Forwarding Resources Usage

Table                Feature      Chip                Used     Used        Free     Committed     Best Case        High
                                                   Entries      (%)     Entries       Entries           Max   Watermark
                                                                                                    Entries
-------------------- ------------ -------------- ----------- -------- ----------- ------------- ------------- ---------
[... SNIP ...]
Host                                                     7       0%       16376             0         16383          12
Host                 V4Hosts                             3       0%       16376             0         16383           3
Host                 V4Mroutes                           0       0%       16376             0         16383           0
Host                 V6Hosts                             4       0%       16376             0         16383          12
Host                 V6Mroutes                           0       0%       16376             0         16383           0
[... SNIP ...]
LPM                                                      4       0%        8186             0          8190        4307
LPM                  V4Routes                            1       0%        8186             0          8190           3
LPM                  V6Routes                            3       0%        8186             0          8190        4306
[... SNIP ...]
MAC                               Linecard0/0           12       0%       32756             0         32768          14
MAC                  L2           Linecard0/0           12       0%       32756             0         32768          14

So here we can see:

  • “Host” route resources are shared between IPv4 Host routes, IPv4 Multicast routes, IPv6 Host routes, and IPv6 Multicast routes.
  • LPM (Longest Prefix Match) resources are shared between IPv4 Unicast routes and IPv6 Unicast routes
  • MAC resources are used for MAC addresses.

So we can see that the four exact match route types share entries on the Host table, and two types of unicast routes share “LPM” entries, so while the 7050QX-32 can support 16k IPv4 unicast routes, that would be at the expense of not being able to support any IPv6 unicast routes, etc.

Longest Prefix Match Entries

Looking at the output of “show hardware capacity” above, we can see that the LPM (Longest Prefix Match) table has 8190 entries, which are shared between IPv4 unicast routes and IPv6 unicast routes, which might be a little strange if you notice that the datasheet claims that this platform can support 16k IPv4 unicast routes.

So how can there be twice as many IPv4 unicast routes as entries? Broadly speaking, the entry resources in the LPM table are large enough to be able to handle 64 bits worth of prefix information,  so we’re able to program two 32 bit IPv4 routes per LPM entry. The first IPv4 route in the routing table consumes an LPM entry, and the next IPv4 route is able to be programmed in the second half of the same entry. We can see this by starting with a switch with a pretty much empty routing table and checking the LPM consumption:

7050qx32#show hardware capacity
Forwarding Resources Usage

Table                Feature      Chip                Used     Used        Free     Committed     Best Case        High
                                                   Entries      (%)     Entries       Entries           Max   Watermark
                                                                                                    Entries
-------------------- ------------ -------------- ----------- -------- ----------- ------------- ------------- ---------
LPM                                                     10       0%        8180             0          8190        4307
LPM                  V4Routes                            3       0%        8180             0          8190           3
LPM                  V6Routes                            7       0%        8180             0          8190        4306

We see three entries consumed by the V4Routes table, which are used for basic always-present routes like 127.0.0.0/8, 0.0.0.0/8, etc. If we then add a static route for a subnet, we can see the number of entries grows to 4:

7050qx32(config)#ip route 10.1.1.0/24 23.152.160.1
7050qx32(config)#show hardware capacity
Forwarding Resources Usage

Table                Feature      Chip                Used     Used        Free     Committed     Best Case        High
                                                   Entries      (%)     Entries       Entries           Max   Watermark
                                                                                                    Entries
-------------------- ------------ -------------- ----------- -------- ----------- ------------- ------------- ---------
[... SNIP ...]
LPM                                                     11       0%        8179             0          8190        4307
LPM                  V4Routes                            4       0%        8179             0          8190           4
LPM                  V6Routes                            7       0%        8179             0          8190        4306

If we then add another route, we can see that the number of entries didn’t increase! So this 10.1.2.0/24 route was programmed in the second half of the same entry already consumed by the previous IPv4 unicast route:

7050qx32(config)#ip route 10.1.2.0/24 23.152.160.1
7050qx32(config)#show hardware capacity
Forwarding Resources Usage

Table                Feature      Chip                Used     Used        Free     Committed     Best Case        High
                                                   Entries      (%)     Entries       Entries           Max   Watermark
                                                                                                    Entries
-------------------- ------------ -------------- ----------- -------- ----------- ------------- ------------- ---------
[... SNIP ...]
LPM                                                     11       0%        8179             0          8190        4307
LPM                  V4Routes                            4       0%        8179             0          8190           4
LPM                  V6Routes                            7       0%        8179             0          8190        4306

So that explains how we’re able to program 16k IPv4 routes in 8190 entries. Two routes fit per entry, so it’s important to remember that the output of “show hardware capacity” is in units of whatever the underlying hardware operates in, which may not directly be the unicast routes you’re used to thinking about. You can also see the exact entries programmed into the switch hardware using “show platform trident” commands, so if you really wanted to know what’s taking up those four V4Routes entries in the hardware:

7050qx32#show platform trident l3 software routes
Host table
VrfId  Host                   Position    EntryId   Class  NextHopId  Ecmp  Hit
    0  23.152.160.1            0xb38      2872       0         13     N    N
    0  23.152.160.0            0x218       536       1         11     N    N
    0  23.152.160.16           0x9e4      2532       1         10     N    Y
    0  23.152.160.31           0x738      1848       1         11     N    N

Lpm table
VrfId  Prefix                  Position  SubBucket    EntryId  Class  NextHopId  Ecmp  Hit
    0  23.152.160.0/27            1431           0       1431      0         12     N    N
    0  10.1.2.0/24                2099           0       2099      0         14     N    N
    0  0.0.0.0/8                  6248           0       6248      0          3     N    N
    0  127.0.0.0/8                6248           0       6248      0          4     N    N
    0  10.1.1.0/24                2099           0       2099      0         14     N    N
    0  0.0.0.0/0                  8191           0       8191      0          2     N    N

So looking at the Lpm table, you can see that the three entries used before we added any routes were for the directly attached subnet (23.152.160.0/27 in entry 1431), the 0.0.0.0/8 local identification subnet (entry 6248), the 127.0.0.0/8 loopback subnet (which is sharing entry 6248) and a catch-all default 0.0.0.0/0 in entry 8191 (despite this switch not currently having a default route in the RIB! You can check “show platform trident l3 software next-hops” to see that the NextHopId is pointing to a “punt to CPU and drop the packet” next hop, so this exists just so that we can be sure that every packet eventually matches something in our hardware routing table). We can also see that the two new 10.1.1.0/24 and 10.1.2.0/24 routes we added both got programmed in position 2099, so they ended up sharing the entry which agrees with what we saw when the “show hardware capacity” numbers didn’t change when we added the second route.

Fitting a 128 bit IPv6 address in a 64 bit LPM entry

The next possible question might be “given that these LPM entries can handle 64 bits of prefix, how does it handle IPv6 routes when IPv6 addresses are 128 bits long?” Most IPv6 routes are either for whole /64 subnets (or shorter) or for exact /128 host addresses (which get programmed in the Host table instead of the LPM table since they are exact match lookups). So if EOS was able to make the assumption that you were willing to never be able to route /127s or /112s or any other unusually long IPv6 unicast prefixes, EOS could use each of the 8190 LPM entries independently and you’d be able to route 8k prefixes (which is a mode that EOS doesn’t actually offer, by the way).

You may want to be able to route only a few IPv6 prefixes longer than /64s, or you might want to be able to route a lot of prefixes longer than /64, so fortunately EOS supports the concept of bonding two adjacent LPM entries together to form one 128 bit “double” LPM entry, which can handle an IPv6 prefix of any length. Unfortunately, this linking of adjacent LPM entries needs to be done ahead of time, so you need to decide in advance how many of these longer prefixes you need to support with the “platform trident routing-table partition” command:

7050qx32(config)#platform trident routing-table partition ?
  1  16K IPv4, 6K IPv6 ( prefix len up to /64 ), 1K IPv6 ( any prefix length )
  2  16K IPv4, 4K IPv6 ( prefix len up to /64 ), 2K IPv6 ( any prefix length ) <<< DEFAULT
  3  16K IPv4, 2K IPv6 ( prefix len up to /64 ), 3K IPv6 ( any prefix length )
  4  16K IPv4, 0K IPv6 ( prefix len up to /64 ), 4K IPv6 ( any prefix length )

(Changing these partition settings are hitful and cause the StrataL3 agent managing the hardware to restart, so be prepared to drop all your traffic for a few seconds when changing this)

EOS offers four different ways to partition how longer IPv6 routes are handled. The default profile (2) ties half of the LPM entries together, so the 7050QX-32 can handle 2k IPv6 routes of any length (which each take up two LPM entries), and an additional 4k routes which can only be 0-64 bits long. If you know for a fact that you will have very few long prefixes, you can switch to the “1” partition profile, which only bonds 1k pairs of entries, and allows a total of 7k IPv6 routes to be programmed in the LPM hardware. On the other extreme, if your application is using a lot of prefixes longer than /64, you may want to apply the “4” profile, which links all of the LPM entries for IPv6 routes, so they can all be long IPv6 routes, but you can only program 4k routes in total.

Regardless of how the routing-table is partitioned for IPv6 routes, all of the entries can still be used for IPv4 routes, in which case the routes are programmed two per entry like usual, which is why the IPv4 capacity is 16k and doesn’t change across any of the route-table partitions. Just remember that the 16k IPv4 routes is assuming there are zero IPv6 routes, so if you’re running partition 2 and have 2,000 “short” IPv6 routes in your routing table, you’ll only be able to program 12k IPv4 routes in the remaining 6k LPM entries. If you run routing-table partition 4, and have 2k IPv6 routes of any length, that will consume 4k entries, and you can only program 8k IPv4 routes, etc.

Unified Forwarding Table

So far, we’ve only been looking at the “base mode” capacity of the Trident2 switch ASIC, which only includes the hardware explicitly dedicated to the three different forwarding tables. In addition to those forwarding resources, the Trident2 has a flexible pool of resources called the Unified Forwarding Table, which is additional exact match hardware resources that can be allocated to either host routes or MAC addresses, depending on which one is in more demand given the network design (more MAC addresses for large L2 deployments, more L3 Host routes for larger L3 deployments).

The partition of how this UFT is allocated between the different pools is controlled by the “platform trident forwarding-table partition” configuration:

7050qx32(config)#platform trident forwarding-table partition ?
  0  288k l2 entries, 16k l3 host, 16k lpm entries
  1  224k l2 entries, 80k l3 host, 16k lpm entries
  2  160k l2 entries, 144k l3 host, 16k lpm entries <<< DEFAULT
  3  96k l2 entries, 208k l3 host, 16k lpm entries
  4  32k l2 entries, 16k l3 host, 128k lpm entries

So the four partition choices between 0 and 3 allow you to turn the knob between maximum L2 capacity and maximum L3 host route capacity. The default is profile 2 which provides support for 160k MAC addresses and 144k host routes, which can be seen in “show hardware capacity”:

7050qx32#show hardware capacity
Forwarding Resources Usage

Table                Feature      Chip                Used     Used        Free     Committed     Best Case        High
                                                   Entries      (%)     Entries       Entries           Max   Watermark
                                                                                                    Entries
-------------------- ------------ -------------- ----------- -------- ----------- ------------- ------------- ---------
[... SNIP ...]
Host                                                     8       0%      147447             0        147455           8
Host                 V4Hosts                             4       0%      147447             0        147455           4
Host                 V4Mroutes                           0       0%      147447             0        147455           0
Host                 V6Hosts                             4       0%      147447             0        147455           4
Host                 V6Mroutes                           0       0%      147447             0        147455           0
[... SNIP ...]
LPM                                                     11       0%        8179             0          8190          11
LPM                  V4Routes                            4       0%        8179             0          8190           4
LPM                  V6Routes                            7       0%        8179             0          8190           7
[... SNIP ...]
MAC                               Linecard0/0            9       0%      163831             0        163840           9
MAC                  L2           Linecard0/0            9       0%      163831             0        163840           9

One thing to note is that UFT resources are all exact match resources, which mean at on the face of it, they can only be used for MAC addresses and /32,/128 host route forwarding, which is why the first four forwarding-table profiles all still have the same 16k Longest Prefix Match capacities; regardless of how you carve up the UFT, the base 16k LPM IPv4 routes are all that are available.

That is… until you consider the Algorithmic Longest Prefix Match feature in EOS!

Algorithmic Longest Prefix Match

ALPM (Algorithmic Longest Prefix Match) is a feature where the Unified Forwarding Table, which is normally used for exact match lookups, is repurposed to instead support longest prefix match lookups for IPv4 and IPv6 unicast routing. This feature is enabled by configuring the Trident2 platform to use the 4th forwarding-table partition profile:

7050qx32(config)#platform trident forwarding-table partition 4

This tells EOS to limit L2 MAC lookups to the dedicated 32k entries supported in the base hardware, L3 host route lookups to the dedicated 16k entries in the base hardware, and use the LPM table and the UFT together to maximize L3 unicast LPM routing capacity. This can be seen in the description of profile 4: “32k l2 entries, 16k l3 host, 128k lpm entries” – a bare minimum of exact match resources (which is still quite a bit, if you’re designing an L3 network all the way to the edge) and it repurposes the entire UFT to greatly expand the L3 LPM capacity of the 7050X platform.

The unusual part is how a flexible pool of exact match resources can be used for longest prefix match routing, which at the face of it can’t use exact match semantics. And this is where it gets very clever; EOS uses the limited LPM hardware to match on a variable length prefix, then use that to pivot into a subset of the ALPM table to exact match on the last few bits. So each individual route consumes an ALPM entry, but groups of routes with similar prefixes still depend on a single LPM entry to match the longest common prefix among them first.

Unfortunately, since the route scale now depends on how well all of the routes pack into a limited set of LPM pivots that jump into a separate (larger) pool of exact match resources, the exact number of routes that are supported on the hardware actually depend on what routes they are. Routes that happen to pack well into groups use fewer LPM entries than routes that happen to be more disjoint and not pack as well. The forwarding table partition 4 profile is rated for 128k IPv4 LPM entries, but this is actually only the absolute worst case packing. If prefixes happen to pack well into the LPM pivots, you could get as many as ~393,000 IPv4 routes programmed in hardware using all the ALPM entries. IPv6 LPM entries aren’t able to pivot to as many routes as IPv4 pivots, and double-wide IPv6 LPM entries support even fewer ALPM entries per bucket, so worst case you may only be able to program 20k IPv6 routes, while the best case is around 196,000. (To get the best IPv6 route scale, be sure to configure routing-table partition 1 in addition to forwarding-table partition 4)

Breaking Things and Seeing What Happens

Now that we’ve reviewed how the routing table can be partitioned to fine-tune the IPv6 route scale for prefixes longer or shorter than the standard /64, and how to use the forwarding table partition to fine-tune between L2 or L3 exact match scale or convert the entire Unified Forwarding Table into ALPM unicast routing resources, let’s use a copy of the full IPv6 Internet routing table as a large source of routes of various lengths and feed it to a 7050QX-32, and see what happens and how the different configuration options change things.

To start with, we’re running these experiments in December 2020, so there are about 100k routes in the IPv6 Internet table, and we’re running EOS version 4.24.3M-2GB:

7050qx32#show version
Arista DCS-7050QX-32-F

Software image version: 4.24.3M-2GB
Architecture:           i686
Internal build version: 4.24.3M-2GB-19566922.4243M
Internal build ID:      a9be6db7-cfd8-4d51-8115-64466a882821

Uptime:                 0 weeks, 0 days, 0 hours and 36 minutes
Total memory:           4009096 kB
Free memory:            2684140 kB

Given the defaults for the routing-table and forwarding-table partitions and an essentially empty routing table, we can see that we have the expected 8190 LPM entries and 8180 of them available:

7050qx32#show hardware capacity | awk 'NR < 7 || /LPM/'
Forwarding Resources Usage

Table                Feature      Chip                Used     Used        Free     Committed     Best Case        High
                                                   Entries      (%)     Entries       Entries           Max   Watermark
                                                                                                    Entries
-------------------- ------------ -------------- ----------- -------- ----------- ------------- ------------- ---------
LPM                                                     10       0%        8180             0          8190          10
LPM                  V4Routes                            3       0%        8180             0          8190           3
LPM                  V6Routes                            7       0%        8180             0          8190           7

Next, we can turn on the terminal monitor to see the error messages and turn up a BGP session with our upstream router:

7050qx32#terminal monitor
7050qx32#configure
7050qx32(config)#router bgp 64522
7050qx32(config-router-bgp)#address-family ipv6
7050qx32(config-router-bgp-af)#neighbor 2620:13b:0:1000::1 activate
7050qx32(config-router-bgp-af)#end
7050qx32#Dec 25 00:24:06 7050qx32 StrataL3: %IP6ROUTING-3-HW_RESOURCE_FULL: Hardware resources are insufficient to program all routes
Dec 25 00:24:11 7050qx32 StrataCentral: %CAPACITY-1-UTILIZATION_HIGH: LPM table utilization is currently at 99%, crossed threshold 90%
Dec 25 00:24:11 7050qx32 StrataCentral: %CAPACITY-1-UTILIZATION_HIGH: LPM-V6Routes table utilization is currently at 99%, crossed threshold 90%

The technical term for what happened there is “a bad thing”. The approximate 100k IPv6 routes in the Internet routing table exceeded the 6k routes supported in the default 7050X hardware configuration, so the StrataL3 agent (which handles L3 route programming on platforms like the 7050 series) programmed as many routes as it could before it ran out of LPM resources, then posted an error and gave up.

7050qx32#show hardware capacity | awk 'NR < 7 || /LPM/'
Forwarding Resources Usage

Table                Feature      Chip                Used     Used        Free     Committed     Best Case        High
                                                   Entries      (%)     Entries       Entries           Max   Watermark
                                                                                                    Entries
-------------------- ------------ -------------- ----------- -------- ----------- ------------- ------------- ---------
LPM                                                   8189      99%           1             0          8190        8189
LPM                  V4Routes                            3      75%           1             0          8190           3
LPM                  V6Routes                         8186      99%           1             0          8190        8186

So we’ve used 8186/8190 entries for programming IPv6 routes in hardware, but how did we do in terms of actual prefixes installed in hardware?

7050qx32#show platform trident l3 summary | grep -A 3 "(hosts+lpm)"
IPv4 routes (hosts+lpm): 8
IPv6 routes (hosts+lpm): 6147
IPv4 unprogrammed routes: 0
IPv6 unprogrammed routes: 94118

“Poorly”. We managed to program 6147 routes in hardware and then failed to program the next 94,118 routes. This makes sense because the default routing-table partition supports 4k single wide IPv6 routes and another 2k double-wide routes. Changing the partition to the best case profile 1 improves things a little:

7050qx32(config)#platform trident routing-table partition 1
Warning: the platform agents will restart immediately
7050qx32(config)#end
7050qx32#Dec 25 00:34:17 7050qx32 StrataL3: %IP6ROUTING-3-HW_RESOURCE_FULL: Hardware resources are insufficient to program all routes
7050qx32#
7050qx32#show platform trident l3 summary | grep -A 3 "(hosts+lpm)"
IPv4 routes (hosts+lpm): 8
IPv6 routes (hosts+lpm): 7171
IPv4 unprogrammed routes: 0
IPv6 unprogrammed routes: 93113

An improvement! We’ve now managed to program 7171 routes in hardware instead of 6147! Not a game-changer for routing the whole Internet table, and we still get the error message that we’ve run out of hardware resources, but this might make a difference inside a smaller network pushing the limits of the LPM table in their leaf-spine fabric.

Now let’s see how much of a difference turning on ALPM makes by changing the forwarding-table partition to 4 instead of changing the routing-table:

7050qx32(config)#platform trident routing-table partition 2 <<< Return the routing-table to the default
Warning: the platform agents will restart immediately
7050qx32(config)#platform trident forwarding-table partition 4
Warning: StrataAgent will restart immediately
7050qx32(config)#end
7050qx32#Dec 25 00:41:20 7050qx32 Bgp: %BGP-3-NOTIFICATION: received from neighbor 2620:13b:0:1000::1 (VRF default AS 64511) 6/0 (Cease/unspecified) 0 bytes
Dec 25 00:41:25 7050qx32 Bgp: %BGP-3-NOTIFICATION: received from neighbor 2620:13b:0:1000::1 (VRF default AS 64511) 6/0 (Cease/unspecified) 0 bytes
Dec 25 00:41:28 7050qx32 StrataCentral: %CAPACITY-1-UTILIZATION_NORMAL: LPM table utilization is back to normal
Dec 25 00:41:28 7050qx32 StrataCentral: %CAPACITY-1-UTILIZATION_NORMAL: LPM-V6Routes table utilization is back to normal
Dec 25 00:42:37 7050qx32 StrataL3: %IP6ROUTING-3-HW_RESOURCE_FULL: Hardware resources are insufficient to program all routes

It seemed promising for a minute there when the LPM utilization went back to normal, but that was unfortunately just because changing these platform settings caused the Strata agent to restart, which caused the BGP session with our upstream to bounce since the switch stopped moving any traffic. Once BGP came back up and reconverged, we ran out of hardware resources again.

7050qx32#show hardware capacity | awk 'NR < 7 || /LPM/'
Forwarding Resources Usage

Table                Feature      Chip                Used     Used        Free     Committed     Best Case        High
                                                   Entries      (%)     Entries       Entries           Max   Watermark
                                                                                                    Entries
-------------------- ------------ -------------- ----------- -------- ----------- ------------- ------------- ---------
ALPM                                                 94466      24%      298750             0        393216       94470
ALPM                 V4Routes                            4       0%      298750             0        393216           4
ALPM                 V6Routes                        94462      24%      298750             0        393216       94466
LPM                                                   4403      53%        3787             0          8190        8189
LPM                  V4Routes                            1       0%        3787             0          8190           3
LPM                  V6Routes                         4402      53%        3787             0          8190        8186

7050qx32#show platform trident l3 summary | grep -A 3 "(hosts+lpm)"
IPv4 routes (hosts+lpm): 8
IPv6 routes (hosts+lpm): 94463
IPv4 unprogrammed routes: 0
IPv6 unprogrammed routes: 5821

So with ALPM, we actually got really close! We managed to program all except for 5821 routes before we ran out of resources, but what resources did we run out of? The “show hardware capacity” output still reports free entries in both tables, so why didn’t we program all of the routes?

7050qx32#show platform trident l3 summary | grep -A 2 "LPM table mode"
LPM table mode: 2, table usage: 4403/8190
IPv4 entries: 1 (full: 0, half full: 1), routes: 1
IPv6 entries: 4402 (single: 4096, double: 153), routes: 4249
--
ALPM table mode: 4, table usage: 94460/393216
IPv4 routes: 4/393216
IPv6 routes: 94456 (half: 92737/163840, full: 1719/32768)

Looking at the more detailed usage of the LPM tables, we can see that we’ve used all 4096 “single” entries in the IPv6 LPM table. The single entries are much more effective for ALPM than the double-wide entries, so once we ran out of those the Strata agent failed to program the last few routes. Granted, this is a problem specific to IPv6; the routing table for IPv4 can fully use all of the LPM entries, so changing the routing-table partition doesn’t make any difference for IPv4 route scale.

Changing the routing-table partition to profile 1 to maximize the number of single wide LPM entries makes another 2k single IPv6 entries available and makes the difference we needed:

7050qx32(config)#platform trident routing-table partition 1
Warning: the platform agents will restart immediately
7050qx32(config)#platform trident forwarding-table partition 4
7050qx32(config)#end
7050qx32#
7050qx32#show hardware capacity | awk 'NR < 7 || /LPM/'
Forwarding Resources Usage

Table                Feature      Chip                Used     Used        Free     Committed     Best Case        High
                                                   Entries      (%)     Entries       Entries           Max   Watermark
                                                                                                    Entries
-------------------- ------------ -------------- ----------- -------- ----------- ------------- ------------- ---------
ALPM                                                100271      25%      292945             0        393216      100279
ALPM                 V4Routes                            4       0%      292945             0        393216           4
ALPM                 V6Routes                       100267      25%      292945             0        393216      100275
LPM                                                   5423      66%        2767             0          8190        8189
LPM                  V4Routes                            1       0%        2767             0          8190           3
LPM                  V6Routes                         5422      66%        2767             0          8190        8186
7050qx32#show platform trident l3 summary | grep -A 3 "(hosts+lpm)"
IPv4 routes (hosts+lpm): 8
IPv6 routes (hosts+lpm): 100272
IPv4 unprogrammed routes: 0
IPv6 unprogrammed routes: 0

7050qx32#show platform trident l3 summary | grep -A 2 "LPM table mode"
LPM table mode: 1, table usage: 5422/8190
IPv4 entries: 1 (full: 0, half full: 1), routes: 1
IPv6 entries: 5421 (single: 5417, double: 2), routes: 5419
--
ALPM table mode: 4, table usage: 100267/393216
IPv4 routes: 4/393216
IPv6 routes: 100263 (half: 100260/212992, full: 3/16384)

Success! We can see that there are zero unprogrammed routes in the l3 summary, and we’re only using 5417 of the 6k single IPv6 LPM entries, so we’ve got the full Internet IPv6 table programmed in hardware with resources to spare (very few resources left over, granted, but some). As the Internet routing table continues to grow, this will likely stop working very soon, so maybe not a good idea to try and route the full IPv6 Internet table in production on Trident2 platforms, and this leaves very few hardware resources for IPv4 routes, but it’s an effective demonstration of the power of ALPM and how EOS can take full advantage of the flexibility of the hardware it’s shipped on.

FIB Compression

Another useful feature that will likely complement well with ALPM is the FIB compression feature shipped in EOS 4.21.3. This feature takes a different approach to increasing the route scale available on Arista switches by considering the fact that if you have a more general route that happens to cover other more specific routes in your routing table with the same next-hop, programming the more specifics in hardware don’t actually change routing behavior but still consume resources. For example, if every point-to-point /31 link in your network was included in your routing table, the naive approach is to program each /31 as a separate LPM route in hardware. While simple, this may not make much sense on a leaf switch, where pretty much all routes have the same next-hops of the uplinks to the spines, so FIB compression can make a big difference.

Even for the full Internet routing table, enabling FIB compression makes a significant difference in hardware utilization, at the cost of higher CPU and memory consumption on the control plane while calculating what routes don’t really need to be programmed in hardware while the routing protocols converge:

7050qx32#show hardware capacity | awk 'NR < 7 || /LPM/'
Forwarding Resources Usage

Table                Feature      Chip                Used     Used        Free     Committed     Best Case        High
                                                   Entries      (%)     Entries       Entries           Max   Watermark
                                                                                                    Entries
-------------------- ------------ -------------- ----------- -------- ----------- ------------- ------------- ---------
ALPM                                                100259      25%      292957             0        393216      100293
ALPM                 V4Routes                            4       0%      292957             0        393216           4
ALPM                 V6Routes                       100255      25%      292957             0        393216      100289
LPM                                                   5422      66%        2768             0          8190        8189
LPM                  V4Routes                            1       0%        2768             0          8190           3
LPM                  V6Routes                         5421      66%        2768             0          8190        8186
7050qx32#conf
7050qx32(config)#ipv6 fib compression redundant-specifics filter
7050qx32(config)#end
7050qx32#show hardware capacity | awk 'NR < 7 || /LPM/'
Forwarding Resources Usage

Table                Feature      Chip                Used     Used        Free     Committed     Best Case        High
                                                   Entries      (%)     Entries       Entries           Max   Watermark
                                                                                                    Entries
-------------------- ------------ -------------- ----------- -------- ----------- ------------- ------------- ---------
ALPM                                                 48082      12%      345134             0        393216      100293
ALPM                 V4Routes                            4       0%      345134             0        393216           4
ALPM                 V6Routes                        48078      12%      345134             0        393216      100289
LPM                                                   2367      28%        5823             0          8190        8189
LPM                  V4Routes                            1       0%        5823             0          8190           3
LPM                  V6Routes                         2366      28%        5823             0          8190        8186

Fully programmed, and only using 12% of the ALPM table and 28% of the LPM table. Remember that this improvement is only because 52k of the routes in the Internet table happen to be more specifics of other routes, so when we ask EOS to do the extra effort of performing this compression, the improvement is only because of the particular mix of routes from our perspective, and if the same number of routes were all disjoint routes, we wouldn’t see the same improvement.

Follow

Get every new post on this blog delivered to your Inbox.

Join other followers: