When forward error correction is enabled, it provides a set of statistics which can be used to monitor the health of the link at layer 1. By comparing trends over time it may be possible to predict which links may experience service impacting error rates allowing action to be taken before these events. This document will describe these statistics and how to monitor them on an Arista switch running EOS.
Forward Error Correction
Forward error correction (FEC) is a technique used in data communications where data is portioned into blocks and to these blocks parity bits are added. When errors are sufficiently randomly distributed, these parity bits can be used by the receiver to identify bits which are in error and allow correction.
In links using copper twinax cables or direct detect optics, there are 2 types of FEC in use, Reed-Solomon and Firecode. Reed-Solomon (RS) is a ‘stronger’ FEC and is most prevalent. Firecode (FC), also known as BASE-R or Clause 74 FEC, is a ‘weaker’ FEC but introduces less latency on the link when used in comparison to RS-FEC. The strength of a FEC is typically characterized by the worst case BER that it can correct. Another characteristic of FEC is its ability to handle bursts of errors. That is errors, when averaged, do not exceed the worst case BER but can still cause uncorrectable errors. RS-FEC is better at handling bursts of errors than FC-FEC.
RS-FEC has 2 variants in these use cases. The first is RS(514,528) which is used on links using Non Return to Zero (NRZ) encoding. These are links running at 25G, 50G-2 (2x25G lanes) and 100G-4 (4x25G lanes). The second variant is RS(514,544) which is used on Pulse Amplitude Modulation 4-level (PAM4) encoded links. These are links running at 50G-1 (1x50G lane), 100G-2 (2x50G lanes), 200G-4 (4x50G lanes), and 400G-8 (8x50G lanes). These can be further abbreviated as RS-528 and RS-544 respectively. The difference between the two is literally the number of parity symbols carried to protect the data. In RS-528 there are 528-514=14 parity symbols (140 parity bits). RS-544 has 544-514=30 parity symbols (300 parity bits). As a result of the additional parity information RS-544 is a stronger FEC
|FEC||Worst Case Correctable BER||Usage|
(1 error/108 bits)
|RS-544||10-4||PAM4 links (400G-8, 200G-4, 100G-2, 50G-1) for all media types.|
Transmitting with RS-FEC
The RS-FEC transmission process builds a codeword which consists of 5140 bits of user data and adds 140 bits of parity. This creates a block that is 5280 bits in size. This codeword internally is arranged as 10 bit ‘symbols’. When transmitting, these symbols are distributed round robin to FEC lanes which are then mapped to PMD lanes. For a 100GbE over 4x25G serdes (100G-4) there are 4 FEC and 4 PMD lanes so the mapping is 1:1. For 100G-2 there are 4 FEC lanes but 2 PMD lanes so each PMD lane will carry 2 FEC lanes. The simplified transmission process is shown in the figure below.
Receiving with RS-FEC
On reception the process is generally reversed. The codeword is assembled from the incoming symbols, the parity bits are used to perform error correction and then the data is sent on for further receive processing. The simplified process is shown below.
The correction algorithm identifies each bit which is in error and applies corrections. It also counts each symbol in which a correction was made. The correction process tracks the number of corrected codewords, the number of corrected symbols and, on some PHYs, the number of bits corrected.
When Correction Fails
RS-FEC can make corrections only when the number of bits in error does not exceed limits. The ability to correct is based on the number of symbols containing bit errors in a given codeword. If the number of parity symbols is 2t, then RS-FEC can correct up to t symbols. If more than t symbols contain errors then the codeword is uncorrectable. For RS-528, 2t=14 and so t=7. This means that as many as 70 bits (7 symbols * 10 bits per symbol) may be in error and the codeword can still be corrected. If the bit errors are distributed 1 per symbol, as few as 8 bit errors can result in an uncorrected codeword since 8 > 7, the maximum number of symbols correctable in a codeword. For RS-544, 2t=30 and so up to t=15 symbols may have errors in a codeword which is correctable.
Arista EOS reports a rich set of parameters at layer 1 for monitoring link performance. These parameters can be monitored to determine when a link is experiencing errors which may have not yet impacted application data. When FEC is enabled, FEC codewords encapsulate all data transmitted. This includes PCS IDLEs transmitted when there is no user data. Because of this, FEC statistics can be used to determine if a link is performing well regardless of whether any packet data is transmitted across the link.
Show Interfaces Phy Detail
The primary display for layer 1 monitoring and for FEC statistics is the ‘show interfaces phy detail’ command. The parameters listed in the table below are the primary ones for monitoring link performance with FEC.
|FEC corrected codewords||Count of codewords which had correctable errors in the last polling period.||All|
|FEC lane corrected symbols||Count of symbols from correctable codewords which had bits corrected in the last polling period.||All|
|FEC corrected symbol rate||Ratio of corrected symbols to total symbols received in the last polling period. Not accurate in periods in which uncorrectable errors are present. Also referred to as Symbol Error Rate (SER) in this document.||7280R2, 7280R3, 7500R2, 7500R3, 7800R3, 7060X4, 7368X4|
|Pre-FEC bit error rate||Ratio of corrected bits to total bits received in the last polling period. Not accurate in periods in which uncorrectable errors are present.||7280R3, 7500R3, 7800R3, 7060X4, 7368X4|
|FEC uncorrected codewords||Count of codewords which could not be corrected in the last polling period. These represent potentially lost data||All|
On many platforms there are intermediate PHYs between the transceiver terminating the incoming link and the switch chip. Often the internal links between the PHYs and the switch chip are protected by FEC. When using FEC to determine the quality of a link between peers, the parameters collected on the PHY directly connected to the transceiver terminating the link should be used. These are grouped under the ‘line’ parameters section of show int phy detail. For example, on ports 1-32 of a 7280CR3MK-32P4/D4 the interesting FEC parameters would be under the section labeled “CMS42550 line”.
Internal link parameters are also shown in show int phy detail. These are in sections using the label “system” as in “CMS42550 system”. The system side parameters can be useful in confirming the internal system links are performing as expected. Errors in this section may require assistance from the Arista TAC to resolve.
The drawing below can serve to orient one on where the statistics reported in show interfaces phy detail are collected. In general, it is the ‘line’ side statistics that are interesting for evaluating the quality of the link to the peer.
For both the system and the line side, after initial link up, there should be no uncorrectable errors in a well performing link. The output below shows an example ‘show int phy detail’ display with the bulk of the sections unrelated to FEC omitted. This example shows a system with a PHY between the switch chip and the transceiver. This results in 3 FEC decoder functions operating on this system.
In the transmit (egress) direction there is a FEC decoder on the PHY associated with the system interface between the switch chip and the PHY identified as “CMS42550 system”. This FEC receiver is decoding data transmitted by the switch chip. This data is retransmitted on the line side of the CMS42550 and will be received and decoded by the peer system.
In the receive (ingress) direction the line data from the transceiver is terminated on the CMS42550 with the data in the section identified as “CMS42550 line”.
Arista#show int et29/1 phy detail Current System Time: Mon Aug 31 19:54:51 2020 Ethernet29/1 Current State Changes Last Change ------------- ------- ----------- Interface state up 16 3:03:32 ago ... BCM88690-TSCBH line Model BCM88690 (0x000000,0x25,0x1) ... Forward Error Correction Reed-Solomon Reed-Solomon codeword size 544 FEC alignment lock ok 31 3:03:35 ago FEC lane alignment marker lock Lane 0 ok 31 3:03:35 ago Lane 1 ok 31 3:03:35 ago Lane 2 ok 31 3:03:35 ago Lane 3 ok 31 3:03:35 ago FEC corrected codewords 1 30 0:51:42 ago FEC uncorrected codewords 0 0 never FEC corrected symbol rate < 1.82E-11 FEC lane corrected symbols ... Lane 0 1 4 4:09:18 ago Lane 1 0 0 never Lane 2 1 26 1:10:23 ago Lane 3 1 4 0:51:42 ago FEC lane mapping FEC lane 00 01 02 03 PMA lane 00 00 01 01 Pre-FEC bit error rate < 1.82E-12 ... CMS42550 Model CMS42550 (B0) Firmware revision 01.90.91 CMS42550 system ... Forward Error Correction Reed-Solomon Reed-Solomon codeword size 544 FEC alignment lock ok 15 3:03:35 ago FEC lane alignment marker lock Lane 0 ok 15 3:03:35 ago Lane 1 ok 15 3:03:35 ago Lane 2 ok 15 3:03:35 ago Lane 3 ok 15 3:03:35 ago FEC corrected codewords 1 126 0:00:26 ago FEC uncorrected codewords 3 12 4:02:24 ago FEC corrected symbol rate < 1.58E-11 FEC lane corrected symbols Lane 0 5 2 4:10:14 ago Lane 1 1 98 0:00:26 ago Lane 2 0 0 never Lane 3 1 31 0:02:26 ago FEC lane mapping FEC lane 00 01 02 03 PMA lane 00 00 01 01 Pre-FEC bit error rate < 1.58E-12 ... CMS42550 line ... Forward Error Correction Reed-Solomon Reed-Solomon codeword size 528 FEC alignment lock ok 15 3:03:35 ago FEC lane alignment marker lock Lane 0 ok 15 3:03:35 ago Lane 1 ok 15 3:03:35 ago Lane 2 ok 15 3:03:35 ago Lane 3 ok 15 3:03:35 ago FEC corrected codewords 1 2 4:25:31 ago FEC uncorrected codewords 3 12 4:02:24 ago FEC corrected symbol rate < 1.63E-11 FEC lane corrected symbols Lane 0 1 2 4:25:31 ago Lane 1 0 0 never Lane 2 0 0 never Lane 3 0 0 never FEC lane mapping FEC lane 00 01 02 03 PMA lane 00 01 02 03 Pre-FEC bit error rate < 1.63E-12 ...
When uncorrectable errors are experienced the calculation of pre-FEC BER and SER is compromised. This is because when uncorrectable, there is no information on how many bits or symbols are in error. This is signaled in the ‘show int phy detail’ output by the appearance of an asterisk (‘*’) next to the value.
During periods where there are no corrections (all FEC codewords were received perfectly with no bits to correct) the BER is noted as BER < ( 1 / bits in the period).
Pre-FEC BER vs SER
As described above RS-FEC codewords are built from 10 bit symbols. When a codeword is corrected the number of symbols which contain bits needing correction are counted. The ratio of the corrected symbols to the total number of symbols is the symbol error rate (SER). It is typical to have just 1 or 2 corrected bits in a symbol. In these cases the corrected symbol count is approximately equal to the corrected bit count. However, in the calculation of SER, given there are 10 bits per symbol, the denominator is smaller by a factor of 10. With 1 bit error per symbol error SER = (10 * preFEC BER). This is often observed in the phy detail output.
FEC Correction Histograms
As discussed above, RS-FEC can correct up to either 7 or 15 symbols per codeword. A well performing link should have most corrections in the lower half of these limits and they should decay exponentially. Often, links can operate with nearly all corrected codewords needing only 1 or 2 symbols corrected. The number requiring 3 should be an order of magnitude lower, and 4 another order of magnitude lower.
Some platforms support collecting a histogram of symbol corrections per codeword and displaying it. This histogram can be used to determine if the link is reaching the limits of RS-FEC to correct the errors on the link. By monitoring this over time, links which are degrading can be identified and corrective action taken prior to a service affecting event. This histogram can be displayed with the command “show interfaces phy diag error-correction histogram’. An example is shown below.
Arista#show int et3/1 phy diag error-correction histogram Ethernet3/1 Symbol Errors Per Codeword Codewords Changes Last Change -------------------------- --------- ------- ----------- CRT50216 system Bin0 4075208478588 18078 0:00:01 ago Bin1 7 7 1 day, 4:42:37 ago Bin2 0 0 never Bin3 0 0 never Bin4 0 0 never Bin5 0 0 never Bin6 0 0 never Bin7 0 0 never Bin8 0 0 never Bin9 0 0 never Bin10 0 0 never Bin11 0 0 never Bin12 0 0 never Bin13 0 0 never Bin14 0 0 never Bin15 0 0 never Bin16+ 0 0 never CRT50216 line Bin0 4077816452204 18084 0:00:01 ago Bin1 259686 4260 0:00:01 ago Bin2 1155 195 0:00:24 ago Bin3 19 13 0:04:27 ago Bin4 0 0 never Bin5 0 0 never Bin6 0 0 never Bin7 0 0 never Bin8+ 0 0 never
In the above display there are two sections. The first is labeled “CRT50216 system” and this is data for the internal link between the PHY and the switch chip. This is data that is destined for the peer system. The section “CRT50216 line” provides histogram data for the FEC engine receiving data from the peer system. This is the section that is interesting for monitoring the performance of an optical fiber link.
The internal system side link consists of 50G PAM4 lanes and is protected by RS-544 which can correct up to 15 symbols per codeword. When a codeword is corrected the bin corresponding to the number of symbols corrected is incremented. Bin0 counts codewords with no bit errors at all. Bin1 counts codewords with 1 symbol corrected – this could be as few as 1 bit and as many as 10 bits. Bin 2 counts 2 symbol corrections per codeword (at least 2, but as many as 20 bits corrected), etc. If 16 or more symbols have errors, the codeword is uncorrectable. In these cases “Bin16+” is incremented.
The line side link above consists of 25G NRZ lanes protected by RS-528 FEC. RS-528 can correct up to 7 symbols per codeword. Again, “Bin0” counts received codewords with no error, “Bin1” those with 1 error, etc. If a codeword is uncorrectable “Bin8+” is incremented.
This example shows that the system is operating with the overwhelming majority of traffic requiring no error correction (Bin0). The line side shows corrections up to Bin3 for a very small percentage (<1 out of 10 million) of the received codewords. Also indicated are a small number of correctable single symbol errors on the interface between the CMS42250 PHY and the switch chip – in other words, a healthy path.
Below shows an example of a link which has experienced uncorrectable errors where bin8+ has accumulated counts of uncorrectable errors.
Arista#show int Et81/1 phy diag error-correction histogram | nz Ethernet81/1 Symbol Errors Per Codeword Codewords Changes Last Change -------------------------- --------- ------- ----------- CRT50216 system Bin0 53066018469 236 0:00:01 ago CRT50216 line Bin0 52515664515 235 0:00:01 ago Bin1 157748 3 0:16:00 ago Bin2 41705 3 0:16:00 ago Bin3 7279 3 0:16:00 ago Bin4 1052 3 0:16:00 ago Bin5 79 3 0:16:00 ago Bin6 11 3 0:16:00 ago Bin7 1178 3 0:16:00 ago Bin8+ 35597 2 0:16:12 ago
Histogram for a Link with Uncorrectable Errors
The histogram data can be cleared with the “clear phy counters” command.
The table below shows platform support for displaying FEC histogram information. Note that all platform support is at the time of this writing. Additional platforms may be added in the future.
|7500R3-36CQ-LC, 7500R3K-36CQ-LC||1-12, 25-36||100G-4, 50G-2|
DCS-7280CR3-32D4, DCS-7280CR3K-32D4, DCS-7280CR3K-32D4
|DCS-7280CR3MK-32P4, DCS-7280CR3MK-32D4||1-32||100G-4, 50G-2, 40G, 25G, 10G|
|7800R3-36DM-LC||1-36||400G-8, 200G-4, 100G-2, 100G-4, 50G-1, 50G-2, 40G, 25G, 10G|
Firecode FEC is much less widely used than RS-FEC. The main advantage of FC-FEC over RS-FEC is that it has somewhat lower latency; RS-FEC at 25G adds 250ns of latency, while FC-FEC adds about 80ns. Transmission over single mode fiber adds about 5ns of latency per meter of fiber. So a fiber run of over 50 meters will add more latency than RS-FEC. The primary use case for FC-FEC is over short twinax cables where the propagation latency is less significant. It can also be used at 25G and 50G with fiber optic connections when the preferred RS-FEC is not available. For 25G and 50G-2 links, if latency is an important factor to optimize, it is recommended to engineer the links to allow running with FEC disabled. For instance, use twinax cables designated as CA-N.
When transmitting with FC-FEC enabled, the PHY will again encode the data into blocks, adding parity bits. In contrast to RS-FEC, FC-FEC may have multiple independent lanes of FEC running and the number of FEC lanes may not match the number of PMD lanes. 50G-2 links running with FC-FEC have 2 PMD lanes. However, FC-FEC encodes 4 FEC lanes and then muxes data from 2 FEC lanes to each PMD lane. The result is that there are 4 sets of counters for a 50G-2 link. This can be seen in the output below. Note also that FC-FEC implementations do not provide counters which may be used to generate a pre-FEC BER or SER. The only counters available are the corrected and uncorrected block counters.
Arista#show int et6/1 phy detail Current System Time: Wed Sep 2 18:43:22 2020 Ethernet6/1 Current State Changes Last Change ------------- ------- ----------- Interface state up 4 0:01:27 ago ... BCM56965-TSCF line Model BCM56965-TSCF (A) (0x00c086,0x37,0x0) ... Forward Error Correction Fire-Code FEC corrected blocks Lane 0 0 0 never Lane 1 0 0 never Lane 2 0 0 never Lane 3 0 0 never FEC uncorrected blocks Lane 0 0 0 never Lane 1 0 0 never Lane 2 0 0 never Lane 3 0 0 never ...
400G Histogram Examples
This first set of output below shows the line side BER as measured on the CMS50216 PHY and the corresponding histogram output. This output was collected from a live link running 400G over a 400GBASE-DR4 transceiver with a short fiber run. This link is quite healthy exhibiting good margin in both raw BER and corrections per codeword. On the CMS50216 line side, this 400G link receives most codewords without needing any symbol corrections (Bin0). Codewords with 1 symbol corrected (Bin1), is over 3 orders of magnitude lower meaning less than 1 in 1000 codewords needs any correction at all.
Arista#show int et9/1/1 phy detail ... CMS50216 line FEC corrected symbol rate 9.45E-07 Pre-FEC bit error rate 9.45E-08 Arista#show int et9/1/1 phy diag error-correction histogram Ethernet9/1/1 Symbol Errors Per Codeword Codewords Changes Last Change -------------------------- --------- ------- ----------- CMS50216 system Bin0 904417222306 4818 0:00:00 ago Bin1 1 1 0:26:14 ago ... CMS50216 line Bin0 903668687198 4816 0:00:00 ago Bin1 142901852 4816 0:00:00 ago Bin2 300529 4816 0:00:00 ago Bin3 3313 1352 0:00:00 ago Bin4 109 100 0:00:29 ago Bin5 7 7 0:08:53 ago Bin6 1 1 0:40:19 ago Bin7 0 0 never Bin8 0 0 never Bin9 0 0 never Bin10 0 0 never Bin11 0 0 never Bin12 0 0 never Bin13 0 0 never Bin14 0 0 never Bin15 0 0 never Bin16+ 0 0 never
This next example shows a link which is experiencing uncorrectable errors which were injected by a test device. This is reflected both in the BER/SER output and the histogram. Note that in the face of uncorrectable errors, the rates are both marked with ‘*’ and Bin16+ of the histogram is accumulating.
Arista#show int et9/1/1 phy detail ... CMS50216 line FEC corrected symbol rate 7.63E-04* Pre-FEC bit error rate 7.68E-05* Arista#show int et9/1/1 phy diag error-correction histogram Ethernet9/1/1 Symbol Errors Per Codeword Codewords Changes Last Change -------------------------- --------- ------- ----------- CMS50216 system Bin0 2237247510 3 0:00:01 ago ... CMS50216 line Bin0 1503627389 3 0:00:01 ago Bin1 595041361 3 0:00:01 ago Bin2 92371309 3 0:00:01 ago Bin3 21267493 3 0:00:01 ago Bin4 8399044 3 0:00:01 ago Bin5 4493216 3 0:00:01 ago Bin6 74908 3 0:00:01 ago Bin7 62796 3 0:00:01 ago Bin8 21772 3 0:00:01 ago Bin9 39833 3 0:00:01 ago Bin10 10754 1 0:00:21 ago Bin11 6335 1 0:00:21 ago Bin12 56098 1 0:00:21 ago Bin13 7402 1 0:00:21 ago Bin14 44176 1 0:00:21 ago Bin15 26242 1 0:00:21 ago Bin16+ 14994 1 0:00:21 ago
Forward error correction is an important component of high speed signaling; it allows error free operation of media that is not inherently error free. This allows for lower cost optics and in some cases, use of lower cost fiber. For copper cables, it extends the reach or allows the same reach with a smaller gauge cable. Such cables are less expensive and easier to manage. As these media are designed for use with FEC enabled, correctable errors are expected and are no reason for alarm.
Users should monitor links for FEC uncorrected codewords as these do indicate a problem if observed on links after initial link up. Increasing pre-FEC bit error rate or links which begin to experience new higher number bin counts in FEC histograms should be investigated. These may predict links which will experience uncorrected FEC codewords in the future.