June 8, 2021
As more networks move to mobile edge computing (MEC), added pressure is placed on data center interconnect (DCI) links. Therefore, proper installation and maintenance (I&M) of link structure is more crucial than ever before. I&M is made more challenging by diversifying communications standards, faster network speeds, and increased security administration.
The reason for the move to the edge is to effectively address the continued growth of latency-sensitive cloud services, social media, gaming, and video streaming. Another network trend caused by these market conditions is that data center transmissions are beginning to move at 400 Gigabit Ethernet (400GE), compared to 100 Gigabit Ethernet (100GE) just a few short years ago.
The biggest difference between legacy 100GE and emerging 400GE is that NRZ technology is used for 100GE, while PAM4 drives 400GE. One advantage of PAM4 is that it’s twice the transmission speed of NRZ. They both also use the same frequency, making for a smoother technology transfer.
PAM4 does not come without its challenges, though. PAM4 signal levels are half that of NRZ, which degrades the signal-to-noise (SNR) ratio. 400GE transmissions are much more susceptible to impairments, because of this fact.
Need for FEC in 400GE Transmissions
To mitigate the effect of waveform distortion, Forward Error Correction (FEC) was introduced to support PAM4. The IEEE 802.3cc standard implemented FEC for 400GE transmission. The reason why FEC is required is because 400GE elements, such as FR4 and DR4 optics, inherently have errors in the transmission. FEC corrects those errors. This is in contrast to 100GE NRZ transmission, which supports error-free transmission.
Figure 1 is a simple visual to explain the differences of each respective technology and the impact of FEC. In the figure, the 100GE transmission without FEC is the purple line. Errors increase in the transmission due to factors such as aging equipment, fiber bending, or longer distance. A gradual drop-off in throughput once the FEC threshold is passed occurs.
Compare it to the 400GE transmission throughput (green line). There is a significant drop once the FEC threshold is crossed. This is due to the accumulation of bit errors that exceed the FEC limit. So, the challenge is determining how many errors are being corrected with FEC and if they are within the FEC budget.
Not All 400G is Equal
The goal of the measurement is to verify if the 400GE hardware, such as optics, cables, and equipment, are within the FEC threshold limit. Results can be deceiving, depending on how the data is presented. For example, figure 2 displays measurement results from two 400GE devices. Initial 400GE FEC results are shown on the top row. As indicated, the device on the left has a better margin of corrected errors than the one on the top right.
FEC margin cannot be determined by performing a regular bit error rate (BER) test since both devices will run error free in a bit error rate tester (BERT) with FEC. When the device with less FEC margin meets the real world and faces a harsh transmission path (burst errors), however, it will drop data.
In the example shown, the device on the top right is correcting 11 to 12 symbol errors for some codewords, highlighted in yellow. What happens when both devices go through the same optical attenuation? The one on the left stays within the FEC tolerance limit, as seen in the bottom left image. The device on the right, though, exceeds its FEC tolerance limit and begins receiving uncorrected errors, as seen in red on the table. This example shows that although both 400G devices began with the same FEC performance, they react differently when attenuated. This is a common measurement taken by vendors to verify device FEC performance, and one that can be made using the Anritsu Network Master™ Pro MT1040A.
Is My 400GE Link Good or Bad?
What this example reveals is that symbol error rate (SER) measurements are not enough to verify 400GE hardware. Distribution margin is critical, but the results can be deceiving, as shown in figure 2. What does this mean? Well, real-time or long stability tests for FEC margin are recommended as conditions change over time. The rule of thumb is the higher the symbol error count per codeword the greater risk of network failure.
Just to highlight the last point in a little more detail, the 400GE FEC distribution shown in figure 2 has been labeled to give a better view of the test results from the 400GE DCI installation at an edge network hub. The point here is the symbol error rate count in the graph changes over time based on multiple factors within an edge deployment, such as heat, temperature, and use. The higher the SER per codeword, the greater the chance of a network failure.
Results are color coded for easier viewing:
- Green – A low symbol error rate of only 1-3 errors per codeword is being measured
- Yellow – Serves as a warning that a higher number of symbol errors (10 to 12) per codeword is being approached.
- Red – More errors are being received than the FEC can correct. Ultimately, network service performance and throughput will be detrimentally impacted.
Want to learn more about the importance of 400G Ethernet FEC? Watch this DCI test challenges webinar sponsored by Lightwave and Broadband Technology Report (BTR) featuring Anritsu Business Development Manager Daniel Gonzalez.
Comments