H3C's First General-Purpose Mid-to-High-End Layer 3 Switch: A Breakthrough Story – Fourth Tribulation: Routing Abnormality

Table of Content [Hide]

As more and more internet cafe users and enterprise users were cut over and activated on the S6500, the fourth tribulation emerged! From time to time, nodes reported that internet cafe users connected to the S6500 complained about lagging in online games during peak evening hours.

The support team promptly dispatched engineers to the faulty nodes. Upon arrival, engineers checked the GE port operation mode, port status, port error rate, and error packet count of the corresponding internet cafe users on the S6500, but found no abnormalities.

Engineers then visited the problematic internet cafes during peak hours to inspect the configuration and port operation mode of the customer-side access devices, with no abnormalities detected either. However, users clearly experienced in-game lag. The engineers immediately connected a computer to the customer access device to conduct intra-and inter-MAN speed tests, segmented packet loss tests, and segmented traceroute tests. Intra-and inter-MAN speed tests were basically up to standard. During segmented packet loss testing, occasional irregular packet loss was observed: sometimes one packet, sometimes two or three packets, sometimes four or five packets. Traceroute results showed that the path from the internet cafe to the MAN egress was identical every time. However, the S6500 was configured with ECMP dual uplinks, so a certain percentage of traceroute paths should have been different. Based on the above test results, engineers suspected that ECMP load balancing was not working on the S6500, causing all traffic to be forwarded through only one uplink. Congestion occurred during peak traffic hours, leading to occasional packet loss and in-game lag.

h3cs-first-general-purpose-mid-to-high-end-layer-3-switch-fourth-tribulation-routing-abnormality-01.jpg

The engineers returned to the S6500 node and checked the operation mode, status, and traffic statistics of the two uplink ports. They found that traffic statistics on one uplink port were extremely low, confirming that this uplink was not participating in ECMP load balancing.

h3cs-first-general-purpose-mid-to-high-end-layer-3-switch-fourth-tribulation-routing-abnormality-2.jpg

Further troubleshooting of VLAN configuration, Interface VLAN configuration, and routing protocol configuration on this port showed no errors. ARP information of the port was normal, and the peer port was reachable via ping. However, when checking the routing table, no corresponding uplink route for this port was found! It was strange that ARP was normal and the peer was pingable, but no corresponding route existed in the routing table. Engineers suspected that ping packets were traveling through the other uplink port, which was later verified by tracerouting the peer IP address.

With normal configuration and normal ARP information, why was the corresponding uplink route missing from the routing table? The engineers could not identify the cause, so they collected fault information and debug logs from the S6500. After completing data collection, the engineers shut down and re-enabled the port. Post-operation, ARP information remained normal, and the corresponding uplink route reappeared in the routing table. Verification confirmed that ECMP dual-uplink load balancing was restored.

After analyzing fault information and debug data from multiple nodes, the S6500 R&D team finally concluded that the failure was caused by abnormal routing table entry installation. The root cause of the routing table installation abnormality was likely high CPU utilization during bulk route insertion. The causes of high CPU utilization were various and required further in-depth analysis.

By Lanbras

Lanbras specializes in translating cutting-edge optical and Ethernet transmission technologies into clear, valuable insights that help our customers stay ahead in a fast-evolving digital world.

By turning complex technical concepts into practical, business-driven content, we aim to empower decision-makers with the knowledge they need to make confident, future-ready choices.

PREV: H3C's First General-Purpose Mid-to-High-End Layer 3 Switch: A Breakthrough Story – Third Tribulation: Forwarding Abnormality

NEXT: H3C's First General-Purpose Mid-to-High-End Layer 3 Switch: A Breakthrough Story – Fifth Tribulation: Excessively High CPU Utilization

References

Lastest News & Blog about Lanbras

H3C's First General-Purpose Mid-to-High-End Layer 3 Switch: A Breakthrough Story – Fifth Tribulation: Excessively High CPU Utilization

Before the routing abnormality issue was fully resolved, the fifth tribulation struck! Following the recommendation of the S6500 R&D team, the routine inspection tasks of the S6500 support team we...

 May 09, 2026

Learn More

H3C's First General-Purpose Mid-to-High-End Layer 3 Switch: A Breakthrough Story – Third Tribulation: Forwarding Abnormality

After a large number of internet cafe users were cut over to the S6500, the third tribulation struck! Nodes reported occasional service outages affecting one or several internet cafes connected under ...

 May 07, 2026

Learn More

H3C's First General-Purpose Mid-to-High-End Layer 3 Switch: A Breakthrough Story – Second Tribulation: Specification Limitation

In late 2004, H3C's newly developed S6500 general-purpose mid-to-high-end Layer 3 switch was delivered for trial deployment at 112 broadband MAN nodes of friendly user XX Telecom. As user services...

 May 06, 2026

Learn More