The Overclock Illusion: Why Aggressive IGP Tuning is Killing Your Network
Every network engineer wants sub-second convergence.
But it's a common trap to assume BGP handles this automatically. Because BGP manages the vast majority of overlay routing, many assume the underlying IGP speed is of secondary importance.
This is a massive vulnerability.
BGP is not an independent entity. Its operational speed relies entirely on the IGP. When a topology changes, BGP Next-Hop Tracking waits for the IGP to recalculate the shortest path before it can update.
If your OSPF takes 40 seconds to detect a dead neighbor, your BGP traffic drops into a black hole for 40 seconds.
The Overclock Illusion
To fix this, engineers try to "overclock" the routing protocol. They drastically reduce hello and dead intervals. They drop the SPF initial wait timer to absurdly low values.
But they forget a fundamental rule of routing: Protocols react instantly to direct interface failures. If a physical cable is cut, the interface goes down, generating a hardware interrupt that tears down the adjacency immediately completely bypassing the dead timers.
This means aggressive timer tuning is only trying to solve indirect or "silent" failures (like a transparent transport switch dying in the middle of a path).
Attempting to catch silent failures by forcing your CPU to process hyper-aggressive software timers creates the Overclock Illusion. The network looks incredibly fast in a vacuum. In reality, you just built a ticking time bomb.
Aggressive tuning introduces massive, unnecessary operational risk:
1. Control Plane Starvation
Link-state protocols use intense mathematical algorithms. Overclocking forces the router to process a relentless stream of interrupts. If a line card fails and drops dozens of interfaces, the router tries to run the SPF algorithm for every single state change individually. This pegs the CPU at 100% and starves the Inter-Process Communication.
2. Transient Micro-Loops
Updating the hardware Forwarding Information Base (FIB) is a localized hardware action. Router A might update in 20ms. Router B might take 60ms. During that 40ms gap, they route traffic back and forth at each other. Packets bounce endlessly until their Time-to-Live expires.
3. Breaking High Availability
If a primary supervisor fails, Stateful Switchover (SSO) takes up to 10 seconds to transition. If your dead timer is set to 3 seconds, remote neighbors tear down the adjacency before the switchover finishes. You just turned a seamless failover into a massive outage.
The Decoupled Speed Framework
Stop manipulating raw timers. Modern architecture shifts failure detection away from the central software process:
Use BFD for Detection: Bidirectional Forwarding Detection runs on the hardware data plane. It detects failures in 150ms (3 x 50ms) without touching the main CPU.
Use Adaptive Throttling: Let the protocol react fast to single events but back off exponentially during physical layer storms.
Deploy TI-LFA: Topology-Independent Loop-Free Alternate pre-computes the backup path. When a link fails, the hardware redirects traffic in under 50ms.
Enable BGP PIC (Core & Edge): Prefix-Independent Convergence is the ultimate overlay cheat code. Instead of forcing the CPU to recalculate millions of BGP routes during an outage, PIC pre-installs backup next-hops directly into the hardware FIB. Whether an internal transit link fails (PIC Core) or an external peer drops (PIC Edge), the hardware just flips a single pointer, rescuing traffic instantly regardless of routing table size.
Let the hardware handle the speed.
Let the IGP handle the math.