Excavators: The Apex Predator of IT Infrastructure

Excavators: The Apex Predator of IT Infrastructure

The Excavator Doesn't Care About Your Diversity

We'd done everything right. Diverse and multiple fiber paths to our remote site. Different physical routes. Different carriers. Different entry points. Textbook diversity - real diversity, not just on paper.

Then an improbable confluence of circumstances hit. Multiple failures, unrelated events, the kind of sequence you'd dismiss as unrealistic if someone described it in a planning meeting. By the end of the day, every fiber path to the datacenter was cut.

The site was dark. Production traffic: gone. Management traffic: also gone because it rode the same fiber.

Here's where the story splits.

Without OOB: You call the on-site technician. They're standing in a noisy datacenter, phone pressed to one ear, hand cupped over the other. You shout CLI commands. They type. They mistype. You spell out "R-O-U-T-E hyphen M-A-P" three times. They read back error messages character by character. An hour later, you've made two configuration changes and need to move on to the next device.

With OOB: You SSH into the Ghost Network. You access the device directly. You make the configuration change and move on to the next device. The on-site tech drinks coffee and waits for the fiber repair crew (Mean Time To Repair for fiber is typically 8 to 12 hours).

The excavator still won. The site was still offline. But we kept control. We reconfigured routing to prepare for restoration. We verified device state. We did it in minutes, not hours, without anyone shouting CLI syntax into a phone.

That's what Out-of-Band gives you.

In-Band vs. Out-of-Band: The Fundamental Split

In-Band management means your management traffic shares the same fate as production. Same switches. Same links. Same fiber. Same excavator.

Out-of-Band management means your management path survives when production dies. Completely independent infrastructure. Separate physical switches. Separate connectivity. Separate everything.

The distinction seems obvious. Yet I've audited many enterprise networks claiming to have OOB. Most had what I call "Cosmetic OOB": a dedicated VLAN, maybe a separate management switch that plugs into the same production network. The management traffic doesn't actually leave production. True OOB means inaccessible to production, independent of production, and alive when production is dead.

Building one requires accepting uncomfortable truths:

  • It will cost money you can't directly justify in ROI spreadsheets
  • It will require dedicated hardware that sits "idle" 99.999% of the time
  • It must work precisely when everything else doesn't

If your CFO asks why you need separate switches and a second internet link that "barely get used," the answer is simple: insurance doesn't pay out because you claim it daily.

The Hybrid Reality: Paving the Desire Path

Running everything through a pure, isolated OOB network is possible, but it creates immense friction. When you force daily SSH sessions and automation scripts through dedicated IPsec tunnels and slow jump hosts, you are asking engineers to walk the long way around.

This creates digital "Desire Paths." If the official road is too painful, engineers will find shortcuts. They will open temporary ports, create management VLANs that bleed management traffic into your in-band network.

The pragmatic solution: pave the desire path while keeping the bunker:

  • Pave the grass: During normal operations, route management traffic through the production infrastructure. It is faster and more convenient.
  • Use the highway: Let your automation platform push changes over the corporate network.
  • Keep the bunker: Design everything assuming the paved path will eventually collapse.

The in-band injection is a convenience, not a lifeline. Your OOB must function completely independently when the bridge burns. Never confuse the comfortable path with the survivable path.

Architecture: The Physical Foundation

Dedicated Switches

Your OOB switches must be physically separate from production, simple and reliable (don't run BGP on them). Use basic Layer 2/3 switching. But don't replicate your production complexity - the OOB exists to be simpler, not another failure domain.

IP Addressing Strategy

  • Small networks (<100 devices): Static IP addressing. Document everything. When DHCP fails, static still works.
  • Large networks (100+ devices): DHCP is acceptable, but the DHCP server must live within the OOB infrastructure. Maintain a static IP fallback for critical devices (routers, core switches, firewalls). Document the static ranges.

Connectivity: The IPsec Backbone

Here's where many OOB designs fail: they use the same WAN links as production.

The solution: regular internet + IPsec tunnels.

Yes, the internet has no SLA. It's best-effort by design. Your enterprise WAN contract promises four nines; your internet link promises nothing. But nothing beats the internet's inherent capillarity and resiliency. The internet was built to route around failures. It has thousands of paths, thousands of providers, automatic reconvergence at every layer.

Your OOB internet link needs strong upstream diversity. Choose providers with robust backbones and multiple transit relationships. Ask the uncomfortable question: who owns the fiber in the last mile? You want infrastructure ownership, not just a different contract.

The Cellular Option: The Ultimate Air-Gap

For ultra-critical sites or locations where physical path diversity is genuinely impossible, LTE/5G cellular gateways offer a completely separate path. A cellular gateway is a spoke that always initiates the tunnel outbound to your hub's static IP. Consider cellular for sites where true physical diversity is unachievable.

The IPsec Hub-and-Spoke Design

Instead of a complex mesh, build your OOB tunnels as a resilient Hub-and-Spoke architecture. Your Central Datacenters act as "Hubs" - locations with high availability, redundant power, and staff on hand. Your Small Datacenters and Network PoPs act as "Spokes."

Why Hub-and-Spoke?

  • Scalability: Adding a new PoP only requires pointing the new spoke at the Hub. The edge remains dumb and simple; the intelligence lives at the Hub.
  • Hardware Efficiency: Edge sites can run lighter, less expensive hardware - they only maintain one or two tunnels back to the Hubs.

Deploy small, dedicated firewalls at each spoke site. They don't need to be powerful - they need to be reliable and independent.

Static IP Requirement

The Hub and Spoke design demands Static IPs at the Hubs absolutely, and preferably at the Spokes as well. For OOB, you want determinism. Static IPs on business-grade internet are cheap compared to the alternative: spelling out "configure terminal" over a phone line to someone standing in an 85-decibel datacenter.

Authentication: The Keys to the Ghost Network

Your OOB is a parallel path into every device in your infrastructure - that access must be fortified.

One rule is non-negotiable: MFA at the VPN gateway. Every engineer connecting to the OOB must authenticate with multi-factor before the tunnel establishes. No exceptions.

For device-level authentication, configure fallback logic: try your centralized TACACS+/RADIUS first, fall back to local accounts only when the authentication server is unreachable. Those local accounts still need care - rotate credentials regularly, store them in a secure vault, and test the fallback periodically. Your production AAA infrastructure may die with production. Design for that moment.

Console Access: The Last Resort

IP management can fail for a hundred reasons. Console access survives almost all of that.

Deploy console servers (terminal servers) at every site. Connect every critical device's serial console port to them. The console server connects to your OOB network. When IP management fails, you SSH to the console server and access the device's serial console.

This is your true last resort. I've recovered from situations where console access was the only path to a device - no IP stack, no control plane, just raw serial.

The Cost Question

Console servers can get expensive, especially when deploying at every site. Don't let perfect be the enemy of good. A simple, reliable console server at every site beats an expensive solution at half your sites.

IP management needs a dozen things to go right. Console needs one: power. When a routing meltdown pegs the CPU at 100% and your management interface stops responding, the console still works.

Test It Regularly

"Everyone has an OOB network until they need to use it."

An untested OOB is a theoretical OOB. Test quarterly. Simulate complete production failure. Can you:

  • Reach every site through the IPsec mesh alone, without the in-band injection?
  • Authenticate without production RADIUS?
  • Access consoles when management IPs are unreachable?
  • Perform a configuration push across sites?

If any answer is "no," you have work to do.

Use It as Your Pre-Flight Check

Before any risky change - BGP policy modifications, IGP migrations, spanning-tree adjustments, control plane upgrades - add this to your change checklist: Verify OOB connectivity to all affected devices.

This takes five minutes. It could save you hours.

The logic is simple:

  • OOB verified + change succeeds: Normal day.
  • OOB verified + change fails: You recover through the Ghost Network.
  • OOB broken + change fails: You're driving to the datacenter at 3 AM.

This discipline also forces regular OOB use, which surfaces problems before they matter - dead IPsec tunnels, expired credentials, console servers in a weird state. You'll find these during a calm pre-change check, not during a 3 AM emergency when you need them.

The Ghost Network

Your OOB network won't stop excavators. It won't prevent fiber cuts. It won't keep your production traffic flowing when the physical layer fails.

What it gives you is control. Control to assess the damage without driving to site. Control to reconfigure routing while waiting for repairs. Control to prepare for restoration before the fiber crew finishes splicing. Control to work in silence instead of shouting CLI commands over a datacenter's roar.

Yes, you can use the comfortable path daily. But never forget: that path is borrowed time. Design for the moment it disappears.

Your Ghost Network won't show up in uptime statistics. It will sit quietly, waiting. And when the excavator wins, you'll still be in command.

That's the difference between having an OOB network and having Cosmetic OOB.

Read more