Segment Routing isn't just 'the next fancy protocol'

Segment Routing isn't just 'the next fancy protocol'

When I think about our move from MPLS/LDP to MPLS/SR, I often compare it to switching from corned beef to wagyu - technically it's the same protein, but the experience is completely different.

The Pain That Drove Us to Change

I've spent years managing networks where the MPLS data-plane would get stuck because LDP and the IGP weren't properly synchronized. I've also operated a network where RSVP-TE basically became a full-time job - constantly babysitting thousands of soft-state tunnels, refreshing PATH/RESV messages, debugging bandwidth reservations, trying to understand Autoroute Announce logic, and hoping the control-plane wouldn't collapse under LSP churn on those old Cisco 7600s we all remember.

After living through all that, the idea of SR-MPLS felt like someone might finally turn the lights on.

Our Challenge

Classical backbone running across Europe with over 50+ circuits, entirely built on Juniper equipment. Our goal was to eliminate LDP completely, prepare for eventual SR-TE deployment, enable tighter convergence, and create a cleaner model for automation.

The physical topology would stay exactly the same: same routers, same locations, same links. This was purely a control-plane evolution.

The Hidden Truth About Why SR Migrations Succeed or Fail

Before I tell you how smoothly our migration went, I need to share something that nobody likes to talk about publicly.

If your IGP has what I call "heritage metrics" - those archaeological remnants where someone set cost 100 here and cost 3000 there years ago - Segment Routing won't magically fix that. It will faithfully reproduce your chaos, just with deterministic precision.

And here's what people rarely admit: fixing bad metrics in a production backbone is nearly impossible in practice. Changing even one IGP metric without comprehensive simulation of every PE-to-PE flow is basically gambling with your production traffic. One wrong metric change can create routing loops, black holes, or cause traffic migrations you never intended.

This is why heritage metrics persist for decades. Changing them is like trying to rebuild the foundation of a skyscraper while people are still working inside.

Getting the Foundation Right

We were fortunate. From day one, long before considering SR, we engineered our backbone with proper IS-IS metrics based on actual physical latency, applied consistently everywhere. No games, no tribal knowledge, no legacy weirdness - just solid fundamentals. We also migrated to wide metrics early, giving us the full 32-bit metric space that SR needs to express proper traffic engineering constraints.

Our IS-IS deployment follows best practices that made the SR migration straightforward: aggressive timers, LSDB flood control, proper reference bandwidths for TI-LFA calculations.

Here's the uncomfortable truth about MPLS backbone design: it can be deceptively simple to build, but every decision you make becomes part of your network's DNA for the next decade. Some choices, like your IGP metrics or your route reflector topology, become essentially impossible to change once traffic is flowing.

Get it right from the beginning, and technologies like SR-MPLS become straightforward migrations. Get it wrong, and you'll spend years working around decisions made by someone who left the company before your youngest team member was hired.

Preparation Through Real-World Validation

We were fortunate to have an in-house lab with real routers - not simulators, but actual hardware matching our production environment. This gave us the luxury of validating every scenario in a safe environment.

The Juniper documentation, particularly the "Day One" series, proved invaluable during this phase: practical, battle-tested procedures that aligned perfectly with our lab validation work.

The Migration Itself

With good IGP metrics in place and thorough lab validation behind us, our migration was completely hitless. Not "mostly smooth" - literally zero impact, every single time. Even the tricky parts like SR-to-LDP stitching worked exactly as designed and tested in the lab.

The transformation was remarkable because of how much disappeared: no more LDP sessions, no more LDP-sync, no more opaque labels.

Why SR-MPLS Is Fundamentally Different

With SR-MPLS, the IGP becomes your control plane. That single change cascades into massive simplification.

Labels become deterministic. SR Node-SIDs are predictable and stable. You can tell the egress PE just by looking at the label stack. If you know the Node-SID, you know exactly which PE handles that service.

This opened unexpected automation possibilities. I built a dual-view traceroute that shows both underlay and overlay paths simultaneously - all powered by IS-IS data. The adjacency information already contains neighbor hostnames, so the traceroute shows exactly which devices you're traversing, not just IP addresses but actual meaningful names.

The visibility compared to LDP is like night and day. With LDP, you'd need to query multiple protocols and correlate disparate data sources. With SR-MPLS, everything you need is already inside IS-IS.

The Concrete Results We Achieved

First, we enabled TI-LFA on every backbone link, achieving real sub-50ms convergence. This isn't theoretical - we've tested it many times, though not always by choice. Fiber cuts happen. When they do, TI-LFA delivers exactly what it promises. We run very sensitive point-to-point ethernet services, and our users have never noticed a single line cut. Ever.

For legacy integration, we could support systems that only speak Layer 2 without keeping all the old architectural constraints. EVPN VPWS stitches everything together while SR handles transport, resiliency, and convergence. All those legacy circuits now inherit TI-LFA's sub-50ms protection without any awareness they're being protected by it. They think they're running on traditional infrastructure, but underneath, they're getting modern SR convergence - a massive resilience upgrade for free.

Automation became dramatically simpler. Because Node-SIDs are deterministic and stable, building automation that understands the network topology became trivial. The data is all in one place: IS-IS.

Two Years Later

Two years after Aurélien Demarty and I led this migration, the benefits are still obvious every day:

  • Predictable paths
  • Sub-50ms convergence everywhere
  • Much faster troubleshooting
  • Fewer forwarding anomalies
  • Dramatically simplified automation
  • EVPN VPWS as a universal L2 service
  • A clean foundation for SR-TE without any RSVP baggage

I can't overstate Aurélien's contribution to this success. His deep expertise and meticulous attention to detail during the migration were instrumental, and his day-to-day management of this network continues to be exceptional. Working with someone who knows almost every SR Node-SID by heart makes all the difference.

The key lesson remains clear: if you do your homework, engineer your IGP properly, and roll out SR with clear intention, your backbone will reward you every single day. But that foundation work - especially those IGP metrics - that's where the real battle is won or lost.

Read more