Tuesday, March 31, 2026

SR-MPLS Troubleshooting: Identifying and Fixing Controller-Injected Null EROs

Recently I faced the issue where SR-MPLS is down due to an empty ERO path for SR-TE LSP, and the path is managed by the controller.

Background:

Implemented SR-MPLS LSP for one of our customers. where the LSP paths are injected from the controller based on the constraint. We faced some issues with respect to the controller recently and fixed them after the upgrade. Post upgrade, we observed that one of the LSPs didn't come up.

Verification:

End-to-end path manually to see whether it meets the constraints mentioned for the path. Everything seems to be fine.

The team checked the controller logs; there are no errors but it is sending empty ERO to the router

As this is very critical, I did perform troubleshooting and am finally able to bring the LSP path with correct path details instead of the empty ERO from the controller:


LSP Configuration:

            lsp "srte-access1-pre-agg" sr-te
                to X.X.X.X
                path-computation-method pce
                max-sr-labels 4 additional-frr-labels 2
                pce-report enable
                pce-control
                bfd
                    failure-action failover-or-down
                exit
                path-profile 1 path-group 128946735
                primary "srte-p"
                    priority 1 1
                exit
                secondary "srte-s"
                    standby
                    priority 7 7
                exit
                no shutdown
            exit
            no shutdown


Troubleshooting Steps:

# show router mpls sr-te-lsp

=====================================================================
MPLS SR-TE LSPs (Originating)
=====================================================================
LSP Name                                            Tun     Protect   Adm  Opr
  To                                                Id      Path
---------------------------------------------------------------------

srte-access1-pre-agg                   4       N/A       Up   Dwn
  X.X.X.X
---------------------------------------------------------------------
LSPs : 1
=====================================================================

Enabled Debug for SR-TE LSP:

debug
    router "Base"
        mpls lsp "srte-access1-pre-agg"
            event
                iom
                lsp-setup
                xc
                frr
                mbb
                misc
                pcc
                te
            exit
        exit
    exit

config>log>log-id$ info
----------------------------------------------
            from debug-trace
            to session
            no shutdown

Debug Logs

Received PCC Reply for srte-access1-pre-agg::srte-p(LspId 62454)
User SR-TE, GenId 2, NumRequests 1
[Reply 1]
  LspType SR-TE, TunnelId 4, PathIdx 10
  LspId 62454, VrId 1, ExtTunnelId X.X.X.X
  LspName "srte-access1-pre-agg::srte-p"
  From X.X.X.X, To X.X.X.X
  Status Path not found, PCCHasLspState Yes
  ERO:
    No hops
  OperBandwidth 0Mbps, HopCnt 0, IgpMetric 0, TeMetric 0

From debug it's clear the controller is not able to calculate the path for the LSP. The next step removed the LSP control from PCC to PCE.

After configuring the local-cspf to calculate the path, LSP has come up. So it gave me an idea that there is no issue with the path or the CSPF configuration along the path.

PE>config>router>mpls>lsp# path-computation-method pce

=====================================================================
MPLS SR-TE LSPs (Originating)
=====================================================================
LSP Name                                            Tun     Protect   Adm  Opr
  To                                                Id      Path
---------------------------------------------------------------------

srte-access1-pre-agg                   4       N/A       Up   Up
  X.X.X.X
---------------------------------------------------------------------
LSPs : 1
=====================================================================

LSP Path Detail:

Primary         : srte-p
                                            Down Time: 0d 00:19:02
Bandwidth       : 0 Mbps
Standby         : srte-s
                                            Down Time: 0d 00:19:10
Bandwidth       : 0 Mbps



Rollback the configuration to PCC from PCE, then check again, but the LSP is again gone down with an empty ERO. Due to some reason, the controller is not able to calculate the path for the LSP. seems something is stuck but not able to figure out until this point

Next, path profiles are responsible for setting the constraint on the controller, so remove it, and the LSP has come up. So it's clear that something is wrong in controller in calculating the path, maybe due to something got stuck and its not able to remove it from its cache. 

PE>config>router>mpls>lsp# no path-profile 1

=============================================================
MPLS SR-TE LSPs (Originating)
=============================================================
LSP Name                                            Tun     Protect   Adm  Opr
  To                                                Id      Path
-------------------------------------------------------------
srte-access1-pre-agg                   4       N/A       Up   Up
  X.X.X.X
-------------------------------------------------------------
LSPs : 1
=============================================================

This time I enabled the path profile configuration on the router, but I did change the path group value, and then the LSP came up, and now the controller was able to calculate the path and send the details to the router.

Debug Logs:

"MPLS: SR-TE : PCC
Received PCC Reply for srte-access1-pre-agg (LspId XXX)
User SR-TE, GenId 2, NumRequests 1
[Reply 1]
  LspType SR-TE, TunnelId 4, PathIdx 9
  LspId XXX, VrId 1, ExtTunnelId XXXX
  LspName "srte-access1-pre-agg"
  From X.X.X.X, To X.X.X.X
  Status Path found, PCCHasLspState Yes
  ERO:
    [1] SIDType IPv4 Node, SID 23434343, RemoteAddr X.X.X.X, Loose
    [2] SIDType IPv4 Node, SID 23432434, RemoteAddr X.X.X.X, Loose
  OperBandwidth 0Mbps, HopCnt 2, IgpMetric XXX, TeMetric 0
"

LSP status:

show router mpls sr-te-lsp "srte-access1-pre-agg" path "srte-p"

=============================================================
MPLS SR-TE LSP srte-access1-pre-agg
Path srte-p
=============================================================
-------------------------------------------------------------
LSP Name    : srte-access1-pre-agg
From             : X.X.X.X
To               : X.X.X.X
Adm State        : Up                      Oper State        : Up
-------------------------------------------------------------
Path Name                        Type           Adm  Opr
-------------------------------------------------------------
srte-p
                                 Primary         Up   Up
=============================================================

This LSP was down before the controller upgrade, and this group value was copied somehow after the upgrade. Group value in the cache with the previous issue state causes the controller to send the empty ERO continuously. After reconfiguring with the new value, the old group value got cleared, and the LSP came up. After monitoring the LSP for some time, I changed the group value as per the agreed standard. 

Recommandation:


It's not recommended to perform the troubleshooting without the support team because, from my experience with various roles, it gave me enough confidence to do all the various ways of bringing up the LSP without causing the traffic impact in the network.


No comments:

Post a Comment