Recently I faced the issue where SR-MPLS is down due to an empty ERO path for SR-TE LSP, and the path is managed by the controller.
Background:
Implemented SR-MPLS LSP for one of our customers. where the LSP paths are injected from the controller based on the constraint. We faced some issues with respect to the controller recently and fixed them after the upgrade. Post upgrade, we observed that one of the LSPs didn't come up.
Verification:
End-to-end path manually to see whether it meets the constraints mentioned for the path. Everything seems to be fine.
The team checked the controller logs; there are no errors but it is sending empty ERO to the router
As this is very critical, I did perform troubleshooting and am finally able to bring the LSP path with correct path details instead of the empty ERO from the controller:
LSP Configuration:
lsp "srte-access1-pre-agg" sr-te
to X.X.X.X
path-computation-method pce
max-sr-labels 4 additional-frr-labels 2
pce-report enable
pce-control
bfd
failure-action failover-or-down
exit
path-profile 1 path-group 128946735
primary "srte-p"
priority 1 1
exit
secondary "srte-s"
standby
priority 7 7
exit
no shutdown
exit
no shutdown
Troubleshooting Steps:
# show router mpls sr-te-lsp
=====================================================================
MPLS SR-TE LSPs (Originating)
=====================================================================
LSP Name Tun Protect Adm Opr
To Id Path
---------------------------------------------------------------------
srte-access1-pre-agg 4 N/A Up Dwn
X.X.X.X
---------------------------------------------------------------------
LSPs : 1
=====================================================================
Enabled Debug for SR-TE LSP:
debug
router "Base"
mpls lsp "srte-access1-pre-agg"
event
iom
lsp-setup
xc
frr
mbb
misc
pcc
te
exit
exit
exit
config>log>log-id$ info
----------------------------------------------
from debug-trace
to session
no shutdown
Debug Logs
Received PCC Reply for srte-access1-pre-agg::srte-p(LspId 62454)
User SR-TE, GenId 2, NumRequests 1
[Reply 1]
LspType SR-TE, TunnelId 4, PathIdx 10
LspId 62454, VrId 1, ExtTunnelId X.X.X.X
LspName "srte-access1-pre-agg::srte-p"
From X.X.X.X, To X.X.X.X
Status Path not found, PCCHasLspState Yes
ERO:
No hops
OperBandwidth 0Mbps, HopCnt 0, IgpMetric 0, TeMetric 0
From debug it's clear the controller is not able to calculate the path for the LSP. The next step removed the LSP control from PCC to PCE.
After configuring the local-cspf to calculate the path, LSP has come up. So it gave me an idea that there is no issue with the path or the CSPF configuration along the path.
PE>config>router>mpls>lsp# path-computation-method pce
=====================================================================
MPLS SR-TE LSPs (Originating)
=====================================================================
LSP Name Tun Protect Adm Opr
To Id Path
---------------------------------------------------------------------
srte-access1-pre-agg 4 N/A Up Up
X.X.X.X
---------------------------------------------------------------------
LSPs : 1
=====================================================================
LSP Path Detail:
Primary : srte-p
Down Time: 0d 00:19:02
Bandwidth : 0 Mbps
Standby : srte-s
Down Time: 0d 00:19:10
Bandwidth : 0 Mbps
Rollback the configuration to PCC from PCE, then check again, but the LSP is again gone down with an empty ERO. Due to some reason, the controller is not able to calculate the path for the LSP. seems something is stuck but not able to figure out until this point
Next, path profiles are responsible for setting the constraint on the controller, so remove it, and the LSP has come up. So it's clear that something is wrong in controller in calculating the path, maybe due to something got stuck and its not able to remove it from its cache.
PE>config>router>mpls>lsp# no path-profile 1
=============================================================
MPLS SR-TE LSPs (Originating)
=============================================================
LSP Name Tun Protect Adm Opr
To Id Path
-------------------------------------------------------------
srte-access1-pre-agg 4 N/A Up Up
X.X.X.X
-------------------------------------------------------------
LSPs : 1
=============================================================
This time I enabled the path profile configuration on the router, but I did change the path group value, and then the LSP came up, and now the controller was able to calculate the path and send the details to the router.
Debug Logs:
"MPLS: SR-TE : PCC
Received PCC Reply for srte-access1-pre-agg (LspId XXX)
User SR-TE, GenId 2, NumRequests 1
[Reply 1]
LspType SR-TE, TunnelId 4, PathIdx 9
LspId XXX, VrId 1, ExtTunnelId XXXX
LspName "srte-access1-pre-agg"
From X.X.X.X, To X.X.X.X
Status Path found, PCCHasLspState Yes
ERO:
[1] SIDType IPv4 Node, SID 23434343, RemoteAddr X.X.X.X, Loose
[2] SIDType IPv4 Node, SID 23432434, RemoteAddr X.X.X.X, Loose
OperBandwidth 0Mbps, HopCnt 2, IgpMetric XXX, TeMetric 0
"
LSP status:
show router mpls sr-te-lsp "srte-access1-pre-agg" path "srte-p"
=============================================================
MPLS SR-TE LSP srte-access1-pre-agg
Path srte-p
=============================================================
-------------------------------------------------------------
LSP Name : srte-access1-pre-agg
From : X.X.X.X
To : X.X.X.X
Adm State : Up Oper State : Up
-------------------------------------------------------------
Path Name Type Adm Opr
-------------------------------------------------------------
srte-p
Primary Up Up
=============================================================
This LSP was down before the controller upgrade, and this group value was copied somehow after the upgrade. Group value in the cache with the previous issue state causes the controller to send the empty ERO continuously. After reconfiguring with the new value, the old group value got cleared, and the LSP came up. After monitoring the LSP for some time, I changed the group value as per the agreed standard.
Recommandation:
It's not recommended to perform the troubleshooting without the support team because, from my experience with various roles, it gave me enough confidence to do all the various ways of bringing up the LSP without causing the traffic impact in the network.
No comments:
Post a Comment