Site icon Karneliuk

Telemetry. Part 1. Native YANG modules for Nokia SR OS and Cisco IOS XR with Ansible – Cisco

Hello my friend,

We have spent some time on configuration of the network functions using NETCONF/YANG using native vendor models and OpenConfig. Now the time is come to talk about information, which we can extract from the router in YANG data models, what is also called model-driven telemetry.

1
2
3
4
5
No part of this blogpost could be reproduced, stored in a<br>
retrieval system, or transmitted in any form or by any<br>
means, electronic, mechanical or photocopying, recording,<br>
or otherwise, for commercial purposes without the<br>
prior permission of the author.<br>

Brief description

Nowadays telemetry is a hot topic for discussion on various network events, conferences or just vendors’ presentation to customers. As usual, without knowing the details about what it is, how it could be used and which benefit it can bring you, such discussions aren’t more than just a hype. So in this article we’ll try to discovery what telemetry means, for sure, with examples in multivendor environment with Cisco IOS XR and Nokia (Alcatel-Lucent) SR OS.

For a long time, SNMP was the primary source of the information about state of the network. It has evolved a lot since its first release and current version 3 has even built-in security using encryption and hashing. Nevertheless, not all the information about network devices are available through SNMP, that’s why still some information is gathered through show commands and proper parsing, which isn’t flexible task. On the other hand, new era of networking, where SDN comes at the scene, requires much more information from the network about its actual state; this is information must be available to SDN controller or network analytics instantly on anytime basis.

And that’s where model-driven telemetry could help a lot. Model-driven means that telemetry is based on the YANG modules, which are complementary to configuration modules we have used a lot previously. It means that virtually all information about network states should be available through operational YANG modules

As you’ve already learned, there are different types of YANG modules available: vendor-native and vendor-neutral (like OpenConfig). In this article we’ll focus on the vendor-native modules for Cisco IOS XR and Nokia (Alcatel-Lucent) SR OS, as it’s the first natural step in telemetry. In the following articles we’ll review telemetry available in OpenConfig YANG modules as well.

In terms of transport protocol, which is used for model-driven telemetry, there are different options. The most popular deployment is based on gRPC (or recently gNMI), which uses HTTP and perfectly fits for transmission of the structured data. There is another option, which is to use TCP directly, but if we speak about cross-vendor implementation, gRPC/gNMI looks more favorable.

Nevertheless, in our examples you will see some NETCONF, though NETCONF itself isn’t supposed to be transport for telemetry. The reason for that is that we don’t build any telemetry collector yet, but rather focusing on the data collecting itself. Probably we’ll review the streaming of telemetry data in the upcoming articles.

What are we going to test?

We are going to collect information from the network functions in their native YANG modules for operational data for:

We’ll use “netconf_get” Ansible module for that.

Software version

The following software components are used in this lab:

See the previous article to get details how to build the lab

Topology

We are using our standard topology:

The logical topology is quite simple so we have just back to back connectivity between Nokia SR OS based VNF SR1 and Cisco IOS XR based VNF XR3, it’s just expressed in ASCII:

1
2
3
4
5
6
<br>
+----------------+    10.11.33.0/24     +----------------+<br>
|      SR1       +----------------------+       XR3      |<br>
+-------+--------+ 1/1/c1/1    g0/0/0/0 +--------+-------+<br>
|          .11              .33          |<br>
| system: 10.0.0.11/32                   | loopback0: 10.0.0.33/32

1
<br>
1
2
                      &lt;---OSPF:area0/p2p---&gt;<br>
&lt;-------------BGP:VPNv4/VPNv6------------&gt;<br>

The initial configuration you can see in the attached files: 136_config_initial_XR3 136_config_initial_SR1

Brief topology check

From the configuration prospective we have configured just 3 topics:

This set of the configuration is very basic on the one hand; on the other hand, it represents basis for modern service provider network capable to provide all kind of IP/MPLS services for customers. We won’t go into explanation of the details, as each of them were covered into details earlier in multivendor setup with Nokia (Alcatel-Lucent) SR OS and Cisco IOS XR: interfaces, OSPF, Segment Routing and BGP IP VPNs. Here we’ll just review some operational commands.

We start with the status of the ports and interfaces on Nokia (Alcatel-Lucent) SR OS based network function SR1:

1
2
[]<br>
A:admin@SR1# show port

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
===============================================================================<br>
Ports on Slot 1<br>
===============================================================================<br>
Port          Admin Link Port    Cfg  Oper LAG/ Port Port Port   C/QS/S/XFP/<br>
Id            State      State   MTU  MTU  Bndl Mode Encp Type   MDIMDX<br>
-------------------------------------------------------------------------------<br>
1/1/c1        Up         Link Up                          conn   100GBASE-LR4*<br>
1/1/c1/1      Up    Yes  Up      1514 1514    - netw null xgige<br>
1/1/c1/2      Down  No   Down    9212 9212    - netw null xgige<br>
1/1/c1/3      Down  No   Down    9212 9212    - netw null xgige<br>
1/1/c1/4      Down  No   Down    9212 9212    - netw null xgige<br>
1/1/c2        Up         Link Up                          conn   100GBASE-LR4*<br>
1/1/c2/1      Down  No   Down    9212 9212    - netw null xgige<br>
1/1/c2/2      Down  No   Down    9212 9212    - netw null xgige<br>
1/1/c2/3      Down  No   Down    9212 9212    - netw null xgige<br>
1/1/c2/4      Down  No   Down    9212 9212    - netw null xgige<br>
1/1/c3        Down       Down                             conn   100GBASE-LR4*<br>
1/1/c4        Down       Down                             conn   100GBASE-LR4*<br>
1/1/c5        Down       Down                             conn   100GBASE-LR4*<br>
1/1/c6        Down       Down                             conn   100GBASE-LR4*

===============================================================================
Ports on Slot A
===============================================================================
Port Admin Link Port Cfg Oper LAG/ Port Port Port C/QS/S/XFP/
Id State State MTU MTU Bndl Mode Encp Type MDIMDX
——————————————————————————-
A/1 Up Yes Up 1514 1514 – netw null faste MDI
A/4 Up No Ghost 1514 1514 – netw null faste
===============================================================================

[]
A:admin@SR1# show router interface

1
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
===============================================================================<br>
Interface Table (Router: Base)<br>
===============================================================================<br>
Interface-Name                   Adm       Opr(v4/v6)  Mode    Port/SapId<br>
IP-Address                                                  PfxState<br>
-------------------------------------------------------------------------------<br>
system                           Up        Up/Up       Network system<br>
10.0.0.11/32                                                n/a<br>
fc00::10:0:0:11/128                                         PREFERRED<br>
uplink1                          Up        Up/Up       Network 1/1/c1/1<br>
10.11.33.11/24                                              n/a<br>
fc00::10:11:33:11/112                                       PREFERRED<br>
fe80::5054:ff:fe02:201/64                                   PREFERRED<br>
-------------------------------------------------------------------------------<br>
Interfaces : 2<br>
===============================================================================<br>

Then we check the adjacency of the OSPF:
1
2
[]<br>
A:admin@SR1# show router ospf neighbor

1
<br>
1
2
3
4
5
6
7
8
9
10
11
===============================================================================<br>
Rtr Base OSPFv2 Instance 0 Neighbors<br>
===============================================================================<br>
Interface-Name                   Rtr Id          State      Pri  RetxQ   TTL<br>
Area-Id<br>
-------------------------------------------------------------------------------<br>
uplink1                          10.0.0.33       Full       1    0       33<br>
0.0.0.0<br>
-------------------------------------------------------------------------------<br>
No. of Neighbors: 1<br>
===============================================================================<br>

Followed by the routing table:
1
2
[]<br>
A:admin@SR1# show router route-table

1
<br>
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
===============================================================================<br>
Route Table (Router: Base)<br>
===============================================================================<br>
Dest Prefix[Flags]                            Type    Proto     Age        Pref<br>
Next Hop[Interface Name]                                    Metric<br>
-------------------------------------------------------------------------------<br>
10.0.0.11/32                                  Local   Local     00h04m12s  0<br>
system                                                       0<br>
10.0.0.33/32                                  Remote  OSPF      00h03m41s  10<br>
10.11.33.33                                                  11<br>
10.11.33.0/24                                 Local   Local     00h04m12s  0<br>
uplink1                                                      0<br>
-------------------------------------------------------------------------------<br>
No. of Routes: 3<br>
Flags: n = Number of times nexthop is repeated<br>
B = BGP backup route available<br>
L = LFA nexthop available<br>
S = Sticky ECMP requested<br>
===============================================================================<br>

And we finish with the status of BGP peering:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
[]<br>
A:admin@SR1# show router bgp summary<br>
===============================================================================<br>
BGP Router ID:10.0.0.11        AS:65000       Local AS:65000<br>
===============================================================================<br>
BGP Admin State         : Up          BGP Oper State              : Up<br>
!<br>
! OUTPUT IS OMITTED<br>
!<br>
===============================================================================<br>
BGP Summary<br>
===============================================================================<br>
Legend : D - Dynamic Neighbor<br>
===============================================================================<br>
Neighbor<br>
Description<br>
AS PktRcvd InQ  Up/Down   State|Rcv/Act/Sent (Addr Family)<br>
PktSent OutQ<br>
-------------------------------------------------------------------------------<br>
10.0.0.33<br>
65000        7    0 00h00m20s 0/0/0 (VpnIPv4)<br>
7    0           0/0/0 (VpnIPv6)<br>
-------------------------------------------------------------------------------<br>

For Cisco IOS XR based network function XR3 we take the same sequence of the verification activities. Here are we have status of the interfaces at XR3:
1
2
RP/0/0/CPU0:XR3#show ipv4 int br<br>
Sun Sep 30 21:04:49.178 UTC

1
<br>
1
2
3
4
5
Interface                      IP-Address      Status          Protocol Vrf-Name<br>
Loopback0                      10.0.0.33       Up              Up       default<br>
MgmtEth0/0/CPU0/0              192.168.1.111   Up              Up       MGMT<br>
GigabitEthernet0/0/0/0         10.11.33.33     Up              Up       default<br>
GigabitEthernet0/0/0/1         unassigned      Shutdown        Down     default<br>

Then we check status of OSPF adjacency:
1
2
RP/0/0/CPU0:XR3#show ospf neighbor<br>
Sun Sep 30 21:05:42.804 UTC

1
2
* Indicates MADJ interface<br>
# Indicates Neighbor awaiting BFD session up

Neighbors for OSPF 0

Neighbor ID Pri State Dead Time Address Interface
10.0.0.11 1 FULL/ – 00:00:35 10.11.33.11 GigabitEthernet0/0/0/0
Neighbor is up for 00:02:07

1
 
1
Total neighbor count: 1<br>

Then routing table:
1
2
RP/0/0/CPU0:XR3#show route ipv4<br>
Sun Sep 30 21:05:27.025 UTC

1
2
3
4
5
6
7
8
9
Codes: C - connected, S - static, R - RIP, B - BGP, (&gt;) - Diversion path<br>
D - EIGRP, EX - EIGRP external, O - OSPF, IA - OSPF inter area<br>
N1 - OSPF NSSA external type 1, N2 - OSPF NSSA external type 2<br>
E1 - OSPF external type 1, E2 - OSPF external type 2, E - EGP<br>
i - ISIS, L1 - IS-IS level-1, L2 - IS-IS level-2<br>
ia - IS-IS inter area, su - IS-IS summary null, * - candidate default<br>
U - per-user static route, o - ODR, L - local, G  - DAGR, l - LISP<br>
A - access/subscriber, a - Application route<br>
M - mobile route, r - RPL, (!) - FRR Backup path

Gateway of last resort is not set

1
 
1
2
3
4
O    10.0.0.11/32 [110/1] via 10.11.33.11, 00:01:51, GigabitEthernet0/0/0/0<br>
L    10.0.0.33/32 is directly connected, 00:01:53, Loopback0<br>
C    10.11.33.0/24 is directly connected, 00:01:52, GigabitEthernet0/0/0/0<br>
L    10.11.33.33/32 is directly connected, 00:01:52, GigabitEthernet0/0/0/0<br>

And we are finishing with BGP peering for XR3:
1
2
3
4
5
6
7
8
9
10
11
RP/0/0/CPU0:XR3#show bgp vpnv4 unicast summary<br>
Sun Sep 30 21:05:12.386 UTC<br>
BGP router identifier 10.0.0.33, local AS number 65000<br>
BGP generic scan interval 60 secs<br>
Non-stop routing is enabled<br>
BGP table state: Active<br>
Table ID: 0x0   RD version: 0<br>
BGP main routing table version 1<br>
BGP NSR Initial initsync version 1 (Reached)<br>
BGP NSR/ISSU Sync-Group versions 0/0<br>
BGP scan interval 60 secs

1
BGP is operating in STANDALONE mode.

Process RcvTblVer bRIB/RIB LabelVer ImportVer SendTblVer StandbyVer
Speaker 1 1 1 1 1 0

Neighbor Spk AS MsgRcvd MsgSent TblVer InQ OutQ Up/Down St/PfxRcd
10.0.0.11 0 65000 6 7 1 0 0 00:01:35 0

1
 
1
 

As you see, adjacency is established both for OSPF and BGP. Actually, we could have omitted check of the routing table as BGP check implicitly includes it: if IPv4 addresses of loopbacks aren’t propagated properly, BGP session won’t come up.

After we have verified state of the network, we will check the same parameter using operational YANG models, hence we’ll go for telemetry data.

Cisco IOS // Operational YANG modules

In one of the OpenConfig articles we have collected all the YANG modules available on the Cisco IOS XR devices. Besides OpenConfig YANG modules, what we have used, there are also a lot of native modules, which are either configuration modules or operational ones. The operational has word “oper” in its name like on the example below:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
$ ls -l ~/Yang/yang/vendor/cisco/xr/651/ | grep 'oper' | more<br>
-rw-rw-r--. 1 aaa aaa  28159 Sep 22 17:53 Cisco-IOS-XR-aaa-diameter-oper-sub1.yang<br>
-rw-rw-r--. 1 aaa aaa   2969 Sep 22 17:53 Cisco-IOS-XR-aaa-diameter-oper.yang<br>
-rw-rw-r--. 1 aaa aaa   5728 Sep 22 17:53 Cisco-IOS-XR-aaa-locald-oper-sub1.yang<br>
-rw-rw-r--. 1 aaa aaa   2982 Sep 22 17:53 Cisco-IOS-XR-aaa-locald-oper.yang<br>
-rw-rw-r--. 1 aaa aaa   4166 Sep 22 17:53 Cisco-IOS-XR-aaa-nacm-oper-sub1.yang<br>
-rw-rw-r--. 1 aaa aaa   2915 Sep 22 17:53 Cisco-IOS-XR-aaa-nacm-oper.yang<br>
-rw-rw-r--. 1 aaa aaa  13380 Sep 22 17:53 Cisco-IOS-XR-aaa-protocol-radius-oper-sub1.yang<br>
-rw-rw-r--. 1 aaa aaa  15813 Sep 22 17:53 Cisco-IOS-XR-aaa-protocol-radius-oper-sub2.yang<br>
-rw-rw-r--. 1 aaa aaa   5523 Sep 22 17:53 Cisco-IOS-XR-aaa-protocol-radius-oper.yang<br>
-rw-rw-r--. 1 aaa aaa   4891 Sep 22 17:53 Cisco-IOS-XR-aaa-tacacs-oper-sub1.yang<br>
-rw-rw-r--. 1 aaa aaa   1504 Sep 22 17:53 Cisco-IOS-XR-aaa-tacacs-oper.yang<br>
!<br>
! FURTHER OUTPUT IS OMITTED<br>

To understand what is inside, we can take an example of some module and render it using “pyang”:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
$ pyang -f tree -p ~/Yang/yang/vendor/cisco/xr/612/ ~/Yang/yang/vendor/cisco/xr/612/Cisco-IOS-XR-ifmgr-oper.yang<br>
/home/aaa/Yang/yang/vendor/cisco/xr/612/Cisco-IOS-XR-ifmgr-oper-sub2.yang:9: warning: imported module Cisco-IOS-XR-types not used<br>
module: Cisco-IOS-XR-ifmgr-oper<br>
+--ro interface-dampening<br>
|  +--ro interfaces<br>
|  |  +--ro interface* [interface-name]<br>
|  |     +--ro if-dampening<br>
|  |     |  +--ro interface-dampening<br>
|  |     |  |  +--ro penalty?                 uint32<br>
|  |     |  |  +--ro is-suppressed-enabled?   boolean<br>
|  |     |  |  +--ro seconds-remaining?       uint32<br>
|  |     |  |  +--ro flaps?                   uint32<br>
|  |     |  |  +--ro state?                   Im-state-enum<br>
|  |     |  +--ro state-transition-count?       uint32<br>
|  |     |  +--ro last-state-transition-time?   uint32<br>
|  |     |  +--ro is-dampening-enabled?         boolean<br>
|  |     |  +--ro half-life?                    uint32<br>
|  |     |  +--ro reuse-threshold?              uint32<br>
|  |     |  +--ro suppress-threshold?           uint32<br>
|  |     |  +--ro maximum-suppress-time?        uint32<br>
|  |     |  +--ro restart-penalty?              uint32<br>
|  |     |  +--ro capsulation* []<br>
|  |     |     +--ro capsulation-dampening<br>
|  |     |     |  +--ro penalty?                 uint32<br>
|  |     |     |  +--ro is-suppressed-enabled?   boolean<br>
|  |     |     |  +--ro seconds-remaining?       uint32<br>
|  |     |     |  +--ro flaps?                   uint32<br>
|  |     |     |  +--ro state?                   Im-state-enum<br>
|  |     |     +--ro capsulation-number?      string<br>
|  |     +--ro interface-name    xr:Interface-name<br>
|  +--ro nodes<br>
!<br>
! FURTHER OUTPUT IS OMITTED<br>

As you might spot, all the nodes in this YANG module have “ro” attribute, meaning they are read only and we only can read data from them, not to configure. For each configuration YANG module there is typically operational module available.

Cisco IOS XR // Model-driven telemetry algorithm

As we’ve said in the beginning, we’ll try to collect information about network state in YANG operational modules using NETCONF. To do that we’ll use “netconf_get” Ansible module, which we have covered earlier. The algorithm is quite straightforward:

The last point is a bit synthetic in this article, as in real telemetry scenario you will use some messaging bus as Kafka for example, which will store those data, so you won’t need to parse it in that way. But about Kafka will talk in the separate article.

Cisco IOS XR // Ansible playbook for telemetry collection

To implement the algorithm above, our playbook using roles has the following structure:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
+--ansible<br>
+--136_lab.yml<br>
+--group_vars<br>
|  +--cisco<br>
|  |  +--cisco_host.yml<br>
|  +--nokia<br>
|     +--nokia_host.yml<br>
+--roles<br>
+--cisco<br>
|  +--136_lab<br>
|     +--tasks<br>
|     |  +--comapring_loop.yml<br>
|     |  +--main.yml<br>
|     +--templates<br>
|     |  +--cisco_telemetry_interfaces.j2<br>
|     |  +--cisco_telemetry_routing_bgp.j2<br>
|     |  +--cisco_telemetry_routing_ospf.j2<br>
|     |  +--netconf_request.j2<br>
|     +--vars<br>
|        +--XR3_infra_profile.yml<br>
+--nokia<br>
+--136_lab<br>
+--tbd<br>

The master Ansible playbook has very easy structure:
1
2
3
4
5
6
$ cat 136_lab.yml<br>
---<br>
- hosts
: cisco<br>
connection
: netconf<br>
roles:<br>
- { role
: cisco/136_lab }

1
<br>
1
2
3
4
5
- hosts: nokia<br>
connection
: netconf<br>
roles:<br>
- { role
: nokia/136_lab }<br>
...<br>

Same structure we have used more or less with all of our Ansible playbook with roles.

The “group_vars” for Cisco IOS XR contains authentication data and some other general parameters:

1
2
3
4
5
6
7
$ cat group_vars/cisco/cisco_host.yml<br>
---<br>
ansible_network_os
: iosxr<br>
ansible_user
: cisco<br>
ansible_pass
: cisco<br>
ansible_ssh_pass
: cisco<br>
...<br>

Before we go deeper into details of playbooks, let’s take a look into file with “vars” for Cisco IOS XR based network function XR3:
1
2
3
4
5
6
7
8
9
10
11
12
$ cat roles/cisco/136_lab/vars/XR3_infra_profile.yml<br>
---<br>
node:<br>
hostname
: XR3<br>
vendor
: cisco<br>
os
: iosxr<br>
version
: 6.1.2<br>
configuration_profiles:<br>
- profile
: interfaces<br>
- profile
: routing_ospf<br>
- profile
: routing_bgp<br>
...<br>

As we have mentioned above, you can see here different configuration profiles, which are used for checking telemetry.

In reality, initial configuration was also done using these configuration profiles based configuration YANG modules, but this is out of scope for this article.

Now, it’s time to take a look into working horses of this automation using Ansible for telemetry, that is actual playbooks with tasks. Let’ start with “main.yml”:

1
2
3
4
5
6
$ cat roles/cisco/136_lab/tasks/main.yml<br>
---<br>
- name
: VERIFICATION // {{ inventory_hostname }} // IMPORTING INFRASTRUCTURE PROFILE<br>
include_vars:<br>
file
: "{{ inventory_hostname }}_infra_profile.yml"<br>
name
: PROFILE

1
2
3
4
5
- name: VERIFICATION // {{ inventory_hostname }} // DELETE PREVIOUS TEST REPORT<br>
file:<br>
dest
: /tmp/{{ inventory_hostname }}_test_report.txt<br>
state
: absent<br>
ignore_errors
: yes

1
<code lang="yaml">

– name: VERIFICATION // {{ inventory_hostname }} // CREATING TEST REPORT
file:
dest: /tmp/{{ inventory_hostname }}_test_report.txt
state: touch

1
<code lang="yaml">

– name: VERIFICATION // {{ inventory_hostname }} // COLLECTING TELEMETRY AND SEARCHING DATA
include_tasks: comparing_loop.yml
loop: “{{ PROFILE.node.configuration_profiles }}”

1
<code lang="yaml">

– name: VERIFICATION // {{ inventory_hostname }} // COMPILING REPORT
shell: “cat /tmp/temp_report_136_{{ inventory_hostname }}_* > /tmp/{{ inventory_hostname }}_test_report.txt”

1
<br>

1
 
1
2
3
4
- name: VERIFICATION // {{ inventory_hostname }} // REPORTING READINESS<br>
debug:<br>
msg
: "Collection of telemetry data from {{ inventory_hostname }} is done."<br>
...<br>

Take a look at article about Ansible roles, if you have questions on this structure.

In the playbook above we do the following actions:

You might have spot that the crucial part is concentrated in the 3rd task, where we collect telemetry using another playbook, hence the next obvious step is to review that playbook:

1
2
3
4
5
6
7
$ cat roles/cisco/136_lab/tasks/comparing_loop.yml<br>
---<br>
- name
: VERIFICATION // {{ inventory_hostname }} // COLLECTING TELEMETRY AND SEARCHING DATA // FETCHING TELEMETRY DATA<br>
netconf_get:<br>
filter
: "{{ lookup ('template', 'netconf_request.j2') }}"<br>
display
: json<br>
register
: output_json

1
2
3
4
- name: VERIFICATION // {{ inventory_hostname }} // COLLECTING TELEMETRY AND SEARCHING DATA // SAVING TELEMETRY DATA<br>
copy:<br>
content
: "{{ output_json.output | to_nice_json }}"<br>
dest
: /tmp/{{ inventory_hostname }}/{{ inventory_hostname }}_{{ item.profile }}_yang_telemetry.json

1
<code lang="yaml">

– name: VERIFICATION // {{ inventory_hostname }} // COLLECTING TELEMETRY AND SEARCHING DATA // MODIFIICATION OF COLLECTED TELEMETRY FOR PYTHON PROCESSING
replace:
path: /tmp/{{ inventory_hostname }}/{{ inventory_hostname }}_{{ item.profile }}_yang_telemetry.json
regexp: ‘-‘
replace: ‘_’

1
<code lang="yaml">

– name: VERIFICATION // {{ inventory_hostname }} // COLLECTING TELEMETRY AND SEARCHING DATA // IMPORTING COLLECTED TELEMETRY DATA
include_vars:
file: /tmp/{{ inventory_hostname }}/{{ inventory_hostname }}_{{ item.profile }}_yang_telemetry.json
name: COLLECTED

1
<br>

1
 
1
2
3
4
5
6
- name: VERIFICATION // {{ inventory_hostname }} // COLLECTING TELEMETRY AND SEARCHING DATA // COMPILING {{ item.profile }}<br>
template:<br>
src
: cisco_telemetry_{{ item.profile }}.j2<br>
dest
: /tmp/temp_report_136_{{ inventory_hostname }}_{{ item.profile }}.txt<br>
mode
: 0755<br>
...<br>

In this playbook we have the same number of tasks, as in the previous one, so it’s quite big. I guess, you want to know, what exactly we are doing here. Here we go:

As a lot of real action is dependent on files with templates, we need to understand them. The first template called “netconf_request.j2” is used for creating proper NETCONF request:

1
2
$ cat roles/cisco/136_lab/templates/netconf_request.j2<br>
{% if item.profile == 'interfaces' %}

1
<code lang="yaml">

{% elif item.profile == ‘routing_ospf’ %}

1
<code lang="yaml">

{% elif item.profile == ‘routing_bgp’ %}

1
<br>

1
{% endif %}<br>

Depending on the name of the configuration profile, we create proper request. The namespace and the name of the block we check within file with the operational YANG modules. The following example for interfaces:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
cat ~/Yang/yang/vendor/cisco/xr/612/Cisco-IOS-XR-ifmgr-oper.yang | grep 'namespace\|container'<br>
namespace "http://cisco.com/ns/yang/Cisco-IOS-XR-ifmgr-oper";<br>
container interfaces {<br>
container interface-dampening {<br>
container interfaces {<br>
container if-dampening {<br>
container nodes {<br>
container show {<br>
container dampening {<br>
container if-handles {<br>
container interfaces {<br>
container interface-properties {<br>
container data-nodes {<br>
container locationviews {<br>
container pq-node-locations {<br>
container system-view {<br>

For XML schema we take the parameter of the “namespace” from the corresponding YANG module. For the request itself we take the name of inter interesting high-level container. In the example above we have 2 high-level containers: “interface-dampening” and “interface-properties”. We are interested only in the second one, so we put its name.

The rest of the templates are used to extract some information from the telemetry. As an example, for interfaces we check the status of the ports:

1
2
3
4
$ cat roles/cisco/136_lab/templates/cisco_telemetry_interfaces.j2<br>
+------------------------------------------+<br>
|     Checking of the interfaces status    |<br>
+------------------------------------------+

1
2
3
{% for port_current in COLLECTED.rpc_reply.data.interface_properties.data_nodes.data_node.locationviews.locationview.interfaces.interface %}<br>
Port:           {{ port_current.interface_name }}<br>
Status:         {{ port_current.actual_line_state }}

{% endfor %}

1
 
1
2
3
===========================================<br>
Verification of interfaces is done<br>
===========================================<br>

For OSPF routing we check the status of the neighbors:
1
2
3
4
$ cat roles/cisco/136_lab/templates/cisco_telemetry_routing_ospf.j2<br>
+------------------------------------------+<br>
|       Checking of the OSPF status        |<br>
+------------------------------------------+

1
2
3
4
5
6
{% if COLLECTED.rpc_reply.data.ospf.processes.process.default_vrf.adjacency_information is defined %}<br>
Neighbor:<br>
RID:          {{ COLLECTED.rpc_reply.data.ospf.processes.process.default_vrf.adjacency_information.neighbors.neighbor.neighbor_id }}<br>
IP:           {{ COLLECTED.rpc_reply.data.ospf.processes.process.default_vrf.adjacency_information.neighbors.neighbor.neighbor_address_xr }}<br>
Connected to: {{ COLLECTED.rpc_reply.data.ospf.processes.process.default_vrf.adjacency_information.neighbors.neighbor.interface_name }}<br>
Status:       {{ COLLECTED.rpc_reply.data.ospf.processes.process.default_vrf.adjacency_information.neighbors.neighbor.neighbor_state }}

{% else %}
After software update:
There is no OSPF neighbors detected

{% endif %}

1
 
1
2
3
===========================================<br>
Verification of OSPF neighbors is done<br>
===========================================<br>

And for the BGP routing, we check the status of the BGP peers:
1
2
3
4
$ cat roles/cisco/136_lab/templates/cisco_telemetry_routing_bgp.j2<br>
+------------------------------------------+<br>
|        Checking of the BGP status        |<br>
+------------------------------------------+

1
2
3
4
5
{% if COLLECTED.rpc_reply.data.bgp.instances is defined %}<br>
Neighbor:<br>
ID:          {{ COLLECTED.rpc_reply.data.bgp.instances.instance.instance_active.default_vrf.neighbors.neighbor.neighbor_address }}<br>
State:       {{ COLLECTED.rpc_reply.data.bgp.instances.instance.instance_active.default_vrf.neighbors.neighbor.connection_state }}<br>
AFI/SAFI:    {% for afi in COLLECTED.rpc_reply.data.bgp.instances.instance.instance_active.default_vrf.neighbors.neighbor.af_data %}{{ afi.af_name }} {% endfor %}

{% else %}
Problem with BGP process on local node ({{ inventory_hostname }})

{% endif %}

1
 
1
2
3
===========================================<br>
Verification of BGP neighbors is done<br>
===========================================<br>

When the preparation is done meaning all the playbooks, variables and templates are defined, we can test them.

Cisco IOS // Collecting telemetry via NETCONF/YANG with Ansible

For now the names of the variable from telemetry don’t say you anything, so let’s execute our Ansible playbook to collect telemetry:

1
$ ansible-playbook 136_lab.yml --limit=XR3

1
PLAY [cisco] ***********************************************************************************************************************************************************

TASK [Gathering Facts] *************************************************************************************************************************************************
ok: [XR3]

TASK [cisco/136_lab : VERIFICATION // XR3 // IMPORTING INFRASTRUCTURE PROFILE] *****************************************************************************************
ok: [XR3]

TASK [cisco/136_lab : VERIFICATION // XR3 // DELETE PREVIOUS TEST REPORT] **********************************************************************************************
changed: [XR3]

TASK [cisco/136_lab : VERIFICATION // XR3 // CREATING TEST REPORT] *****************************************************************************************************
changed: [XR3]

TASK [cisco/136_lab : VERIFICATION // XR3 // COLLECTING TELEMETRY AND SEARCHING DATA] **********************************************************************************
included: /home/aaa/ansible/roles/cisco/136_lab/tasks/comparing_loop.yml for XR3
included: /home/aaa/ansible/roles/cisco/136_lab/tasks/comparing_loop.yml for XR3
included: /home/aaa/ansible/roles/cisco/136_lab/tasks/comparing_loop.yml for XR3

TASK [cisco/136_lab : VERIFICATION // XR3 // COLLECTING TELEMETRY AND SEARCHING DATA // FETCHING TELEMETRY DATA] *******************************************************
ok: [XR3]

TASK [cisco/136_lab : VERIFICATION // XR3 // COLLECTING TELEMETRY AND SEARCHING DATA // SAVING TELEMETRY DATA] *********************************************************
changed: [XR3]

TASK [cisco/136_lab : VERIFICATION // XR3 // COLLECTING TELEMETRY AND SEARCHING DATA // MODIFIICATION OF COLLECTED TELEMETRY FOR PYTHON PROCESSING] ********************
changed: [XR3]

TASK [cisco/136_lab : VERIFICATION // XR3 // COLLECTING TELEMETRY AND SEARCHING DATA // IMPORTING COLLECTED TELEMETRY DATA] ********************************************
ok: [XR3]

TASK [cisco/136_lab : VERIFICATION // XR3 // COLLECTING TELEMETRY AND SEARCHING DATA // COMPILING interfaces] **********************************************************
ok: [XR3]

TASK [cisco/136_lab : VERIFICATION // XR3 // COLLECTING TELEMETRY AND SEARCHING DATA // FETCHING TELEMETRY DATA] *******************************************************
ok: [XR3]

TASK [cisco/136_lab : VERIFICATION // XR3 // COLLECTING TELEMETRY AND SEARCHING DATA // SAVING TELEMETRY DATA] *********************************************************
changed: [XR3]

TASK [cisco/136_lab : VERIFICATION // XR3 // COLLECTING TELEMETRY AND SEARCHING DATA // MODIFIICATION OF COLLECTED TELEMETRY FOR PYTHON PROCESSING] ********************
changed: [XR3]

TASK [cisco/136_lab : VERIFICATION // XR3 // COLLECTING TELEMETRY AND SEARCHING DATA // IMPORTING COLLECTED TELEMETRY DATA] ********************************************
ok: [XR3]

TASK [cisco/136_lab : VERIFICATION // XR3 // COLLECTING TELEMETRY AND SEARCHING DATA // COMPILING routing_ospf] ********************************************************
changed: [XR3]

TASK [cisco/136_lab : VERIFICATION // XR3 // COLLECTING TELEMETRY AND SEARCHING DATA // FETCHING TELEMETRY DATA] *******************************************************
ok: [XR3]

TASK [cisco/136_lab : VERIFICATION // XR3 // COLLECTING TELEMETRY AND SEARCHING DATA // SAVING TELEMETRY DATA] *********************************************************
changed: [XR3]

TASK [cisco/136_lab : VERIFICATION // XR3 // COLLECTING TELEMETRY AND SEARCHING DATA // MODIFIICATION OF COLLECTED TELEMETRY FOR PYTHON PROCESSING] ********************
changed: [XR3]

TASK [cisco/136_lab : VERIFICATION // XR3 // COLLECTING TELEMETRY AND SEARCHING DATA // IMPORTING COLLECTED TELEMETRY DATA] ********************************************
ok: [XR3]

TASK [cisco/136_lab : VERIFICATION // XR3 // COLLECTING TELEMETRY AND SEARCHING DATA // COMPILING routing_bgp] *********************************************************
changed: [XR3]

TASK [cisco/136_lab : VERIFICATION // XR3 // COMPILING REPORT] *********************************************************************************************************
changed: [XR3]

TASK [cisco/136_lab : VERIFICATION // XR3 // REPORTING READINESS] ******************************************************************************************************
ok: [XR3] => {
“msg”: “Collection of telemetry data from XR3 is done.”
}

PLAY [nokia] ***********************************************************************************************************************************************************
skipping: no hosts matched

1
 
1
2
PLAY RECAP *************************************************************************************************************************************************************<br>
XR3                        : ok=24   changed=11   unreachable=0    failed=0<br>

After the execution of the playbook is finished, we can check the results:
1
2
3
4
$ ls -l /tmp/XR3 | grep 'XR3'<br>
-rw-rw-r--. 1 aaa aaa  17166 Oct  2 07:55 XR3_interfaces_yang_telemetry.json<br>
-rw-rw-r--. 1 aaa aaa 102808 Oct  2 07:56 XR3_routing_bgp_yang_telemetry.json<br>
-rw-rw-r--. 1 aaa aaa 237501 Oct  2 07:56 XR3_routing_ospf_yang_telemetry.json<br>

These 3 files contain telemetry data, what we have requested: interfaces, OSPF and BGP. To be honest, they are very huge, especially BGP’s telemetry. So I provide you some snippet from interface level telemetry:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
$ cat /tmp/XR3/XR3_interfaces_yang_telemetry.json<br>
{<br>
"rpc_reply": {<br>
"data": {<br>
"interface_properties": {<br>
"data_nodes": {<br>
"data_node": {<br>
"data_node_name": "0/0/CPU0",<br>
"locationviews": {<br>
"locationview": {<br>
"interfaces": {<br>
"interface": [<br>
{<br>
"actual_line_state": "im_state_up",<br>
"actual_state": "im_state_up",<br>
"bandwidth": "0",<br>
"encapsulation": "fint_base",<br>
"encapsulation_type_string": "FINT_BASE_CAPS",<br>
"interface": "FINT0/0/CPU0",<br>
"interface_name": "FINT0/0/CPU0",<br>
"l2_transport": "false",<br>
"line_state": "im_state_up",<br>
"mtu": "8000",<br>
"state": "im_state_up",<br>
"sub_interface_mtu_overhead": "0",<br>
"type": "IFT_FINT_INTF"<br>
},<br>
{<br>
"actual_line_state": "im_state_up",<br>
"actual_state": "im_state_up",<br>
"bandwidth": "1000000",<br>
"encapsulation": "ether",<br>
"encapsulation_type_string": "ARPA",<br>
"interface": "GigabitEthernet0/0/0/0",<br>
"interface_name": "GigabitEthernet0/0/0/0",<br>
"l2_transport": "false",<br>
"line_state": "im_state_up",<br>
"mtu": "1514",<br>
"state": "im_state_up",<br>
"sub_interface_mtu_overhead": "0",<br>
"type": "IFT_GETHERNET"<br>
},<br>
{<br>
"actual_line_state": "im_state_admin_down",<br>
"actual_state": "im_state_admin_down",<br>
"bandwidth": "1000000",<br>
"encapsulation": "ether",<br>
"encapsulation_type_string": "ARPA",<br>
"interface": "GigabitEthernet0/0/0/1",<br>
"interface_name": "GigabitEthernet0/0/0/1",<br>
"l2_transport": "false",<br>
"line_state": "im_state_admin_down",<br>
"mtu": "1514",<br>
"state": "im_state_admin_down",<br>
"sub_interface_mtu_overhead": "0",<br>
"type": "IFT_GETHERNET"<br>
},<br>
!<br>
! FURTHER OUTPUT IS OMITTED<br>

Actually all the information you can collect through various show commands are available in JSON format in telemetry. For OSPF and BGP there is MUCH more information, what you can review. Like for OSPF you will have even all LSAs from LSDB in telemetry output.

The last point in verification for Cisco will be to take a look onto test report, where we check:

As we have explained earlier, we create the test report per configuration profile and then merge together:

1
2
3
4
$ cat /tmp/XR3_test_report.txt<br>
+------------------------------------------+<br>
|     Checking of the interfaces status    |<br>
+------------------------------------------+

1
2
    Port:           FINT0/0/CPU0<br>
Status:         im_state_up

Port: GigabitEthernet0/0/0/0
Status: im_state_up

Port: GigabitEthernet0/0/0/1
Status: im_state_admin_down

Port: Loopback0
Status: im_state_up

Port: MgmtEth0/0/CPU0/0
Status: im_state_up

Port: Null0
Status: im_state_up

Port: nV_Loopback0
Status: im_state_up

Port: nV_Loopback1
Status: im_state_up

===========================================
Verification of interfaces is done
===========================================
+——————————————+
| Checking of the BGP status |
+——————————————+

Problem with BGP process on local node (XR3)

===========================================
Verification of BGP neighbors is done
===========================================

+——————————————+
| Checking of the OSPF status |
+——————————————+

Neighbor:
RID: 10.0.0.11
IP: 10.11.33.11
Connected to: GigabitEthernet0/0/0/0
Status: mgmt_nbr_full

1
 
1
2
3
===========================================<br>
Verification of OSPF neighbors is done<br>
===========================================<br>

I think, you have spotted that BGP has the problems in the output. The reason for that is that 2 GB RAM in my virtual lab is not sufficient for telemetry, and NETCONF/YANG request to collect BGP telemetry data pushes the memory state on the virtual network function XR3 to critical state and BGP process is broken afterwards. Nevertheless, if we increase the amount of available memory, then it’s OK. So, that’s how model-driven telemetry looks like on Cisco IOS XR. Here you can download the ansible playbooks for this article: 136_lab.tar

Lessons learned

As we have learned that collection of the telemetry data via NETCONF is quite resource intensive task, so it shouldn’t be used in production. We advise you to go for gRPC/gNMI to stream telemetry data.

Conclusion

For the sake of brevity I have reduced the output of the provided commands, but the amount of the information contained in the telemetry is just overwhelming. It’s really new oil in the networking, as this telemetry information allows you to build any kind of business logic. Later on we’ll review Kafka, so you will get much more understanding how the whole solution is working, but even now you can assess the amount of information in model-driven telemetry. In the nxt article we’ll take a look on model-driven telemetry in Nokia SR OS. Take care and good bye!

Support us






P.S.

If you have further questions or you need help with your networks, I’m happy to assist you, just send me message (https://karneliuk.com/contact/). Also don’t forget to share the article on your social media, if you like it.

BR,

Anton Karneliuk

Exit mobile version