Hello my friend,

We have spent some time on configuration of the network functions using NETCONF/YANG using native vendor models and OpenConfig. Now the time is come to talk about information, which we can extract from the router in YANG data models, what is also called model-driven telemetry.

Brief description

Nowadays telemetry is a hot topic for discussion on various network events, conferences or just vendors’ presentation to customers. As usual, without knowing the details about what it is, how it could be used and which benefit it can bring you, such discussions aren’t more than just a hype. So in this article we’ll try to discovery what telemetry means, for sure, with examples in multivendor environment with Cisco IOS XR and Nokia (Alcatel-Lucent) SR OS.

For a long time, SNMP was the primary source of the information about state of the network. It has evolved a lot since its first release and current version 3 has even built-in security using encryption and hashing. Nevertheless, not all the information about network devices are available through SNMP, that’s why still some information is gathered through show commands and proper parsing, which isn’t flexible task. On the other hand, new era of networking, where SDN comes at the scene, requires much more information from the network about its actual state; this is information must be available to SDN controller or network analytics instantly on anytime basis.

And that’s where model-driven telemetry could help a lot. Model-driven means that telemetry is based on the YANG modules, which are complementary to configuration modules we have used a lot previously. It means that virtually all information about network states should be available through operational YANG modules

As you’ve already learned, there are different types of YANG modules available: vendor-native and vendor-neutral (like OpenConfig). In this article we’ll focus on the vendor-native modules for Cisco IOS XR and Nokia (Alcatel-Lucent) SR OS, as it’s the first natural step in telemetry. In the following articles we’ll review telemetry available in OpenConfig YANG modules as well.

In terms of transport protocol, which is used for model-driven telemetry, there are different options. The most popular deployment is based on gRPC (or recently gNMI), which uses HTTP and perfectly fits for transmission of the structured data. There is another option, which is to use TCP directly, but if we speak about cross-vendor implementation, gRPC/gNMI looks more favorable.

Nevertheless, in our examples you will see some NETCONF, though NETCONF itself isn’t supposed to be transport for telemetry. The reason for that is that we don’t build any telemetry collector yet, but rather focusing on the data collecting itself. Probably we’ll review the streaming of telemetry data in the upcoming articles.

What are we going to test?

We are going to collect information from the network functions in their native YANG modules for operational data for:

  • Nokia (Alcatel-Lucent) SR OS 16.0.R3
  • Cisco IOS XR 6.1.2

We’ll use “netconf_get” Ansible module for that.

Software version

The following software components are used in this lab:

  • CentOS 7 with python 2.7.
  • Ansible 2.6.4
  • Nokia SR OS 16.0.R3 [guest VNF]
  • Cisco IOS XR 6.1.2 [guest VNF]

See the previous article to get details how to build the lab

Topology

We are using our standard topology:

The logical topology is quite simple so we have just back to back connectivity between Nokia SR OS based VNF SR1 and Cisco IOS XR based VNF XR3, it’s just expressed in ASCII:

1
2
3
4
5
6
7
8
    +----------------+    10.11.33.0/24     +----------------+
    |      SR1       +----------------------+       XR3      |
    +-------+--------+ 1/1/c1/1    g0/0/0/0 +--------+-------+
            |          .11              .33          |
            | system: 10.0.0.11/32                   | loopback0: 10.0.0.33/32

                      <---OSPF:area0/p2p--->
            <-------------BGP:VPNv4/VPNv6------------>

 

The initial configuration you can see in the attached files: 136_config_initial_XR3 136_config_initial_SR1

Brief topology check

From the configuration prospective we have configured just 3 topics:

  • Interfaces;
  • OSPF including Segment Routing;
  • BGP for VPNv4/VPNv6 unicast address families;

This set of the configuration is very basic on the one hand; on the other hand, it represents basis for modern service provider network capable to provide all kind of IP/MPLS services for customers. We won’t go into explanation of the details, as each of them were covered into details earlier in multivendor setup with Nokia (Alcatel-Lucent) SR OS and Cisco IOS XR: interfaces, OSPF, Segment Routing and BGP IP VPNs. Here we’ll just review some operational commands.

We start with the status of the ports and interfaces on Nokia (Alcatel-Lucent) SR OS based network function SR1:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
[]
A:admin@SR1# show port

===============================================================================
Ports on Slot 1
===============================================================================
Port          Admin Link Port    Cfg  Oper LAG/ Port Port Port   C/QS/S/XFP/
Id            State      State   MTU  MTU  Bndl Mode Encp Type   MDIMDX
-------------------------------------------------------------------------------
1/1/c1        Up         Link Up                          conn   100GBASE-LR4*
1/1/c1/1      Up    Yes  Up      1514 1514    - netw null xgige  
1/1/c1/2      Down  No   Down    9212 9212    - netw null xgige  
1/1/c1/3      Down  No   Down    9212 9212    - netw null xgige  
1/1/c1/4      Down  No   Down    9212 9212    - netw null xgige  
1/1/c2        Up         Link Up                          conn   100GBASE-LR4*
1/1/c2/1      Down  No   Down    9212 9212    - netw null xgige  
1/1/c2/2      Down  No   Down    9212 9212    - netw null xgige  
1/1/c2/3      Down  No   Down    9212 9212    - netw null xgige  
1/1/c2/4      Down  No   Down    9212 9212    - netw null xgige  
1/1/c3        Down       Down                             conn   100GBASE-LR4*
1/1/c4        Down       Down                             conn   100GBASE-LR4*
1/1/c5        Down       Down                             conn   100GBASE-LR4*
1/1/c6        Down       Down                             conn   100GBASE-LR4*

===============================================================================
Ports on Slot A
===============================================================================
Port          Admin Link Port    Cfg  Oper LAG/ Port Port Port   C/QS/S/XFP/
Id            State      State   MTU  MTU  Bndl Mode Encp Type   MDIMDX
-------------------------------------------------------------------------------
A/1           Up    Yes  Up      1514 1514    - netw null faste  MDI
A/4           Up    No   Ghost   1514 1514    - netw null faste  
===============================================================================


[]
A:admin@SR1# show router interface

===============================================================================
Interface Table (Router: Base)
===============================================================================
Interface-Name                   Adm       Opr(v4/v6)  Mode    Port/SapId
   IP-Address                                                  PfxState
-------------------------------------------------------------------------------
system                           Up        Up/Up       Network system
   10.0.0.11/32                                                n/a
   fc00::10:0:0:11/128                                         PREFERRED
uplink1                          Up        Up/Up       Network 1/1/c1/1
   10.11.33.11/24                                              n/a
   fc00::10:11:33:11/112                                       PREFERRED
   fe80::5054:ff:fe02:201/64                                   PREFERRED
-------------------------------------------------------------------------------
Interfaces : 2
===============================================================================

 

Then we check the adjacency of the OSPF:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
[]
A:admin@SR1# show router ospf neighbor

===============================================================================
Rtr Base OSPFv2 Instance 0 Neighbors
===============================================================================
Interface-Name                   Rtr Id          State      Pri  RetxQ   TTL
   Area-Id
-------------------------------------------------------------------------------
uplink1                          10.0.0.33       Full       1    0       33
   0.0.0.0
-------------------------------------------------------------------------------
No. of Neighbors: 1
===============================================================================

 

Followed by the routing table:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
[]
A:admin@SR1# show router route-table

===============================================================================
Route Table (Router: Base)
===============================================================================
Dest Prefix[Flags]                            Type    Proto     Age        Pref
      Next Hop[Interface Name]                                    Metric  
-------------------------------------------------------------------------------
10.0.0.11/32                                  Local   Local     00h04m12s  0
       system                                                       0
10.0.0.33/32                                  Remote  OSPF      00h03m41s  10
       10.11.33.33                                                  11
10.11.33.0/24                                 Local   Local     00h04m12s  0
       uplink1                                                      0
-------------------------------------------------------------------------------
No. of Routes: 3
Flags: n = Number of times nexthop is repeated
       B = BGP backup route available
       L = LFA nexthop available
       S = Sticky ECMP requested
===============================================================================

 

And we finish with the status of BGP peering:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
[]
A:admin@SR1# show router bgp summary
===============================================================================
 BGP Router ID:10.0.0.11        AS:65000       Local AS:65000      
===============================================================================
BGP Admin State         : Up          BGP Oper State              : Up
!
! OUTPUT IS OMITTED
!
===============================================================================
BGP Summary
===============================================================================
Legend : D - Dynamic Neighbor
===============================================================================
Neighbor
Description
                   AS PktRcvd InQ  Up/Down   State|Rcv/Act/Sent (Addr Family)
                      PktSent OutQ
-------------------------------------------------------------------------------
10.0.0.33
               65000        7    0 00h00m20s 0/0/0 (VpnIPv4)
                            7    0           0/0/0 (VpnIPv6)
-------------------------------------------------------------------------------

 

For Cisco IOS XR based network function XR3 we take the same sequence of the verification activities. Here are we have status of the interfaces at XR3:

1
2
3
4
5
6
7
8
RP/0/0/CPU0:XR3#show ipv4 int br
Sun Sep 30 21:04:49.178 UTC

Interface                      IP-Address      Status          Protocol Vrf-Name
Loopback0                      10.0.0.33       Up              Up       default
MgmtEth0/0/CPU0/0              192.168.1.111   Up              Up       MGMT    
GigabitEthernet0/0/0/0         10.11.33.33     Up              Up       default
GigabitEthernet0/0/0/1         unassigned      Shutdown        Down     default

 

Then we check status of OSPF adjacency:

1
2
3
4
5
6
7
8
9
10
11
12
13
RP/0/0/CPU0:XR3#show ospf neighbor            
Sun Sep 30 21:05:42.804 UTC

* Indicates MADJ interface
# Indicates Neighbor awaiting BFD session up

Neighbors for OSPF 0

Neighbor ID     Pri   State           Dead Time   Address         Interface
10.0.0.11       1     FULL/  -        00:00:35    10.11.33.11     GigabitEthernet0/0/0/0
    Neighbor is up for 00:02:07

Total neighbor count: 1

 

Then routing table:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
RP/0/0/CPU0:XR3#show route ipv4
Sun Sep 30 21:05:27.025 UTC

Codes: C - connected, S - static, R - RIP, B - BGP, (>) - Diversion path
       D - EIGRP, EX - EIGRP external, O - OSPF, IA - OSPF inter area
       N1 - OSPF NSSA external type 1, N2 - OSPF NSSA external type 2
       E1 - OSPF external type 1, E2 - OSPF external type 2, E - EGP
       i - ISIS, L1 - IS-IS level-1, L2 - IS-IS level-2
       ia - IS-IS inter area, su - IS-IS summary null, * - candidate default
       U - per-user static route, o - ODR, L - local, G  - DAGR, l - LISP
       A - access/subscriber, a - Application route
       M - mobile route, r - RPL, (!) - FRR Backup path

Gateway of last resort is not set

O    10.0.0.11/32 [110/1] via 10.11.33.11, 00:01:51, GigabitEthernet0/0/0/0
L    10.0.0.33/32 is directly connected, 00:01:53, Loopback0
C    10.11.33.0/24 is directly connected, 00:01:52, GigabitEthernet0/0/0/0
L    10.11.33.33/32 is directly connected, 00:01:52, GigabitEthernet0/0/0/0

 

And we are finishing with BGP peering for XR3:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
RP/0/0/CPU0:XR3#show bgp vpnv4 unicast summary
Sun Sep 30 21:05:12.386 UTC
BGP router identifier 10.0.0.33, local AS number 65000
BGP generic scan interval 60 secs
Non-stop routing is enabled
BGP table state: Active
Table ID: 0x0   RD version: 0
BGP main routing table version 1
BGP NSR Initial initsync version 1 (Reached)
BGP NSR/ISSU Sync-Group versions 0/0
BGP scan interval 60 secs

BGP is operating in STANDALONE mode.


Process       RcvTblVer   bRIB/RIB   LabelVer  ImportVer  SendTblVer  StandbyVer
Speaker               1          1          1          1           1           0

Neighbor        Spk    AS MsgRcvd MsgSent   TblVer  InQ OutQ  Up/Down  St/PfxRcd
10.0.0.11         0 65000       6       7        1    0    0 00:01:35          0

 

As you see, adjacency is established both for OSPF and BGP. Actually, we could have omitted check of the routing table as BGP check implicitly includes it: if IPv4 addresses of loopbacks aren’t propagated properly, BGP session won’t come up.

After we have verified state of the network, we will check the same parameter using operational YANG models, hence we’ll go for telemetry data.

Cisco IOS // Operational YANG modules

In one of the OpenConfig articles we have collected all the YANG modules available on the Cisco IOS XR devices. Besides OpenConfig YANG modules, what we have used, there are also a lot of native modules, which are either configuration modules or operational ones. The operational has word “oper” in its name like on the example below:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
$ ls -l ~/Yang/yang/vendor/cisco/xr/651/ | grep 'oper' | more
-rw-rw-r--. 1 aaa aaa  28159 Sep 22 17:53 Cisco-IOS-XR-aaa-diameter-oper-sub1.yang
-rw-rw-r--. 1 aaa aaa   2969 Sep 22 17:53 Cisco-IOS-XR-aaa-diameter-oper.yang
-rw-rw-r--. 1 aaa aaa   5728 Sep 22 17:53 Cisco-IOS-XR-aaa-locald-oper-sub1.yang
-rw-rw-r--. 1 aaa aaa   2982 Sep 22 17:53 Cisco-IOS-XR-aaa-locald-oper.yang
-rw-rw-r--. 1 aaa aaa   4166 Sep 22 17:53 Cisco-IOS-XR-aaa-nacm-oper-sub1.yang
-rw-rw-r--. 1 aaa aaa   2915 Sep 22 17:53 Cisco-IOS-XR-aaa-nacm-oper.yang
-rw-rw-r--. 1 aaa aaa  13380 Sep 22 17:53 Cisco-IOS-XR-aaa-protocol-radius-oper-sub1.yang
-rw-rw-r--. 1 aaa aaa  15813 Sep 22 17:53 Cisco-IOS-XR-aaa-protocol-radius-oper-sub2.yang
-rw-rw-r--. 1 aaa aaa   5523 Sep 22 17:53 Cisco-IOS-XR-aaa-protocol-radius-oper.yang
-rw-rw-r--. 1 aaa aaa   4891 Sep 22 17:53 Cisco-IOS-XR-aaa-tacacs-oper-sub1.yang
-rw-rw-r--. 1 aaa aaa   1504 Sep 22 17:53 Cisco-IOS-XR-aaa-tacacs-oper.yang
!
! FURTHER OUTPUT IS OMITTED

 

To understand what is inside, we can take an example of some module and render it using “pyang”:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
$ pyang -f tree -p ~/Yang/yang/vendor/cisco/xr/612/ ~/Yang/yang/vendor/cisco/xr/612/Cisco-IOS-XR-ifmgr-oper.yang
/home/aaa/Yang/yang/vendor/cisco/xr/612/Cisco-IOS-XR-ifmgr-oper-sub2.yang:9: warning: imported module Cisco-IOS-XR-types not used
module: Cisco-IOS-XR-ifmgr-oper
  +--ro interface-dampening
  |  +--ro interfaces
  |  |  +--ro interface* [interface-name]
  |  |     +--ro if-dampening
  |  |     |  +--ro interface-dampening
  |  |     |  |  +--ro penalty?                 uint32
  |  |     |  |  +--ro is-suppressed-enabled?   boolean
  |  |     |  |  +--ro seconds-remaining?       uint32
  |  |     |  |  +--ro flaps?                   uint32
  |  |     |  |  +--ro state?                   Im-state-enum
  |  |     |  +--ro state-transition-count?       uint32
  |  |     |  +--ro last-state-transition-time?   uint32
  |  |     |  +--ro is-dampening-enabled?         boolean
  |  |     |  +--ro half-life?                    uint32
  |  |     |  +--ro reuse-threshold?              uint32
  |  |     |  +--ro suppress-threshold?           uint32
  |  |     |  +--ro maximum-suppress-time?        uint32
  |  |     |  +--ro restart-penalty?              uint32
  |  |     |  +--ro capsulation* []
  |  |     |     +--ro capsulation-dampening
  |  |     |     |  +--ro penalty?                 uint32
  |  |     |     |  +--ro is-suppressed-enabled?   boolean
  |  |     |     |  +--ro seconds-remaining?       uint32
  |  |     |     |  +--ro flaps?                   uint32
  |  |     |     |  +--ro state?                   Im-state-enum
  |  |     |     +--ro capsulation-number?      string
  |  |     +--ro interface-name    xr:Interface-name
  |  +--ro nodes
!
! FURTHER OUTPUT IS OMITTED

 

As you might spot, all the nodes in this YANG module have “ro” attribute, meaning they are read only and we only can read data from them, not to configure. For each configuration YANG module there is typically operational module available.

Cisco IOS XR // Model-driven telemetry algorithm

As we’ve said in the beginning, we’ll try to collect information about network state in YANG operational modules using NETCONF. To do that we’ll use “netconf_get” Ansible module, which we have covered earlier. The algorithm is quite straightforward:

  • We define some so called “configuration profiles” we’d like to collect telemetry data about. In our case they are: interfaces, OSPF routing and BGP routing
  • Per each profile we have dedicated NETCONF request as Cisco IOS XR uses different YANG modules for different operational data
  • Fetch telemetry data using NETCONF/YANG
  • Parse data to extract some information

The last point is a bit synthetic in this article, as in real telemetry scenario you will use some messaging bus as Kafka for example, which will store those data, so you won’t need to parse it in that way. But about Kafka will talk in the separate article.

Cisco IOS XR // Ansible playbook for telemetry collection

To implement the algorithm above, our playbook using roles has the following structure:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
+--ansible
   +--136_lab.yml
   +--group_vars
   |  +--cisco
   |  |  +--cisco_host.yml
   |  +--nokia
   |     +--nokia_host.yml
   +--roles
      +--cisco
      |  +--136_lab
      |     +--tasks
      |     |  +--comapring_loop.yml
      |     |  +--main.yml
      |     +--templates
      |     |  +--cisco_telemetry_interfaces.j2
      |     |  +--cisco_telemetry_routing_bgp.j2
      |     |  +--cisco_telemetry_routing_ospf.j2
      |     |  +--netconf_request.j2
      |     +--vars
      |        +--XR3_infra_profile.yml
      +--nokia
         +--136_lab
            +--tbd

 

The master Ansible playbook has very easy structure:

1
2
3
4
5
6
7
8
9
10
11
12
$ cat 136_lab.yml
---
- hosts
: cisco
  connection
: netconf
  roles
:
    - { role
: cisco/136_lab }

- hosts
: nokia
  connection
: netconf
  roles
:
    - { role
: nokia/136_lab }
...

 

Same structure we have used more or less with all of our Ansible playbook with roles.

The “group_vars” for Cisco IOS XR contains authentication data and some other general parameters:

1
2
3
4
5
6
7
$ cat group_vars/cisco/cisco_host.yml
---
ansible_network_os
: iosxr
ansible_user
: cisco
ansible_pass
: cisco
ansible_ssh_pass
: cisco
...

 

Before we go deeper into details of playbooks, let’s take a look into file with “vars” for Cisco IOS XR based network function XR3:

1
2
3
4
5
6
7
8
9
10
11
12
$ cat roles/cisco/136_lab/vars/XR3_infra_profile.yml
---
node
:
    hostname
: XR3
    vendor
: cisco
    os
: iosxr
    version
: 6.1.2
    configuration_profiles
:
        - profile
: interfaces
        - profile
: routing_ospf
        - profile
: routing_bgp
...

 

As we have mentioned above, you can see here different configuration profiles, which are used for checking telemetry.

In reality, initial configuration was also done using these configuration profiles based configuration YANG modules, but this is out of scope for this article.

Now, it’s time to take a look into working horses of this automation using Ansible for telemetry, that is actual playbooks with tasks. Let’ start with “main.yml”:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
$ cat roles/cisco/136_lab/tasks/main.yml
---
- name
: VERIFICATION // {{ inventory_hostname }} // IMPORTING INFRASTRUCTURE PROFILE
  include_vars
:
      file
: "{{ inventory_hostname }}_infra_profile.yml"
      name
: PROFILE

- name
: VERIFICATION // {{ inventory_hostname }} // DELETE PREVIOUS TEST REPORT
  file
:
      dest
: /tmp/{{ inventory_hostname }}_test_report.txt
      state
: absent
  ignore_errors
: yes

- name
: VERIFICATION // {{ inventory_hostname }} // CREATING TEST REPORT
  file
:
      dest
: /tmp/{{ inventory_hostname }}_test_report.txt
      state
: touch

- name
: VERIFICATION // {{ inventory_hostname }} // COLLECTING TELEMETRY AND SEARCHING DATA
  include_tasks
: comparing_loop.yml
  loop
: "{{ PROFILE.node.configuration_profiles }}"

- name
: VERIFICATION // {{ inventory_hostname }} // COMPILING REPORT
  shell
: "cat /tmp/temp_report_136_{{ inventory_hostname }}_* > /tmp/{{ inventory_hostname }}_test_report.txt"

- name
: VERIFICATION // {{ inventory_hostname }} // REPORTING READINESS
  debug
:
      msg
: "Collection of telemetry data from {{ inventory_hostname }} is done."
...

 

Take a look at article about Ansible roles, if you have questions on this structure.

In the playbook above we do the following actions:

  • We import the file with variables, which contains configuration profiles to check
  • We try to delete previous file with report. Even if it doesn.t exist, we go further
  • We collect telemetry for each profile (looping per profile name) using external playbook with tasks called “comparing_loop.yml”
  • We combine all the files with reports in a single report
  • We send message that everything is done

You might have spot that the crucial part is concentrated in the 3rd task, where we collect telemetry using another playbook, hence the next obvious step is to review that playbook:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
$ cat roles/cisco/136_lab/tasks/comparing_loop.yml
---
- name
: VERIFICATION // {{ inventory_hostname }} // COLLECTING TELEMETRY AND SEARCHING DATA // FETCHING TELEMETRY DATA
  netconf_get
:
      filter
: "{{ lookup ('template', 'netconf_request.j2') }}"
      display
: json
  register
: output_json

- name
: VERIFICATION // {{ inventory_hostname }} // COLLECTING TELEMETRY AND SEARCHING DATA // SAVING TELEMETRY DATA
  copy
:
      content
: "{{ output_json.output | to_nice_json }}"
      dest
: /tmp/{{ inventory_hostname }}/{{ inventory_hostname }}_{{ item.profile }}_yang_telemetry.json

- name
: VERIFICATION // {{ inventory_hostname }} // COLLECTING TELEMETRY AND SEARCHING DATA // MODIFIICATION OF COLLECTED TELEMETRY FOR PYTHON PROCESSING
  replace
:
      path
: /tmp/{{ inventory_hostname }}/{{ inventory_hostname }}_{{ item.profile }}_yang_telemetry.json
      regexp
: '-'
      replace
: '_'

- name
: VERIFICATION // {{ inventory_hostname }} // COLLECTING TELEMETRY AND SEARCHING DATA // IMPORTING COLLECTED TELEMETRY DATA
  include_vars
:
      file
: /tmp/{{ inventory_hostname }}/{{ inventory_hostname }}_{{ item.profile }}_yang_telemetry.json
      name
: COLLECTED

- name
: VERIFICATION // {{ inventory_hostname }} // COLLECTING TELEMETRY AND SEARCHING DATA // COMPILING {{ item.profile }}
  template
:
      src
: cisco_telemetry_{{ item.profile }}.j2
      dest
: /tmp/temp_report_136_{{ inventory_hostname }}_{{ item.profile }}.txt
      mode
: 0755
...

 

In this playbook we have the same number of tasks, as in the previous one, so it’s quite big. I guess, you want to know, what exactly we are doing here. Here we go:

  • We send the RPC message over NETCONF to collect live data for particular operational YANG modules. The proper request is chosen from the template “netconf_request.j2” based on the name of the configuration profile
  • We save output of the previous command to “/tmp” folder so that we can review details later on.
  • We replace all “-“ with “_” in saved telemetry output, as it’s impossible in Python for variable to contain “-“ in its name.
  • We import back saved telemetry data as variables.
  • We look for some interesting for us parameters in telemetry based on some predefined (by ourselves) template.

As a lot of real action is dependent on files with templates, we need to understand them. The first template called “netconf_request.j2” is used for creating proper NETCONF request:

1
2
3
4
5
6
7
8
$ cat roles/cisco/136_lab/templates/netconf_request.j2
{% if item.profile == 'interfaces' %}
<interface-properties xmlns="http://cisco.com/ns/yang/Cisco-IOS-XR-ifmgr-oper"/>
{% elif item.profile == 'routing_ospf' %}
<ospf xmlns="http://cisco.com/ns/yang/Cisco-IOS-XR-ipv4-ospf-oper"/>
{% elif item.profile == 'routing_bgp' %}
<bgp xmlns="http://cisco.com/ns/yang/Cisco-IOS-XR-ipv4-bgp-oper"/>
{% endif %}

 

Depending on the name of the configuration profile, we create proper request. The namespace and the name of the block we check within file with the operational YANG modules. The following example for interfaces:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
cat ~/Yang/yang/vendor/cisco/xr/612/Cisco-IOS-XR-ifmgr-oper.yang | grep 'namespace\|container'
  namespace "http://cisco.com/ns/yang/Cisco-IOS-XR-ifmgr-oper";
    container interfaces {
  container interface-dampening {
    container interfaces {
        container if-dampening {
    container nodes {
        container show {
          container dampening {
            container if-handles {
            container interfaces {
  container interface-properties {
    container data-nodes {
        container locationviews {
        container pq-node-locations {
        container system-view {

 

For XML schema we take the parameter of the “namespace” from the corresponding YANG module. For the request itself we take the name of inter interesting high-level container. In the example above we have 2 high-level containers: “interface-dampening” and “interface-properties”. We are interested only in the second one, so we put its name.

The rest of the templates are used to extract some information from the telemetry. As an example, for interfaces we check the status of the ports:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
$ cat roles/cisco/136_lab/templates/cisco_telemetry_interfaces.j2
+------------------------------------------+
|     Checking of the interfaces status    |
+------------------------------------------+

{% for port_current in COLLECTED.rpc_reply.data.interface_properties.data_nodes.data_node.locationviews.locationview.interfaces.interface %}
    Port:           {{ port_current.interface_name }}
    Status:         {{ port_current.actual_line_state }}

{% endfor %}

===========================================
    Verification of interfaces is done
===========================================

 

For OSPF routing we check the status of the neighbors:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
$ cat roles/cisco/136_lab/templates/cisco_telemetry_routing_ospf.j2
+------------------------------------------+
|       Checking of the OSPF status        |
+------------------------------------------+

{% if COLLECTED.rpc_reply.data.ospf.processes.process.default_vrf.adjacency_information is defined %}
    Neighbor:
      RID:          {{ COLLECTED.rpc_reply.data.ospf.processes.process.default_vrf.adjacency_information.neighbors.neighbor.neighbor_id }}
      IP:           {{ COLLECTED.rpc_reply.data.ospf.processes.process.default_vrf.adjacency_information.neighbors.neighbor.neighbor_address_xr }}
      Connected to: {{ COLLECTED.rpc_reply.data.ospf.processes.process.default_vrf.adjacency_information.neighbors.neighbor.interface_name }}
      Status:       {{ COLLECTED.rpc_reply.data.ospf.processes.process.default_vrf.adjacency_information.neighbors.neighbor.neighbor_state }}


{% else %}
After software update:
    There is no OSPF neighbors detected

{% endif %}

===========================================
    Verification of OSPF neighbors is done
===========================================

 

And for the BGP routing, we check the status of the BGP peers:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
$ cat roles/cisco/136_lab/templates/cisco_telemetry_routing_bgp.j2
+------------------------------------------+
|        Checking of the BGP status        |
+------------------------------------------+

{% if COLLECTED.rpc_reply.data.bgp.instances is defined %}
    Neighbor:
      ID:          {{ COLLECTED.rpc_reply.data.bgp.instances.instance.instance_active.default_vrf.neighbors.neighbor.neighbor_address }}
      State:       {{ COLLECTED.rpc_reply.data.bgp.instances.instance.instance_active.default_vrf.neighbors.neighbor.connection_state }}
      AFI/SAFI:    {% for afi in COLLECTED.rpc_reply.data.bgp.instances.instance.instance_active.default_vrf.neighbors.neighbor.af_data %}{{ afi.af_name }} {% endfor %}


{% else %}
    Problem with BGP process on local node ({{ inventory_hostname }})

{% endif %}

===========================================
    Verification of BGP neighbors is done
===========================================

 

When the preparation is done meaning all the playbooks, variables and templates are defined, we can test them.

Cisco IOS // Collecting telemetry via NETCONF/YANG with Ansible

For now the names of the variable from telemetry don’t say you anything, so let’s execute our Ansible playbook to collect telemetry:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
$ ansible-playbook 136_lab.yml --limit=XR3

PLAY [cisco] ***********************************************************************************************************************************************************

TASK [Gathering Facts] *************************************************************************************************************************************************
ok: [XR3]

TASK [cisco/136_lab : VERIFICATION // XR3 // IMPORTING INFRASTRUCTURE PROFILE] *****************************************************************************************
ok: [XR3]

TASK [cisco/136_lab : VERIFICATION // XR3 // DELETE PREVIOUS TEST REPORT] **********************************************************************************************
changed: [XR3]

TASK [cisco/136_lab : VERIFICATION // XR3 // CREATING TEST REPORT] *****************************************************************************************************
changed: [XR3]

TASK [cisco/136_lab : VERIFICATION // XR3 // COLLECTING TELEMETRY AND SEARCHING DATA] **********************************************************************************
included: /home/aaa/ansible/roles/cisco/136_lab/tasks/comparing_loop.yml for XR3
included: /home/aaa/ansible/roles/cisco/136_lab/tasks/comparing_loop.yml for XR3
included: /home/aaa/ansible/roles/cisco/136_lab/tasks/comparing_loop.yml for XR3

TASK [cisco/136_lab : VERIFICATION // XR3 // COLLECTING TELEMETRY AND SEARCHING DATA // FETCHING TELEMETRY DATA] *******************************************************
ok: [XR3]

TASK [cisco/136_lab : VERIFICATION // XR3 // COLLECTING TELEMETRY AND SEARCHING DATA // SAVING TELEMETRY DATA] *********************************************************
changed: [XR3]

TASK [cisco/136_lab : VERIFICATION // XR3 // COLLECTING TELEMETRY AND SEARCHING DATA // MODIFIICATION OF COLLECTED TELEMETRY FOR PYTHON PROCESSING] ********************
changed: [XR3]

TASK [cisco/136_lab : VERIFICATION // XR3 // COLLECTING TELEMETRY AND SEARCHING DATA // IMPORTING COLLECTED TELEMETRY DATA] ********************************************
ok: [XR3]

TASK [cisco/136_lab : VERIFICATION // XR3 // COLLECTING TELEMETRY AND SEARCHING DATA // COMPILING interfaces] **********************************************************
ok: [XR3]

TASK [cisco/136_lab : VERIFICATION // XR3 // COLLECTING TELEMETRY AND SEARCHING DATA // FETCHING TELEMETRY DATA] *******************************************************
ok: [XR3]

TASK [cisco/136_lab : VERIFICATION // XR3 // COLLECTING TELEMETRY AND SEARCHING DATA // SAVING TELEMETRY DATA] *********************************************************
changed: [XR3]

TASK [cisco/136_lab : VERIFICATION // XR3 // COLLECTING TELEMETRY AND SEARCHING DATA // MODIFIICATION OF COLLECTED TELEMETRY FOR PYTHON PROCESSING] ********************
changed: [XR3]

TASK [cisco/136_lab : VERIFICATION // XR3 // COLLECTING TELEMETRY AND SEARCHING DATA // IMPORTING COLLECTED TELEMETRY DATA] ********************************************
ok: [XR3]

TASK [cisco/136_lab : VERIFICATION // XR3 // COLLECTING TELEMETRY AND SEARCHING DATA // COMPILING routing_ospf] ********************************************************
changed: [XR3]

TASK [cisco/136_lab : VERIFICATION // XR3 // COLLECTING TELEMETRY AND SEARCHING DATA // FETCHING TELEMETRY DATA] *******************************************************
ok: [XR3]

TASK [cisco/136_lab : VERIFICATION // XR3 // COLLECTING TELEMETRY AND SEARCHING DATA // SAVING TELEMETRY DATA] *********************************************************
changed: [XR3]

TASK [cisco/136_lab : VERIFICATION // XR3 // COLLECTING TELEMETRY AND SEARCHING DATA // MODIFIICATION OF COLLECTED TELEMETRY FOR PYTHON PROCESSING] ********************
changed: [XR3]

TASK [cisco/136_lab : VERIFICATION // XR3 // COLLECTING TELEMETRY AND SEARCHING DATA // IMPORTING COLLECTED TELEMETRY DATA] ********************************************
ok: [XR3]

TASK [cisco/136_lab : VERIFICATION // XR3 // COLLECTING TELEMETRY AND SEARCHING DATA // COMPILING routing_bgp] *********************************************************
changed: [XR3]

TASK [cisco/136_lab : VERIFICATION // XR3 // COMPILING REPORT] *********************************************************************************************************
changed: [XR3]

TASK [cisco/136_lab : VERIFICATION // XR3 // REPORTING READINESS] ******************************************************************************************************
ok: [XR3] => {
    "msg": "Collection of telemetry data from XR3 is done."
}

PLAY [nokia] ***********************************************************************************************************************************************************
skipping: no hosts matched

PLAY RECAP *************************************************************************************************************************************************************
XR3                        : ok=24   changed=11   unreachable=0    failed=0

 

After the execution of the playbook is finished, we can check the results:

1
2
3
4
$ ls -l /tmp/XR3 | grep 'XR3'
-rw-rw-r--. 1 aaa aaa  17166 Oct  2 07:55 XR3_interfaces_yang_telemetry.json
-rw-rw-r--. 1 aaa aaa 102808 Oct  2 07:56 XR3_routing_bgp_yang_telemetry.json
-rw-rw-r--. 1 aaa aaa 237501 Oct  2 07:56 XR3_routing_ospf_yang_telemetry.json

 

These 3 files contain telemetry data, what we have requested: interfaces, OSPF and BGP. To be honest, they are very huge, especially BGP’s telemetry. So I provide you some snippet from interface level telemetry:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
$ cat /tmp/XR3/XR3_interfaces_yang_telemetry.json
{
    "rpc_reply": {
        "data": {
            "interface_properties": {
                "data_nodes": {
                    "data_node": {
                        "data_node_name": "0/0/CPU0",
                        "locationviews": {
                            "locationview": {
                                "interfaces": {
                                    "interface": [
                                        {
                                            "actual_line_state": "im_state_up",
                                            "actual_state": "im_state_up",
                                            "bandwidth": "0",
                                            "encapsulation": "fint_base",
                                            "encapsulation_type_string": "FINT_BASE_CAPS",
                                            "interface": "FINT0/0/CPU0",
                                            "interface_name": "FINT0/0/CPU0",
                                            "l2_transport": "false",
                                            "line_state": "im_state_up",
                                            "mtu": "8000",
                                            "state": "im_state_up",
                                            "sub_interface_mtu_overhead": "0",
                                            "type": "IFT_FINT_INTF"
                                        },
                                        {
                                            "actual_line_state": "im_state_up",
                                            "actual_state": "im_state_up",
                                            "bandwidth": "1000000",
                                            "encapsulation": "ether",
                                            "encapsulation_type_string": "ARPA",
                                            "interface": "GigabitEthernet0/0/0/0",
                                            "interface_name": "GigabitEthernet0/0/0/0",
                                            "l2_transport": "false",
                                            "line_state": "im_state_up",
                                            "mtu": "1514",
                                            "state": "im_state_up",
                                            "sub_interface_mtu_overhead": "0",
                                            "type": "IFT_GETHERNET"
                                        },
                                        {
                                            "actual_line_state": "im_state_admin_down",
                                            "actual_state": "im_state_admin_down",
                                            "bandwidth": "1000000",
                                            "encapsulation": "ether",
                                            "encapsulation_type_string": "ARPA",
                                            "interface": "GigabitEthernet0/0/0/1",
                                            "interface_name": "GigabitEthernet0/0/0/1",
                                            "l2_transport": "false",
                                            "line_state": "im_state_admin_down",
                                            "mtu": "1514",
                                            "state": "im_state_admin_down",
                                            "sub_interface_mtu_overhead": "0",
                                            "type": "IFT_GETHERNET"
                                        },
!
! FURTHER OUTPUT IS OMITTED

 

Actually all the information you can collect through various show commands are available in JSON format in telemetry. For OSPF and BGP there is MUCH more information, what you can review. Like for OSPF you will have even all LSAs from LSDB in telemetry output.

The last point in verification for Cisco will be to take a look onto test report, where we check:

  • State of the interfaces
  • Status of the OSPF neighbors
  • Status of the BGP peering

As we have explained earlier, we create the test report per configuration profile and then merge together:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
$ cat /tmp/XR3_test_report.txt
+------------------------------------------+
|     Checking of the interfaces status    |
+------------------------------------------+

    Port:           FINT0/0/CPU0
    Status:         im_state_up

    Port:           GigabitEthernet0/0/0/0
    Status:         im_state_up

    Port:           GigabitEthernet0/0/0/1
    Status:         im_state_admin_down

    Port:           Loopback0
    Status:         im_state_up

    Port:           MgmtEth0/0/CPU0/0
    Status:         im_state_up

    Port:           Null0
    Status:         im_state_up

    Port:           nV_Loopback0
    Status:         im_state_up

    Port:           nV_Loopback1
    Status:         im_state_up


===========================================
    Verification of interfaces is done
===========================================
+------------------------------------------+
|        Checking of the BGP status        |
+------------------------------------------+

    Problem with BGP process on local node (XR3)


===========================================
    Verification of BGP neighbors is done
===========================================

+------------------------------------------+
|       Checking of the OSPF status        |
+------------------------------------------+

    Neighbor:
      RID:          10.0.0.11
      IP:           10.11.33.11
      Connected to: GigabitEthernet0/0/0/0
      Status:       mgmt_nbr_full



===========================================
    Verification of OSPF neighbors is done
===========================================

 

I think, you have spotted that BGP has the problems in the output. The reason for that is that 2 GB RAM in my virtual lab is not sufficient for telemetry, and NETCONF/YANG request to collect BGP telemetry data pushes the memory state on the virtual network function XR3 to critical state and BGP process is broken afterwards. Nevertheless, if we increase the amount of available memory, then it’s OK. So, that’s how model-driven telemetry looks like on Cisco IOS XR. Here you can download the ansible playbooks for this article: 136_lab.tar

Lessons learned

As we have learned that collection of the telemetry data via NETCONF is quite resource intensive task, so it shouldn’t be used in production. We advise you to go for gRPC/gNMI to stream telemetry data.

Conclusion

For the sake of brevity I have reduced the output of the provided commands, but the amount of the information contained in the telemetry is just overwhelming. It’s really new oil in the networking, as this telemetry information allows you to build any kind of business logic. Later on we’ll review Kafka, so you will get much more understanding how the whole solution is working, but even now you can assess the amount of information in model-driven telemetry. In the nxt article we’ll take a look on model-driven telemetry in Nokia SR OS. Take care and good bye!

Support us





P.S.

If you have further questions or you need help with your networks, I’m happy to assist you, just send me message (http://karneliuk.com/contact/). Also don’t forget to share the article on your social media, if you like it.

BR,

Anton Karneliuk