Hello my friend,
In the previous blogpost we’ve introduced the Batfish and how to set it up. Today we’ll take a look how to perform the analysis of the configuration to figure out discrepancies, which may lead to broken operation of your network.
1
2
3
4
5 No part of this blogpost could be reproduced, stored in a
retrieval system, or transmitted in any form or by any
means, electronic, mechanical or photocopying, recording,
or otherwise, for commercial purposes without the
prior permission of the author.
Network Analysis as Part of Automation?
In software development we have a concept called CI/CD (Continuous Integration/Continuous Delivery). In a nutshell, it’s a methodology, which incorporates mandatory testing of configuration (code, software version, etc) before bringing it to production. The main idea behind it is that automated testing and validation will make sure that code is stable and fit for purpose. Automated testing? That’s where the automation comes to the stage.
And automation is something what we are experts in. And you can benefit from that expertise as well.
In our network automation training we follow zero to hero approach, where we start with the basics including Linux operation and administration topped with KVM, Docker and Git, and gradually progressing through data models (YANG) and encodings (XML, JSON, YAML, Protobuf) to protocols (SSH, NETCONF, RESTCONF, GNMI) utilisation with Ansible, Bash and Python integrating your scripts with NetBox via REST API. All this happens in context of the real life examples and multi vendor network built with Cisco, Nokia, Arista and Cumulus.
That is the reason why leading service providers and network vendors are training with to learn their staff with the real life network automation knowledge, skills and techniques in the multivendor environment. So you can.
Brief Description
Batfish allows you to parse the configuration from various vendors, and bring it to the vendor-agnostic format, similar to what NAPALM does. That simplifies the configuration analysis across different vendors. On the one hand, such analysis is less useful compared to the analysis of the real state of the network. On the other hand, it may be coupled with the change management process, where the following workflow can be implemented:
- Original configuration is being analysed
- New configuration is being analysed
- Old and new configurations are compared one to another
In the later blogposts we’ll share some insights, how you can implement such a workflow.
Today we’ll focus on how to programmatically check the configuration of your network devices using Batfish. As various networks have different network configuration (e.g., routing protocols), we’ll focus on the elements, which you can find nowadays in data centres and service provider networks:
- BGP for the underlay topology
- Routes in Route Table or in BGP-RIB
Those (and some more) are the configuration elements we are going to analyse in this blogpost.
Lab Setup
This blogpost is a continuation of a previous one, which describes the setup of the lab as well as the lab topology.
Usage
As said above, there might a lot of different network protocols that you have in your network. At the moment, Batfish supports only BGP and OSPF (sadly enough, it doesn’t support ISIS, which is very popular in service provider networks). We believe that at the moment BGP is the best way to build data centres, if your equipment doesn’t support RIFT yet. Therefore, in our blogpost we’ll focus on the BGP configuration and networking based on it.
The configuration for the tests you can find in our GitHub repo.
#1. Analysing Used and Unused Configuration Items
However, before we start talking about BGP configuration, let’s do a sanity check. I.e., we will answer the question is all the configuration we have on our network devices is used. If you run a big network, and it is not fully automated, you soon might end up having unused configuration structures (e.g., access lists, prefix lists, route policies, and others). Batfish has some questions, which allows you find and trace used structures and, what is more important, unused structures (e.g., prefix list is created, but not used anywhere) or references for the non-created structures.
#1.1. Unused Structures
Let’s check first of all if don’t have any unused structures in our network. The following code we are using for that:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32 $ cat main.py
#!/usr/bin/env python
# Modules
from pybatfish.client.commands import bf_init_snapshot, bf_session
from pybatfish.question.question import load_questions
from pybatfish.question import bfq
import os
# Variables
bf_address = "127.0.0.1"
snapshot_path = "./snapshots/nat"
output_dir = "./output"
# Body
if __name__ == "__main__":
# Setting host to connect
bf_session.host = bf_address
# Loading confgs and questions
bf_init_snapshot(snapshot_path, overwrite=True)
load_questions()
# Running questions
r1 = bfq.unusedStructures().answer().frame()
print(r1)
# Saving output
if not os.path.exists(output_dir):
os.mkdir(output_dir)
r1.to_csv(f"{output_dir}/r1.csv")
Check the previous blogpost to get the details about the Python script.
So the key here is the question bfq.unusedStructures(), which parses the configuration of the devices trying to find if any resource is unused. In the clean configuration, there shall be no unused resources:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28 $ python main.py
status: CHECKINGSTATUS
.... no task information
status: ASSIGNED
.... 2021-06-26 16:53:08.287000+01:00 Deserializing objects of type 'org.batfish.vendor.VendorConfiguration' from files 2 / 4.
status: TERMINATEDNORMALLY
.... 2021-06-26 16:53:08.287000+01:00 Deserializing objects of type 'org.batfish.datamodel.Configuration' from files 3 / 3.
Default snapshot is now set to ss_dc5b9186-5c37-4c75-9de9-e5e0f054aee2
status: ASSIGNED
.... no task information
status: TERMINATEDNORMALLY
.... 2021-06-26 16:53:08.846000+01:00 Parse environment BGP tables.
Your snapshot was successfully initialized but Batfish failed to fully recognized some lines in one or more input files. Some unrecognized configuration lines are not uncommon for new networks, and it is often fine to proceed with further analysis. You can help the Batfish developers improve support for your network by running:
bf_upload_diagnostics(dry_run=False, contact_info='<optional email address>')
to share private, anonymized information. For more information, see the documentation with:
help(bf_upload_diagnostics)
Successfully loaded 67 questions from remote
Successfully loaded 67 questions from remote
status: ASSIGNED
.... no task information
status: TERMINATEDNORMALLY
.... 2021-06-26 16:53:09.349000+01:00 Begin job.
Empty DataFrame
Columns: [Structure_Type, Structure_Name, Source_Lines]
Index: []
The last three lines says us that, there DataFrame is empty. It says what are the columns available, but none of them are filled in. Let’s analyse what would be the output, if the config is broken. So, first, take a look on the working config at Cumulus Linux based device:
1
2
3
4
5
6
7
8
9 $ cat VX1.cfg
! OUTPUT IS TRUNCATED FOR BREVITY
!
ip prefix-list PL_LO seq 5 permit 10.0.255.0/24 ge 32
route-map RP_PASS_LO permit 10
match ip address prefix-list PL_LO
route-map RP_PASS_LO deny 9999
!
! OUTPUT IS TRUNCATED FOR BREVITY
Now, let’s break it by changing the prefix list name from PL_LO to PL_LO_1 in the name of the list, yet retaining the old name in the route policy:
1
2
3
4
5
6
7
8
9 $ cat VX1.cfg
! OUTPUT IS TRUNCATED FOR BREVITY
!
ip prefix-list PL_LO_1 seq 5 permit 10.0.255.0/24 ge 32
route-map RP_PASS_LO permit 10
match ip address prefix-list PL_LO
route-map RP_PASS_LO deny 9999
!
! OUTPUT IS TRUNCATED FOR BREVITY
Let’s rerun our script now:
1
2
3
4
5
6
7
8
9
10 $ python main.py
!
! OUTPUT IS TRUNCATED FOR BREVITY
!
status: ASSIGNED
.... no task information
status: TERMINATEDNORMALLY
.... 2021-06-26 16:58:41.501000+01:00 Begin job.
Structure_Type Structure_Name Source_Lines
0 ip_prefix_list PL_LO_1 configs/VX1.cfg:[108]
You see now in the output of the script now that file configs/VX1.cfg has unused structure, which is having:
- Type: ip prefix list
- name: PL_LO_1
- It is located on the string 108 of the config file configs/VX1.cfg
Let’s see the screen shot of the config to validate the position:
We have repeated the test with Batfish against Cisco IOS XR based and Arista EOS based devices (tweaking the prefix list name and can confirm Batfish can catch those changes:
1
2
3
4
5
6
7
8
9
10
11
12
13 $ python main.py
!
! OUTPUT IS TRUNCATED FOR BREVITY
!
status: ASSIGNED
.... no task information
status: TERMINATEDNORMALLY
.... 2021-06-26 17:03:50.856000+01:00 Begin job.
Structure_Type Structure_Name Source_Lines
0 ipv4 prefix-list PL_LO_1 configs/EOS1.cfg:[51]
1 ip_prefix_list PL_LO_1 configs/VX1.cfg:[108]
2 prefix-set PS_LO_1 configs/XR1.cfg:[34, 35, 36]
(venv) HANP272:batfish anton.karneliuk$
#1.2. Undefined References
Wait a second… If we have changed the name of the prefix-list (for Cumulus Linux and Arista EOS) and the name of the prefix-set (for Cisco IOS XR), it means we have a references in the route maps and route policies respectively to the non-existing structures. Can the Batifsh catch that as well? Yes, we raise a corresponding question:
1
2
3
4
5
6
7
8
9
10
11
12
13 $ cat main.py
!
! OUTPUT IS TRUNCATED FOR BREVITY
!
# Running questions
r1 = bfq.unusedStructures().answer().frame()
print(r1)
r2 = bfq.undefinedReferences().answer().frame()
print(r2)
!
! OUTPUT IS TRUNCATED FOR BREVITY
!
So the new question we’ve added is bfq.undefinedReferences(), which allows to find such references. Let’s verify its operation:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20 $ python main.py
!
! OUTPUT IS TRUNCATED FOR BREVITY
!
status: ASSIGNED
.... no task information
status: TERMINATEDNORMALLY
.... 2021-06-26 17:10:28.210000+01:00 Begin job.
Structure_Type Structure_Name Source_Lines
0 ipv4 prefix-list PL_LO_1 configs/EOS1.cfg:[51]
1 ip_prefix_list PL_LO_1 configs/VX1.cfg:[108]
2 prefix-set PS_LO_1 configs/XR1.cfg:[34, 35, 36]
status: ASSIGNED
.... no task information
status: TERMINATEDNORMALLY
.... 2021-06-26 17:10:28.513000+01:00 Begin job.
File_Name Struct_Type Ref_Name Context Lines
0 configs/EOS1.cfg ipv4 prefix-list PL_LO route-map match ipv4 prefix-list configs/EOS1.cfg:[58]
1 configs/VX1.cfg ip_prefix_list PL_LO route-map match ip prefix-list configs/VX1.cfg:[110]
2 configs/XR1.cfg prefix-set PS_LO route-policy prefix-set configs/XR1.cfg:[39]
We now have two outputs in our Python script, which interacts with Batish:
- The first shows unused structures (their names, types and position in code)
- The second shows the references to the non-existing structures (they names, types and where they are called in the code)
Both those two checks allow you to stop the issue with the configuration either at the stage you are doing the change or when you are auditing/troubleshooting the existing network.
As said earlier, in the healthy network both of those data frames must be empty. Therefore, once we change back the name of the IP prefix-lists and prefix-sets from PL_LO_1 to PL_LO (and PS_LO_1 to PS_LO), there will be no configuration errors found.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18 $ python main.py
!
! OUTPUT IS TRUNCATED FOR BREVITY
!
status: ASSIGNED
.... no task information
status: TERMINATEDNORMALLY
.... 2021-06-26 17:52:55.849000+01:00 Begin job.
Empty DataFrame
Columns: [Structure_Type, Structure_Name, Source_Lines]
Index: []
status: ASSIGNED
.... no task information
status: TERMINATEDNORMALLY
.... 2021-06-26 17:52:56.129000+01:00 Begin job.
Empty DataFrame
Columns: [File_Name, Struct_Type, Ref_Name, Context, Lines]
Index: []
As you could see, the analysis shows now that there is no unused configuration elements or references to non-existing entries.
#1.3 All Structures
How does Batfish figure out, what are the structures available? You can ask it using the following two questions:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19 $ cat main.py
!
! OUTPUT IS TRUNCATED FOR BREVITY
!
# Running questions
r1 = bfq.unusedStructures().answer().frame()
print(r1)
r2 = bfq.undefinedReferences().answer().frame()
print(r2)
r3 = bfq.namedStructures().answer().frame()
print(r3)
r4 = bfq.definedStructures().answer().frame()
print(r4)
!
! OUTPUT IS TRUNCATED FOR BREVITY
!
The question namedStructures() allows to see the details of the created entries (i.e., their content), whereas the definedStructures() shows, where those structures are created in the configuration files.
Let’s run this test:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66 $ python main.py
!
! OUTPUT IS TRUNCATED FOR BREVITY
!
status: CHECKINGSTATUS
.... no task information
status: TERMINATEDNORMALLY
.... 2021-06-26 18:00:46.584000+01:00 Begin job.
Empty DataFrame
Columns: [Structure_Type, Structure_Name, Source_Lines]
Index: []
status: CHECKINGSTATUS
.... no task information
status: TERMINATEDNORMALLY
.... 2021-06-26 18:00:46.861000+01:00 Begin job.
Empty DataFrame
Columns: [File_Name, Struct_Type, Ref_Name, Context, Lines]
Index: []
status: ASSIGNED
.... no task information
status: ASSIGNED
.... 2021-06-26 18:00:47.147000+01:00 Begin job.
status: TERMINATEDNORMALLY
.... 2021-06-26 18:00:47.147000+01:00 Begin job.
Node Structure_Type Structure_Name Structure_Definition
0 a-eos1 Routing_Policy RP_PASS_LO {'name': 'RP_PASS_LO', 'statements': [{'class'...
1 xr1 Routing_Policy RP_PASS_LO {'name': 'RP_PASS_LO', 'statements': [{'class'...
2 vx1 VRF default {'name': 'default', 'bgpProcess': {'ebgpAdminC...
3 xr1 VRF default {'name': 'default', 'bgpProcess': {'ebgpAdminC...
4 vx1 VRF mgmt {'name': 'mgmt', 'staticRoutes': [{'class': 'o...
5 xr1 Route_Filter_List PS_LO {'lines': [{'action': 'PERMIT', 'ipWildcard': ...
6 vx1 Routing_Policy RP_PASS_LO {'name': 'RP_PASS_LO', 'statements': [{'class'...
7 xr1 VRF mgmt {'name': 'mgmt', 'staticRoutes': [{'class': 'o...
8 vx1 Route_Filter_List PL_LO {'lines': [{'action': 'PERMIT', 'ipWildcard': ...
9 a-eos1 VRF default {'name': 'default', 'bgpProcess': {'ebgpAdminC...
10 a-eos1 Route_Filter_List PL_LO {'lines': [{'action': 'PERMIT', 'ipWildcard': ...
status: CHECKINGSTATUS
.... no task information
status: TERMINATEDNORMALLY
.... 2021-06-26 18:00:47.674000+01:00 Begin job.
Structure_Type Structure_Name Source_Lines
0 interface Loopback0 configs/EOS1.cfg:[37, 38]
1 interface Loopback0 configs/XR1.cfg:[15, 16, 17]
2 interface bridge configs/VX1.cfg:[43, 44, 45, 46]
3 vxlan Vxlan1 configs/EOS1.cfg:[44, 45, 46, 47]
4 ip_prefix_list PL_LO configs/VX1.cfg:[108]
5 route-map RP_PASS_LO configs/VX1.cfg:[109, 110, 111]
6 interface swp3 configs/VX1.cfg:[37, 38, 39, 40]
7 prefix-set PS_LO configs/XR1.cfg:[34, 35, 36]
8 route-policy RP_PASS_LO configs/XR1.cfg:[38, 39, 40, 41, 42]
9 route-map RP_PASS_LO configs/EOS1.cfg:[57, 58, 60]
10 vlan 100 configs/VX1.cfg:[58]
11 vrf mgmt configs/VX1.cfg:[49, 50, 51, 52, 84, 85]
12 interface swp2 configs/VX1.cfg:[31, 32, 33, 34]
13 interface swp1 configs/VX1.cfg:[25, 26, 27, 28]
14 interface GigabitEthernet0/0/0/2 configs/XR1.cfg:[29, 30, 31, 32]
15 interface Management1 configs/EOS1.cfg:[40, 41, 42]
16 interface Ethernet1 configs/EOS1.cfg:[29, 30, 31]
17 interface lo configs/VX1.cfg:[11, 12, 13, 14]
18 interface Ethernet2 configs/EOS1.cfg:[33, 34, 35]
19 vxlan vni100 configs/VX1.cfg:[62, 63, 64, 65, 66, 67, 68, 69]
20 interface GigabitEthernet0/0/0/1 configs/XR1.cfg:[24, 25, 26, 27]
21 ipv4 prefix-list PL_LO configs/EOS1.cfg:[51]
22 interface MgmtEth0/0/CPU0/0 configs/XR1.cfg:[19, 20, 21, 22]
23 interface eth0 configs/VX1.cfg:[18, 19, 20, 21, 22]
24 vlan vlan100 configs/VX1.cfg:[55, 56, 57, 58, 59]
Despite the outputs of those two questions are slightly overlapping, there are certain differences:
- the second one also includes interfaces with the lines’ numbers, where the structure is defined in the configuration
- the first one provides names and content of VRFs, prefix lists and route policies
Batfish has a lot more questions, which we mentioned in our first blogpost in this series.
So far we have reviewed the configuration elements, which may be orphaned or not created. Let’s move on to the next topic on our today’s agenda.
#2. Analysing BGP-related Configuration
There are generally two sets of the questions, which Batfish has related to BGP. The first one is related to just providing the raw data structuring and the second is doing a bit more advanced analysis:
Type | Question | Usage |
---|---|---|
Raw data | bgpProcessConfiguration() | Show generic description of the BGP process configuration, such as AS number, router id, confederation/route reflector entries, multipathing, etc. |
Raw data | bgpPeerConfiguration() | Show generic configuration of BGP sessions, such as local/remote AS numbers, local/remote IP address, import/export route policies, session attributes (e.g., communities), etc. |
Analysis | bgpSessionCompatibility() | Trying to match the configuration from different routers to figure out if the BGP sessions can be established (i.e. IP addresses, BGP ASNs and AFI/SAFIs are matching). |
Analysis | bgpSessionStatus() | Similar to the one above, but tries estimate if the BGP session would be up or not (i.e. state is Established or any different) and which type of the BGP session it will be (e.g., eBGP/iBGP, single-hop/multi-hop). Includes the AFI/SAFI info as well. |
Analysis | bgpEdges() | A simpler analysis just listing all unidirectional edges in the network graph based on the established BGP peering. |
Depending on your use case, you may be interested in either questions’s groups or in both simultaneously. So, let’s modify now our Python script to include all those questios:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27 $ cat main.py
!
! OUTPUT IS TRUNCATED FOR BREVITY
!
# Running questions
print("RAW // bgpProcessConfiguration()")
r1 = bfq.bgpProcessConfiguration().answer().frame()
print(r1)
print("RAW // bgpPeerConfiguration()")
r2 = bfq.bgpPeerConfiguration().answer().frame()
print(r2)
print("ANALYSIS // bgpSessionCompatibility()")
r3 = bfq.bgpSessionCompatibility().answer().frame()
print(r3)
print("ANALYSIS // bgpSessionStatus()")
r4 = bfq.bgpSessionStatus().answer().frame()
print(r4)
print("ANALYSIS // bgpEdges()")
r5 = bfq.bgpEdges().answer().frame()
print(r5)
!
! OUTPUT IS TRUNCATED FOR BREVITY
!
In our Network Automation Training we teach you how to create powerful Python scripts.
Now, once our script is amended, we can verify if the BGP configuration is supposed to be stable in our network. The output would be quite an extensive, so we suggest you to spend some time to validate that:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63 $ python main.py
!
! OUTPUT IS TRUNCATED FOR BREVITY
!
RAW // bgpProcessConfiguration()
status: TERMINATEDNORMALLY
.... 2021-06-26 20:55:05.568000+01:00 Begin job.
Node VRF Router_ID Confederation_ID Confederation_Members ... Multipath_IBGP Multipath_Match_Mode Neighbors Route_Reflector Tie_Breaker
0 vx1 default 10.0.255.44 None None ... False None ['10.0.0.4/32', '10.0.0.8/32'] False ARRIVAL_ORDER
1 a-eos1 default 10.0.255.33 None None ... False PATH_LENGTH ['10.0.0.6/32', '10.0.0.9/32'] False ROUTER_ID
2 xr1 default 10.0.255.22 None None ... False EXACT_PATH ['10.0.0.5/32', '10.0.0.7/32'] False ARRIVAL_ORDER
[3 rows x 11 columns]
RAW // bgpPeerConfiguration()
status: ASSIGNED
.... no task information
status: TERMINATEDNORMALLY
.... 2021-06-26 20:55:05.709000+01:00 Begin job.
Node VRF Local_AS Local_IP Local_Interface Confederation Remote_AS ... Route_Reflector_Client Cluster_ID Peer_Group Import_Policy Export_Policy Send_Community Is_Passive
0 vx1 default 65044 10.0.0.9 None None 65033 ... False None None ['RP_PASS_LO'] ['RP_PASS_LO'] True False
1 vx1 default 65044 10.0.0.5 None None 65022 ... False None None ['RP_PASS_LO'] ['RP_PASS_LO'] True False
2 a-eos1 default 65033 10.0.0.7 None None 65022 ... False None None ['RP_PASS_LO'] ['RP_PASS_LO'] False False
3 xr1 default 65022 10.0.0.4 None None 65044 ... False None None ['RP_PASS_LO'] ['RP_PASS_LO'] False False
4 xr1 default 65022 10.0.0.6 None None 65033 ... False None None ['RP_PASS_LO'] ['RP_PASS_LO'] False False
5 a-eos1 default 65033 10.0.0.8 None None 65044 ... False None None ['RP_PASS_LO'] ['RP_PASS_LO'] False False
[6 rows x 16 columns]
ANALYSIS // bgpSessionCompatibility()
status: ASSIGNED
.... no task information
status: TERMINATEDNORMALLY
.... 2021-06-26 20:55:06.098000+01:00 Begin job.
Node VRF Local_AS Local_Interface Local_IP Remote_AS Remote_Node Remote_Interface Remote_IP Address_Families Session_Type Configured_Status
0 a-eos1 default 65033 None 10.0.0.7 65022 xr1 None 10.0.0.6 ['IPV4_UNICAST'] EBGP_SINGLEHOP UNIQUE_MATCH
1 a-eos1 default 65033 None 10.0.0.8 65044 vx1 None 10.0.0.9 ['IPV4_UNICAST', 'EVPN'] EBGP_SINGLEHOP UNIQUE_MATCH
2 vx1 default 65044 None 10.0.0.5 65022 xr1 None 10.0.0.4 ['IPV4_UNICAST'] EBGP_SINGLEHOP UNIQUE_MATCH
3 vx1 default 65044 None 10.0.0.9 65033 a-eos1 None 10.0.0.8 ['IPV4_UNICAST', 'EVPN'] EBGP_SINGLEHOP UNIQUE_MATCH
4 xr1 default 65022 None 10.0.0.4 65044 vx1 None 10.0.0.5 ['IPV4_UNICAST'] EBGP_SINGLEHOP UNIQUE_MATCH
5 xr1 default 65022 None 10.0.0.6 65033 a-eos1 None 10.0.0.7 ['IPV4_UNICAST'] EBGP_SINGLEHOP UNIQUE_MATCH
ANALYSIS // bgpSessionStatus()
status: BLOCKED
.... no task information
status: TERMINATEDNORMALLY
.... 2021-06-26 20:55:06.536000+01:00 Begin job.
Node VRF Local_AS Local_Interface Local_IP Remote_AS Remote_Node Remote_Interface Remote_IP Address_Families Session_Type Established_Status
0 a-eos1 default 65033 None 10.0.0.7 65022 xr1 None 10.0.0.6 ['IPV4_UNICAST'] EBGP_SINGLEHOP ESTABLISHED
1 a-eos1 default 65033 None 10.0.0.8 65044 vx1 None 10.0.0.9 ['IPV4_UNICAST', 'EVPN'] EBGP_SINGLEHOP ESTABLISHED
2 vx1 default 65044 None 10.0.0.5 65022 xr1 None 10.0.0.4 ['IPV4_UNICAST'] EBGP_SINGLEHOP ESTABLISHED
3 vx1 default 65044 None 10.0.0.9 65033 a-eos1 None 10.0.0.8 ['IPV4_UNICAST', 'EVPN'] EBGP_SINGLEHOP ESTABLISHED
4 xr1 default 65022 None 10.0.0.4 65044 vx1 None 10.0.0.5 ['IPV4_UNICAST'] EBGP_SINGLEHOP ESTABLISHED
5 xr1 default 65022 None 10.0.0.6 65033 a-eos1 None 10.0.0.7 ['IPV4_UNICAST'] EBGP_SINGLEHOP ESTABLISHED
ANALYSIS // bgpEdges()
status: ASSIGNED
.... no task information
status: TERMINATEDNORMALLY
.... 2021-06-26 20:55:06.713000+01:00 Begin job.
Node IP Interface AS_Number Remote_Node Remote_IP Remote_Interface Remote_AS_Number
0 xr1 10.0.0.6 None 65022 a-eos1 10.0.0.7 None 65033
1 a-eos1 10.0.0.8 None 65033 vx1 10.0.0.9 None 65044
2 vx1 10.0.0.5 None 65044 xr1 10.0.0.4 None 65022
3 vx1 10.0.0.9 None 65044 a-eos1 10.0.0.8 None 65033
4 xr1 10.0.0.4 None 65022 vx1 10.0.0.5 None 65044
5 a-eos1 10.0.0.7 None 65033 xr1 10.0.0.6 None 65022
We’ve added tags in the beginning of each output so that you can easier trace which question leads to which information. From our perspective, bgpSessionCompatibility() and bgpSessionStatus() are particularly useful, as they give you a look and feel, what may the operational status of the BGP sessions.
Now, in order to see how this check can help you, let’s break something. For example, we’ll change on the Arista EOS device the BGP ASN in the BGP process. It was (correct):
1
2
3
4
5
6 $ cat EOS1.cfg
! OUTPUT IS TRUNCATED FOR BREVITY
!
router bgp 65033
!
! OUTPUT IS TRUNCATED FOR BREVITY
And now we change it to the wrong one, like we’ve just made a small typo:
1
2
3
4
5
6 $ cat EOS1.cfg
! OUTPUT IS TRUNCATED FOR BREVITY
!
router bgp 650333
!
! OUTPUT IS TRUNCATED FOR BREVITY
Do you think Batfish can help us to spot this error? It absolutely can:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63 $ python main.py
!
! OUTPUT IS TRUNCATED FOR BREVITY
!
RAW // bgpProcessConfiguration()
status: ASSIGNED
.... no task information
status: TERMINATEDNORMALLY
.... 2021-06-26 23:33:04.135000+01:00 Begin job.
Node VRF Router_ID Confederation_ID Confederation_Members ... Multipath_IBGP Multipath_Match_Mode Neighbors Route_Reflector Tie_Breaker
0 vx1 default 10.0.255.44 None None ... False None ['10.0.0.4/32', '10.0.0.8/32'] False ARRIVAL_ORDER
1 a-eos1 default 10.0.255.33 None None ... False PATH_LENGTH ['10.0.0.6/32', '10.0.0.9/32'] False ROUTER_ID
2 xr1 default 10.0.255.22 None None ... False EXACT_PATH ['10.0.0.5/32', '10.0.0.7/32'] False ARRIVAL_ORDER
[3 rows x 11 columns]
RAW // bgpPeerConfiguration()
status: ASSIGNED
.... no task information
status: TERMINATEDNORMALLY
.... 2021-06-26 23:33:04.438000+01:00 Begin job.
Node VRF Local_AS Local_IP Local_Interface Confederation Remote_AS ... Route_Reflector_Client Cluster_ID Peer_Group Import_Policy Export_Policy Send_Community Is_Passive
0 a-eos1 default 650333 10.0.0.8 None None 65044 ... False None None ['RP_PASS_LO'] ['RP_PASS_LO'] False False
1 vx1 default 65044 10.0.0.9 None None 65033 ... False None None ['RP_PASS_LO'] ['RP_PASS_LO'] True False
2 vx1 default 65044 10.0.0.5 None None 65022 ... False None None ['RP_PASS_LO'] ['RP_PASS_LO'] True False
3 a-eos1 default 650333 10.0.0.7 None None 65022 ... False None None ['RP_PASS_LO'] ['RP_PASS_LO'] False False
4 xr1 default 65022 10.0.0.4 None None 65044 ... False None None ['RP_PASS_LO'] ['RP_PASS_LO'] False False
5 xr1 default 65022 10.0.0.6 None None 65033 ... False None None ['RP_PASS_LO'] ['RP_PASS_LO'] False False
[6 rows x 16 columns]
ANALYSIS // bgpSessionCompatibility()
status: ASSIGNED
.... no task information
status: TERMINATEDNORMALLY
.... 2021-06-26 23:33:04.746000+01:00 Begin job.
Node VRF Local_AS Local_Interface Local_IP Remote_AS Remote_Node Remote_Interface Remote_IP Address_Families Session_Type Configured_Status
0 a-eos1 default 650333 None 10.0.0.7 65022 None None 10.0.0.6 [] EBGP_SINGLEHOP HALF_OPEN
1 a-eos1 default 650333 None 10.0.0.8 65044 None None 10.0.0.9 [] EBGP_SINGLEHOP HALF_OPEN
2 vx1 default 65044 None 10.0.0.5 65022 xr1 None 10.0.0.4 ['IPV4_UNICAST'] EBGP_SINGLEHOP UNIQUE_MATCH
3 vx1 default 65044 None 10.0.0.9 65033 None None 10.0.0.8 [] EBGP_SINGLEHOP HALF_OPEN
4 xr1 default 65022 None 10.0.0.4 65044 vx1 None 10.0.0.5 ['IPV4_UNICAST'] EBGP_SINGLEHOP UNIQUE_MATCH
5 xr1 default 65022 None 10.0.0.6 65033 None None 10.0.0.7 [] EBGP_SINGLEHOP HALF_OPEN
ANALYSIS // bgpSessionStatus()
status: BLOCKED
.... no task information
status: ASSIGNED
.... no task information
status: TERMINATEDNORMALLY
.... 2021-06-26 23:33:05.232000+01:00 Begin job.
Node VRF Local_AS Local_Interface Local_IP Remote_AS Remote_Node Remote_Interface Remote_IP Address_Families Session_Type Established_Status
0 a-eos1 default 650333 None 10.0.0.7 65022 None None 10.0.0.6 [] EBGP_SINGLEHOP NOT_COMPATIBLE
1 a-eos1 default 650333 None 10.0.0.8 65044 None None 10.0.0.9 [] EBGP_SINGLEHOP NOT_COMPATIBLE
2 vx1 default 65044 None 10.0.0.5 65022 xr1 None 10.0.0.4 ['IPV4_UNICAST'] EBGP_SINGLEHOP ESTABLISHED
3 vx1 default 65044 None 10.0.0.9 65033 None None 10.0.0.8 [] EBGP_SINGLEHOP NOT_COMPATIBLE
4 xr1 default 65022 None 10.0.0.4 65044 vx1 None 10.0.0.5 ['IPV4_UNICAST'] EBGP_SINGLEHOP ESTABLISHED
5 xr1 default 65022 None 10.0.0.6 65033 None None 10.0.0.7 [] EBGP_SINGLEHOP NOT_COMPATIBLE
ANALYSIS // bgpEdges()
status: ASSIGNED
.... no task information
status: TERMINATEDNORMALLY
.... 2021-06-26 23:33:05.623000+01:00 Begin job.
Node IP Interface AS_Number Remote_Node Remote_IP Remote_Interface Remote_AS_Number
0 vx1 10.0.0.5 None 65044 xr1 10.0.0.4 None 65022
1 xr1 10.0.0.4 None 65022 vx1 10.0.0.5 None 65044
As said earlier, the first two outputs contain the raw data, which might be useful for a general analysis; however, it doesn’t answer immediately the question, whether the configuration is wrong. In the same time, the tests 3-5 show way more interesting things:
- bgpSessionCompatibility() shows that only 2 out of 6 BGP session has UNIQUE_MATCH state (versus 6 before error)
- bgpSessionStatus() shows that only 2 out of BGP session has state ESTABLISHED (versus 6 before error)
- bgpEdges() shows that we have only 2 unidirectional BGP edges (versus 6 before error)
Clearly, the Batfish helped us to find the affect on the real network the configuration mistake. Let’s restore configuration before moving on to the next session.
We’ve repeated the tests consequentially by tweaking the BGP ASN in Cisco IOS XR and Cumulus Linux and error was found.
#3. Analysing the Route Table
The last test we are going to conduct today for our multivendor running Cisco IOS XR, Cumulus Linux and Arista EOS would be analysis of the route table as well as BGP-RIB, which Batfish performs based on the configuration.
#3.1. Route Table View
The first test in this part is to check the content of the Routing table. To to do that, Batfish has a question routes(), which attempts to give you a projected view of the Route Table including the local and learned routes. This question bring the analysis on the next level, as it takes into account the configuration of the routing protocols (BGP in our case) when analysis the content of the Routing Table. Let’s have a look into script:
1
2
3
4
5
6
7
8
9
10 $ cat main.py
!
! OUTPUT IS TRUNCATED FOR BREVITY
!
# Running questions
print("ANALYSIS // routes()")
r1 = bfq.routes().answer().frame()
print(r1)
!
! OUTPUT IS TRUNCATED FOR BREVITY
Just to remind, now our network has a correct configuration, so we should see quite a bunch of routes learned over BGP as well as local routes:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32 $ python main.py
!
! OUTPUT IS TRUNCATED FOR BREVITY
!
ANALYSIS // routes()
status: BLOCKED
.... no task information
status: ASSIGNED
.... no task information
status: TERMINATEDNORMALLY
.... 2021-06-27 13:42:02.223000+01:00 Begin job.
Node VRF Network Next_Hop Next_Hop_IP Next_Hop_Interface Protocol Metric Admin_Distance Tag
0 xr1 default 10.0.0.4/32 None AUTO/NONE(-1l) GigabitEthernet0/0/0/1 local 0 0 None
1 xr1 default 10.0.0.4/31 None AUTO/NONE(-1l) GigabitEthernet0/0/0/1 connected 0 0 None
2 eos1 default 10.0.255.44/32 vx1 10.0.0.9 dynamic bgp 0 200 None
3 vx1 default 10.0.0.8/31 None AUTO/NONE(-1l) swp3 connected 0 0 None
4 xr1 default 10.0.255.44/32 vx1 10.0.0.5 dynamic bgp 0 20 None
5 vx1 default 10.0.255.33/32 eos1 10.0.0.8 dynamic bgp 0 20 None
6 xr1 default 10.0.0.6/31 None AUTO/NONE(-1l) GigabitEthernet0/0/0/2 connected 0 0 None
7 eos1 default 10.0.255.33/32 None AUTO/NONE(-1l) Loopback0 connected 0 0 None
8 xr1 default 10.0.0.6/32 None AUTO/NONE(-1l) GigabitEthernet0/0/0/2 local 0 0 None
9 xr1 default 10.0.255.22/32 None AUTO/NONE(-1l) Loopback0 connected 0 0 None
10 vx1 default 10.0.255.22/32 xr1 10.0.0.4 dynamic bgp 0 20 None
11 eos1 default 10.0.255.22/32 xr1 10.0.0.6 dynamic bgp 0 200 None
12 eos1 default 10.0.0.8/31 None AUTO/NONE(-1l) Ethernet2 connected 0 0 None
13 vx1 default 10.0.0.2/31 None AUTO/NONE(-1l) swp1 connected 0 0 None
14 eos1 default 10.0.0.6/31 None AUTO/NONE(-1l) Ethernet1 connected 0 0 None
15 vx1 default 10.0.0.4/31 None AUTO/NONE(-1l) swp2 connected 0 0 None
16 eos1 default 10.0.0.8/32 None AUTO/NONE(-1l) Ethernet2 local 0 0 None
17 vx1 default 10.0.255.44/32 None AUTO/NONE(-1l) lo connected 0 0 None
18 eos1 default 10.0.0.7/32 None AUTO/NONE(-1l) Ethernet1 local 0 0 None
19 xr1 default 10.0.255.33/32 eos1 10.0.0.7 dynamic bgp 0 20 None
As each of the routers advertise their BGP routes, we see in total 6 BGP routes (we have 3 routers, each has 2 BGP routes installed). Now, we’ll add mistake on our Cisco IOS XR router by changing BGP ASN, like we did in the previous test.
Was (correct):
1
2
3
4
5
6
7 $ cat XR1.cfg
!
! OUTPUT IS TRUNCATED FOR BREVITY
!
router bgp 65022
!
! OUTPUT IS TRUNCATED FOR BREVITY
Become (with mistake):
1
2
3
4
5
6
7 $ cat XR1.cfg
!
! OUTPUT IS TRUNCATED FOR BREVITY
!
router bgp 650222
!
! OUTPUT IS TRUNCATED FOR BREVITY
Previous in the part 2 we have seen that BGP questions would show that there are errors and BGP sessions aren’t established. Let’s see what would be the answer to the route’s question:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26 $ python main.py
!
! OUTPUT IS TRUNCATED FOR BREVITY
!
ANALYSIS // routes()
status: BLOCKED
.... no task information
status: TERMINATEDNORMALLY
.... 2021-06-27 13:49:50.726000+01:00 Begin job.
Node VRF Network Next_Hop Next_Hop_IP Next_Hop_Interface Protocol Metric Admin_Distance Tag
0 xr1 default 10.0.0.4/32 None AUTO/NONE(-1l) GigabitEthernet0/0/0/1 local 0 0 None
1 xr1 default 10.0.0.4/31 None AUTO/NONE(-1l) GigabitEthernet0/0/0/1 connected 0 0 None
2 eos1 default 10.0.255.44/32 vx1 10.0.0.9 dynamic bgp 0 200 None
3 vx1 default 10.0.0.8/31 None AUTO/NONE(-1l) swp3 connected 0 0 None
4 vx1 default 10.0.255.33/32 eos1 10.0.0.8 dynamic bgp 0 20 None
5 xr1 default 10.0.0.6/31 None AUTO/NONE(-1l) GigabitEthernet0/0/0/2 connected 0 0 None
6 eos1 default 10.0.255.33/32 None AUTO/NONE(-1l) Loopback0 connected 0 0 None
7 xr1 default 10.0.0.6/32 None AUTO/NONE(-1l) GigabitEthernet0/0/0/2 local 0 0 None
8 xr1 default 10.0.255.22/32 None AUTO/NONE(-1l) Loopback0 connected 0 0 None
9 eos1 default 10.0.0.8/31 None AUTO/NONE(-1l) Ethernet2 connected 0 0 None
10 vx1 default 10.0.0.2/31 None AUTO/NONE(-1l) swp1 connected 0 0 None
11 eos1 default 10.0.0.6/31 None AUTO/NONE(-1l) Ethernet1 connected 0 0 None
12 vx1 default 10.0.0.4/31 None AUTO/NONE(-1l) swp2 connected 0 0 None
13 eos1 default 10.0.0.8/32 None AUTO/NONE(-1l) Ethernet2 local 0 0 None
14 vx1 default 10.0.255.44/32 None AUTO/NONE(-1l) lo connected 0 0 None
15 eos1 default 10.0.0.7/32 None AUTO/NONE(-1l) Ethernet1 local 0 0 None
Now you see only 2 BGP routes instead of 6, which means that Batfish correctly processes the configuration and estimates the effect of the broken BGP sessions on the route table
#3.2. BGP RIB
Yet retaining the broken state of the network, let’s check the content of the BGP RIBs using the Batfish’s question bgpRib(), which intents to show us the content of the BGP RIB per each BGP speaking router in the analysed network:
1
2
3
4
5
6
7
8
9
10 $ cat main.py
!
! OUTPUT IS TRUNCATED FOR BREVITY
!
# Running questions
print("ANALYSIS // bgpRib()")
r1 = bgpRib().answer().frame()
print(r1)
!
! OUTPUT IS TRUNCATED FOR BREVITY
And this is the output now:
1
2
3
4
5
6
7
8
9
10
11
12 $ python main.py
!
! OUTPUT IS TRUNCATED FOR BREVITY
!
ANALYSIS // bgpRib()
status: CHECKINGSTATUS
.... no task information
status: TERMINATEDNORMALLY
.... 2021-06-27 13:49:50.955000+01:00 Begin job.
Node VRF Network Next_Hop_IP Next_Hop_Interface Protocol AS_Path Metric Local_Pref Communities Origin_Protocol Origin_Type Originator_Id Cluster_List Tag
0 eos1 default 10.0.255.44/32 10.0.0.9 dynamic bgp 65044 0 100 [] bgp igp 10.0.255.44 None None
1 vx1 default 10.0.255.33/32 10.0.0.8 dynamic bgp 65033 0 100 [] bgp incomplete 10.0.255.33 None None
As you can see, it shows the entry in the BGP RIB including the AS_PATH, originator’s ID, Type and many others. If we fix the error introduced earlier and re-run the script, we should see many more routes:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16 $ python main.py
!
! OUTPUT IS TRUNCATED FOR BREVITY
!
NALYSIS // bgpRib()
status: ASSIGNED
.... no task information
status: TERMINATEDNORMALLY
.... 2021-06-27 13:58:06.843000+01:00 Begin job.
Node VRF Network Next_Hop_IP Next_Hop_Interface Protocol AS_Path Metric Local_Pref Communities Origin_Protocol Origin_Type Originator_Id Cluster_List Tag
0 eos1 default 10.0.255.44/32 10.0.0.9 dynamic bgp 65044 0 100 [] bgp igp 10.0.255.44 None None
1 vx1 default 10.0.255.22/32 10.0.0.4 dynamic bgp 65022 0 100 [] bgp igp 10.0.255.22 None None
2 eos1 default 10.0.255.22/32 10.0.0.6 dynamic bgp 65022 0 100 [] bgp igp 10.0.255.22 None None
3 xr1 default 10.0.255.33/32 10.0.0.7 dynamic bgp 65033 0 100 [] bgp incomplete 10.0.255.33 None None
4 xr1 default 10.0.255.44/32 10.0.0.5 dynamic bgp 65044 0 100 [] bgp igp 10.0.255.44 None None
5 vx1 default 10.0.255.33/32 10.0.0.8 dynamic bgp 65033 0 100 [] bgp incomplete 10.0.255.33 None None
That suggests that Batfish can properly interpret the BGP-related configuration errors and its impact on the routing tables and BGP RIB.
#3.3. Route Table Lookup
The last test would be relatively quick in our topology. It allows you to check, which route would be chosen to route the traffic towards particular destination. To do that, you should use question lpmRoutes() with a mandatory argument ip, which shall include the IP adderss of your destination:
1
2
3
4
5
6
7
8
9
10 $ cat main.py
!
! OUTPUT IS TRUNCATED FOR BREVITY
!
print("ANALYSIS // lpmRoutes()")
r3 = bfq.lpmRoutes(ip='10.0.255.22').answer().frame()
print(r3)
!
! OUTPUT IS TRUNCATED FOR BREVITY
This question builds internally the same routing table structure as routes(), and the performs the route lookup:
1
2
3
4
5
6
7
8
9
10
11 $ python main.py
!
! OUTPUT IS TRUNCATED FOR BREVITY
!
ANALYSIS // lpmRoutes()
status: TERMINATEDNORMALLY
.... 2021-06-27 13:58:07.192000+01:00 Begin job.
Node VRF Ip Network Num_Routes
0 eos1 default 10.0.255.22 10.0.255.22/32 1
1 vx1 default 10.0.255.22 10.0.255.22/32 1
2 xr1 default 10.0.255.22 10.0.255.22/32 1
Together with the routes(), it allows you to trace the path through the network to a specific destination
GitHub Repo
The configuration files for this lab as well as a full examples of the Python script we created in this and other blogposts related to Batfish you can find in our GitHub repo.
Lessons Learned
In certain cases, you might be interested in the results of the analysis from the particular node’s standpoint. In other words, you submit for the analysis the configuration files of multiple network elements, but print the output only for a specific node. This can be achieved in Batfish by using the key nodes=”XXX”, where XXX is a hostname of the node in the configuration files, provided as an argument to the question:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15 $ cat main.py
!
! OUTPUT IS TRUNCATED FOR BREVITY
!
# Running questions
print("ANALYSIS // routes()")
r1 = bfq.routes(nodes='EOS1').answer().frame()
print(r1)
print("ANALYSIS // bgpRib()")
r2 = bfq.bgpRib(nodes='EOS1').answer().frame()
print(r2)
!
! OUTPUT IS TRUNCATED FOR BREVITY
!
In his case the resulting output is limited only to the requested node:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27 $ python main.py
!
! OUTPUT IS TRUNCATED FOR BREVITY
!
ANALYSIS // routes()
status: BLOCKED
.... no task information
status: ASSIGNED
.... no task information
status: TERMINATEDNORMALLY
.... 2021-06-27 13:11:01.027000+01:00 Begin job.
Node VRF Network Next_Hop Next_Hop_IP Next_Hop_Interface Protocol Metric Admin_Distance Tag
0 eos1 default 10.0.255.33/32 None AUTO/NONE(-1l) Loopback0 connected 0 0 None
1 eos1 default 10.0.255.22/32 xr1 10.0.0.6 dynamic bgp 0 200 None
2 eos1 default 10.0.0.8/31 None AUTO/NONE(-1l) Ethernet2 connected 0 0 None
3 eos1 default 10.0.255.44/32 vx1 10.0.0.9 dynamic bgp 0 200 None
4 eos1 default 10.0.0.6/31 None AUTO/NONE(-1l) Ethernet1 connected 0 0 None
5 eos1 default 10.0.0.8/32 None AUTO/NONE(-1l) Ethernet2 local 0 0 None
6 eos1 default 10.0.0.7/32 None AUTO/NONE(-1l) Ethernet1 local 0 0 None
ANALYSIS // bgpRib()
status: CHECKINGSTATUS
.... no task information
status: TERMINATEDNORMALLY
.... 2021-06-27 13:11:01.369000+01:00 Begin job.
Node VRF Network Next_Hop_IP Next_Hop_Interface Protocol AS_Path Metric Local_Pref Communities Origin_Protocol Origin_Type Originator_Id Cluster_List Tag
0 eos1 default 10.0.255.44/32 10.0.0.9 dynamic bgp 65044 0 100 [] bgp igp 10.0.255.44 None None
1 eos1 default 10.0.255.22/32 10.0.0.6 dynamic bgp 65022 0 100 [] bgp igp 10.0.255.22 None None
Conclusion
In this blogpost we’ve performed the deep dive in the configuration analysis with the Batfish. Based on this analysis, we’ve found that Batfish allows us to find unused configuration elements, references towards non-created elements as well as predict the impact on the network the potential issues with the configuration of the BGP routing protocol. In the next part we’ll cover the synthetic tests with Batfish. Take care and good bye.
Support Us
P.S.
If you have further questions or you need help with your networks, our team is happy to assist you. Just book a free slot with us. Also don’t forget to share the article on your social media, if you like it.
BR,
Anton Karneliuk