CEX (Code EXpress) 11. Working with files and parsing CSV.

Anton Karneliuk

4 years ago

Hello my friend,

With this post we start the second series of the Code EXpress (CEX) blogposts covering Python (namely, Python 3.8) basics for the network engineers. Previously we have covered the most simple elements and heading now to more complex scenarios.

Automation for networks and not only

Knowing how to improve the efficiency of your network or IT operation via applying some of the automation techniques with Ansible or Python is getting more and more important.

Boost up your skills with the industry best network automation training covering the details of the NETCONF/YANG, REST API with Bash, Ansible and Python for managing network devices from Cisco, Nokia, Arista and Cumulus. Besides you get the Linux management skills, as well as network virtualisation (KVM) and containerisation (Docker).

Don’t waste your time. Start your training today!

What are we going to do today?

In today’s blogpost we are going to cover the basics of the working with file in Python. Despite there are multiple ways how you can do, we will share, we believe, one of the most popular and convenient way:

Using context manager with … as … you will open the CSV file in your Python script.
Using function read() you will read the content of the file and parse it into Python data structure containing dictionaries and lists using split() function.
Using function write() you will write the Python variables into the file at your hard disk.
On top to that you will learn about new function enumerate(), which allows you to get the index of the element used in a for-loop.

In general, it is expected you have read the previous CEX blogposts, so you shall be familiar with our lab setup.

Read the first blogpost to understand how to settle your lab with the latest Python version, which is Python 3.8.3 at the moment of writing.

Why does it matter?

If you think about any application, starting from the simplest scripts up to the complex enterprise-class tools, you all the time have a configuration files. They may be INI, may be CSV, or anything other textual data format. Hence, you absolutely should know how to work with the files so that Python applications you are creating or will be creating in future will benefit from the external files.

On the other hand, having worked for quite a while in Service Providers in various countries across the Europe, I can firmly say that spreadsheet engineering (or Excel-based engineering) is still very popular and, I bet, will be yet popular for a while. As such, you shall be able to integrate the spreadsheets in your automation workflow with Python so that you can create an improvements in your network or IT operation just right now.

To be able to do that, you need to know not only how to read the file, but also how to parse its content. In this lab you will see how to do that for CSV using the tools you have already learned: lists and thei r operators pop()/append(), splint() function and for-loop code low control.

How are we doing that?

At a high level, the task can be split into two major building blocks for the parsing operation:

Open the file and read its content.
Parse the date into a Python data structure.

The first block would require the usage of the context manager with … as …, which allows you to open the file for reading or writing depending on your needs. The second blocks relies on the tools we have covered previously.

The second task is a mirrored back the first one:

You need to covert the Python data structure into a single multi-line string
This string shall be written into a file.

And here again you will use the same with … as … context manager and known skills.

#1. Reading file and parsing CSV

It is assumed you have a clone of the lessons code from our GitHub repo.

As usually, we start with the creation of the new directory:


1
2
3
$ cd CEX

$ mkdir 11

$ cd 11

The next step is to have some CSV data, which we can open in Python and covert in a Python list or a dictionary. We are creating this Python 3 classes for the network engineers; hence, we will take some network-oriented data. The following examples can be an inspiration for you to create your CSV:

List of the servers’ interfaces in the data centre containing the IPv4 or IPv6 addresses and list of the VLANs.
List of the 4G eNB or 5G gNB (mobile network base stations) also containing the names of the interfaces, IPv4/IPv6 address and VLANs.
List of the office desktops with the same information.

Choose what is closer to you and create the similar CSV file:


1
2
3
4
$ cat some_data.csv

Id,Name,Interface,Speed,Encapsulation,VLAN,IPv4,IPv6

1,DE-DB-1,eth0,1000,none,none,192.168.100.11/24,fc00:192:168:100::B/64

2,DE-CP-1,eth0,10000,dot1q,10,192.168.100.12/24,fc00:192:168:100::C/64

This is sort of connectivity matrix on one side, so it is perfectly matching any of the cases you we have explained above. CSV stands for “Comma-Separated Values” and that is exactly what you can see here:

The fist line contains the names of the columns.
The second and all other lines contain the entry for each connected node.

Once imported and converted into Python, we’d like to have a similar structure:

The parent structure is a list containing the info about each connected element.
The connected element is represented as a dictionary with the keys following the first line of CSV file and values are each another line.

With this structure in mind, we will write our Python script:


1
2
3
4
5
6
7
8
9
10
11
12
$ cat fread.py 

#!/usr/local/bin/python3.8



# Variables

path_to_file = './some_data.csv'



# Body

## Reading CSV file and parsing it into Python data structures

with open(path_to_file, 'r') as f:

    file_content = f.read()



print(file_content)

We start our Python code with the path towards the Python interpreter. Then you create a variable path_to_file with the string data containing the path towards the CSV data. Finally, using the context manager with … as … and function open() we open the file and store in a variable called f. The function open() has two arguments:

The first positional argument contains the path to the file, which shall be opened. In our case it contains the value of the variable path_to_file.
The second positional argument contains the operation we are opening the file for. In our case it is ‘r’, what stands for read.

File is opened as an object, which is a specific type of the Python’s data.

Later in this blog series you will know, what the object is.

In case we work with the files, we need to apply the function read() without any arguments to the created variable f. We save the content of the opened file into a new variable called file_content, which is a multiline string. Let’s execute the file now and see the result:


1
2
3
4
$ python3.8 fread.py 

Id,Name,Interface,Speed,Encapsulation,VLAN,IPv4,IPv6

1,DE-DB-1,eth0,1000,none,none,192.168.100.11/24,fc00:192:168:100::B/64

2,DE-CP-1,eth0,10000,dot1q,10,192.168.100.12/24,fc00:192:168:100::C/64

You can also change the file permissions to covert it into an executable one.

As expected, the content of the file is exactly the same as the original some_data.csv file. Now we need to parse this multiline string into a data set as we explained above:


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
$ cat fread.py 

#!/usr/local/bin/python3.8



# Variables

path_to_file = './some_data.csv'



# Body

## Reading CSV file and parsing it into Python data structures

with open(path_to_file, 'r') as f:

    file_content = f.read()



data_lines = file_content.split('\n')

headers = data_lines.pop(0)

headers_keys = headers.split(',')



result_data = &#91;]



for line_entry in data_lines:

    temp_dict = {}



    for value_index, value_entry in enumerate(line_entry.split(',')):

        temp_dict.update({headers_keys&#91;value_index]: value_entry})

    

    result_data.append(temp_dict)



print(result_data)

Almost all the functions we are using here, we have covered in the previous blogposts in the Code EXpress (CEX) series; hence, we just explain the logic:

Using split() function with the separator ‘\n’ (meaning newline), we create a Python list data_lines.
Using pop() function we take the first element of data_lines out and copy that value into headers variable, which is covered itself into a list header_keys using split() with separator ‘,’.
For the rest entries in the data_lines, which now are pure data without header, we start a for-loop.
Inside each iteration we create a dictionary, where the keys are the elements out of the header_keys list and values come from the current iteration of the for cycle line_entry.
To achieve that, we use the new function enumerate(), which takes the list as an input argument and returns two values: current index, which we store into value_index, and the value of the element, which we store into value_entry.
This index is used to call the proper element out of header_keys list to construct a key-value pair, which we add to the temporary dictionary using update() function.

In the end, we’d like to print the content of our created data structure:


1
2
$ python3.8 fread.py 

&#91;{'Id': '1', 'Name': 'DE-DB-1', 'Interface': 'eth0', 'Speed': '1000', 'Encapsulation': 'none', 'VLAN': 'none', 'IPv4': '192.168.100.11/24', 'IPv6': 'fc00:192:168:100::B/64'}, {'Id': '2', 'Name': 'DE-CP-1', 'Interface': 'eth0', 'Speed': '10000', 'Encapsulation': 'dot1q', 'VLAN': '10', 'IPv4': '192.168.100.12/24', 'IPv6': 'fc00:192:168:100::C/64'}]

Huzzah! We have the proper Python data structure, which we can start using in our code as we need.

Learn more about the Python dictionaries and lists.

#2. Composing file into CSV format and writing into the file

Now we can perform the backwards operation, which is the converting of this data structure in the multiline string of CSV format and writing it into a new file on your disk. Here is the Python code to do that:


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
$ cat fread.py 

!

! Some content is omitted for brevity

!

## Converting Python data into CSV format and writing into a file

t0 = &#91;]



for line_index, data_entry in enumerate(result_data):

    t1 = &#91;]

    t2 = &#91;]

    for key, value in data_entry.items():

        if line_index == 0:

            t1.append(key)



        t2.append(value)



    if t1:

        t0.append(','.join(t1))



    t0.append(','.join(t2))



with open('new_file.csv', 'w') as f:

    f.write('\n'.join(t0))

If you have followed the previous explanation, the logic here shall be almost self-explanatory for you:

We create the t0 array, which will contain all the elements we will convert into a CSV-like multiline string.
Using enumerate() function over the list result_data we have created in previous part, we start a for-loop.
Inside for-loop we have two lists t1 and t2, which accumulate either the header keys (working only for a first iteration and controlled by if statement) or values for each element inside the dictionary exposed by items() function within the second for-loop.
Once the t1 and t2 are created, their content is merged into a single line using join() function and ‘,’ delimiter and added to t0. Ultimately t0 has elements, which are the strings of the future CSV file.
Using the with … as … context manager, we open() a new file called ‘new_file.csv’ for a write (or re-write) operation instructed by ‘w’ argument.
Inside this file we write() the lines out of t0 merged using join() function and ‘\n’ (newline) delimiter.

If we execute this Python script now, you won’t see any output in the CLI:


1
2
$ python3.8 fread.py 

$

However, now we have a new file created, which has a desired content:


1
2
3
4
5
6
7
8
$ ls

fread.py  new_file.csv  some_data.csv





$ cat new_file.csv 

Id,Name,Interface,Speed,Encapsulation,VLAN,IPv4,IPv6

1,DE-DB-1,eth0,1000,none,none,192.168.100.11/24,fc00:192:168:100::B/64

2,DE-CP-1,eth0,10000,dot1q,10,192.168.100.12/24,fc00:192:168:100::C/64

If you want to learn about Python in the context of the network automation, join our network automation training: live or self-paced.

If you prefer video

If you more prefer watching the video rather than reading the articles, it is all good. Subscribe to our YouTube channel, where you will find all the latest our videos including previous Code EXpress (CEX) episodes.

And here is the latest one:

What else shall you try?

Learning programming is all about trying and testing. To fully understand what we have covered so far, you can try the following additional scenarios:

Instead of having the list of dictionaries, parse the data in a nested dictionary with the key being the value of the first column (Id).

As a result, you shall have the following data structure:


1
2
3
4
5
6
7
8
9
10
{

  '1': {

    'Name': 'ABC',

    'Interface': 'eth0',

    ...

  },

  '2': {

  ...

  }

}

Lessons at GitHub

The listing of this Python class, as well as all the previous, you can find in our GitHub repository.

Conclusion

We are glad to kick off the 2nd season of the Code EXpress (CEX) blogposts to boost your knowledge in Python. Today you have learned how you can work with the files and how to parse the tables, which are so popular in all the companies, especially well established.

Support us

Support new interop and automation articles at karneliuk.com
EUR
I want to support with:

P.S.

If you have further questions or you need help with your networks, I’m happy to assist you, just send me message. Also don’t forget to share the article on your social media, if you like it.

BR,

Anton Karneliuk