Site icon Karneliuk

From Python to Go 011. Parsing XML, JSON, And YAML Files.

Hello my friend,

This blog post is probably the first one, where we start doing more practical rather than foundational things in Python and Go (Golang). Up till now we were going through all possible data types as well as small steps how to deal with files. Today we’ll bring that all together and boost it with practical scenario of parsing data following the most popular data serialization techniques these days

Which Jobs Do Require Network Automation Skills?

For quite a while I’m trying to hire a good network automation engineer, who shall be capable to write applications in Python, which shall manage networking. The pay is good, so my understanding would be that the candidates’ level shall be good as well. My understanding is sadly far from reality as general skills in software development is poor. I was thinking multiple times, if people who passed my trainings would apply, they could have smashed it (provided they practice). Which means there are a lot of jobs out there, requiring good level of automation and software development skills. But they stay unfulfilled because there are no good candidates. It could be yours.

Boost yourself up!

We offer the following training programs in network automation for you:

During these trainings you will learn the following topics:

Moreover, we put all mentions technologies in the context of real use cases, which our team has solved and are solving in various projects in the service providers, enterprise and data centre networks and systems across the Europe and USA. That gives you opportunity to ask questions to understand the solutions in-depth and have discussions about your own projects. And on top of that, each technology is provided with online demos and labs to master your skills thoroughly. Such a mixture creates a unique learning environment, which all students value so much. Join us and unleash your potential.

Start your automation training today.

What Are We Going To Talk Today?

Whenever you start developing network and IT infrastructure applications, you start dealing with question, where do you take your inventory from. By inventory in this context we mean the list of systems, which you are going to connect to to perform certain activities. At later stage of application development (and in our blog post), we will introduce how to fetch inventory via API. However, at the beginning of development you normally would deal with local inventory files. In this blog post we are going to discuss:

  1. How to read and parse the most popular serializations existing today: XML, JSON, and YAML?
  2. How to create structured data (data classes in Python and structs in Go (Golang)) and use it in your application?

Explanation

Each popular serialization format used these days is important. It is important for multiple reasons, including but not limited to:

  1. Applications and protocols it is being used today.
  2. Understanding how data could be serialized on wire or on disk.
  3. Associated pros and cons.
  4. Historical purposes.

Without further ado, let’s review those key formats:

XML

Official specification.

Stands for Extensible Markup Language and for decades it was one of the most important formats for storing structured data for websites and application to application communications. Here is a snippet of XML data:


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
<?xml version="1.0" encoding="UTF-8"?>
<root>
    <devices>
        <name>leaf-01</name>
        <os>cisco-nxos</os>
        <ip>192.168.1.1</ip>
        <port>22</port>
        <latitude>51.5120898</latitude>
        <longitude>-0.0030987</longitude>
        <active>true</active>
    </devices>
    <devices>
        <name>leaf-02</name>
        <os>arista-eos</os>
        <ip>192.168.1.2</ip>
        <port>830</port>
        <latitude>51.5120427</latitude>
        <longitude>-0.0044585</longitude>
        <active>true</active>
    </devices>
    <devices>
        <name>spine-01</name>
        <ip>192.168.1.11</ip>
        <port>22</port>
        <latitude>51.5112179</latitude>
        <longitude>-0.0048555</longitude>
        <active>false</active>
    </devices>
</root>

It starts with XML declaration “<?xml …>” followed by the content. It must require on top element, called “root“, which then contain all further elements. Assigning the value to a key happens in format “<key_name>value</key_name>“, where value can be further nested. This serialization brings us immediately to two major drawbacks:

  1. You essentially type key name twice, which almost double the overhead on wire.
  2. You cannot start partially processing data until you fully read the entire message.

One of the big benefits of XML though is that it supports metadata in form of instructions coming past the key name “<key_name metadata1=”some_metadata_value_1″ metadata2=”some_metadata_value_2″>“. Neither JSON nor XML has ability to serialize additional metadata.

XML is very important from historical standpoint as it dominated the data communication during dot-com boom at the beginning of Internet growth and is still widely used in web-development for storing data. In network automation it is widely used nowadays in NETCONF protocol.

Join zero-to-hero network automation training to master NETCONF and XML.

JSON

Official specification.

Stands for Java Script Object Notation and it is now de-facto standard for application to application communication these days. As XML, it is “self-descriptive“, meaning you (and your application) can reason about data received simply by examining key names and associated data. Sample:


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
{
    "devices": &#91;
        {
            "name": "leaf-01",
            "os": "cisco-nxos",
            "ip": "192.168.1.1",
            "port": 22,
            "latitude": 51.5120898,
            "longitude": -0.0030987,
            "active": true
        },
        {
            "name": "leaf-02",
            "os": "arista-eos",
            "ip": "192.168.1.2",
            "port": 830,
            "latitude": 51.5120427,
            "longitude": -0.0044585,
            "active": true
        },
        {
            "name": "spine-01",
            "ip": "192.168.1.11",
            "port": 22,
            "latitude": 51.5112179,
            "longitude": -0.0048555,
            "active": false
        }
    ]
}

Essentially this is exactly the same data as above, but in JSON serialization, which leads to the following important statement: your serialization may vary depending on the context, but the actual data it contains may not.

The content of the JSON file is stored within curly braces “{}“, which is called “object“. If you want signal that your data type is a list, you use square brackets “[]“, whilst they data mapping is happening following ““key_name”: “value”” format, where value can be string, boolean/integer, another object, list or null. Strings must be wrapped into double quotes, which is applicable both for key names and for values; at the same time, all other data types MUST NOT be wrapped in double quotes.

JSON is used REST API and RESTCONF, when it comes to network automation.

Join zero-to-hero network automation training to master RESTCONF/REST API and JSON.

YAML

Official specification.

YAML stands for YAML Ain’t Markup Language and its primary purpose is to store structured data in a human-friendly format. It is not used for data transfer (at least I’m not aware of any protocol using YAML). Let’s take a look at snippet with the same data as above:


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
---
devices:
  - name: leaf-01
    os: cisco-nxos
    ip: 192.168.1.1
    port: 22
    latitude: 51.5120898
    longitude: -0.0030987
    active: true
  - name: leaf-02
    os: arista-eos
    ip: 192.168.1.2
    port: 22
    latitude: 51.5120898
    longitude: -0.0030987
    active: true
  - name: spine-01
    ip: 192.168.1.11
    port: 22
    latitude: 51.5112179
    longitude: -0.0048555
    active: false

I think you’d agree that it is less hacky and much easier to read by us, human. There are no angle/square/curly brackets, there are typically no quotes symbols either, unless you want explicitly to code your value as string. However, this is the only data serialization we are covering so far, where indentation matter. That is logical as XML uses opening/closing tags to signal where the value ends, whilst JSON uses curly braces. As YAML uses non of these, it shall still have way to signal it; hence, indentations.

YAML is actively used in all the application, where we as human needs to prepare input, as it is easier for us to read this data. Ansible, Salt, Kubernetes, — just to name a few applications, which has input in YAML as well their artifacts (Ansible playbooks, Kubenretes manifests, etc are created in YAML).

Examples

The best way for us to show you how to read and parse data in these serialization formats is to show you the code and to execute it. So we are going to implement the following scenario:

  1. You will have data input: 3 files “inventory.xml“, “inventory.json“, “inventory.yaml“. Each file has the same data, but serialized in a different way.
  2. The application shall be detecting the serialization based on the filename and then load the content it and create structured data using the correct serializer.

We’ll use the files provided above as data input.

Python

As usual, we start with the Python code:


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
"""From Python to Go: Python: 011 - Parsing XML/JSON/YAML files """

# Import
from dataclasses import dataclass, field
from typing import Union, List
import sys
import json
import xmltodict
import yaml


# Dataclass
@dataclass
class InventoryItem:
    """Device -- Inventory Item"""
    name: str
    os: Union&#91;str, None] = None
    ip: Union&#91;str, None] = None
    port: Union&#91;int, None] = None
    latitude: Union&#91;float, None] = None
    longitude: Union&#91;float, None] = None
    active: Union&#91;bool, None] = None


@dataclass
class Inventory:
    """Inventory of all devices"""
    devices: List&#91;InventoryItem] = field(default_factory=list)


# Auxiliary functions
def load_inventory(file: str) -> Inventory:
    """Function to load inventory"""

    # Initialize result
    result = Inventory()

    # Load file
    temp_dict = {}
    with open(file, 'r', encoding="utf-8") as f:
        if file.endswith('.json'):
            temp_dict = json.load(f)
        elif file.endswith('.xml'):
            temp_dict = xmltodict.parse(f.read())&#91;"root"]
        elif file.endswith('.yaml') or file.endswith('.yml'):
            temp_dict = yaml.safe_load(f)
        else:
            raise ValueError('Unsupported file format')

    # Populate result
    for item in temp_dict&#91;'devices']:
        result.devices.append(InventoryItem(
            name=item&#91;'name'],
            os=item.get('os'),
            ip=item.get('ip'),
            port=item.get('port'),
            latitude=item.get('latitude'),
            longitude=item.get('longitude'),
            active=item.get('active')
        ))

    return result


# Main
if __name__ == "__main__":
    # Check that file is provided
    if len(sys.argv) != 2:
        print('Usage: python main.py &lt;file>')
        sys.exit(1)

    # Load inventory
    inventory = load_inventory(sys.argv&#91;1])

    # Print inventory
    print(inventory)

We already covered in previous blog posts many of concepts, so we won’t repeat them. If you struggle with something, read previous blog posts and join zero-to-hero network automation training.

Key things:

  1. We use two external packages
    1
    xmltodict
    and
    1
    pyyaml
    to parse content of XML and YAML respectively. For XML, there is a built-in package called
    1
    xml
    , but it is difficult to use, whilst there is no built-in packages for YAML processing as well. So, you need to install them first:
    1
    pip install xmltodict pyyaml
  2. Despite you install “pyyaml“, it shall be referenced in your code as “yaml“.
  3. Define two data classes to represent your environment: “Inventory” and “InventoryItem“, which is what you will have available for your apps. Where appropriate, use default values. For example, “os: Union[str, None] = None” means that attribute “os” can be either string or null-type data with default being null. The only specific treatment is done for field “devices” from class “Inventory“, which is required for fields of list type. This boils down to how Python implements lists.
  4. The key is function “load_inventory“, which takes the path to file as an input and returns the inventory object:
    • File is opened using context manager “with … as …“, as explained in the previous blog post.
    • Depending on what the file ends with, which is evaluated using if-conditional and method “.endswith()” applied to a string, the corresponding parser is used:
      • json.load()” to process JSON serialization.
      • xmltodict.parse()” to process XML serialization.
      • yaml.safe_load()” process YAML serialization.
    • All these functions return back Python dictionary. We could have stopped here as we already know how to work with dictionaries/maps. But we can be better that this, so we progress further with populating inventory classes:
      • In essence we add elements to list, with each element being data class.
      • Thing you shall pay attention here is the usage of the “.get()” method applied to dictionary. This method checks if the key you ask exists in the dictionary first and then it either returns its value or default value, which is “None“. That is different to straightly calling “dict[“key”]“, which will raise an exception if the key you ask doesn’t exist. The logic with the get-method is better for use cases, where you question the correctness of the input data.
  5. Finally, in the execution part we use something new, we use script call arguments, which are contained in “argv” list of “sys” package. In the next blog post we will talk more about CLI arguments. This allows us to dynamically pass different files without coding them in the application.

Let’s execute this script:


1
2
3
4
5
6
7
8
9
10
$ python main.py ../data/inventory.xml
Inventory(devices=&#91;InventoryItem(name='leaf-01', os='cisco-nxos', ip='192.168.1.1', port='22', latitude='51.5120898', longitude='-0.0030987', active='true'), InventoryItem(name='leaf-02', os='arista-eos', ip='192.168.1.2', port='830', latitude='51.5120427', longitude='-0.0044585', active='true'), InventoryItem(name='spine-01', os=None, ip='192.168.1.11', port='22', latitude='51.5112179', longitude='-0.0048555', active='false')])


$ python main.py ../data/inventory.json
Inventory(devices=&#91;InventoryItem(name='leaf-01', os='cisco-nxos', ip='192.168.1.1', port=22, latitude=51.5120898, longitude=-0.0030987, active=True), InventoryItem(name='leaf-02', os='arista-eos', ip='192.168.1.2', port=830, latitude=51.5120427, longitude=-0.0044585, active=True), InventoryItem(name='spine-01', os=None, ip='192.168.1.11', port=22, latitude=51.5112179, longitude=-0.0048555, active=False)])


$ python main.py ../data/inventory.yaml
Inventory(devices=&#91;InventoryItem(name='leaf-01', os='cisco-nxos', ip='192.168.1.1', port=22, latitude=51.5120898, longitude=-0.0030987, active=True), InventoryItem(name='leaf-02', os='arista-eos', ip='192.168.1.2', port=22, latitude=51.5120898, longitude=-0.0030987, active=True), InventoryItem(name='spine-01', os=None, ip='192.168.1.11', port=22, latitude=51.5112179, longitude=-0.0048555, active=False)])

As you see, in all 3 cases, you’ve got the identical result, which confirms the point that regardless of the serialization, your data shall be the same.

Well, to be brutally honest, the result is ALMOST identical for XML, whilst it is truly identical for JSON and YAML. If you paid a close attention, you would see that all data from XML is read as strings, including “port“, which shall integer, “active“, which shall be boolean, and others. This is because of the fact that XML encoding doesn’t differentiate between string and other data types and defaults to string, whilst JSON/XML by default tries to infer the most appropriate data type and use string only if it cannot detect anything better.

Go (Golang)

Now it is time to implement the same scenario in Go (Golang):


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
/* From Python to Go: Go (Golang): 011 - Parsing XML/JSON/YAML files */
package main

// Import
import (
    "encoding/json"
    "encoding/xml"
    "fmt"
    "os"
    "regexp"

    "gopkg.in/yaml.v3"
)

// Types
type InventoryItem struct {
    /* Device -- Inventory Item */
    Name      string  `xml:"name" json:"name" yaml:"name"`
    OS        string  `xml:"os" json:"os" yaml:"os"`
    IP        string  `xml:"ip" json:"ip" yaml:"ip"`
    Port      int64   `xml:"port" json:"port" yaml:"port"`
    Latitude  float64 `xml:"latitude" json:"latitude" yaml:"latitude"`
    Longitude float64 `xml:"longitude" json:"longitude" yaml:"longitude"`
    Active    bool    `xml:"active" json:"active" yaml:"active"`
}
type Inventory struct {
    /* Inventory of all devices */
    Devices &#91;]InventoryItem `xml:"devices" json:"devices" yaml:"devices"`
}

// Aux functions
func loadInventory(p string) Inventory {
    /* Function to load inventory */

    // Load file
    bs, err := os.ReadFile(p)
    if err != nil {
        fmt.Printf("Cannot open file %v: %v\n", p, err)
        os.Exit(1)
    }

    // Define result
    result := Inventory{}

    // Find importer
    reXML := regexp.MustCompile(`^.+\.xml$`)
    reJSON := regexp.MustCompile(`^.+\.json$`)
    reYAML := regexp.MustCompile(`^.+\.ya?ml$`)

    // XML parsing
    if reXML.MatchString(p) {
        err := xml.Unmarshal(bs, &amp;result)
        if err != nil {
            fmt.Printf("Cannot parse XML data: %v\n", err)
        }
        // JSON parsing
    } else if reJSON.MatchString(p) {
        err := json.Unmarshal(bs, &amp;result)
        if err != nil {
            fmt.Printf("Cannot parse JSON data: %v\n", err)
        }
        // YAML parsing
    } else if reYAML.MatchString(p) {
        err := yaml.Unmarshal(bs, &amp;result)
        if err != nil {
            fmt.Printf("Cannot parse YAML data: %v\n", err)
        }
    } else {
        fmt.Printf("Unknown file format: %v\n", p)
    }

    // Return result
    return result
}

// Main
func main() {
    /* Main busines logic */

    // Check that file is provided
    if len(os.Args) != 2 {
        fmt.Println("Usage: ./app &lt;file>")
        os.Exit(1)
    }

    // Load inventory
    inv := loadInventory(os.Args&#91;1])

    // Print inventory
    fmt.Printf("%+v\n", inv)
}

Same declaimer as above is applicable: read previous blog post for details on what is not explained below.

Break down:

  1. As in Python, we need to install here 3rd party package to parse YAML data , as there is no built-in Go (Golang) by default. Do it using the following instruction:
    1
    go get gopkg.in/yaml.v3
  2. In Go (Golang), you must define which fields from your input serialization type matches to what struct field. That is achieved by adding instruction in your struct key. For example “Name string `xml:”name” json:”name” yaml:”name”`” mean that struct key “Name” will be read out of field “name” in XML, JSON and YAML.
    • It is important to emphasize that struct key MUST be capitalized, meaning starting from the capital letter. This concept is called “Export” in Go (Golang).
  3. Function “loadInventory()” is what generates for you the inventory struct:
    • Read the content of the file in the byte slice as explained in the previous blog post.
    • Create a variable “result” to parse the data into.
    • Create regexp expressions to match the endings.
    • Perform actual detection of the data types based on the file name and then perform parsing using “Unmarshal()” function. Unmarshal is a standard term, hence the same function with the same specification exists in all Go (Golang) libraries performing parsing:
      • Specification requires you to provide 2 variables as input: byte slice, which we take out of reading the file, and pointer towards a variable, where the data will be stored.
      • Specification also defines that the function returns only error in case it arises and as a side effect it populates the pointer with the data.
    • Finally, the result is returned to the main execution body.
  4. Within the main function, the path of the file, which shall be parsed is read from CLI arguments, same as in Python.

And there is the result of the tool’s execution:


1
2
3
4
5
6
7
8
9
10
$ go run . ../data/inventory.xml
{Devices:&#91;{Name:leaf-01 OS:cisco-nxos IP:192.168.1.1 Port:22 Latitude:51.5120898 Longitude:-0.0030987 Active:true} {Name:leaf-02 OS:arista-eos IP:192.168.1.2 Port:830 Latitude:51.5120427 Longitude:-0.0044585 Active:true} {Name:spine-01 OS: IP:192.168.1.11 Port:22 Latitude:51.5112179 Longitude:-0.0048555 Active:false}]}


$ go run . ../data/inventory.json
{Devices:&#91;{Name:leaf-01 OS:cisco-nxos IP:192.168.1.1 Port:22 Latitude:51.5120898 Longitude:-0.0030987 Active:true} {Name:leaf-02 OS:arista-eos IP:192.168.1.2 Port:830 Latitude:51.5120427 Longitude:-0.0044585 Active:true} {Name:spine-01 OS: IP:192.168.1.11 Port:22 Latitude:51.5112179 Longitude:-0.0048555 Active:false}]}


$ go run . ../data/inventory.yaml
{Devices:&#91;{Name:leaf-01 OS:cisco-nxos IP:192.168.1.1 Port:22 Latitude:51.5120898 Longitude:-0.0030987 Active:true} {Name:leaf-02 OS:arista-eos IP:192.168.1.2 Port:22 Latitude:51.5120898 Longitude:-0.0030987 Active:true} {Name:spine-01 OS: IP:192.168.1.11 Port:22 Latitude:51.5112179 Longitude:-0.0048555 Active:false}]}

Here is an important distinction between Python and Go (Golang): in Golang the parser for XML actually does data conversion per Struct data type. Therefore the data in all results is 100% identical.

Lessons in GitHub

You can find the final working versions of the files from this blog at out GitHub page.

Conclusion


In today’s blog we’ve covered the basics of dealing with the most popular data serialization formats: XML, JSON, and YAML. As you can see, it is relatively easy to parse data using either built-in or already developed 3rd-party packages, allowing you to start developing your network and IT infrastructure automation standing on the shoulders of giants. Take care and good bye.

Support us






P.S.

If you have further questions or you need help with your networks, I’m happy to assist you, just send me message. Also don’t forget to share the article on your social media, if you like it.

BR,

Anton Karneliuk 

Exit mobile version