Site icon Karneliuk

HS. Part 1. Emulating a hyper-scale datacentre with Microsoft Azure SONiC as Docker containers (SONiC-P4).

Hello my friend,

In the previous article from the networking series we have started the discussion about SONiC (Software for Open Networking in Clouds), which is a network infrastructure behind Microsoft Azure cloud. Today we continue this discussion from slightly different angle: we will emulate the whole data centre infrastructure end-to-end with leafs, spines and servers.


1
2
3
4
5
No part of this blogpost could be reproduced, stored in a
retrieval system, or transmitted in any form or by any
means, electronic, mechanical or photocopying, recording,
or otherwise, for commercial purposes without the
prior permission of the author.

Network automation training – boost your career

Don’t wait to be kicked out of IT business. Join our network automation training to secure your job in future. Come to NetDevOps side.

How does the training differ from this blog post series? Here you get the basics and learn some programming concepts in general, whereas in the training you get comprehensive set of knowledge with the detailed examples how to use Python for the network and IT automation. You need both.

Thanks

Big thanks to a colleague of mine, Michael Salo, who shared with me useful insights on namespaces networking in Linux.

Brief description

Previously we’ve run the SONiC on the Mellanox SN2010 switch, which was provided to me by the Mellanox team to do some reviews. I’ve returned it back, so I started looking for the opportunities to run the Microsoft Azure SONiC in a virtual environment. As I dealt previously with virtual network functions, such as Cisco IOS XRv, Nokia VSR, Cumulus Linux, Arista vEOS and many more, I tried to find an Azure SONiC as a VM. I have failed to do that, but I have found something even more interesting and intriguing: SONiC as a Docker container. Copying information from the official webpage:

SONiC-P4 is a software switch that runs on the P4-emulated SAI behavioural model software switch ASIC. It uses the sai_bm.p4 to program the P4-emulated switch ASIC to emulate the data plane behaviour. On top of that, it runs the real SONiC network stack.

(c) https://github.com/Azure/SONiC/wiki/SONiC-P4-Software-Switch

As far as I can see, this P4 software switch is also created by Mellanox, though I’m not 100% sure about that. This software switch is build using the most widely used framework in the network disaggregation and white boxes industry, which is SAI API. SAI stands for Switch Abstraction Interface and defines the set of rules, how the network operation system interacts with the underlying hardware such as ASIC or NPU.

From the testing point of view, the Microsoft Azure SONiC running inside the container with emulated P4-switch is managed almost in the same way as a physical switch running such as Mellanox SN2010 running SONiC.

What are we going to test?

The following points are covered in this article:

Software version

The following software components are used in this lab. 

Management host:

The Data Centre Fabric:

CNF stands for a containerised network function.

Topology

They physical topology is significantly changed since the previous networking labs, as this is the first time, we build a containerised network. Hence, all of the network functions and hosts are containerised:


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
+-------------------------------------------------------------------------+
|                                                                         |
|     +-----------+                                    +-----------+      |
|     |           | sw_port1                  sw_port0 |           |      |
|     |  spine11  +-------+                  +---------+  spine12  |      |
|     |           |       |                  |         |           |      |
|     +-----+-----+       |                  |         +-----+-----+      |
|  sw_port0 |             |                  |      sw_port1 |            |
|           |             |                  |               |            |
|       +---+---+     +---+---+          +---+---+       +---+---+        |
|       |s11_l12|     |s11_l12|          |s12_l11|       |s12_l12|        |
|       +---+---+     +---+---+          +---+---+       +---+---+        |
|           |             |                  |               |            |
|  sw_port5 |         +----------------------+      sw_port6 |            |
|     +-----+-----+   |   |                            +-----+-----+      |
|     |           |   |   |                            |           |      |
|     |  leaf11   +---+   +----------------------------+  leaf12   |      |
|     |           | sw_port6                  sw_port5 |           |      |
|     +-----+-----+                                    +-----+-----+      |
|  sw_port0 |                                       sw_port0 |            |
|           |                                                |            |
|       +---+---+                                        +---+---+        |
|       |l11_h11|                                        |l12_h12|        |
|       +---+---+                                        +---+---+        |
|           |                                                |            |
|      eth1 |                                           eth1 |            |
|     +-----+-----+                                    +-----+-----+      |
|     |           |                                    |           |      |
|     |  host11   |                                    |  host12   |      |
|     |           |                                    |           |      |
|     +-----------+                                    +-----------+      |
|                                                                         |
|                                                                         |
|            (c)2020, karneliuk.com // DC POD with 2x leaf, 2x spine      |
|               and 2x hosts. Leaf/Spine run MS Azure SONiC               |
|                                                                         |
+-------------------------------------------------------------------------+

If you want to learn more about server lab setup, refer to the corresponding article.

As said earlier, all the elements of the lab are running as Docker containers. However, each of the has a full network stack (rather complicated with multiple namespaces, to be honest). Let’s dig into the details.

Topology emulation build

I’ve started with the downloading the official repository of SONiC P4 Software Switch. Its content is the following:

#1. Configuration files for Microsoft Azure SONiC

The Ubuntu hosts are pretty simple, so we don’t focus on them too much. On the other hand, each SONiC switch has configuration files stored locally on your host, which are to be mapped towards the container. The following files are available per the virtual instance:


1
2
3
4
5
6
7
8
9
10
11
12
13
sonic_switch
  +--etc
  |  +--config_db
  |  |  +--vlan_config.json
  |  +--quagga
  |  |  +--bgpd.conf
  |  |  +--daemons
  |  |  +--zebra.conf
  |  +--swss
  |     +--config.d
  |        +--00-copp.config.json
  +--scripts
     +--startup.sh

We can review all the files. However, from our lab prospective we need to modify 3 following files:

We need to change the content of these files based on the topology shared above. Using the example of leaf11 you should have:


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
$ cat infrastructure/leaf11/etc/config_db/vlan_config.json
 {
    "VLAN": {
        "Vlan130": {
            "members": [
                "Ethernet0"
            ],
            "vlanid": "130"
        },
        "Vlan131": {
            "members": [
                "Ethernet5"
            ],
            "vlanid": "131"
        },
        "Vlan132": {
            "members": [
                "Ethernet6"
            ],
            "vlanid": "132"
        }
    },
    "VLAN_MEMBER": {
        "Vlan130|Ethernet0": {
            "tagging_mode": "untagged"
        },
        "Vlan131|Ethernet5": {
            "tagging_mode": "untagged"
        },
        "Vlan132|Ethernet6": {
            "tagging_mode": "untagged"
        }
    },
    "VLAN_INTERFACE": {
        "Vlan130|192.168.1.1/24": {},
        "Vlan131|10.0.0.1/31": {},
        "Vlan132|10.0.0.5/31": {}
    }
}

And this one:


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
$ cat infrastructure/leaf11/etc/quagga/bgpd.conf
hostname bgpd
password zebra
enable password zebra
log file /var/log/quagga/bgpd.log
!
router bgp 65111
  bgp router-id 10.1.1.1
  bgp bestpath as-path multipath-relax
  network 192.168.1.0 mask 255.255.255.0
  neighbor 10.0.0.0 remote-as 65101
  neighbor 10.0.0.0 timers 1 3
  neighbor 10.0.0.0 send-community
  neighbor 10.0.0.0 allowas-in
  neighbor 10.0.0.4 remote-as 65102
  neighbor 10.0.0.4 timers 1 3
  neighbor 10.0.0.4 send-community
  neighbor 10.0.0.4 allowas-in
  maximum-paths 64
!
access-list all permit any

And also this one:


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
$ cat infrastructure/leaf11/scripts/startup.sh
[ -d /etc/sonic ] || mkdir -p /etc/sonic

SYSTEM_MAC_ADDRESS=00:dc:5e:01:01:01
ip link add eth0 addr $SYSTEM_MAC_ADDRESS type dummy

if [ -f /etc/sonic/config_db.json ]; then
    sonic-cfggen -j /etc/sonic/config_db.json -j /sonic/scripts/vlan_config.json --print-data > /tmp/config_db.json
    mv /tmp/config_db.json /etc/sonic/config_db.json
else
    sonic-cfggen -j /sonic/etc/config_db/vlan_config.json --print-data > /etc/sonic/config_db.json
fi

#chmod +x /usr/bin/config_bm.sh # TODO: remove this line
cp -f /sonic/etc/swss/config.d/00-copp.config.json /etc/swss/config.d/default_config.json
cp -rf /sonic/etc/quagga /etc/
ip netns exec sw_net ip link set dev sw_port0 addr $SYSTEM_MAC_ADDRESS
ip netns exec sw_net ip link set dev sw_port5 addr $SYSTEM_MAC_ADDRESS
ip netns exec sw_net ip link set dev sw_port6 addr $SYSTEM_MAC_ADDRESS
supervisord

As I’m explaining my own experience, I don’t pretend to be the hundred percent objective. First of all, there is little to zero documentation about the containerized Microsoft SONiC available. Therefore, I have to follow the “plug-and-pray” approach going almost blindly forward. 

From my experience, I didn’t manage to get purely routed physical interfaces (e.g. Ethernet0) working. Regardless what I tried to do; they were all the time in the down state. That’s why I stuck to the concept that each physical interface (e.g. Ethernet0) has a single untagged VLAN (e.g. VLAN130), whereas each VLAN has an SVI got an IP address (e.g. VLAN130 IP 192.168.1.1/24) assigned.

But that is only part of the tricks.

#2. Logic behind the port mapping in the Docker container with Microsoft SONiC P4 switch

The next trick, which cost me quite a lot of time is the way how the container with the SONiC connects to the outside world. But, step by step.

The first step challenge is that we are building the Docker container without the networking part, as we want to bypass the standard Docker bridge with NAT. Therefore, the container with the Microsoft Azure SONiC would be created as the following:


1
sudo docker run --net=none --privileged --entrypoint /bin/bash --name leaf11 -it -d -v $PWD/infrastructure/leaf11:/sonic docker-sonic-p4:latest

The whole Bash file to build the lab is provided later in this blogpost.

This command would launch the Docker container mapping the directory with the configuration files mentioned in the previous point. As you see, we launch it with the disabled networking –net=none. Therefore, we will need to the network interfaces later.

The Docker container is running as a process inside the Linux, and it has its associated process ID (PID). We need to get this PID in order to be able to connect our container somewhere:


1
LEAF11=$(sudo docker inspect --format '{{ .State.Pid }}' leaf11)

The Linux’s PID is stored in the variable having the hostname of the device. Now we are able to configure the networking between the P4 software switches running Microsoft Azure SONiC or Ubuntu-based Linux hosts. We connect them between each other using the built-in Linux network bridges, what is a way different to the original SONiC P4 repo:


1
2
3
4
5
6
7
8
9
10
sudo brctl addbr s11_l11
sudo ip link set s11_l11 up
sudo ip link add sw_port0 type veth
sudo ip link set veth0 up
sudo brctl addif s11_l11 veth0
sudo ip link set netns ${SPINE11} dev sw_port0
sudo ip link add sw_port5 type veth
sudo ip link set veth1 up
sudo brctl addif s11_l11 veth1
sudo ip link set netns ${LEAF11} dev sw_port5

Using the veth type of the interface, we can map the interfaces inside the container. The naming convention of the interfaces with the Docker container with SONiC, how we connect them to the host is sw_portX, where X is the sequence interface number. The mapping between these interfaces and the SONiC’s one are the following:

“physical” interrace“SONiC” interface
sw_port0Ethernet0

Within the Microsoft SONiC P4 switch we don’t interact directly with these “physical” interfaces. Therefore, we need to do some additional job inside the launched Docker container to make these interfaces working:


1
2
3
4
5
6
7
8
9
10
sudo docker exec -d leaf11 ip netns add sw_net
sudo docker exec -d leaf11 ip link set dev sw_port0 netns sw_net
sudo docker exec -d leaf11 ip netns exec sw_net sysctl net.ipv6.conf.sw_port0.disable_ipv6=1
sudo docker exec -d leaf11 ip netns exec sw_net ip link set sw_port0 up
sudo docker exec -d leaf11 ip link set dev sw_port5 netns sw_net
sudo docker exec -d leaf11 ip netns exec sw_net sysctl net.ipv6.conf.sw_port5.disable_ipv6=1
sudo docker exec -d leaf11 ip netns exec sw_net ip link set sw_port5 up
sudo docker exec -d leaf11 ip link set dev sw_port6 netns sw_net
sudo docker exec -d leaf11 ip netns exec sw_net sysctl net.ipv6.conf.sw_port6.disable_ipv6=1
sudo docker exec -d leaf11 ip netns exec sw_net ip link set sw_port6 up

The snippet above create the interfaces with the proper names inside the Docker container and associated them with the namespace sw_net, which exists inside that container. In a nutshell, this namespace is something the Microsoft SONiC P4 switch uses to communicate outside the world, whereas internally we deal with the standard namespace upon the configuration of the SONiC itself.

I might be wrong, but my understanding is that this is a way, how the P4 switch works in general (link to Mellanox P4). I haven’t managed to get to the bottom of this complexity, but I’ve managed to get that working.

The last modification of the container we need to do after it is launched, is to run the script, which we have mapped to it:


1
sudo docker exec -d leaf11 sh /sonic/scripts/startup.sh

This script, as shared in the previous point, makes sure that the SONiC configuration is applied and the MAC addresses are set properly.

The overall launch file contains all the necessary information such as Docker container details, the Linux bridges to interconnect the devices and configuration of the respective namespace network in the containers with Microsoft Azure SONiC. I took the original one from the Microsoft Azure SONiC P4 software switch GitHub repo and modified it as follows:


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
$ cat build.sh
#!/bin/bash

echo "Launching Docker: starting"
sudo systemctl start docker.service
echo "Launching Docker: done"

echo "Creating the containers: starting"
sudo docker run --net=none --privileged --entrypoint /bin/bash --name leaf11 -it -d -v $PWD/infrastructure/leaf11:/sonic docker-sonic-p4:latest
sudo docker run --net=none --privileged --entrypoint /bin/bash --name leaf12 -it -d -v $PWD/infrastructure/leaf12:/sonic docker-sonic-p4:latest
sudo docker run --net=none --privileged --entrypoint /bin/bash --name spine11 -it -d -v $PWD/infrastructure/spine11:/sonic docker-sonic-p4:latest
sudo docker run --net=none --privileged --entrypoint /bin/bash --name spine12 -it -d -v $PWD/infrastructure/spine12:/sonic docker-sonic-p4:latest
sudo docker run --net=none --privileged --entrypoint /bin/bash --name host11 -it -d ubuntu:14.04
sudo docker run --net=none --privileged --entrypoint /bin/bash --name host12 -it -d ubuntu:14.04

LEAF11=$(sudo docker inspect --format '{{ .State.Pid }}' leaf11)
LEAF12=$(sudo docker inspect --format '{{ .State.Pid }}' leaf12)
SPINE11=$(sudo docker inspect --format '{{ .State.Pid }}' spine11)
SPINE12=$(sudo docker inspect --format '{{ .State.Pid }}' spine12)
HOST11=$(sudo docker inspect --format '{{ .State.Pid }}' host11)
HOST12=$(sudo docker inspect --format '{{ .State.Pid }}' host12)
echo "Creating the containers: done"

echo "Creating the network connectivity: starting"
sudo brctl addbr s11_l11
sudo ip link set s11_l11 up
sudo ip link add sw_port0 type veth
sudo ip link set veth0 up
sudo brctl addif s11_l11 veth0
sudo ip link set netns ${SPINE11} dev sw_port0
sudo ip link add sw_port5 type veth
sudo ip link set veth1 up
sudo brctl addif s11_l11 veth1
sudo ip link set netns ${LEAF11} dev sw_port5

sudo brctl addbr s12_l11
sudo ip link set s12_l11 up
sudo ip link add sw_port0 type veth
sudo ip link set veth2 up
sudo brctl addif s12_l11 veth2
sudo ip link set netns ${SPINE12} dev sw_port0
sudo ip link add sw_port6 type veth
sudo ip link set veth3 up
sudo brctl addif s12_l11 veth3
sudo ip link set netns ${LEAF11} dev sw_port6

sudo brctl addbr s11_l12
sudo ip link set s11_l12 up
sudo ip link add sw_port1 type veth
sudo ip link set veth4 up
sudo brctl addif s11_l12 veth4
sudo ip link set netns ${SPINE11} dev sw_port1
sudo ip link add sw_port5 type veth
sudo ip link set veth5 up
sudo brctl addif s11_l12 veth5
sudo ip link set netns ${LEAF12} dev sw_port5

sudo brctl addbr s12_l12
sudo ip link set s12_l12 up
sudo ip link add sw_port1 type veth
sudo ip link set veth6 up
sudo brctl addif s12_l12 veth6
sudo ip link set netns ${SPINE12} dev sw_port1
sudo ip link add sw_port6 type veth
sudo ip link set veth7 up
sudo brctl addif s12_l12 veth7
sudo ip link set netns ${LEAF12} dev sw_port6

sudo brctl addbr host11_leaf11
sudo ip link set host11_leaf11 up
sudo ip link add sw_port0 type veth
sudo ip link set veth8 up
sudo brctl addif host11_leaf11 veth8
sudo ip link set netns ${LEAF11} dev sw_port0
sudo ip link add eth1 type veth
sudo ip link set veth9 up
sudo brctl addif host11_leaf11 veth9
sudo ip link set netns ${HOST11} dev eth1

sudo brctl addbr host12_leaf12
sudo ip link set host12_leaf12 up
sudo ip link add sw_port0 type veth
sudo ip link set veth10 up
sudo brctl addif host12_leaf12 veth10
sudo ip link set netns ${LEAF12} dev sw_port0
sudo ip link add eth1 type veth
sudo ip link set veth11 up
sudo brctl addif host12_leaf12 veth11
sudo ip link set netns ${HOST12} eth1
echo "Creating the network connectivity: done"

echo "Configuring hosts: starting"
sudo docker exec -d host11 sysctl net.ipv6.conf.eth0.disable_ipv6=1
sudo docker exec -d host11 sysctl net.ipv6.conf.eth1.disable_ipv6=1
sudo docker exec -d host12 sysctl net.ipv6.conf.eth0.disable_ipv6=1
sudo docker exec -d host12 sysctl net.ipv6.conf.eth1.disable_ipv6=1

sudo docker exec -d host11 ifconfig eth1 192.168.1.2/24 mtu 1400
sudo docker exec -d host11 ip route replace default via 192.168.1.1
sudo docker exec -d host12 ifconfig eth1 192.168.2.2/24 mtu 1400
sudo docker exec -d host12 ip route replace default via 192.168.2.1
echo "Configuring hosts: done"

echo "Configuring switches: starting"
sudo docker exec -d leaf11 ip netns add sw_net
sudo docker exec -d leaf11 ip link set dev sw_port0 netns sw_net
sudo docker exec -d leaf11 ip netns exec sw_net sysctl net.ipv6.conf.sw_port0.disable_ipv6=1
sudo docker exec -d leaf11 ip netns exec sw_net ip link set sw_port0 up
sudo docker exec -d leaf11 ip link set dev sw_port5 netns sw_net
sudo docker exec -d leaf11 ip netns exec sw_net sysctl net.ipv6.conf.sw_port5.disable_ipv6=1
sudo docker exec -d leaf11 ip netns exec sw_net ip link set sw_port5 up
sudo docker exec -d leaf11 ip link set dev sw_port6 netns sw_net
sudo docker exec -d leaf11 ip netns exec sw_net sysctl net.ipv6.conf.sw_port6.disable_ipv6=1
sudo docker exec -d leaf11 ip netns exec sw_net ip link set sw_port6 up

sudo docker exec -d leaf12 ip netns add sw_net
sudo docker exec -d leaf12 ip link set dev sw_port0 netns sw_net
sudo docker exec -d leaf12 ip netns exec sw_net sysctl net.ipv6.conf.sw_port0.disable_ipv6=1
sudo docker exec -d leaf12 ip netns exec sw_net ip link set sw_port0 up
sudo docker exec -d leaf12 ip link set dev sw_port5 netns sw_net
sudo docker exec -d leaf12 ip netns exec sw_net sysctl net.ipv6.conf.sw_port5.disable_ipv6=1
sudo docker exec -d leaf12 ip netns exec sw_net ip link set sw_port5 up
sudo docker exec -d leaf12 ip link set dev sw_port6 netns sw_net
sudo docker exec -d leaf12 ip netns exec sw_net sysctl net.ipv6.conf.sw_port6.disable_ipv6=1
sudo docker exec -d leaf12 ip netns exec sw_net ip link set sw_port6 up

sudo docker exec -d spine11 ip netns add sw_net
sudo docker exec -d spine11 ip link set dev sw_port0 netns sw_net
sudo docker exec -d spine11 ip netns exec sw_net sysctl net.ipv6.conf.sw_port0.disable_ipv6=1
sudo docker exec -d spine11 ip netns exec sw_net ip link set sw_port0 up
sudo docker exec -d spine11 ip link set dev sw_port1 netns sw_net
sudo docker exec -d spine11 ip netns exec sw_net sysctl net.ipv6.conf.sw_port1.disable_ipv6=1
sudo docker exec -d spine11 ip netns exec sw_net ip link set sw_port1 up

sudo docker exec -d spine12 ip netns add sw_net
sudo docker exec -d spine12 ip link set dev sw_port0 netns sw_net
sudo docker exec -d spine12 ip netns exec sw_net sysctl net.ipv6.conf.sw_port0.disable_ipv6=1
sudo docker exec -d spine12 ip netns exec sw_net ip link set sw_port0 up
sudo docker exec -d spine12 ip link set dev sw_port1 netns sw_net
sudo docker exec -d spine12 ip netns exec sw_net sysctl net.ipv6.conf.sw_port1.disable_ipv6=1
sudo docker exec -d spine12 ip netns exec sw_net ip link set sw_port1 up
echo "Configuring switches: done"

echo "Booting switches, please wait ~1 minute for switches to load: starting"
sudo docker exec -d leaf11 sh /sonic/scripts/startup.sh
sudo docker exec -d leaf12 sh /sonic/scripts/startup.sh
sudo docker exec -d spine11 sh /sonic/scripts/startup.sh
sudo docker exec -d spine12 sh /sonic/scripts/startup.sh
sleep 70
echo "Booting switches, please wait ~1 minute for switches to load: done"

echo "Fixing iptables firewall: starting"
sudo iptables -I FORWARD 1 -s 10.0.0.0/24 -d 10.0.0.0/24 -j ACCEPT
sudo iptables -I FORWARD 1 -s 192.168.0.0/16 -d 192.168.0.0/16 -j ACCEPT

Launching this Bash script, we can bring the topology up.

Join our network automation training (link), if you are interested in learning how to build the proper Bash or Python script to rock your network with automation.

#3. Launching the Docker containers with Microsoft SONiC P4 switches and Ubuntu hosts

However, before we bring the topology up, we need to download the SONiC image. Based on the original repo, you can do it like this:


1
2
3
$ cat load_image.sh
wget https://sonic-jenkins.westus2.cloudapp.azure.com/job/p4/job/buildimage-p4-all/543/artifact/target/docker-sonic-p4.gz
sudo docker load < docker-sonic-p4.gz

This link downloads the image from the corresponding webpage and adds it to your Docker images repository.

Therefore, step number is one is to make sure that the Docker service on your host is up and running:


1
2
3
4
5
6
7
8
9
10
11
12
13
$ sudo systemctl start docker.service


$ sudo systemctl status docker.service
● docker.service - Docker Application Container Engine
   Loaded: loaded (/usr/lib/systemd/system/docker.service; disabled; vendor preset: disabled)
   Active: active (running) since Sun 2020-03-15 19:49:07 GMT; 1min 20s ago
     Docs: https://docs.docker.com
 Main PID: 5607 (dockerd)
    Tasks: 11
   Memory: 146.8M
   CGroup: /system.slice/docker.service
           └─5607 /usr/bin/dockerd -H fd:// --containerd=/run/containerd/containerd.sock

The second step is to launch the script mentioned above:


1
2
3
4
5
6
7
8
9
10
11
12
13
$ ./load_image.sh
--2020-03-15 19:48:17--  https://sonic-jenkins.westus2.cloudapp.azure.com/job/p4/job/buildimage-p4-all/543/artifact/target/docker-sonic-p4.gz
Resolving sonic-jenkins.westus2.cloudapp.azure.com (sonic-jenkins.westus2.cloudapp.azure.com)... 52.250.106.22
Connecting to sonic-jenkins.westus2.cloudapp.azure.com (sonic-jenkins.westus2.cloudapp.azure.com)|52.250.106.22|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 164639463 (157M) [application/x-gzip]
Saving to: ‘docker-sonic-p4.gz’

100%[====================================================================================================================>] 164,639,463 1.05MB/s   in 2m 44s

2020-03-15 19:51:03 (978 KB/s) - ‘docker-sonic-p4.gz’ saved [164639463/164639463]

Loaded image: docker-sonic-p4:latest

The third step is to verify the image is added properly to the Docker images:


1
2
3
4
5
$ sudo docker image ls
REPOSITORY           TAG                 IMAGE ID            CREATED             SIZE
ubuntu               14.04               6e4f1fe62ff1        2 months ago        197MB
akarneliuk/dcf_ftp   latest              aeca611f7115        10 months ago       8MB
docker-sonic-p4      latest              a62359e719f0        2 years ago         445MB

Now we can bring our topology with Microsoft Azure SONiC up:


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
$ ./build.sh
Launching Docker: starting
Launching Docker: done
Creating the containers: starting
1cd17d8872a3d9127493a0af2d94a5472a4dbdfc5912967771322e05b306d529
229b87ba9d116c9ecef4a0190eeb14418d02ce6daa185b74fb3929fce6754a8b
37215b3837199b084f48500883a0dd16db9cbd6cd65a54402593cad3818fb60f
35bca165764f98747020eea1345c58e483dd896ddf7ff317e320c110f5734cec
dd851bd9ee34c8c7ea180edb078dad3961c7a602c1e5edb18e0e0df0a5206988
6b1ec0df840a6bfe841a22c8a3817897b14c1b4b5c1f50a8453790f1e129a1a7
Creating the containers: done
Creating the network connectivity: starting
Creating the network connectivity: done
Configuring hosts: starting
Configuring hosts: done
Configuring switches: starting
Configuring switches: done
Booting switches, please wait ~1 minute for switches to load: starting
Booting switches, please wait ~1 minute for switches to load: done
Fixing iptables firewall: starting
Fixing iptables firewall: done

Now we can get some brew and wait for some minutes to allow the containers to boot and the topology to converge.

Topology verification

The first step we need to check in our created hyper-scaler is whether our containers are booted properly:


1
2
3
4
5
6
7
8
9
$ sudo docker container ls
[sudo] password for aaa:
CONTAINER ID        IMAGE                    COMMAND             CREATED             STATUS              PORTS               NAMES
6b1ec0df840a        ubuntu:14.04             "/bin/bash"         7 minutes ago       Up 7 minutes                            host12
dd851bd9ee34        ubuntu:14.04             "/bin/bash"         7 minutes ago       Up 7 minutes                            host11
35bca165764f        docker-sonic-p4:latest   "/bin/bash"         7 minutes ago       Up 7 minutes                            spine12
37215b383719        docker-sonic-p4:latest   "/bin/bash"         7 minutes ago       Up 7 minutes                            spine11
229b87ba9d11        docker-sonic-p4:latest   "/bin/bash"         7 minutes ago       Up 7 minutes                            leaf12
1cd17d8872a3        docker-sonic-p4:latest   "/bin/bash"         7 minutes ago       Up 7 minutes                            leaf11

Once that is done, we can check the interfaces inside the container with Microsoft Azure SONiC P4 switch:


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
$ sudo docker container exec -it leaf11 ip link show
[sudo] password for aaa:
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: eth0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ether 00:dc:5e:01:01:01 brd ff:ff:ff:ff:ff:ff
3: host_port1@if4: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state LOWERLAYERDOWN mode DEFAULT group default qlen 1000
    link/ether 5a:da:c3:34:cf:d4 brd ff:ff:ff:ff:ff:ff link-netnsid 1
5: host_port2@if6: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state LOWERLAYERDOWN mode DEFAULT group default qlen 1000
    link/ether 66:16:8f:5f:b5:1b brd ff:ff:ff:ff:ff:ff link-netnsid 1ago       Up 7 minutes                            leaf11
!
! Some output is truncated for brevity
!
67: Ethernet0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9100 qdisc pfifo_fast master Bridge state UNKNOWN mode DEFAULT group default qlen 1000
    link/ether b6:91:5b:15:4d:20 brd ff:ff:ff:ff:ff:ff
68: Ethernet1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ether 8a:66:6c:84:12:e4 brd ff:ff:ff:ff:ff:ff
69: Ethernet2: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ether ae:d0:7d:40:58:a4 brd ff:ff:ff:ff:ff:ff
70: Ethernet3: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ether 96:53:90:81:55:e6 brd ff:ff:ff:ff:ff:ff
71: Ethernet4: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ether d2:1c:49:7b:28:c9 brd ff:ff:ff:ff:ff:ff
72: Ethernet5: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9100 qdisc pfifo_fast master Bridge state UNKNOWN mode DEFAULT group default qlen 1000
    link/ether 3a:c9:4e:77:ba:81 brd ff:ff:ff:ff:ff:ff
73: Ethernet6: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9100 qdisc pfifo_fast master Bridge state UNKNOWN mode DEFAULT group default qlen 1000
    link/ether 5a:ba:ac:07:9a:4b brd ff:ff:ff:ff:ff:ff
!
! Some output is truncated for brevity
!
99: Bridge: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9100 qdisc noqueue state UP mode DEFAULT group default qlen 1000
    link/ether 3a:c9:4e:77:ba:81 brd ff:ff:ff:ff:ff:ff
100: Vlan130@Bridge: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000
    link/ether 00:dc:5e:01:01:01 brd ff:ff:ff:ff:ff:ff
101: Vlan131@Bridge: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000
    link/ether 00:dc:5e:01:01:01 brd ff:ff:ff:ff:ff:ff
102: Vlan132@Bridge: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000
    link/ether 00:dc:5e:01:01:01 brd ff:ff:ff:ff:ff:ff

There are multiple interfaces down. However, the ones, which we use, Ethernet0, Ethernet5 and Ethernet6 are up. So that the VLAN interfaces, which we have created. What was the point of mentioning the namespace sw_net earlier? Let’s take a look here:


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
$ sudo docker container exec -it spine11 ip netns exec sw_net ip link show
1: lo: <LOOPBACK> mtu 65536 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: sw_port5@if9: <BROADCAST,MULTICAST> mtu 1500 qdisc noqueue state DOWN mode DEFAULT group default qlen 1000
    link/ether 22:b9:ea:49:ff:6b brd ff:ff:ff:ff:ff:ff link-netnsid 1
3: sw_port10@if19: <BROADCAST,MULTICAST> mtu 1500 qdisc noqueue state DOWN mode DEFAULT group default qlen 1000
    link/ether ea:4a:d9:88:2b:9d brd ff:ff:ff:ff:ff:ff link-netnsid 1
4: sw_port2@if3: <BROADCAST,MULTICAST> mtu 1500 qdisc noqueue state DOWN mode DEFAULT group default qlen 1000
    link/ether 56:6f:02:9e:2d:2b brd ff:ff:ff:ff:ff:ff link-netnsid 1
6: sw_port3@if5: <BROADCAST,MULTICAST> mtu 1500 qdisc noqueue state DOWN mode DEFAULT group default qlen 1000
    link/ether f2:fa:22:b5:c2:35 brd ff:ff:ff:ff:ff:ff link-netnsid 1
8: sw_port4@if7: <BROADCAST,MULTICAST> mtu 1500 qdisc noqueue state DOWN mode DEFAULT group default qlen 1000
    link/ether c6:6f:2e:78:4c:4c brd ff:ff:ff:ff:ff:ff link-netnsid 1
10: sw_port0@if9: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000
    link/ether 00:dc:5e:01:00:01 brd ff:ff:ff:ff:ff:ff link-netnsid 0
12: sw_port6@if11: <BROADCAST,MULTICAST> mtu 1500 qdisc noqueue state DOWN mode DEFAULT group default qlen 1000
!
! Some output is truncated for brevity
!
62: sw_port31@if61: <BROADCAST,MULTICAST> mtu 1500 qdisc noqueue state DOWN mode DEFAULT group default qlen 1000
    link/ether d6:45:43:47:b3:5d brd ff:ff:ff:ff:ff:ff link-netnsid 1
64: cpu_port@if63: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000
    link/ether c2:5d:81:d8:8b:11 brd ff:ff:ff:ff:ff:ff link-netnsid 1
65: router_port1@router_port0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000
    link/ether 76:32:87:29:78:cf brd ff:ff:ff:ff:ff:ff
66: router_port0@router_port1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000
    link/ether f2:a1:66:5d:53:99 brd ff:ff:ff:ff:ff:ff
68: router_cpu_port@if67: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000
    link/ether de:60:a4:19:78:c3 brd ff:ff:ff:ff:ff:ff link-netnsid 1

In this namespace you can see the ports sw_port0, sw_port5 and sw_port6, which we have created in the original Bash script. The mapping of the interfaces are explained above, so we can don’t focus on that now.

Putting aside the complexity with all the multiple namespaces caused by the Software Switch P4, we can focus on the SONiC itself.

Let’s check the Microsoft Azure SONiC version, like we did earlier on the Mellanox SN2010 switches:


1
2
3
4
5
6
7
8
9
10
$ sudo docker container exec -it leaf11 show version
Traceback (most recent call last):
  File "/usr/local/bin/sonic-cfggen", line 220, in <module>
    main()
  File "/usr/local/bin/sonic-cfggen", line 176, in main
    with open(yaml_file, 'r') as stream:
IOError: [Errno 2] No such file or directory: '/etc/sonic/sonic_version.yml'

Docker images:
/bin/sh: 1: sudo: not found

It looks like the implementation is not complete (or not fully complete). However, we have seen the VLAN interfaces, what means that the configuration was applied. Let’s try to reach the spine switches and the host:


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
$ sudo docker container exec -it leaf11 ping 10.0.0.0 -c 1
PING 10.0.0.0 (10.0.0.0): 56 data bytes
64 bytes from 10.0.0.0: icmp_seq=0 ttl=64 time=6.771 ms
--- 10.0.0.0 ping statistics ---
1 packets transmitted, 1 packets received, 0% packet loss
round-trip min/avg/max/stddev = 6.771/6.771/6.771/0.000 ms


$ sudo docker container exec -it leaf11 ping 10.0.0.4 -c 1
PING 10.0.0.4 (10.0.0.4): 56 data bytes
64 bytes from 10.0.0.4: icmp_seq=0 ttl=64 time=8.064 ms
--- 10.0.0.4 ping statistics ---
1 packets transmitted, 1 packets received, 0% packet loss
round-trip min/avg/max/stddev = 8.064/8.064/8.064/0.000 ms


$ sudo docker container exec -it leaf11 ping 192.168.1.2 -c 1
PING 192.168.1.2 (192.168.1.2): 56 data bytes
64 bytes from 192.168.1.2: icmp_seq=0 ttl=64 time=9.607 ms
--- 192.168.1.2 ping statistics ---
1 packets transmitted, 1 packets received, 0% packet loss
round-trip min/avg/max/stddev = 9.607/9.607/9.607/0.000 ms

The reachability check is successful, which means that our networking part works correct.

The next point is to check the BGP routing:


1
2
3
4
5
6
7
8
9
10
11
12
13
$ sudo docker container exec -it leaf11 vtysh -c "show bgp ipv4 unicast"
[sudo] password for aaa:
BGP table version is 0, local router ID is 10.1.1.1
Status codes: s suppressed, d damped, h history, * valid, > best, = multipath,
              i internal, r RIB-failure, S Stale, R Removed
Origin codes: i - IGP, e - EGP, ? - incomplete

   Network          Next Hop            Metric LocPrf Weight Path
*> 192.168.1.0      0.0.0.0                  0         32768 i
*= 192.168.2.0      10.0.0.0                               0 65101 65112 i
*>                  10.0.0.4                               0 65102 65112 i

Total number of prefixes 2

Per our idea, we don’t need to advertise the transit links, therefore we have only customer subnets. On the other hand, as you know, the BGP RIB is a BGP control plane, what might deviate from the routing plane. In Linux you can check the routing table as follows:


1
2
3
4
5
6
7
$ sudo docker container exec -it leaf11 ip route show
10.0.0.0/31 dev Vlan131 proto kernel scope link src 10.0.0.1
10.0.0.4/31 dev Vlan132 proto kernel scope link src 10.0.0.5
192.168.1.0/24 dev Vlan130 proto kernel scope link src 192.168.1.1
192.168.2.0/24 proto zebra
    nexthop via 10.0.0.4  dev Vlan132 weight 1
    nexthop via 10.0.0.0  dev Vlan131 weight 1

So, it looks like everything is ready to start sending the traffic between the Ubuntu hosts. 

Let’s take a look on the routing table at the host11:


1
2
3
$ sudo docker container exec -it host11 ip route show
default via 192.168.1.1 dev eth1
192.168.1.0/24 dev eth1  proto kernel  scope link  src 192.168.1.2

It looks accurate and we can try to ping host12:


1
2
3
4
5
$ sudo docker container exec -it host11 ping 192.168.2.2 -c 1
PING 192.168.2.2 (192.168.2.2) 56(84) bytes of data.
^C
--- 192.168.2.2 ping statistics ---
1 packets transmitted, 0 received, 100% packet loss, time 0ms

I don’t want create too much mystery, there is a bug explained in the original test file:


1
2
3
4
5
6
7
8
$ cat test.sh
#!/bin/bash
# First ping from host 2 to switch 2 - this is a patch:
# currently there is a bug with miss on neighbor table (does not trap by default as should)
# When fixed, we can remove it
sudo docker exec -it host12 ping 192.168.2.1 -c1
sleep 2
sudo docker exec -it host11 ping 192.168.2.2

Therefore, we replicate this solution:


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
$ sudo docker container exec -it host12 ping 192.168.2.1 -c 1
PING 192.168.2.1 (192.168.2.1) 56(84) bytes of data.
64 bytes from 192.168.2.1: icmp_seq=1 ttl=64 time=3.76 ms

--- 192.168.2.1 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss,


$ sudo docker container exec -it host11 ping 192.168.2.2 -c 1
PING 192.168.2.2 (192.168.2.2) 56(84) bytes of data.
64 bytes from 192.168.2.2: icmp_seq=1 ttl=61 time=16.6 ms

--- 192.168.2.2 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 16.605/16.605/16.605/0.000 ms

The end to end connectivity between the Ubuntu hosts is established and the traffic is passing our leaf/spine micro hyper-scale fabric running Microsoft Azure SONiC. Feeling like we’ve just built our own Azure Cloud 😉

You can find the topology and the relevant files on our GitHub page.

Lessons learned

It took me quite a while to dig into the Linux networking on a level deeper than Linux bridges and ip link/addr/route. There were a lot of things, which were not working from the beginning and required fixing. One of the biggest issues is that there is almost no documentation from the usage standpoint (only for the HW/SW developers, what is not an easy read for the network engineers). Therefore, the biggest lessons learned is that it is required to have cross functional skills (network, programming, Linux) to succeed in the world of cloud builders.

Conclusion

We have made a significant step into an understanding of the operations of the Microsoft Azure SONiC and built a local environment with a leaf/spine topology. We know that SONiC doesn’t support EVPN/VXLAN, which is necessary for an enterprise data centres, therefore we can’t test those technologies. However, we will try to build a mixed environment with multiple vendors, as SONiC could perfectly be a spines, aggregation spines and A-Z switches. More to come on Microsoft Azure SONiC automation as well, hopefully soon. Take care and goodbye! 

Support us






P.S.

If you have further questions or you need help with your networks, I’m happy to assist you, just send me message (https://karneliuk.com/contact/). Also don’t forget to share the article on your social media, if you like it.

BR,

Anton Karneliuk

Exit mobile version