Hello my friend,
We have discussed a lot how to make interoperable data center with Nokia (Alcatel-Lucent), Cumulus Networks and Arista. For routing between subnets (IRB) we have always used asymmetrical model. There is another one, which is called symmetrical IRB. Let’s review what is the difference and how to deploy it.
1 2 3 4 5 | No part of this blogpost could be reproduced, stored in a<br> retrieval system, or transmitted in any form or by any<br> means, electronic, mechanical or photocopying, recording,<br> or otherwise, for commercial purposes without the<br> prior permission of the author.<br> |
Brief description
Normally I don’t describe theory, as I assume you know everything and just keen to know how to connect different vendors. But today I need to explain the theory a bit, so that you understand the area of applicability for these technologies.
Asymmetrical IRB
We start with that we have already done for Nokia (Alcatel-Lucent) SR OS, Cumulus Linux and Arista EOS, meaning we start with the asymmetrical IRB.
Prerequisite: all VNIs are configured on all VTEPs for the tenant
Mode of operation: ingress VTEP performs routing and switching, egress VTEP performs only switching
Traffic flow: we take as an example the topology configured in the previous articles. Both VNI 123 and 456 are configured on both VTEPs (SR1 and SR2/VX2/vEOS2).
Assuming that our network is converged, let’s see what happens upon communication between VM1 (connected to VNI 123 on SR1) and VM4 (connected to VNI 456 at SR2/VX2/vEOS2) on the forward path:
- VM1 is going to send packet to VM4 with source IP 192.168.0.1 and destination IP 192.168.1.2
- As packet is destined to the host outside the original subnet, the packet is sent towards default gateway with IP 192.168.0.250, which is configured as distributed gateway across all date center leafs within this VNI 123. Source MAC is MAC of VM1 and destination MAC is virtual MAC of distributed gateway
- Leaf SR1 receives the packet and analyses where it should be sent
- SR1 checks that destination IP address is related to subnet on VNI 456 so that packet from VM1 to VM4 is to be sent over VNI 456
- SR1 checks the ARP for VM4 IP address to construct payload.
- SR1 sends packet to SR2:
- Outer IP: source 10.0.0.11, destination 10.0.0.22
- UDP: source 45687 (random), destination 4789
- VXLAN: VNI 456
- Inner MAC: source MAC of SR1 inside VNI 456, destination MAC VM4
- Inner IP: source 192.168.0.1, destination 192.168.1.2
- SR2 receives the packet from SR1, removes VXLAN header, checks the destination MAC and sends accordingly through the interface, where VM4 is attached
- VM4 receives the packet
On backward path we have the same sequence, with ingress leaf (SR2/VX2/vEOS2) performing routing:
- VM4 is going to send packet to VM1 with source IP 192.168.1.2 and destination IP 192.168.0.1
- As packet is destined to the host outside the original subnet, the packet is sent towards default gateway with IP 192.168.1.250, which is configured as distributed gateway across all date center leafs within this VNI 456. Source MAC is MAC of VM4 and destination MAC is virtual MAC of distributed gateway
- Leaf SR2 receives the packet and analyses where it should be sent
- SR2 checks that destination IP address is related to subnet on VNI 123 so that packet from VM4 to VM1 is to be sent over VNI 123
- SR2 checks the ARP for VM1 IP address to construct payload.
- SR2 sends packet to SR1:
- Outer IP: source 10.0.0.22, destination 10.0.0.11
- UDP: source 47891 (random), destination 4789
- VXLAN: VNI 123
- Inner MAC: source MAC of SR2 inside VNI 123, destination MAC VM1
- Inner IP: source 192.168.1.2, destination 192.168.0.1
- SR1 receives the packet from SR2, removes VXLAN header, checks the destination MAC and sends accordingly through the interface, where VM1 is attached
- VM1 receives the packet
Based on the described traffic flow you can easily spot, why this IRB mode is called asymmetrical.
Symmetrical IRB
This mode is the core topic for the current article. It differs from the previous one in several aspects. One of the key differentiator is that symmetrical IRB heavily relies on EVPN type-5 routes (IP prefix), that’s why proper redistribution of IP subnets between different address families in BGP should be done.
Prerequisite: Only one VNI (called L3 VNI) should be configured at all VTEPs.
Mode of operation: Both ingress and egress VTEPs perform switching and routing functions
Traffic flow: We take the same topology and the same VMs as in the previous example, but we modify the structure of VNIs in the following way:
- Leaf 1 has VNI 123 configured with subnet 192.168.0.0/24 and VNI 4000 as L3 VNI
- Leaf 2 has VNI 456 configured with subnet 192.168.1.0/24 and VNI 4000 as L3 VNI
In general, we have the following topology for our tenant:
On the forward path we have the following sequence of actions:
- VM1 sends packet to VM4 (IP addresses 192.168.0.1 -> 192.168.1.2)
- As IP address of VM4 lays out of the source subnet, the packets are destined to the distributed anycast gateway (leaf 1 is closest gateway), hence the destination MAC is vMAC for GW in VNI 123, whereas source MAC is MAC of VM1
- Leaf 1 receives the packet from VM1 and evaluates the destination IP address, which is learned as EVPN type-5 route and installed in routing table pointing to VNI 4000 and mac of Leaf 2 as a next-hop
- Leaf 1 sends the packet to Leaf 2 with the following structure:
- Outer IP: source 10.0.0.11, destination 10.0.0.22
- UDP: source 45687 (random), destination 4789
- VXLAN: VNI 4000
- Inner MAC: source MAC of Leaf1 inside VNI 4000, destination MAC of Leaf2 inside VNI 4000
- Inner IP: source 192.168.0.1, destination 192.168.1.2
- Leaf2 receives the packet form Leaf1, removes VXLAN encapsulation and inner MAC header.
- Leaf2 performs routing lookup and finds that VM4 is connected subnet in VLAN 444.
- Leaf2 performs ARP lookup to find corresponding MAC address of VM4 and sends packet to VM2 with source MAC of Leaf2 in VNI 456 and destination MAC of VM4
- VM4 receives the packet
I won’t specify the backwards path as it is completely the same: both Leaf1 and Leaf2 performs routing and switching lookup.
What we are going to test?
For the symmetrical IRB I haven’t mentioned the names of the vendors. Arista and Cumulus supports it, so I will show example of symmetrical IRB based on them. Cisco Nexus supports it as well, but I don’t have VM with Cisco Nexus.
[UPD] Nokia didn’t support L3 VNI (interface-less mode) up to latest SR OS 16.0.R1. BUT now Nokia supports L3 VNI, hurray! Thanks a lot to Nokia PLM Jorge Rabadan, who has updated me with this info. As I got this info after the whole article is ready, I will add Nokia configuration as a last chapter after verification on Arista/Cumulus.
Software version
The following infrastructure is used in my lab:
- CentOS 7 with python 2.7.
- Ansible 2.5.2
- Arista EOS 4.20.5F
- [NEW] Cumulus Linux VX 3.6.1
- [NEW] Nokia (Alcatel-Lucent) SR OS 16.0.R1
See the previous article to get details how to build the lab
Topology
The physical topology is more or less stable over the last tests, so there is no surprise for you:
The logical topology is also familiar to you, as we have used it in all the articles about data center technologies:
The major difference comparing to previous articles is that spine switches, which were previously Cisco IOS XR based, I replaced to Cumulus Linux VX and Arista vEOS. The reason for that is very simple: Cumulus Linux VX (or Arista vEOS) uses less resources than Cisco IOS XRv, whereas the simple functionality of spine switches (IP routing. BGP peering, client emulation (VRF)) is available in all vendors.
Initial configuration files for this lab: 126_config_initial_vEOS1 126_config_initial_vEOS3 126_config_initial_VX2 126_config_initial_VX4
There is no Linux file, as I’m using only VMWare based VMs in this case. For Nokia (Alcatel-Lucent) VSR there will be some additions the topology.
IP Fabric. BGP for underlay and overlay
Here we have some changes comparing to the previous topology. The difference is related to Cisco IOS XRv limitations: it doesn’t understand VXLAN in BGP updates, that’s why it can’t process routes coming from leafs. That’s why previously we have BGP peering for EVPN between Leafs as eBGP-multihop session. Now both Arista vEOS and Cumulus Linux understand such routes, so we can build reference architecture with single hop eBGP session on interface level both for IPv4 unicast and L2VPN EVPN address families:
Here is the configuration necessary for that
VX2 – Cumulus Linux | vEOS1 – Arista EOS |
cumulus@VX2:mgmt-vrf:~$ net pending net add bgp autonomous-system 65012 net add bgp router-id 10.0.0.22 net add bgp neighbor SPINE peer-group net add bgp neighbor SPINE remote-as 65001 net add bgp neighbor SPINE bfd net add bgp neighbor SPINE password FABRIC net add bgp neighbor 10.22.33.33 peer-group SPINE net add bgp neighbor 10.22.44.44 peer-group SPINE net add bgp ipv4 unicast network 10.0.0.22/32 net add bgp l2vpn evpn neighbor SPINE activate net add bgp l2vpn evpn advertise-all-vni net add bgp l2vpn evpn advertise-default-gw | vEOS1#show run ! Command: show running-config ! device: vEOS1 (vEOS, EOS-4.20.1F) ! service routing protocols model multi-agent ! ip routing ! ipv6 unicast-routing ! router bgp 65011 router-id 10.0.0.11 maximum-paths 2 ecmp 2 neighbor SPINE peer-group neighbor SPINE remote-as 65001 neighbor SPINE fall-over bfd neighbor SPINE password 7 GjMNH/tbSiIZR6ITsaFeZQ== neighbor SPINE send-community extended neighbor SPINE maximum-routes 12000 neighbor 10.11.33.33 peer-group SPINE neighbor 10.11.44.44 peer-group SPINE ! address-family evpn neighbor SPINE activate ! address-family ipv4 neighbor SPINE activate network 10.0.0.11/32 ! end |
VX4 – Cumulus Linux | vEOS3 – Arista EOS |
cumulus@VX4:mgmt-vrf:~$ net add bgp autonomous-system 65001 net add bgp router-id 10.0.0.44 net add bgp neighbor POD1 peer-group net add bgp neighbor POD1 remote-as external net add bgp neighbor POD1 bfd net add bgp neighbor POD1 password FABRIC net add bgp neighbor POD2 peer-group net add bgp neighbor POD2 remote-as external net add bgp neighbor POD2 bfd net add bgp neighbor POD2 password FABRIC net add bgp neighbor 10.11.44.11 peer-group POD1 net add bgp neighbor 10.22.44.22 peer-group POD2 net add bgp ipv4 unicast network 10.0.0.44/32 net add bgp l2vpn evpn neighbor POD1 activate net add bgp l2vpn evpn neighbor POD2 activate net add bgp l2vpn evpn advertise-all-vni net add bgp l2vpn evpn advertise-default-gw | vEOS3#show run ! Command: show running-config ! device: vEOS3 (vEOS, EOS-4.20.1F) ! service routing protocols model multi-agent ! ip routing ! ipv6 unicast-routing ! router bgp 65001 router-id 10.0.0.33 maximum-paths 2 ecmp 2 neighbor POD1 peer-group neighbor POD1 remote-as 65011 neighbor POD1 fall-over bfd neighbor POD1 password 7 1RtJFYB9gqIVXjAG8UiJWg== neighbor POD1 send-community extended neighbor POD1 maximum-routes 12000 neighbor POD2 peer-group neighbor POD2 remote-as 65012 neighbor POD2 fall-over bfd neighbor POD2 password 7 aPYDv9nQvwYDurTezBF0ag== neighbor POD2 send-community extended neighbor POD2 maximum-routes 12000 neighbor 10.11.33.11 peer-group POD1 neighbor 10.22.33.22 peer-group POD2 ! address-family evpn neighbor POD1 activate neighbor POD2 activate ! address-family ipv4 neighbor POD1 activate neighbor POD2 activate network 10.0.0.0/24 ! end |
As you see configuration is relatively simple: we just have 2 BGP single-hop eBGP peering from each node: each Leaf is connected to two Spines, and each Spine is connected to two (all leafs). BFD is used to speedup convergence, what is particularly useful in virtual environment. Point out that this time both Spines are in the same AS, so they won’t see each other IP addresses, what is fully OK for such deployment.
There is one point that might be needed for configuration, which I haven’t configured. Normally next-hop for routes are changed by each eBGP peer in order to maintain reachability between areas. On the other hand we need to preserve originally next-hop, which is typically loopback at VTEP for EVPN routes. In some configuration guides it’s recommended to configure “next-hop-unchanged” in EVPN address-family at Spines. My observation in this lab shows that it isn’t necessary and next-hop is preserved automatically and no additional configuration is necessary. Probably, it depends on OS version, but keep in mind that it might be necessary.
The basic test is to check the reachability between loopbacks at vEOS1 and VX2, which will be later on VTEP IP addresses:
1 2 3 4 5 6 7 8 9 10 11 12 | <br> .vEOS1#ping 10.0.0.22 so 10.0.0.11<br> .PING 10.0.0.22 (10.0.0.22) from 10.0.0.11 : 72(100) bytes of data.<br> .80 bytes from 10.0.0.22: icmp_seq=1 ttl=63 time=23.1 ms<br> .80 bytes from 10.0.0.22: icmp_seq=2 ttl=63 time=45.3 ms<br> .80 bytes from 10.0.0.22: icmp_seq=3 ttl=63 time=34.9 ms<br> .80 bytes from 10.0.0.22: icmp_seq=4 ttl=63 time=28.3 ms<br> .80 bytes from 10.0.0.22: icmp_seq=5 ttl=63 time=22.6 ms<br> .<br> .--- 10.0.0.22 ping statistics ---<br> .5 packets transmitted, 5 received, 0% packet loss, time 109ms<br> .rtt min/avg/max/mdev = 22.627/30.872/45.317/8.476 ms, pipe 2, ipg/ewma 27.257/26.621 ms<br> |
The data centre fabric is ready and we can start with building our service on top.
Symmetric IRB configuration / Arista and Cumulus
Earlier we have shown the configuration of L2 VNIs both for Cumulus Linux and Arista EOS, that’s here we will focus on the L3 VNI solely. The following service topology we are going to implement:
And to make this service working, we need to configure both Leafs as shown below (no changes at Spines are necessary):
VX2 – Cumulus Linux | vEOS1 – Arista EOS |
cumulus@VX2:mgmt-vrf:~$ net pending net add vrf CUST2 vni 4000 net add vxlan vxlan4000 vxlan id 4000 net add vxlan vxlan4000 vxlan local-tunnelip 10.0.0.22 net add vxlan vxlan4000 bridge access 4000 net add vxlan vxlan4000 bridge learning off net add vlan 4000 hwaddress 44:39:39:FF:40:00 net add vlan 4000 vrf CUST2 net add vlan 888 ip address 192.168.8.254/24 net add vlan 888 ip address-virtual 00:00:50:00:00:01 192.168.8.250/24 net add vlan 888 vrf CUST2 net add interface swp3 bridge vids 888 net add bgp vrf CUST2 autonomous-system 65012 net add bgp vrf CUST2 ipv4 unicast redistribute connected net add bgp vrf CUST2 l2vpn evpn advertise ipv4 unicast net add bgp vrf CUST2 l2vpn evpn rd 10.0.0.22:70 net add bgp vrf CUST2 l2vpn evpn route-target both 65000:70 | vEOS1#show run ! Command: show running-config ! device: vEOS1 (vEOS, EOS-4.20.1F) ! vrf definition CUST2 ! vlan 777 ! interface Ethernet3 switchport trunk allowed vlan 777 switchport mode trunk ! interface Vlan777 vrf forwarding CUST2 ip address 192.168.7.253/24 ip virtual-router address 192.168.7.250 ! interface Vxlan1 vxlan source-interface Loopback0 vxlan udp-port 4789 vxlan vrf CUST2 vni 4000 ! ip virtual-router mac-address 00:00:50:00:00:01 ! ip routing vrf CUST2 ! router bgp 65011 vrf CUST2 rd 10.0.0.11:70 route-target import 10.0.0.11:70 route-target export 10.0.0.11:70 redistribute connected ! end |
Configuration is quite similar across both vendors with some additions at Cumulus Linux.
In Arista EOS we create VRF, VLAN in database and corresponding VLAN interface with IP and vIP to connect VM7. Then we create VXLAN interface, where we define VTEP IP and L3 VNI ID. L3 VNI is defined as “vxlan vrf X vni Y”, what is different to L2 VNI “vxlan vlan X vni Y”, what we have configured previously for asymmetrical IRB. Under BGP context we create VRF context, where we define RD/RT and instruct it to redistribute connected IP prefixes.
For Cumulus Linux we do the same activities, plus we create additional VLAN 4000, which is purely internal to VX2 and is used to terminate L3 VNI with certain MAC in corresponding VRF. In Cumulus Linux we also explicitly configure EVPN address family also within VRF and instruct it to advertise IPv4 prefixes, whereas under IPv4 unicast address family within VRF we redistribute connected routes.
To be honest, it took me some time to collect and test this configuration, as there for Cumulus Linux it was realised recently and for Arista there are a lot of configuration unavailable without corresponding account (coupled with contract). I put some links in the end of this post, where you can learn more about L3 VNI configuration.
Control plane verification / Arista and Cumulus
Before we start testing the data plane, let’s check the control plane, how does different tables look like. First of all let’s verify that we have L3 VNIs. We start with Arista EOS:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | <br> .vEOS1#show interfaces vxlan 1<br> .Vxlan1 is up, line protocol is up (connected)<br> . Hardware is Vxlan<br> . Source interface is Loopback0 and is active with 10.0.0.11<br> . Replication/Flood Mode is headend with Flood List Source: EVPN<br> . Remote MAC learning via EVPN<br> . Static VLAN to VNI mapping is<br> . Dynamic VLAN to VNI mapping for 'evpn' is<br> . [1011, 4000]<br> . Note: All Dynamic VLANs used by VCS are internal VLANs.<br> . Use 'show vxlan vni' for details.<br> . Static VRF to VNI mapping is<br> . [CUST2, 4000]<br> . Headend replication flood vtep list is:<br> |
The L3 VNI is shown under “VRF to VNI mapping”. For Cumulus Linux we have different command:
1 2 3 4 5 6 7 8 9 | <br> .cumulus@VX2:mgmt-vrf:~$ net show bgp l2vpn evpn vni<br> .Advertise Gateway Macip: Enabled<br> .Advertise All VNI flag: Enabled<br> .Number of L2 VNIs: 0<br> .Number of L3 VNIs: 1<br> .Flags: * - Kernel<br> . VNI Type RD Import RT Export RT Tenant VRF<br> .* 4000 L3 10.0.0.22:70 10.0.0.11:70 10.0.0.11:70 CUST2<br> |
The next very useful table is BGP EVPN RIB. Here is the filtered output for Cumulus Linux based leaf switch VX2 (we filter only related EVPN type-5 routes):
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 | <br> .cumulus@VX2:mgmt-vrf:~$ net show bgp l2vpn evpn route type prefix<br> .BGP table version is 9, local router ID is 10.0.0.22<br> .Status codes: s suppressed, d damped, h history, * valid, > best, i - internal<br> .Origin codes: i - IGP, e - EGP, ? - incomplete<br> .EVPN type-2 prefix: [2]:[ESI]:[EthTag]:[MAClen]:[MAC]:[IPlen]:[IP]<br> .EVPN type-3 prefix: [3]:[EthTag]:[IPlen]:[OrigIP]<br> .EVPN type-5 prefix: [5]:[ESI]:[EthTag]:[IPlen]:[IP]<br> .<br> . Network Next Hop Metric LocPrf Weight Path<br> .Route Distinguisher: 10.0.0.11:70<br> .* [5]:[0]:[0]:[24]:[192.168.7.0]<br> . 10.0.0.11 0 65001 65011 i<br> .*> [5]:[0]:[0]:[24]:[192.168.7.0]<br> . 10.0.0.11 0 65001 65011 i<br> .Route Distinguisher: 10.0.0.22:70<br> .*> [5]:[0]:[0]:[24]:[192.168.8.0]<br> . 10.0.0.22 0 32768 ?<br> .<br> .Displayed 2 prefixes (3 paths) (of requested type)<br> |
On the Arista vEOS we can use either the same filtering or we can show the routes related to this VNI :
In Cumulus Linux filtering routes against VNI works for L2 VNI and not for L3 VNI, at least in VX 3.6.1.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | <br> .vEOS1#show bgp evpn vni 4000<br> .BGP routing table information for VRF default<br> .Router identifier 10.0.0.11, local AS number 65011<br> .Route status codes: s - suppressed, * - valid, > - active, # - not installed, E - ECMP head, e - ECMP<br> . S - Stale, c - Contributing to ECMP, b - backup<br> . % - Pending BGP convergence<br> .Origin codes: i - IGP, e - EGP, ? - incomplete<br> .AS Path Attributes: Or-ID - Originator ID, C-LST - Cluster List, LL Nexthop - Link Local Nexthop<br> .<br> . Network Next Hop Metric LocPref Weight Path<br> . * > RD: 10.0.0.11:70 ip-prefix 192.168.7.0/24<br> . - - - 0 i<br> . * >Ec RD: 10.0.0.22:70 ip-prefix 192.168.8.0/24<br> . 10.0.0.22 - 100 0 65001 65012 ?<br> . * ec RD: 10.0.0.22:70 ip-prefix 192.168.8.0/24<br> . 10.0.0.22 - 100 0 65001 65012<br> |
Take a look at next hops: we haven’t specified them anywhere, but they are set correctly. I assume they are automatically copied from “VTEP source IP”. But you remember, that in the very beginning I mentioned something about MAC addresses within L3 VNI. They are transferred as extended community attribures, let’s take a look (the output will be the same on both Arista EOS and Cumulus Linux):
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 | <br> .vEOS1#show bgp evpn vni 4000 detail<br> .BGP routing table information for VRF default<br> .Router identifier 10.0.0.11, local AS number 65011<br> .BGP routing table entry for ip-prefix 192.168.7.0/24, Route Distinguisher: 10.0.0.11:70<br> . Paths: 1 available<br> . Local<br> . - from - (0.0.0.0)<br> . Origin IGP, metric -, localpref -, weight 0, valid, local, best<br> . Extended Community: Route-Target-IP:10.0.0.11:70 TunnelEncap:tunnelTypeVxlan EvpnRouterMac:00:0c:29:e4:19:f2<br> . VNI: 4000<br> .BGP routing table entry for ip-prefix 192.168.8.0/24, Route Distinguisher: 10.0.0.22:70<br> . Paths: 2 available<br> . 65001 65012<br> . 10.0.0.22 from 10.11.44.44 (10.0.0.44)<br> . Origin INCOMPLETE, metric -, localpref 100, weight 0, valid, external, ECMP head, best, ECMP contributor<br> . Extended Community: Route-Target-IP:10.0.0.11:70 TunnelEncap:tunnelTypeVxlan EvpnRouterMac:44:39:39:ff:40:00<br> . VNI: 4000<br> . 65001 65012<br> . 10.0.0.22 from 10.11.33.33 (10.0.0.33)<br> . Origin INCOMPLETE, metric -, localpref 100, weight 0, valid, external, ECMP, ECMP contributor<br> . Extended Community: Route-Target-IP:10.0.0.11:70 TunnelEncap:tunnelTypeVxlan EvpnRouterMac:44:39:39:ff:40:00<br> . VNI: 4000<br> |
Finally let’s check the routing table for the tenant’s VRF. But we will do it together with the verification of BGP RIB for this VRF as well. At Cumulus Linux it looks like:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 | <br> .cumulus@VX2:mgmt-vrf:~$ net show bgp vrf CUST2 ipv4 unicast<br> .BGP table version is 13, local router ID is 192.168.8.254<br> .Status codes: s suppressed, d damped, h history, * valid, > best, = multipath,<br> . i internal, r RIB-failure, S Stale, R Removed<br> .Origin codes: i - IGP, e - EGP, ? - incomplete<br> .<br> . Network Next Hop Metric LocPrf Weight Path<br> .* 192.168.7.0 10.0.0.11 0 65001 65011 i<br> .*> 10.0.0.11 0 65001 65011 i<br> .*> 192.168.8.0 0.0.0.0 0 32768 ?<br> .<br> .Displayed 2 routes and 3 total paths<br> .<br> .<br> .cumulus@VX2:mgmt-vrf:~$ net show route vrf CUST2<br> .show ip route vrf CUST2<br> .========================<br> .Codes: K - kernel route, C - connected, S - static, R - RIP,<br> . O - OSPF, I - IS-IS, B - BGP, E - EIGRP, N - NHRP,<br> . T - Table, v - VNC, V - VNC-Direct, A - Babel, D - SHARP,<br> . F - PBR,<br> . > - selected route, * - FIB route<br> .<br> .<br> .VRF CUST2:<br> .K * 0.0.0.0/0 [255/8192] unreachable (ICMP unreachable), 02:23:23<br> .B>* 192.168.7.0/24 [20/0] via 10.0.0.11, vlan4000 onlink, 01:11:18<br> .C * 192.168.8.0/24 is directly connected, vlan888-v0, 01:17:52<br> .C>* 192.168.8.0/24 is directly connected, vlan888, 01:17:52<br> |
In Arista EOS the syntax is very similar to Cisco IOS (or NX-OS), what we have mentioned previously, so the command is also easy:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 | <br> .cumulus@VX2:mgmt-vrf:~$ net show bgp vrf CUST2 ipv4 unicast<br> .BGP table version is 13, local router ID is 192.168.8.254<br> .Status codes: s suppressed, d damped, h history, * valid, > best, = multipath,<br> . i internal, r RIB-failure, S Stale, R Removed<br> .Origin codes: i - IGP, e - EGP, ? - incomplete<br> .<br> . Network Next Hop Metric LocPrf Weight Path<br> .* 192.168.7.0 10.0.0.11 0 65001 65011 i<br> .*> 10.0.0.11 0 65001 65011 i<br> .*> 192.168.8.0 0.0.0.0 0 32768 ?<br> .<br> .Displayed 2 routes and 3 total paths<br> .<br> .<br> .cumulus@VX2:mgmt-vrf:~$ net show route vrf CUST2<br> .show ip route vrf CUST2<br> .========================<br> .Codes: K - kernel route, C - connected, S - static, R - RIP,<br> . O - OSPF, I - IS-IS, B - BGP, E - EIGRP, N - NHRP,<br> . T - Table, v - VNC, V - VNC-Direct, A - Babel, D - SHARP,<br> . F - PBR,<br> . > - selected route, * - FIB route<br> .<br> .<br> .VRF CUST2:<br> .K * 0.0.0.0/0 [255/8192] unreachable (ICMP unreachable), 02:23:23<br> .B>* 192.168.7.0/24 [20/0] via 10.0.0.11, vlan4000 onlink, 01:11:18<br> .C * 192.168.8.0/24 is directly connected, vlan888-v0, 01:17:52<br> .C>* 192.168.8.0/24 is directly connected, vlan888, 01:17:52<br> |
The control plane looks nice, so we go further with our checks.
Data plane verification / Arista and Cumulus
The data plane verification is quite easy. As usually we just use ping to generate ICMP stream between VM7 and VM8. But beforehand we need to create the emulations of such VMs using VRFs at Spine switches:
VX4 – Cumulus Linux | vEOS2 – Arista EOS |
cumulus@VX2:mgmt-vrf:~$ net pending net add vlan 888 ip address 192.168.8.2/24 net add vlan 888 vlan-id 888 net add vlan 888 vlan-raw-device bridge net add vlan 888 vrf VM8 net add routing route 0.0.0.0/0 192.168.8.250 vrf VM8 | vEOS1#show run ! Command: show running-config ! device: vEOS1 (vEOS, EOS-4.20.1F) ! vrf definition VM7 ! interface Ethernet3 no switchport ! interface Ethernet3.777 encapsulation dot1q vlan 777 vrf forwarding VM7 ip address 192.168.7.1/24 ! ip route vrf VM7 0.0.0.0/0 192.168.7.250 ! ip routing vrf VM7 ! end |
The VMs emulated using basic VRF-lite approach, so no further actions are needed. Let’s test how it works:
1 2 3 4 5 6 7 8 9 10 11 12 | <br> .vEOS3#ping vrf VM7 192.168.8.2<br> .PING 192.168.8.2 (192.168.8.2) 72(100) bytes of data.<br> .80 bytes from 192.168.8.2: icmp_seq=1 ttl=62 time=37.7 ms<br> .80 bytes from 192.168.8.2: icmp_seq=2 ttl=62 time=31.9 ms<br> .80 bytes from 192.168.8.2: icmp_seq=3 ttl=62 time=31.2 ms<br> .80 bytes from 192.168.8.2: icmp_seq=4 ttl=62 time=35.1 ms<br> .80 bytes from 192.168.8.2: icmp_seq=5 ttl=62 time=30.5 ms<br> .<br> .--- 192.168.8.2 ping statistics ---<br> .5 packets transmitted, 5 received, 0% packet loss, time 147ms<br> .rtt min/avg/max/mdev = 30.574/33.310/37.708/2.702 ms, ipg/ewma 36.822/35.428 ms<br> |
As both Cumulus Linux and Arista EOS has Linux as basis, we can check the packet flow using TCP dump (the command will be the same for Cumulus and for Arista if launch it from bash):
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 | <br> .cumulus@VX2:mgmt-vrf:~$ sudo tcpdump -i any -vvvv port 4789<br> .[sudo] password for cumulus:<br> .tcpdump: listening on any, link-type LINUX_SLL (Linux cooked), capture size 262144 bytes<br> .<br> .<br> .22:36:32.454494 IP (tos 0x0, ttl 63, id 1, offset 0, flags [DF], proto UDP (17), length 150)<br> . 10.0.0.11.4789 > 10.0.0.22.4789: [no cksum] VXLAN, flags [I] (0x08), vni 4000<br> .IP (tos 0x0, ttl 63, id 48160, offset 0, flags [none], proto ICMP (1), length 100)<br> . 192.168.7.1 > 192.168.8.2: ICMP echo request, id 22540, seq 1, length 80<br> .<br> .<br> .22:36:32.455269 IP (tos 0x0, ttl 64, id 48027, offset 0, flags [none], proto UDP (17), length 150)<br> . 10.0.0.22.37591 > 10.0.0.11.4789: [no cksum] VXLAN, flags [I] (0x08), vni 4000<br> .IP (tos 0x0, ttl 63, id 41064, offset 0, flags [none], proto ICMP (1), length 100)<br> . 192.168.8.2 > 192.168.7.1: ICMP echo reply, id 22540, seq 1, length 80<br> |
Output is reduced just for 2 packets (one input, one output).
It works!
Final configuration files for this lab for Cumulus Linux and Arista EOS: 126_config_final_vEOS3 126_config_final_vEOS1 126_config_final_VX4 126_config_final_VX2
Configuration of L3 VNI in Nokia (Alcatel-Lucent) SR OS
As we’ve highlighted in the very beginning, support of interface-less was added in the latest Nokia (Alcatel-Lucent) SR OS release, which is SR OS 16.0.R1 by the time of writing and which was released less than a month ago. That’s why I put Nokia snippet here separately, though my tests shown full interoperability with Cumulus Linux and Arista EOS.
In the example below, I’ve replaced Arista EOS based leaf vEOS1 with Nokia SR OS based leaf SR1. The reason is limited amount of RAM at my laptop, and Cumulus Linux VX utilizes less resources than Arista vEOS:
By the way, the tests with Arista EOS was done as well, just in reduced topology without spine switches, with connectivity of leaf switches back to back.
The initial configuration for Nokia SR1 is here: 126_config_initial_SR1
Assuming the rest of the topology and configuration done so far isn’t changed we have the following.
#1. BGP fabric for underlay and overlay
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 | <br> A:admin@SR1# admin show configuration<br> .# TiMOS-B-16.0.R1 both/x86_64 Nokia 7750 SR Copyright (c) 2000-2018 Nokia.<br> .# All rights reserved. All use subject to applicable license agreements.<br> .# Built on Thu May 31 16:23:56 PDT 2018 by builder in /builds/160B/R1/panos/main<br> .<br> .configure {<br> . policy-options {<br> . prefix-list "PL_BGP_LO" {<br> . prefix 10.0.0.0/24 type range {<br> . start-length 32<br> . end-length 32<br> . }<br> . }<br> . policy-statement "RP_BGP_IPV4_UNICAST" {<br> . default-action {<br> . action-type reject<br> . }<br> . entry 10 {<br> . action {<br> . action-type accept<br> . }<br> . from {<br> . prefix-list ["PL_BGP_LO"]<br> . family [ipv4]<br> . }<br> . }<br> . entry 20 {<br> . action {<br> . action-type accept<br> . next-hop "10.0.0.11"<br> . }<br> . from {<br> . family [evpn]<br> . }<br> . }<br> . }<br> . }<br> . router "Base" {<br> . autonomous-system 65011<br> . ecmp 2<br> . router-id 10.0.0.11<br> . bgp {<br> . admin-state enable<br> . rapid-withdrawal true<br> . multipath {<br> . ebgp 2<br> . }<br> . rapid-update {<br> . l2-vpn true<br> . mvpn-ipv4 true<br> . mdt-safi true<br> . mvpn-ipv6 true<br> . evpn true<br> . }<br> . next-hop-resolution {<br> . use-bgp-routes true<br> . }<br> . group "FABRIC" {<br> . vpn-apply-export true<br> . peer-as 65001<br> . family {<br> . ipv4 true<br> . evpn true<br> . }<br> . authentication-key {<br> . authentication-key-hash "eBz7u3UiC/xyK3imhoaiZqy6rErJmg==" hash<br> . }<br> . export {<br> . policy ["RP_BGP_IPV4_UNICAST"]<br> . }<br> . }<br> . neighbor 10.11.44.44 {<br> . group "FABRIC"<br> . peer-as 65001<br> . }<br> . }<br> . }<br> .}<br> |
Remember that we are using Nokia (Alcatel-Lucent) SR OS 16.0.R1. So we use advantage of MD-CLI, and config is provided in YANG format. Pay attention that structure of BGP config is CHANGED!
Here, in the Nokia configuration there is one important point, what was so far not relevant for Arista EOS and Cumulus Linux. We need to modify the source of the BGP updates for EVPN address-family. In Cumulus Linux and Arista EOS VTEP IP address is used by default, whereas for Nokia (Alcatel-Lucent) SR OS the source IP address of BGP session. That’s why we have extended route policy, comparing to previous examples. Additionally, we need to enable “vpn-apply-export” under BGP group in oder route policy is applied also for all VPN address-families, what is not the case by default.
As I pointed previously, the BGP structure has changed:
- Up to Nokia SR OS 16.0.R1 BGP neighbours (peers) were configured within the group context
- From Nokia SR OS 16.0.R1 BGP neighbours are configured in general BGP context, whereas group is an parameter within neighbour context
I think this approach is quite good (and you can find it is similar to something else).
#2. L3 VNI in Nokia SR OS
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 | <br> .A:admin@SR1# admin show configuration<br> .# TiMOS-B-16.0.R1 both/x86_64 Nokia 7750 SR Copyright (c) 2000-2018 Nokia.<br> .# All rights reserved. All use subject to applicable license agreements.<br> .# Built on Thu May 31 16:23:56 PDT 2018 by builder in /builds/160B/R1/panos/main<br> .<br> .configure {<br> . service {<br> . customer "2" {<br> . }<br> . vpls "104000" {<br> . admin-state enable<br> . customer "2"<br> . vxlan {<br> . instance 1 {<br> . vni 4000<br> . }<br> . }<br> . routed-vpls {<br> . }<br> . bgp 1 {<br> . route-distinguisher "10.0.0.11:70"<br> . route-target {<br> . export "target:65000:70"<br> . import "target:65000:70"<br> . }<br> . }<br> . bgp-evpn {<br> . routes {<br> . mac-ip {<br> . advertise false<br> . }<br> . ip-prefix {<br> . advertise true<br> . include-direct-interface-host true<br> . }<br> . }<br> . vxlan 1 {<br> . admin-state enable<br> . vxlan-instance 1<br> . send-tunnel-encap true<br> . }<br> . }<br> . }<br> . vpls "2" {<br> . admin-state enable<br> . customer "2"<br> . routed-vpls {<br> . }<br> . sap 1/1/2:777 {<br> . }<br> . }<br> . vprn "7" {<br> . admin-state enable<br> . customer "2"<br> . route-distinguisher "10.0.0.11:70"<br> . interface "IRB" {<br> . admin-state enable<br> . mac 00:20:00:01:97:01<br> . vpls {<br> . service-name "2"<br> . }<br> . ipv4 {<br> . primary {<br> . address 192.168.7.253<br> . prefix-length 24<br> . }<br> . neighbor-discovery {<br> . populate-host true<br> . local-proxy-arp true<br> . remote-proxy-arp true<br> . }<br> . vrrp 1 {<br> . backup [192.168.7.250]<br> . passive true<br> . mac 00:00:50:00:00:01<br> . ping-reply true<br> . traceroute-reply true<br> . }<br> . }<br> . }<br> . interface "L3_VNI" {<br> . admin-state enable<br> . mac 00:00:54:00:00:01<br> . vpls {<br> . service-name "104000"<br> . evpn-tunnel true<br> . }<br> . }<br> . }<br> . }<br> .}<br> |
Here we have new structure due to SR OS 16.0.R1 again
The key change is that by the command “vpls x” we configured not the service with id “x”, but rather the service with name “x”. So we don’t need (actually there is no such command anymore) to configure “service-name Y” within service context. To make interface-less L3 VNI, which is interoperable with other vendors, we disable “mac-ip” advertisement and enable “ip-prefix” advertisement under “bgp-evpn routes” context. Then under routing context “vprn 7” we create interface “L3_VNI”, which is just arbitrary name< and configure there vpls with our name “104000” and “evpn-tunnel” option enabled. Point out that we don’t configure any EVPN parameters under “vpls 2”, what in fact is just local BVI interface, which is mapped to IP 192.168.7.253 inside “vprn 7” context.
#3. Verification for L3 VNI in Nokia
This time we start the verification with data plane. We test connectivity from VM8 (emulated by VRF at Cumulus Linux VX4) to VM7 (emulated by VRF at Arista vEOS3):
1 2 3 4 5 6 7 8 | <br> .cumulus@VX4:mgmt-vrf:~$ ping -I VM8 192.168.7.1<br> .ping: Warning: source address might be selected on device other than VM8.<br> .PING 192.168.7.1 (192.168.7.1) from 192.168.8.2 VM8: 56(84) bytes of data.<br> .64 bytes from 192.168.7.1: icmp_seq=1 ttl=62 time=126 ms<br> .64 bytes from 192.168.7.1: icmp_seq=2 ttl=62 time=16.4 ms<br> .64 bytes from 192.168.7.1: icmp_seq=3 ttl=62 time=28.4 ms<br> .64 bytes from 192.168.7.1: icmp_seq=4 ttl=62 time=19.1 ms<br> |
As we have shown the packets on the wire previously, we do the same for this test. The “tcpdump” is performed at Cumulus Linux based leaf switch VX2:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 | <br> .cumulus@VX2:mgmt-vrf:~$ sudo tcpdump -i any -vvvv port 4789<br> .[sudo] password for cumulus:<br> .tcpdump: listening on any, link-type LINUX_SLL (Linux cooked), capture size 262144 bytes<br> .<br> .<br> .20:58:10.355256 IP (tos 0x0, ttl 64, id 17500, offset 0, flags [none], proto UDP (17), length 134)<br> . 10.0.0.22.33906 > 10.0.0.11.4789: [no cksum] VXLAN, flags [I] (0x08), vni 4000<br> .IP (tos 0x0, ttl 63, id 2879, offset 0, flags [DF], proto ICMP (1), length 84)<br> . 192.168.8.2 > 192.168.7.1: ICMP echo request, id 1447, seq 28, length 64<br> .<br> .<br> .20:58:10.389836 IP (tos 0x0, ttl 254, id 4096, offset 0, flags [DF], proto UDP (17), length 134)<br> . 10.0.0.11.60123 > 10.0.0.22.4789: [no cksum] VXLAN, flags [I] (0x08), vni 4000<br> .IP (tos 0x0, ttl 63, id 682, offset 0, flags [none], proto ICMP (1), length 84)<br> . 192.168.7.1 > 192.168.8.2: ICMP echo reply, id 1447, seq 28, length 64<br> |
As data plane is fully OK, it means that control plane is OK as well, but let’s review some outputs.
Here is the output of the learned EVPN routes (both type-2 MAC-IP and type-5 IP-PREFIX):
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 | <br> .[]<br> .A:admin@SR1# show router bgp routes evpn mac<br> .===============================================================================<br> . BGP Router ID:10.0.0.11 AS:65011 Local AS:65011<br> .===============================================================================<br> . Legend -<br> . Status codes : u - used, s - suppressed, h - history, d - decayed, * - valid<br> . l - leaked, x - stale, > - best, b - backup, p - purge<br> . Origin codes : i - IGP, e - EGP, ? - incomplete<br> .<br> .===============================================================================<br> .BGP EVPN MAC Routes<br> .===============================================================================<br> .Flag Route Dist. MacAddr ESI<br> . Tag Mac Mobility Label1<br> . Ip Address<br> . NextHop<br> .-------------------------------------------------------------------------------<br> .No Matching Entries Found.<br> .===============================================================================<br> .<br> .<br> .[]<br> .A:admin@SR1# show router bgp routes evpn ip-prefix<br> .===============================================================================<br> . BGP Router ID:10.0.0.11 AS:65011 Local AS:65011<br> .===============================================================================<br> . Legend -<br> . Status codes : u - used, s - suppressed, h - history, d - decayed, * - valid<br> . l - leaked, x - stale, > - best, b - backup, p - purge<br> . Origin codes : i - IGP, e - EGP, ? - incomplete<br> .<br> .===============================================================================<br> .BGP EVPN IP-Prefix Routes<br> .===============================================================================<br> .Flag Route Dist. Prefix<br> . Tag Gw Address<br> . NextHop<br> . Label<br> .-------------------------------------------------------------------------------<br> .u*>? 0.0.0.0:5 192.168.8.0/24<br> . 0 44:39:39:ff:40:00<br> . 10.0.0.22<br> . VNI 4000<br> .<br> .i 10.0.0.11:70 192.168.7.1/32<br> . 0 00:00:54:00:00:01<br> . 10.0.0.11<br> . VNI 4000<br> .<br> .i 10.0.0.11:70 192.168.7.253/32<br> . 0 00:00:54:00:00:01<br> . 10.0.0.11<br> . VNI 4000<br> .<br> .i 10.0.0.11:70 192.168.7.0/24<br> . 0 00:00:54:00:00:01<br> . 10.0.0.11<br> . VNI 4000<br> .<br> .-------------------------------------------------------------------------------<br> .Routes : 4<br> .===============================================================================<br> |
As you see, there is no single type-2 (MAC-IP) routes learned or advertised, so all customer routes are only level-5 (IP-PREFIX).
In routing table (as well as in FIB) we see that next hop to this route is pointed to MAC:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 | <br> .[]<br> .A:admin@SR1# show router service-name 7 route-table<br> .<br> .===============================================================================<br> .Route Table (Service: 7)<br> .===============================================================================<br> .Dest Prefix[Flags] Type Proto Age Pref<br> . Next Hop[Interface Name] Metric<br> .-------------------------------------------------------------------------------<br> .192.168.7.0/24 Local Local 00h41m27s 0<br> . IRB 0<br> .192.168.7.1/32 Remote ARP-ND 00h36m38s 1<br> . 192.168.7.1 0<br> .192.168.8.0/24 Remote BGP EVPN 00h37m53s 169<br> . L3_VNI (ET-44:39:39:ff:40:00) 0<br> .-------------------------------------------------------------------------------<br> .No. of Routes: 3<br> .Flags: n = Number of times nexthop is repeated<br> . B = BGP backup route available<br> . L = LFA nexthop available<br> . S = Sticky ECMP requested<br> .===============================================================================<br> |
And in order for the lookup to be successful, this MAC must be reachable in VNI:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 | <br> .[]<br> .A:admin@SR1# show service id "104000" fdb detail<br> .<br> .===============================================================================<br> .Forwarding Database, Service 104000<br> .===============================================================================<br> .ServId MAC Source-Identifier Type Last Change<br> . Age<br> .-------------------------------------------------------------------------------<br> .104000 00:00:54:00:00:01 cpm Intf 06/30/18 18:56:08<br> .104000 44:39:39:ff:40:00 vxlan-1: Evpn 06/30/18 19:00:17<br> . 10.0.0.22:4000<br> .-------------------------------------------------------------------------------<br> .No. of MAC Entries: 2<br> .-------------------------------------------------------------------------------<br> .Legend: L=Learned O=Oam P=Protected-MAC C=Conditional S=Static Lf=Leaf<br> .===============================================================================<br> |
This MAC is extracted also from IP-PREFIX (EVPN type-5) route from the field “GW address”.
The final configuration for Nokia (Alcatel-Lucent) SR OS you can find here: 126_config_final_SR1
Lessons learned
Frankly speaking, by default the traffic flow from the example above won’t work. The reason for that is some caveats in data plane implementation in Arista vEOS, which is not relevant for hardware or even vEOS-Router. Actually usage of tcpdump helped me to find that issue.
I hope it will be fixed in the upcoming release of Arista vEOS
Greetings and acknowledges
Many thanks for Michael Amstelveen and Alex Nichol from Arista for answering my questions and sharing valuable tips on Arista configuration. Thanks for Pete Crocker from Cumulus Networks for pointing to useful information about Cumulus Linux. And special thanks for Jorge Rabadan from Nokia for updating on capabilities of Nokia SR OS 16.0.R1 to have interface-less mode and constantly providing feedback on features supported in Nokia SR OS.
Conclusion
Though data centres based on BGP-EVPN control plane and VXLAN data plane are already some time presents on market, this technology is being actively developed. Such type of L3 VNI is described in the one of the BESS draft. I believe that further development of RFC and standardization of features being draft now will make data centre world even more interoperable.
Support us
P.S.
If you have further questions or you need help with your networks, I’m happy to assist you, just send me message (https://karneliuk.com/contact/). Also don’t forget to share the article on your social media, if you like it.
BR,
Anton Karneliuk
Useful links
Configuration of EVPN-VXLAN including L3VNI in Cumulus Linux
Configuration of EVPN-VXLAN for interface-less mode in Nokia (Alcatel-Lucent) SR OS
For Arista I haven’t found similar doc (L3 VNI config) doc in publically available.