Hello my friend,
Recently we have talked about building data center with EVPN/VXLAN using Nokia (Alcatel-Lucent) SR OS and Cisco IOS XR. But we have touched only L2 part, so switching between VMs within same L2 domain. In this article we’ll work on L3 part, hence routing between VMs in different L2 domains. Interested?
1 2 3 4 5 | No part of this blogpost could be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical or photocopying, recording, or otherwise, for commercial purposes without the prior permission of the author. |
Disclaimer
This article isn’t independent, but rather continuation of the discussion started a bit ago. So I strongly recommend to review that article beforehand.
Brief description
It’s a standard practice for data centers to put different VMs in different IP subnets, based on the function they are doing. There are numerous issues, why is it necessary. The easiest example is reduction of broadcast domain to limit amount of BUM traffic.
BUM stands for Broadcast, unknown unicast, multicast.
In the same way, modern data centers are built in highly robust manner to eliminate impact of any single outage (and hopefully not only single, when it’s economically reasonable). One of the tools that helps to reach it is the migration of VMs from one server to another (as a preventive action) or restoration from backup after failure (reactive action).
After such migration or restoration is done, each VM has already resolved ARP table with IP/MAC of default gateway. In legacy data center design there were FHRP (i.e. standard-based VRRP or Cisco-proprietary HSRP). Configured on DC GW, what typically is a pair of switches connected to IP Core, what was making inter-VLAN routing very inefficient.
And here where EVPN comes onto central stage. It allows you to deploy anycast gateway, where all Leaf switches (sometimes called also ToR – top of rack, depending on design) has the same IP/MAC address configured within certain VLAN termination. And such design works just awesomely! The control plane of EVPN based on BGP does all the magic. From the VXLAN prospective, as a data plane mechanism, there are no differences comparing to previous case.
UPD.
There is statement which implies creating of MAC/IP routes based on the information learned from hosts. On the page 437 it’s said:
1 2 3 4 5 6 7 | SR OS does not include a host IP address in any EVPN MAC advertisement for a MAC learned on a SAP or SDP-binding. Host IP addresses are only included in the EVPN MAC advertisements corresponding to R-VPLS IP interfaces. When deployed as DC GW in a Nuage architecture, the Nuage Networks Virtual Services Controller (VSC) or Virtual Services Gateway (VSG) will send virtual machine and host MAC/IP pairs in EVPN MAC routes. See the Nokia Nuage documentation for more information about the Nuage DC architecture. The 7x50 DC GW will populate the proxy-ARP tables with those MAC/IP pairs. |
According to Jorge Rabadan, Nokia Senior PLM, this issue is solved in 15.0.Rx version. As I have tested VSR 14.0.R4, that wasn’t possible. I will redo tests with 15.0.R7 and will update the article later on than.
UPD[23/04/2018].
Many thanks to Jorge Rabadan, who pointed me to the corresponding part of the Advanced configuration guide page 1469, where concept of passive VRRP is described, which Nokia uses to deploy anycast gateway. So in previous version of the article I have stated it isn’t supported, it was false statement, sorry for that. I will provide configuration and verification of anycast gateway in the end of the article, so despite some inconsistency it brings to the article, I’m sure it’s worth to mention it.
What we are going to test?
We start with the final configuration from the previous lab about EVPN/VXLAN. Then we’ll create two additional VM emulations (new VRF per Cisco IOS XRv router) in different L2 domain. Also we’ll enable IP services within EVPN to terminate to be able to perform inter-VLAN and/or inter-VXLAN routing
The success criteria for the lab is that we are able to ping from VM in one L2 domain (VM1 and VM2) any VM in another domain (VM3 and VM4). Also we should see corresponding traffic in packet capture and appropriate info in control plane.
Software version
The following infrastructure is used in my lab:
- CentOS 7 with python 2.7.
- Ansible 2.4.2
- Nokia (Alcatel-Lucent) SR OS 14.0.R4
- Cisco IOS XRv 6.1.2
See the previous article to get details how to build the lab
Topology
Physical topology for the lab doesn’t changes comparing to previous labs:
As I’ve mentioned before, the initial topology for this lab is the final topology from the previous EVPN/VXLAN lab:
Initial configuration files are here: 106_config_final_SR1 106_config_final_linux 106_config_final_XR4 106_config_final_XR3 106_config_final_SR2
For more details, what is going on, check the previous lab
Configuration of overlay network infrastructure (EVPN + VXLAN) for IP routing
As we don’t need to configure underlay, cause it’s done, we go straight to the service part in this lab.
Based on the task above the following activities should be done:
- Turn created VPLS 10000123 into routed one
- Create new routed VPRN 10000456 according to the image below
- Create VPRN 20000000 to provide L3 functionality
- Deploy two new VMs
In total our service topology looks as follows for domain 1 with mapping to routing instance:
And for domain 2:
#1. Turning VPLS 10000123 into routed
We start with the first point, as it’s quite simple and involve only Nokia (Alcatel-Lucent) SR OS routers:
SR1 | SR2 |
A:SR1>edit-cfg# candidate view |
A:SR2>edit-cfg# candidate view |
Only additions to configuration is shown. Initial config shown in the previous part of the article
Frankly speaking, for the purpose of this particular lab only “allow-ip-int-bind” and “service-name “L2_DOMAIN_1”” are need due to issue mentioned before. Nokia (Alcatel-Lucent) SR OS routers doesn’t create automatically type-2 IP route (MAC/IP route) for connected hosts. So we only need to bind VPLS to IP service in VPRN and that’s it. Command “ip-route-advertisement incl-host” doesn’t bring much benefit now, as it advertises the prefixes from routing table in VPRN (important for DCI and peering with IP/MPLS backbone), and in our case both Leaf SR1 and SR2 have the same interface IPv4 addresses configured.
Brief verification at Nokia (Alcatel-Lucent) VSR:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 | A:SR1# show service id 10000123 base =============================================================================== Service Basic Information =============================================================================== Service Id : 10000123 Vpn Id : 0 Service Type : VPLS Name : L2_DOMAIN_1 Description : (Not Specified) Customer Id : 2 Creation Origin : manual Last Status Change: 02/18/2018 18:16:59 Last Mgmt Change : 02/18/2018 18:50:47 Etree Mode : Disabled Admin State : Up Oper State : Up MTU : 1514 Def. Mesh VC Id : 10000123 SAP Count : 1 SDP Bind Count : 0 Snd Flush on Fail : Disabled Host Conn Verify : Disabled SHCV pol IPv4 : None Propagate MacFlush: Disabled Per Svc Hashing : Disabled Allow IP Intf Bind: Enabled Fwd-IPv4-Mcast-To*: Disabled Fwd-IPv6-Mcast-To*: Disabled Def. Gateway IP : None Def. Gateway MAC : None Temp Flood Time : Disabled Temp Flood : Inactive Temp Flood Chg Cnt: 0 SPI load-balance : Disabled TEID load-balance : Disabled Src Tep IP : N/A VSD Domain : =============================================================================== Service Access & Destination Points =============================================================================== Identifier Type AdmMTU OprMTU Adm Opr =============================================================================== sap:1/1/2:111 q-tag 1518 1518 Up Up =============================================================================== |
You can also read another good article about routed VPLS to get more details.
#2. Creating new routed VPLS
Here we create VPLS for new L2 domain:
SR1 | SR2 |
A:SR1>edit-cfg# candidate view |
A:SR2>edit-cfg# candidate view |
I won’t comment this part of the configuration, because it should be already self-explanatory for you. If not, refer previous point in this article and the previous article in general.
#3. Creating gateway services
In this point we map L2 domains to corresponding L3 interfaces. The configuration will be the same for both routers, the difference is just in RD, RT and IP/MAC addresses in respective interface:
SR1 | SR2 |
A:SR1>edit-cfg# candidate view |
A:SR2>edit-cfg# candidate view |
After this point we’ll see something interesting in BGP RIB for EVPN. But before, let’s check just normal routing table and interfaces:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 | A:SR1# show router 20000000 route-table =============================================================================== Route Table (Service: 20000000) =============================================================================== Dest Prefix[Flags] Type Proto Age Pref Next Hop[Interface Name] Metric =============================================================================== 192.168.0.0/24 Local Local 02h30m09s 0 IRB_VXLAN1 0 192.168.1.0/24 Local Local 02h21m43s 0 IRB_VXLAN2 0 =============================================================================== No. of Routes: 2 Flags: n = Number of times nexthop is repeated B = BGP backup route available L = LFA nexthop available S = Sticky ECMP requested =============================================================================== ! ! A:SR1# show router 20000000 interface =============================================================================== Interface Table (Service: 20000000) =============================================================================== Interface-Name Adm Opr(v4/v6) Mode Port/SapId IP-Address PfxState =============================================================================== IRB_VXLAN1 Up Up/Down VPRN rvpls 192.168.0.254/24 n/a IRB_VXLAN2 Up Up/Down VPRN rvpls 192.168.1.254/24 n/a =============================================================================== Interfaces : 2 =============================================================================== |
We see that created interfaces are up and running and we see the corresponding routes in routing table. The next step is to check which EVPN type 2 (MAC/IP) routes we have before we send ping packets from our VMs:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 | A:SR1# show router bgp routes evpn mac =============================================================================== BGP Router ID:10.0.0.11 AS:65011 Local AS:65011 =============================================================================== Legend - Status codes : u - used, s - suppressed, h - history, d - decayed, * - valid l - leaked, x - stale, > - best, b - backup, p - purge Origin codes : i - IGP, e - EGP, ? - incomplete =============================================================================== BGP EVPN MAC Routes =============================================================================== Flag Route Dist. MacAddr ESI Tag Mac Mobility Label1 Ip Address NextHop =============================================================================== u*>i 10.0.0.22:123 00:20:00:01:23:02 ESI-0 0 Static VNI 123 192.168.0.254 10.0.0.22 . u*>i 10.0.0.22:456 00:20:00:04:56:02 ESI-0 0 Static VNI 456 192.168.1.254 10.0.0.22 =============================================================================== Routes : 2 =============================================================================== |
Let’s analyse these routes. As in Nokia (Alcatel-Lucent) SR OS we see only Adj-RIB-In routes in BGP, we see the MAC/IP from another Leaf switch SR2, which are unfortunately have to be different comparing to ours
As we have configured advertisement of IP routes for EVPN, we can also check table of EVPN type 5 routes:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 | A:SR1# show router bgp routes evpn ip-prefix =============================================================================== BGP Router ID:10.0.0.11 AS:65011 Local AS:65011 =============================================================================== Legend - Status codes : u - used, s - suppressed, h - history, d - decayed, * - valid l - leaked, x - stale, > - best, b - backup, p - purge Origin codes : i - IGP, e - EGP, ? - incomplete =============================================================================== BGP EVPN IP-Prefix Routes =============================================================================== Flag Route Dist. Prefix Tag Gw Address NextHop Label =============================================================================== u*>i 10.0.0.22:123 192.168.1.254/32 0 192.168.0.254 10.0.0.22 VNI 123 . u*>i 10.0.0.22:123 192.168.1.0/24 0 192.168.0.254 10.0.0.22 VNI 123 . u*>i 10.0.0.22:456 192.168.0.254/32 0 192.168.1.254 10.0.0.22 VNI 456 . u*>i 10.0.0.22:456 192.168.0.0/24 0 192.168.1.254 10.0.0.22 VNI 456 =============================================================================== Routes : 4 |
Here we see subnet IPv4 prefix and host route, the same as in type-2 MAC/IP route. You remember, in the first point we have written “incl-hosts”, what makes router to send also it host route from configured on interface.
#4. Configuration of VM (emulation)
Much the same like in the previous part of EVPN/VXLAN article, we create two new VRFs:
XR3 | XR4 |
RP/0/0/CPU0:XR3(config)#show conf |
RP/0/0/CPU0:XR3(config)#show conf |
As you see, in addition to VRF creation we add also default gateway to provide routing outside the local domain.
And provision connectivity in Linux for them:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | sudo /sbin/vconfig add vnet2 111 sudo /sbin/vconfig add ens34 111 sudo ifconfig vnet2.111 up sudo ifconfig ens34.111 up sudo brctl addbr br111 sudo brctl addif br111 vnet2.111 sudo brctl addif br111 ens34.111 sudo ifconfig br111 up ! sudo /sbin/vconfig add vnet5 222 sudo /sbin/vconfig add ens34 222 sudo ifconfig vnet5.222 up sudo ifconfig ens34.222 up sudo brctl addbr br222 sudo brctl addif br222 vnet5.222 sudo brctl addif br222 ens34.222 sudo ifconfig br222 up |
It’s the moment to go to the most interesting part, which is the verification.
Verification of inter-subnet routing with EVPN/VXLAN
We’ll review 3 cases to show you all possible flavours of traffic flow:
- Between VMs inside the same L2 domain
- Between VMs in different L2 domains connected to same Leaf switch
- Between VMs in different L2 domains connected to different Leaf switch
#1. Between VMs inside the same L2 domain
We did this use case in the previous part. Here we double check, that everything still works:
1 2 3 4 5 | RP/0/0/CPU0:XR3#ping vrf VM3 192.168.1.4 Type escape sequence to abort. Sending 5, 100-byte ICMP Echos to 192.168.1.4, timeout is 2 seconds: !!!!! Success rate is 100 percent (5/5), round-trip min/avg/max = 1/5/9 ms |
Wireshark trace:
Read previous article to get more info decoding VXLAN encapsulation.
#2. Between VMs in different L2 domains connected to same Leaf switch
First inter-subnet routing scenario looks like the following:
1 2 3 4 5 | RP/0/0/CPU0:XR3#ping vrf VM3 192.168.0.3 Type escape sequence to abort. Sending 5, 100-byte ICMP Echos to 192.168.0.3, timeout is 2 seconds: !!!!! Success rate is 100 percent (5/5), round-trip min/avg/max = 1/4/9 ms |
Wireshark trace:
As you see, we don’t have anything special at all here, as we have local routing without touching VXLAN overlay somehow.
#3. Between VMs in different L2 domains connected to different Leaf switch
Here will be the most complicated scenario from configured:
1 2 3 4 5 | RP/0/0/CPU0:XR3#ping vrf VM3 192.168.0.4 Type escape sequence to abort. Sending 5, 100-byte ICMP Echos to 192.168.0.4, timeout is 2 seconds: !!!!! Success rate is 100 percent (5/5), round-trip min/avg/max = 1/5/9 ms |
Wireshark trace:
What is happening here:
- VM3 (connected to domain 1 at SR1) sends ping to VM4 (connected to domain 2 at SR2)
- VM3 sends ICMP packet to its default GW (DST MAC of SR1 for interface in domain 2 (00:20:00:04:56:01)) with source MAC of VM3
- SR1 strips L2 header and creates new with destination MAC of VM4 (learned through EVPN type-2 MAC/IP route) and source of SR1 for domain 1 (00:20:00:01:23:01) and encapsulates this packet into VXLAN
- SR2 removes VXLAN encapsulation and layer 2 frame is sent to VM4
- VM4 responds to ICMP with packet to its default GW (DST MAC 00:20:01:23:02 of SR2 in VNI 123).
- SR2 removes L2 encpsulation and creates new frame in proper subnet with source MAC of 00:20:00:04:56:02 and DST MAC of VM3. Then it adds VXLAN encapsulation for VNI 456 and send the packet to SR1
- SR1 removes VXLAN encapsulation and sends packet to VM3
- VM3 receives ping response.
After we have done all pings, there are more EVPN type-2 MAC/IP routes in BGP Adj-RIB-In now:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 | A:SR1# show router bgp routes evpn mac =============================================================================== BGP Router ID:10.0.0.11 AS:65011 Local AS:65011 =============================================================================== Legend - Status codes : u - used, s - suppressed, h - history, d - decayed, * - valid l - leaked, x - stale, > - best, b - backup, p - purge Origin codes : i - IGP, e - EGP, ? - incomplete =============================================================================== BGP EVPN MAC Routes =============================================================================== Flag Route Dist. MacAddr ESI Tag Mac Mobility Label1 Ip Address NextHop ------------------------------------------------------------------------------- u*>i 10.0.0.22:123 00:20:00:01:23:02 ESI-0 0 Static VNI 123 192.168.0.254 10.0.0.22 . u*>i 10.0.0.22:123 00:50:56:23:d3:7e ESI-0 0 Seq:0 VNI 123 N/A 10.0.0.22 . u*>i 10.0.0.22:456 00:20:00:04:56:02 ESI-0 0 Static VNI 456 192.168.1.254 10.0.0.22 . u*>i 10.0.0.22:456 00:50:56:23:d3:7e ESI-0 0 Seq:0 VNI 456 N/A 10.0.0.22 =============================================================================== Routes : 4 =============================================================================== |
UPD[23/04/2018]. #3.x Creating ANYCAST gateway services
As we have said in the beginning, threre is the possibility to create anycast GW service in Nokia SR 7750 so that all leaf switches within VRF have the same MAC/IP, what is one of the prerequsits for efficient IRB in data center. This is done by using passive VRRP (link) in Nokia (Alcatel-Lucent) SR OS. As it’s described in the provided link, each node than treats itself as master without sending any keepalive messages to the peers, so that each leaf swtiches routes the traffic in another IP subnet/VXLAN if it’s necessary. Here is the service topology for the first context:
and for the second context:
The following configuration shoud be implemented at leaf switches:
SR1 | SR2 |
A:SR1>edit-cfg# candidate view |
A:SR2>edit-cfg# candidate view |
First of all, let’s check the status of VRRP:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 | .A:SR1# show router 20000000 vrrp instance .=============================================================================== .VRRP Instances .=============================================================================== .Interface Name VR Id Own Adm State Base Pri Msg Int . IP Opr Pol Id InUse Pri Inh Int .------------------------------------------------------------------------------- .IRB_VXLAN1 1 No Up Master 100 1 . IPv4 Up n/a 100 No . Backup Addr: 192.168.0.250 .IRB_VXLAN2 1 No Up Master 100 1 . IPv4 Up n/a 100 No . Backup Addr: 192.168.1.250 .------------------------------------------------------------------------------- .Instances : 2 .=============================================================================== . . .A:SR2# show router 20000000 vrrp instance .=============================================================================== .VRRP Instances .=============================================================================== .Interface Name VR Id Own Adm State Base Pri Msg Int . IP Opr Pol Id InUse Pri Inh Int .------------------------------------------------------------------------------- .IRB_VXLAN1 1 No Up Master 100 1 . IPv4 Up n/a 100 No . Backup Addr: 192.168.0.250 .IRB_VXLAN2 1 No Up Master 100 1 . IPv4 Up n/a 100 No . Backup Addr: 192.168.1.250 .------------------------------------------------------------------------------- .Instances : 2 .=============================================================================== |
In the BGP RIB for EVPN we see new routes:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 | .A:SR1# show router bgp routes evpn mac .=============================================================================== . BGP Router ID:10.0.0.11 AS:65011 Local AS:65011 .=============================================================================== . Legend - . Status codes : u - used, s - suppressed, h - history, d - decayed, * - valid . l - leaked, x - stale, > - best, b - backup, p - purge . Origin codes : i - IGP, e - EGP, ? - incomplete . .=============================================================================== .BGP EVPN MAC Routes .=============================================================================== .Flag Route Dist. MacAddr ESI . Tag Mac Mobility Label1 . Ip Address . NextHop .------------------------------------------------------------------------------- .i 10.0.0.11:123 00:00:5e:00:01:23 ESI-0 . 0 Static VNI 123 . 192.168.0.250 . 10.0.0.11 . .i 10.0.0.11:456 00:00:5e:00:04:56 ESI-0 . 0 Static VNI 456 . 192.168.1.250 . 10.0.0.11 . .u*>i 10.0.0.22:123 00:00:5e:00:01:23 ESI-0 . 0 Static VNI 123 . 192.168.0.250 . 10.0.0.22 . .u*>i 10.0.0.22:456 00:00:5e:00:04:56 ESI-0 . 0 Static VNI 456 . 192.168.1.250 . 10.0.0.22 . .------------------------------------------------------------------------------- .Routes : 14 .=============================================================================== |
In the ARP table we see the proper information:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 | .A:SR1# show router 20000000 arp . .=============================================================================== .ARP Table (Service: 20000000) .=============================================================================== .IP Address MAC Address Expiry Type Interface .------------------------------------------------------------------------------- .192.168.0.1 00:50:56:23:b3:34 03h51m41s Dyn[I] IRB_VXLAN1 .192.168.0.2 00:50:56:34:71:46 03h52m33s Dyn[I] IRB_VXLAN1 .192.168.0.250 00:00:5e:00:01:23 00h00m00s Oth[I] IRB_VXLAN1 .192.168.0.253 00:20:00:01:23:01 00h00m00s Oth[I] IRB_VXLAN1 .192.168.0.254 00:20:00:01:23:02 00h00m00s Evp[I] IRB_VXLAN1 .192.168.1.1 00:50:56:23:b3:34 03h53m33s Dyn[I] IRB_VXLAN2 .192.168.1.2 00:50:56:34:71:46 03h53m18s Dyn[I] IRB_VXLAN2 .192.168.1.250 00:00:5e:00:04:56 00h00m00s Oth[I] IRB_VXLAN2 .192.168.1.253 00:20:00:04:56:01 00h00m00s Oth[I] IRB_VXLAN2 .192.168.1.254 00:20:00:04:56:02 00h00m00s Evp[I] IRB_VXLAN2 .------------------------------------------------------------------------------- .No. of ARP Entries: 10 .=============================================================================== |
What is the more important, we don’t see any issues in log that we have duplicated MAC/IP, what we have previously, when I tried to configure same IP and MAC address on the interface.
The gateway IP on VMs (all VRFs on XR3 and XR4) should be updated accordingly to 192.168.0.250 and 192.168.1.250
The final configuration files for this lab: 110_config_final_XR3 110_config_final_XR4 110_config_final_SR1 110_config_final_SR2 110_config_final_linux
Also automation (Ansible-playbooks using Cisco IOS XR/SR OS modules to update initial state of the lab to new one): 110_lab_final.tar
Lessons learned
Doing is the only possibility to learn/prove something. I have started writing article with the idea to deploy anycast GW for data centre, which turns out to be impossible in Nokia (Alcatel-Lucent) SR OS. Though in RFC 7432 it’s explained, how the solution should work, Nokia doesn’t implement it fully. By the way, I have learned that all vendors (Cisco, Juniper or mode modern, like Dell/Mellanox) have slightly different realistion of this RFC. Somebody doesn’t deploy all the route types as well (only particular one).
Conclusion
In general, concept of anycast GW for data centre is a huge advantage for data plane. The convergence is much quicker, so the whole data centre is seemed outside as just one switch with one SVI per VLAN, what is very convenient for VMs. In the next article we’ll try to connect DC to IP network and we’ll try to propagate L3 routes over EVPN. I don’t know, if it works in Nokia (Alcatel-Lucent) SR OS, so I’m very excited to try it. Take care and good bye!
P.S.
If you have further questions or you need help with your networks, I’m happy to assist you, just send me message. Also don’t forget to share the article on your social media, if you like it.
Support us
BR,
Anton Karneliuk