Hello my friend,
In this article we won’t speak about pure service provide environment (like core and backhaul), but rather we’ll speak about data center. Well, data centers exist in service provider world as well, but they aren’t limited to it. And EVPN/VXLAN is currently de-facto standard for building them.
1 2 3 4 5 | No part of this blogpost could be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical or photocopying, recording, or otherwise, for commercial purposes without the prior permission of the author. |
Brief description
Long ago we were speaking about EVPN with the focus on service provider network, where we have deployed it in mega scaling option, which is EVPN-PBB. Its scalability is caused by hiding of all customer’s MAC addresses behind MAC address of PE router for control plane, like in classical PBB deployment. But you still have data plane MAC learning for customer MAC addresses on respective PE. On the other hand, complexity with such deployment is higher than traditional EVPN deployment, where customer MAC address learning happens in control plane. We haven’t reviewed the ordinary EVPN (as well as VXLAN), because Cisco IOS XRv, at least my version doesn’t support it. So in current article I’ll use Nokia (Alcatel-Lucent) VSR as VTEP (virtual termination end point), which is actually leaf switchers terminating VXLAN encapsulation. Cisco IOS XRv routers will play role of data center spine switches. Though the names spine/leaf might be new to you, if you don’t have data center background, their definition can be translated to IP/MPLS world:
Function | Service provider world (IP/MPLS network) | Data center world (IP fabric) |
Termination of customer connectivity, start of VPN (overlay) | PE (provider edge) | Leaf |
Sending packets in underlay between VPN endpoints without looking into payload | P (provider core) | Spine |
Quite often term “underlay’ considering data centers is replaced by “fabric” or “IP fabric”, what means the same for data center.
The major difference is that in data center MPLS is typically not deployed. Though it can be deployed, a lot of data center engineers are very resistant to this. In order to provide the same level of flexibility, as can be reached with MPLS, VXLAN is used. Recently my colleague and friend Nicola Arnoldi, who is much more familiar with VXLAN than I, pointed out that one of the big advantages of VXLAN is built in support for per-flow load balancing. How does it work? Let’s take a look on VXLAN header:
As you see, VXLAN is encapsulated into UDP, where destination port is fixed (4789) and points to VXLAN application. But source UDP port is variable, so per each new flow from one VTEP to another, source port will be different, what provides possibility for new output of hash function used to load share traffic across different links (within link aggregation group) or next-hops (for ECMP)
The only field within VXLAN header, which can be configured is VNI (virtual network identifier), which we can treat like VPN id (the same meaning as service label in MPLS, GRE key or even VLAN). It’s 24 bits long, so we have plenty of space to encode our services.
EVPN itself is explained in RFC7432, but you can also refer to my first article in order to get more info
What we are going to test?
We’ll configure Nokia (Alcatel-Lucent) SR OS routers SR1 and SR2 as leaf switches with VTEP functionality in order to interconnect two VMs (emulated by VRFs at Cisco IOS XR routers XR3 and XR4) residing in the same L2 domain. In GRT (global routing table) Cisco IOS XR routers will be acting as spine switches without any knowledge about EVPN/VXLAN overlay.
The success criteria for the lab is that we are able to ping from VM1 VM2 and see corresponding traffic in packet capture and appropriate info in control plane.
Software version
Comparing to the articles, I have written previously, now we are more bundled with “controller” part of the network, so the following setup is used:
- CentOS 7 with python 2.7.
- Ansible 2.4.2
- Nokia (Alcatel-Lucent) SR OS 14.0.R4
- Cisco IOS XRv 6.1.2
See the one of previous articles to get details how to build the lab
Topology
Our physical topology is stable, and if you tried the link above, you even know how to build it:
The logical topology is a bit extended from the default one:
If you checked the link above, you will get Ansible playbooks to deploy the core topology. What is missing, are the links to VMs and the VMs themselves. We’ll configure them later on in this lab. Those Ansible playbooks are also creates all necessary connectivity (Linux bridges) for our VNFs with virtual routers. If you don’t want to get all advances of automation, well, you can pick up these initial configuration files: 106_config_initial_SR1 106_config_initial_SR2 106_config_initial_XR3 106_config_initial_XR4 106_config_initial_linux
Configuration of underlay network infrastructure (IP fabric)
We start our lab activities with the configuration of the IPv4 fabric, using BGP. Such design is covered in RFC 79 38. IP addresses are configured using default lab setup (link), so we need to do the following steps:
- Shutdown links SR1 – SR2 and XR3 – XR3.
- Configure eBGP peering on interface level according to provided topology
- Announce system interfaces at Nokia (Alcatel-Lucent) SR OS routers SR1 and SR2 in BGP
We’ll consider this part as successfully done, when we can ping system interface of Nokia (Alcatel-Lucent) VSR SR2 from system interface of SR1.
BGP topology to deploy:
Here is the necessary configuration: .
Nokia (Alcatel-Lucent) SR OS | Cisco IOS XR |
SR1 | XR3 |
A:SR1>edit-cfg# candidate view |
RP/0/0/CPU0:XR3(config)#show conf |
SR2 | XR4 |
A:SR2>edit-cfg# candidate view |
RP/0/0/CPU0:XR4(config)#show conf |
For more details, refer to previous chapters: BGP and DC with MPLS
I hope that most of the configuration is familiar to you, if not – see the tip above. As a general consideration, I’ve reduced timers, because BFD doesn’t work on Cisco IOS XRv router. In real environment you will use BFD to track neighbour failure. We also enabled peer tracking at Nokia (Alcatel-Lucent) SR OS routes, which will speedup convergence later, when we configure multihop eBGP session for EVPN overlay. Next-hop resolution over BGP routes is necessary to make EVPN overlay as well, because next-hop for all EVPN routes will be system interfaces of another VSR learned through BGP IPv4 unicast (AFI/SAFI 1/1). Brief check that BGP sessions are up and running
RP/0/0/CPU0:XR3#show bgp ipv4 unicast summary |
The final part of the configuration is to announce system interfaces. I put it separately, because the output above is already very long:
SR1 | SR2 |
A:SR1>edit-cfg# candidate view |
A:SR2>edit-cfg# candidate view |
The configuration is absolutely identical, the only minor difference is the IPv4 addresses, which must be equal to local system interfaces.
After the configuration is applied, theses IP addresses are prorogated:
A:SR1# show router route-table |
And routers can establish connectivity to each other:
A:SR1# ping 10.0.0.22 source 10.0.0.11 count 1 |
Our underlay fabric is ready and fully functional. So we can move to overlay configuration.
Configuration of overlay network infrastructure (EVPN + VXLAN)
Now, when IP fabric works, we need to perform some actions to get our L2 VPN working:
- Configure peering for EVPN address family between Nokia (Alcatel-Lucent) SR OS routers SR1 and SR2.
- Create corresponding service instances at SR1 and SR2, which provides EVPN service for customer
- Additionally, we’ll configure emulations of VMs (virtual machines) at Cisco IOS XR routers XR3 and XR4 to test connectivity over EVPN
In total our service topology looks as follows:
#1. Configuration of EVPN AFI/SAFI in BGP
We start with the first point, as it’s quite simple and involve only Nokia (Alcatel-Lucent) SR OS routers:
SR1 | SR2 |
A:SR1>edit-cfg# candidate view |
A:SR2>edit-cfg# candidate view |
The same construct we have used before upon configuration of Data Centre with BGP-LU/MPLS in data plane (link). Pay attention to multihop option, otherwise the peering won’t be established.
If everything is configured successfully, the BGP peering in EVPN address family goes up:
A:SR1# show router bgp summary |
As don’t have any VNIs configured, we don’t send or receive any prefixes.
#2. Configuration of EVPN service instances
This point is related solely to configuration of Nokia (Alcatel-Lucent) VSRs, much the same as the previous one. The reason is simple: Cisco IOS XR works as spine switches in our data centre fabric and they don’t know about any VXLAN traffic:
SR1 | SR2 |
A:SR1>edit-cfg# candidate view |
A:SR2>edit-cfg# candidate view |
Once we have configured EVPN in EVPN-PBB fashion with MPLS encapsulation (link), so you can read some details there. In general, we have BGP VPN’s business as usual (RD/RT) and some new parameters (VXLAN VNI and VXLAN in BGP-EVPN).
What I have learned, Nokia (Alcatel-Lucent) SR OS doesn’t send EVPN type 1 route (Ethernet auto-discovery) by default, at least with VXLAN encapsulation.
Starting from this point both our Nokia (Alcatel-Lucent) SR OS routers SR1 and SR2 starts sending EVPN type-3 route (inclusive multicast), which is used for BUM (broadcast, unknown unicast, multicast) traffic:
A:SR1# show router bgp routes evpn inclusive-mcast |
As you remember (link to BGP article), in Nokia (Alcatel-Lucent) SR OS we see only received routed (Adj-RIB-In).
We can also check the VTEPs, which we have within our VNI:
A:SR1# show service id 10000123 vxlan |
#3. Configuration of VM (emulation)
As we mentioned, we’ll create VRFs at Cisco IOS XR routers XR3 and XR4, which will act as emulation of VMs in our data centre:
XR3 | XR4 |
RP/0/0/CPU0:XR3(config)#show conf |
RP/0/0/CPU0:XR3(config)#show conf |
Moreover we also need to configure bridge in Linux to provide connectivity between new VMs and leaf switches (SR1, SR2):
Linux |
sudo /sbin/vconfig add vnet2 111 |
Verification of EVPN with VXLAN
The good information is that Nokia (Alcatel-Lucent) VSR are fully functionall router with working data plane for all type of services, comparing to Cisco IOS XRv, which can perform only control-plane for L2 VPNs, as we discussed before. That’s why we are able to perform full verification including sending data. Let’s issue ping packets from VM1 to VM2:
RP/0/0/CPU0:XR3#ping vrf VM1 192.168.0.4 |
In parallel we make a traffic capture in Wireshark:
Some explanation to the PCAP above:
- Under “A” the path from VM1 to VM2 is shown. VXLAN uses UDP (destination port 4789) to encapsulate the packets.
- Under “B” the backward path from VM2 to VM1 is shown.
- In “1” is shown part of VXLAN header, where VNI is located. If we convert “0x7B” from hexadecimal to decimal we’ll get “123”, which is exactly our configured VNI value.
- From “2” and further the initial Ethernet frame is stored. The first field is destination MAC address, so “00:50:56:23:d3:7e” is MAC address of VM2
- In “3” we have source MAC address of VM1, which is “00:50:56:31:b7:73”
- In “4” the EtherType is coming, which value “0x0800” is equal to IPv4 payload”
To provide more proof for explanation, we check the ARP table in emulated VM1 at Cisco IOS XR router XR3:
RP/0/0/CPU0:XR3#show arp vrf VM1 |
So, we got working EVPN solution with VXLAN encapsulation between Nokia (Alcatel-Lucent) SR OS routers SR1 and SR2. For Cisco IOS XR routers XR3 and XR4 this traffic is seen as ordinary IPv4 traffic, and they even don’t know that they forward EVPN in overlay. That’s what VXLAN about.
But for sure, information about VXLAN must present on VTEPs. First of all, let’s check BGP routing table:
A:SR1# show router bgp routes evpn mac |
We see MAC address of VM4 learned at VTEP SR1 in VNI 123. So control plane looks nice. What about data plane?
A :SR1# show service id 10000123 fdb detail |
It looks good as well. So VTEP SR1 knows that it has local MAC of VM1 at SAP 1/1/2:111 and remote MAC from VM2 learned through VTEP 10.0.0.22 in VNI 123.
Final configuration files as they looks in CLI are here: 106_config_final_SR1 106_config_final_XR4 106_config_final_XR3 106_config_final_SR2 106_config_final_linux
Also you can use this Ansible playbooks, which deploys all necessary configuration through NETCONF with YANG data model: 106_lab.tar
The following sequence of Ansible playbooks is needs to be performed:
!— To create base topology |
More info about lab building and NETCONF/YANG
Lessons learned
What I have missed in the initial lab deployment, is installation of wireshark. You can grab it by the following:
$ sudo yum install -y wireshark wireshark-gnome |
As you see, we install two packages: one is for CLI and another is for GUI.
To launch it, you need to launch wireshark with root privileges, otherwise you will have empty list of the network interfaces (by default). Another solution would be to create appropriate groups and perform hell amount of other configuration, but I prefer the simple way:
$ sudo wireshark |
Conclusion
As we briefly mentioned in the beginning, EVPN/VXLAN is currently de-facto standard for building data centre’s network. And all sizes of data centres, starting from just 4 switches deployment up to really big with hundreds of switches, will profit from its deployment. In general, EVPN is considered by many vendors as a full replacement of existing L2 VPN services and even potentially L3 VPN services (through usage of route type 5, we’ll write separate article about it). If you aren’t familiar with this technology, I strongly advise you to invest your time in learning it and doing this lab.Take and good bye!
P.S.
If you have further questions or you need help with your networks, I’m happy to assist you, just send me message. Also don’t forget to share the article on your social media, if you like it.
Support us
BR,
Anton Karneliuk
Hi,
Can you please help me. How can i create wireshark dumps on router and download the wireshark file on our system.
Many thanks!!! for your support
Hi Sonu,
it depends, what router you are speaking about. Some of them have built in functionality to make PCAP (packet capture) files, others not.
BR,
Anton
Hi Anton,
Awesome article.
I am new to VXLAN, and tried to simulate it using Arista vEOS all across the topology.
What I want to ask is that, is that possible to have the VXLAN working without using any of overlay technologies such as EVPN, Multicast, etc. ? I’ve tried only use OSPF as underlay IP network and then configured VXLAN and its component. And somehow it works. But I don’t know how, because I don’t have many knowledge yet for VxLAN
Thank you,
Hi Fakhri,
Thanks for the feedback.
VXLAN is just encapsulation type, which is used to transmit some traffic over the network. How do you use it, it’s up to your. EVPN or Multicast is technology that helps to assure proper information distribution (processing of BUM traffic, then proper unicast traffic). I don’t know your setup, to explaing how it somehow working 🙂 For therory you could refer to some online articles (as I’m not writing about theory 🙂 ) or to some books, like http://www.ciscopress.com/store/building-data-centers-with-vxlan-bgp-evpn-a-cisco-nx-9781587144677
BR,
Anton
BR,
Anton
Hi Anton,
Thank you for the reply and the link it is useful.
I think I already figure out how my setup is working. Arista can handle BUM traffic by using Head End Replication (HER) or (I cannot verify whether it is true or not) OTV in cisco NX-OS. By using HER we are defining the list of desired VTEP that we want to flood on each switches, this can be done manually or automatically using arista Cloud Vision (CVX).
And just like using eBGP as underlay network technology, OSPF is only to provide underlay IP connectivity and HER is doing the rest to carry VXLAN traffic.
Maybe, In my opinion, for larger network using OSPF and HER is not very scalable.
Btw: here is my topology if you’re interested
https://drive.google.com/file/d/16odR8FbxKf7ixHmkXLHoPi_3T6NOx_D8/view?usp=sharing
Thx,
—
Fakhri
Hi Fakhri,
you are describing HER correct, the point that it must be done, as I told, there is no magic. So you have manually added the list of endpoints for your VXLAN, so the endpoint knows how to treat BUM.
Regarding scalability, well, there is no single answer. HER is used in BGP-EVPN as well, it’s just another type of control plane, not the data plane, which stays with VXLAN. So we need to distinguish:
1) Control plane: how you VTEP about other VTEPs and associated routes (manuall, multicast group or BGP-EVPN).
2) Data plane: how you send the traffic (VXLAN/MPLS) and how you treat BUM (HER or multicast)
Normally BUM traffic is very low, like less 1% of the communication, so HER is OK. but for sure, it depends on your applications.
BR,
Anton