Hello my friend,
When something goes wrong with the distributed application, where the network is involved (e.g., between client and web service, or between frontend and backend of services), the network is a first thing to be blamed. After the troubleshooting, it is often turned out that the network is innocent, but we need first need to prove it.
1
2
3
4
5 No part of this blogpost could be reproduced, stored in a
retrieval system, or transmitted in any form or by any
means, electronic, mechanical or photocopying, recording,
or otherwise, for commercial purposes without the
prior permission of the author.
Automated troubleshooting for automated networks?
The truth is that automation helped me so many times to figure out the root cause of the network outages or malfunctions that I even stopped counting that. I may say that automaton solutions work perfect, if you create them to solve your issues and tailor to your environment.
That’s what our Live Network Automation Training (10 weeks) and Automation with Nornir (2 weeks) are all about: to show you real automation in a real environment with multiple vendors together. No matter what those vendors are, the automation principles, tools and protocols stay the same and that is what you will learn with us: the full spectre of the automation approaches starting from the text-based automation used in the hyper scalers with the full configuration templated till the model-driven automation with NETCONF/RESTCONF/gNMI loaded with YANG data models for Cisco, Nokia, Arista and Cumulus. This knowledge comes with a lot of different exercises with direct console as well as Ansible/Python and Bash scripts. On top to that, you learn a lot of infrastructure skills such as building and managing Linux, KVM and Docker.
Moreover, we are running our trainings for 2 years already and constantly adopting them to the changes happening in the automation world. Master your automation skills with us.
Brief description
Having visibility in what is going on the wire is one of the key components of the successful troubleshooting. It is very often, when some application is not working, we are trying to see, if they send the packets to other part of application and if the response packets are coming back. If the latter is not the case, we start doing more in-depth troubleshooting.
To figure out, if the packets are being sent and received, we need to to do packet capture on the our host (server, container, virtual machine, laptop, etc) at least. It is also useful to do that on the intermediate devices if possible. To do the packet capture, we have one of tool, which is built-in in the wast majority of Linux distributions (even in Mac OS), which is called tcpdump.
No doubts tcpdump is a powerful tool in your arsenal, if you know how to use that. It allows you to dump all the received packets or selectively choose only the necessary ones based on the certain criteria, such as protocol type, port number, IPv4 or IPv6 address and so on. Besides that it allows you to write the received packets locally in a file, so that you can analyse that offline with the Wireshark.
For the Linux admins this tool is a part of the traditional toolkit, whereas for the network engineers that is not daily used. As such, let’s dive into that.
Usage
In this part you will learn a lot of the details how to use tcpdump on the devices running the Linux (e.g., CentOS VM, Raspberry PI, Cumulus Linux, and even Arista EOS and Nokia SR Linux).
#1. Installation
The good thing about the tcpdump is that you don’t need to install that: it is part of the standard Linux distribution. This is much different to other tools we have reviewed, such as speedtest, iperf, fping, or mtr. There might be some Linux distributions (e.g., tiny Alpine Linux), which don’t have tcpdump available by default. In these scenarios you would need to install it using the package manager the platform uses.
#2. Simple tests
It is very easy start using the tcpdump: all you need to do is to run the following command:
1 $ sudo tcpdump
In the simplest format, you even don’t need to provide any arguments. If you run the command as above, you will have all the packets your host sends or receives:
1
2
3
4
5
6 21:31:10.563966 IP abcd.cust.communityfibre.co.uk.52424 > ec2-50-18-194-39.us-west-1.compute.amazonaws.com.https: Flags [P.], seq 2907003270:2907003483, ack 133435834, win 2048, options [nop,nop,TS val 904268893 ecr 3950468171], length 213
21:31:10.584869 IP6 2a02:6b6d:1234:0:114b:65b1:b0e5:3c68.52586 > 2a02:6b6d:1234:0:ea9f:80ff:fe4c:d90d.domain: 61901+ PTR? 250.2.168.192.in-addr.arpa. (44)
21:31:10.587208 IP6 2a02:6b6d:4321:0:ea9f:80ff:fe4c:d90d.domain > 2a02:6b6d:4321:0:114b:65b1:b0e5:3c68.52586: 61901* 1/0/0 PTR abcd.cust.communityfibre.co.uk. (91)
21:31:10.587904 IP6 2a02:6b6d:4321:0:114b:65b1:b0e5:3c68.55820 > 2a02:6b6d:4321:0:ea9f:80ff:fe4c:d90d.domain: 15238+ PTR? 49.194.18.50.in-addr.arpa. (43)
21:31:10.594829 IP6 2a02:6b6d:4321:0:ea9f:80ff:fe4c:d90d.domain > 2a02:6b6d:4321:0:114b:65b1:b0e5:3c68.55820: 15238 1/0/0 PTR ec2-50-18-194-39.us-west-1.compute.amazonaws.com. (105)
21:31:10.596822 IP6 2a02:6b6d:1234:0:114b:65b1:b0e5:3c68.65244 > 2a02:6b6d:1234:0:ea9f:80ff:fe4c:d90d.domain: 52107+ PTR? 8.6.c.3.5.e.0.b.1.b.5.6.b.4.1.1.0.0.0.4.3.2.1.a.d.6.b.6.2.0.a.2.ip6.arpa. (90)
To specify, if you don’t specify any arguments for the tcpdump, the following packets are selected and displayed:
- All sent and received packets
- Packets on all interfaces
- where DNS resolution can be done, it is done
- where port can be resolved to applications, it is done
- All types of packets: ethernet, IPv4, IPv6
If you really need all those packets, that is your way to go. However, in may cases, you may need something more specific.
#3. Advanced tests
First thing that you can filter is the interface, where the packets are being received or sent. To do that, you need to add the key “-i” followed by the name of the interface. If you still want to filter all the interfaces, you can use the name “any“:
1
2 $ sudo tcpdump -i eth0
$ sudo tcpdump -i any
Another useful key is “-n“, which allows to disable the DNS resolution for IP addresses and port resolution for application names:
1
2
3
4
5 $ sudo tcpdump -n -i any
22:39:15.781643 IP 165.225.1.247.443 > 192.168.1.250.55806: Flags [.], ack 2378195641, win 2081, length 0
22:39:15.781695 IP 192.168.1.250.55806 > 165.225.1.247.443: Flags [.], ack 1, win 2048, options [nop,nop,TS val 908338670 ecr 2011701667], length 0
22:39:15.846153 IP 60.28.194.39.443 > 192.168.1.250.52424: Flags [.], ack 2907633172, win 442, options [nop,nop,TS val 3954556673 ecr 908338599], length 0
22:39:15.846157 IP 60.28.194.39.443 > 192.168.1.250.52424: Flags [P.], seq 0:45, ack 1, win 442, options [nop,nop,TS val 3954556673 ecr 908338599], length 45
This approach is generally more preferable, because the DNS resolution might be inaccurate, especially if you are dealing with the private IP addresses or customer applications.
If you are troubleshooting the Layer 2 issues, it makes sense to use the key “-e“, which would add also the Ethernet header. You can use it together with “-n” or separately:
1
2
3
4 $ sudo tcpdump -en -i any
22:51:35.296234 e8:9f:80:4c:d9:0d > 98:5a:eb:8c:a9:c0, ethertype IPv4 (0x0800), length 66: 60.28.194.39.443 > 192.168.1.250.52424: Flags [.], ack 2907730770, win 442, options [nop,nop,TS val 3955296123 ecr 909075978], length 0
22:51:35.419289 e8:9f:80:4c:d9:0d > 98:5a:eb:8c:a9:c0, ethertype IPv4 (0x0800), length 111: 60.28.194.39.443 > 192.168.1.250.52424: Flags [P.], seq 0:45, ack 1, win 442, options [nop,nop,TS val 3955296245 ecr 909075978], length 45
22:51:35.419350 98:5a:eb:8c:a9:c0 > e8:9f:80:4c:d9:0d, ethertype IPv4 (0x0800), length 66: 192.168.1.250.52424 > 60.28.194.39.443: Flags [.], ack 45, win 2047, options [nop,nop,TS val 909076238 ecr 3955296245], length 0
If provided level of the details is not enough, you can add the key “-vvv” which extends the verbosity of the output:
1
2
3
4
5
6 $ sudo tcpdump -en -i any -vvv
22:52:57.478136 98:5a:eb:8c:a9:c0 > e8:9f:80:4c:d9:0d, ethertype IPv4 (0x0800), length 78: (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 64)
192.168.1.250.55973 > 165.225.1.247.443: Flags [S], cksum 0xb236 (correct), seq 422574076, win 65535, options [mss 1460,nop,wscale 6,nop,nop,TS val 909158029 ecr 0,sackOK,eol], length 0
22:52:57.483848 e8:9f:80:4c:d9:0d > 98:5a:eb:8c:a9:c0, ethertype IPv4 (0x0800), length 74: (tos 0x0, ttl 58, id 46580, offset 0, flags [DF], proto TCP (6), length 60)
165.225.1.247.443 > 192.168.1.250.55973: Flags [S.], cksum 0x8f7d (correct), seq 598930541, ack 422574077, win 65535, options [mss 1460,nop,wscale 5,sackOK,TS val 2764011214 ecr 909158029], length 0
22:52:57.483916 98:5a:eb:8c:a9:c0 > e8:9f:80:4c:d9:0d, ethertype IPv4 (0x0800), length 66: (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 52)
The next direction is limit the output to a specific host (meaning that it can be both source and destination):
1
2
3
4
5
6
7
8
9 $ sudo tcpdump -en -i any host 8.8.8.8 -vvv
23:06:01.356297 98:5a:eb:8c:a9:c0 > e8:9f:80:4c:d9:0d, ethertype IPv4 (0x0800), length 98: (tos 0x0, ttl 64, id 62232, offset 0, flags [none], proto ICMP (1), length 84)
192.168.1.250 > 8.8.8.8: ICMP echo request, id 2969, seq 0, length 64
23:06:01.359638 e8:9f:80:4c:d9:0d > 98:5a:eb:8c:a9:c0, ethertype IPv4 (0x0800), length 98: (tos 0x0, ttl 115, id 0, offset 0, flags [none], proto ICMP (1), length 84)
8.8.8.8 > 192.168.1.250: ICMP echo reply, id 2969, seq 0, length 64
23:06:02.361402 98:5a:eb:8c:a9:c0 > e8:9f:80:4c:d9:0d, ethertype IPv4 (0x0800), length 98: (tos 0x0, ttl 64, id 30239, offset 0, flags [none], proto ICMP (1), length 84)
192.168.1.250 > 8.8.8.8: ICMP echo request, id 2969, seq 1, length 64
23:06:02.366877 e8:9f:80:4c:d9:0d > 98:5a:eb:8c:a9:c0, ethertype IPv4 (0x0800), length 98: (tos 0x0, ttl 115, id 0, offset 0, flags [none], proto ICMP (1), length 84)
8.8.8.8 > 192.168.1.250: ICMP echo reply, id 2969, seq 1, length 64
Now including also protocol type:
1
2
3
4
5
6 $ sudo tcpdump -en -i any 'host 8.8.8.8 and tcp' -vvv
23:08:38.637883 98:5a:eb:8c:a9:c0 > e8:9f:80:4c:d9:0d, ethertype IPv4 (0x0800), length 78: (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 64)
192.168.1.250.56243 > 8.8.8.8.443: Flags [S], cksum 0xd7f5 (correct), seq 3131889016, win 65535, options [mss 1460,nop,wscale 6,nop,nop,TS val 910092417 ecr 0,sackOK,eol], length 0
23:08:38.642933 e8:9f:80:4c:d9:0d > 98:5a:eb:8c:a9:c0, ethertype IPv4 (0x0800), length 74: (tos 0x0, ttl 57, id 8765, offset 0, flags [none], proto TCP (6), length 60)
8.8.8.8.443 > 192.168.1.250.56243: Flags [S.], cksum 0xb982 (correct), seq 1180074733, ack 3131889017, win 65535, options [mss 1430,sackOK,TS val 2732573535 ecr 910092417,nop,wscale 8], length 0
23:08:38.642965 98:5a:eb:8c:a9:c0 > e8:9f:80:4c:d9:0d, ethertype IPv4 (0x0800), length 66: (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 52)
And even the port number:
1
2
3
4
5 $ sudo tcpdump -en -i any 'host 8.8.8.8 and tcp and port 443' -vvv
23:10:03.071481 98:5a:eb:8c:a9:c0 > e8:9f:80:4c:d9:0d, ethertype IPv4 (0x0800), length 78: (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 64)
192.168.1.250.56269 > 8.8.8.8.443: Flags [S], cksum 0x8ed3 (correct), seq 1600598317, win 65535, options [mss 1460,nop,wscale 6,nop,nop,TS val 910176537 ecr 0,sackOK,eol], length 0
23:10:03.076506 e8:9f:80:4c:d9:0d > 98:5a:eb:8c:a9:c0, ethertype IPv4 (0x0800), length 74: (tos 0x0, ttl 57, id 10765, offset 0, flags [none], proto TCP (6), length 60)
8.8.8.8.443 > 192.168.1.250.56269: Flags [S.], cksum 0x7c91 (correct), seq 1653020631, ack 1600598318, win 65535, options [mss 1430,sackOK,TS val 1428313041 ecr 910176537,nop,wscale 8], length 0
Again, it is possible to go to a level lower and filter for the specific MAC address, which is particularly useful if you are troubleshooting the network connectivity for Docker or Linux bridges, where Linux is a transit host:
1
2
3
4
5 $ sudo tcpdump -en -i any 'ether host e8:9f:80:4c:d9:0d' -vvv
23:15:49.872344 e8:9f:80:4c:d9:0d > 98:5a:eb:8c:a9:c0, ethertype IPv4 (0x0800), length 66: (tos 0x28, ttl 37, id 34214, offset 0, flags [DF], proto TCP (6), length 52)
60.28.194.39.443 > 192.168.1.250.52424: Flags [.], cksum 0x8464 (correct), seq 133821451, ack 2909269833, win 442, options [nop,nop,TS val 3956750700 ecr 910522111], length 0
23:15:49.898939 e8:9f:80:4c:d9:0d > 98:5a:eb:8c:a9:c0, ethertype IPv4 (0x0800), length 66: (tos 0x28, ttl 37, id 34215, offset 0, flags [DF], proto TCP (6), length 52)
60.28.194.39.443 > 192.168.1.250.52424: Flags [.], cksum 0x82d2 (correct), seq 0, ack 349, win 442, options [nop,nop,TS val 3956750727 ecr 910522138], length 0
Finally, you may want to save the output in a file, which you can open with you beloved Wireshark and do the proper analysis. To do that, you need to use two keys “-s X“, where X is amount of saved bytes (0 – unlimited) and “-w Y“, where Y is the filename, where the packet capture will be saved.
1
2
3
4 $ sudo tcpdump -en -i any 'ether host e8:9f:80:4c:d9:0d' -vvv -s 0 -w test.pcap
^C49 packets captured
117 packets received by filter
0 packets dropped by kernel
The generated test.pcap you can open in Wireshark directly without any further modifications.
#4. Ideas for automation
There might be different scenarios, where you would like to automate the tcpdump. For example, as part of the troubleshooting of the intermittent packet loss (or general loss of connectivity between particular hosts), you may be willing to automatically collect the packet dump using those hosts as a filter, what would allow you to figure if you see requests and responses for the traffic flow. To do that, you may be triggering the tcpdump via Bash, Ansible or Python scripts.
Take me to the examplesLessons learned
I’m using the tcpdump quite often myself, but every time I need to do something a bit complicated (e.g., using multiple filters), I have to ask Google for the details. With this blogpost I just collected all my notes in a single place, so that when I need to do the troubleshooting the next time, I know where to find them.
Conclusion
You need to know what is happening in your network to be able to quickly solve the problems, when they raise. For ages we have used the SPAN/ERSPAN, which absolutely beautiful tools by the way, in a traditional networking environment, but the capabilities to see packets in real time on the network devices may be limited. It is different to Linux world though, where you have possibility to monitor the flows. Having tcpdump as your ally will significantly improve your skills as network engineer and troubleshooter. Take care and good bye.
Support us
P.S.
If you have further questions or you need help with your networks, our team is happy to assist you. Just book a free slot with us. Also don’t forget to share the article on your social media, if you like it.
BR,
Anton Karneliuk
Highly useful for every network engineer, thanks!
Hey Dhilip,
Thanks for the kind words. We are glad you liked it.