Site icon Karneliuk

Tools 9. Monitoring Availability of Customers via HTTP GET, ICMP, and DNS via Dockerised Prometheus

Hello my friend,

in the previous blogpost we’ve started the discussion how you can improve your customers’ experience in your network via better observability of the network health with Prometheus by means of periodic automated speedtest and iperf3 measurements. Albeit it is very important and useful, it doesn’t tell you if the customers’ or your own resources are available. By resource we mean any exposed service, such as web page, streaming service, etc. Today you will learn how to setup a monitoring with Docekrised Prometheus to make sure that you know for sure if the services are available for customers.


1
2
3
4
5
No part of this blogpost could be reproduced, stored in a
retrieval system, or transmitted in any form or by any
means, electronic, mechanical or photocopying, recording,
or otherwise, for commercial purposes without the
prior permission of the author.

Why to Automate Monitoring?

Monitoring allows to make you aware, what is the state of resources you are interested in. At a bare minimum, you shall be notified if the state of resources deviates from the acceptable value and, therefore, crosses some threshold. At the same time, this is just the first step. The end goal is self-healing systems, where outages of different components don’t affect the service as a whole via automated configuration changes to restore the service. And this is why you need to know, how the automation works.

And we are keen to assist you with learning the automation skills!

We offer the following training programs:

During these trainings you will learn the following topics:

Moreover, we put all mentions technologies in the context of real use cases, which our team has solved and are solving in various projects in the service providers, enterprise and data centre networks and systems across the Europe and USA. That gives you opportunity to ask questions to understand the solutions in-depts and have discussions about your own projects. And on top of that, each technology is provided with online demos and labs to master your skills thoroughly. Such a mixture creates a unique learning environment, which all students value so much. Join us and unleash your potential.

Start your automation training today.

Brief Description

In the previous blogpost about monitoring with Prometheus we have discussed a use case, where we are hosting resources in public data centres and are willing to know, what is the performance of the network connectivity between our cloud locations and what is the bandwidth of the internet connectivity there. Today we will take a look at our public resources from a different angle. Namely, we’ll try to measure the user experience. As this phrase is quite a vague, let’s clarify what exactly we mean. We will test:

Obviously, the list of application’s availability metrics can be much longer, but we believe that this list is a good starting point and it can be used to validate the health of almost all web-based services.

All the mentioned metrics make sense, if they are measured from outside of the location, where the service is hosted. Moreover, from different geographical areas we’ll have different results, which can be input for a further planning, if we want to optimise the application delivery worldwide and improve the customer experience. Let’s see how can we can measure that.

Use Case

For today’s lab, we’ll use the following scenario. We have three web services we are willing to monitor:

For these web services we want to monitor the following parameters:

Solution Description

The good thing is that folks from Prometheus thought about such a use case and created a corresponding exported, which is called the blackbox exporter. What a strange name… or not?

Originally Prometheus is created for a purpose of monitoring your infrastructure, so that you have access to the monitored device, install an exporter there and scrape the data. However, in the described scenario it doesn’t matter. From the end user perspective, the services are provided by some “black boxes”, which are existing somewhere far away and end user doesn’t have a management access to them and can only consume a specific application running on a specific port. Hence, the name.

Read the first blogpost about Prometheus to get up to speed with the terminology.

Let’s visualise, how our monitoring setup is going to look like today.

In the same way as in the previous blogpost, Prometheus and exporters are deployed as Docker containers, which significantly simplifies the operational overhead. Also in terms of communication:

Enrol to our Zero-to-Hero Network Automation Training to learn how to use Docker and create your own containerised automation.

Implementation Details

It is assumed you have read the previous blogpost and completed steps provided in it.

Step #1. Hosts with Exporters

In contrast to the exporters from the previous blogpost, this time it is needed to do some configuration on the exporter side itself. The way how the blackbox exporter works:

Step #1.1. Blackbox Exporter Configuration File

To implement the monitoring of aforementioned KPIs, you would need to create the following configuration file on the Prometheus blackbox exporter side:


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
$ cat config/blackbox.yaml
---
modules:
  http_2xx:
    prober: http
    timeout: 10s
    http:
  icmp_test:
    prober: icmp
    timeout: 10s
    icmp:
      preferred_ip_protocol: ip4
      payload_size: 64
  dns_test:
    prober: dns
    timeout: 10s
    dns:
      query_name: karneliuk.com
      preferred_ip_protocol: ip4
      ip_protocol_fallback: false
  http_parsing:
    prober: http
    timeout: 10s
    http:
      fail_if_body_not_matches_regexp:
        - ".*</html>"
...

The following logic is applied here:

The full list of configuration options is provided in the official documentation for the Prometheus blackbox exporter.

Step #1.2. Blackbox Exporter Docker-compose File

The second step in the Prometheus blackbox exporter setup is the docker-compose file, which launches Docker container(s) with the exporters. You can run one big container with all types of checks, or choose a micro-services approach, with each check conducted by a separate Docker container. As such, the docker-compose file looks like as follows;


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
$ cat docker-compose.yaml
---
version: "3.9"
services:
  blackbox_http:
    image: "prom/blackbox-exporter:master"
    ports:
      - "9115:9115"
    volumes:
      - "./config:/config"
    command:
      - "--config.file=/config/blackbox.yaml"
  blackbox_icmp:
    image: "prom/blackbox-exporter:master"
    ports:
      - "9116:9115"
    volumes:
      - "./config:/config"
    command:
      - "--config.file=/config/blackbox.yaml"
  blackbox_dns:
    image: "prom/blackbox-exporter:master"
    ports:
      - "9117:9115"
    volumes:
      - "./config:/config"
    command:
      - "--config.file=/config/blackbox.yaml"
  blackbox_http_parser:
    image: "prom/blackbox-exporter:master"
    ports:
      - "9118:9115"
    volumes:
      - "./config:/config"
    command:
      - "--config.file=/config/blackbox.yaml"
...

As you can see, we create 4 identical containers with a single parameter changing:

We can just generate public port automatically and simplify the content of the docker-compose file; however, this will require us to change the configuration on the Prometheus side at each restart, which is not a desirable setup. All the rest parameters of all 4 Docker containers are the same.

Step #1.3. Launch Prometheus Blackbox Exporter

Once the configuration file and docker-compose files are created, launch the exporter side:


1
2
3
4
5
6
7
8
9
10
11
12
13
14
$ sudo docker-compose up -d
Creating network "web-monitoring_default" with the default driver
Pulling blackbox_http (prom/blackbox-exporter:master)...
master: Pulling from prom/blackbox-exporter
554879bb3004: Pull complete
9ec611452210: Pull complete
c17bd3230a15: Pull complete
57f96573daa0: Pull complete
Digest: sha256:1f01186d9bd9f8b090a730079a9912d0ae022d768c377d9dca62310682dc178e
Status: Downloaded newer image for prom/blackbox-exporter:master
Creating web-monitoring_blackbox_http_parser_1 ... done
Creating web-monitoring_blackbox_icmp_1        ... done
Creating web-monitoring_blackbox_dns_1         ... done
Creating web-monitoring_blackbox_http_1        ... done

Once launched, validate that containers are up and running:


1
2
3
4
5
6
$ sudo docker container ls
CONTAINER ID   IMAGE                           COMMAND                  CREATED          STATUS          PORTS                                       NAMES
629fe9a51a24   prom/blackbox-exporter:master   "/bin/blackbox_expor…"   40 minutes ago   Up 40 minutes   0.0.0.0:9115->9115/tcp, :::9115->9115/tcp   web-monitoring_blackbox_http_1
6cdf474c2ef3   prom/blackbox-exporter:master   "/bin/blackbox_expor…"   40 minutes ago   Up 40 minutes   0.0.0.0:9117->9115/tcp, :::9117->9115/tcp   web-monitoring_blackbox_dns_1
0a5055b92fea   prom/blackbox-exporter:master   "/bin/blackbox_expor…"   40 minutes ago   Up 40 minutes   0.0.0.0:9118->9115/tcp, :::9118->9115/tcp   web-monitoring_blackbox_http_parser_1
7fd2c7a4af9a   prom/blackbox-exporter:master   "/bin/blackbox_expor…"   40 minutes ago   Up 40 minutes   0.0.0.0:9116->9115/tcp, :::9116->9115/tcp   web-monitoring_blackbox_icmp_1

Step #2. Host with Prometheus

The setup of the Prometheus side doesn’t change:

Refer to the previous blogpost for further details.

Step #2.1. Prometheus Configuration File

As described earlier, we would need to have job for each exporter. In the provided scenario, there are 3 exporters, so you would need to have jobs for each of them. Moreover, as there were multiple Docker container created to serve each type of probes, there will be 4 probes per a blackbox exporter. It means that in total you would have 4 * 3 = 12 jobs.

You can downscale it back to 6 by combining ICMP, HTTP, and HTTP with parsing into a single one. DNS still will be outstanding, as it has a different mechanics.

The following Prometheus configuration file is used to implement the aforementioned scenario:


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
$ cat config/prometheus.yaml
---
global:
  scrape_interval: 1m
  scrape_timeout: 30s

scrape_configs:
# blackbox tests
  - job_name: 'blackbox-probe-http-1'
    metrics_path: /probe
    params:
      module: [http_2xx]
    static_configs:
      - targets:
        - https://karneliuk.com
        - https://www.openstack.org
        - https://github.com
    relabel_configs:
      - source_labels: [__address__]
        target_label: __param_target
      - source_labels: [__param_target]
        target_label: instance
      - target_label: __address__
        replacement: 192.168.51.72:9115
  - job_name: 'blackbox-probe-http-2'
    metrics_path: /probe
    params:
      module: [http_2xx]
    static_configs:
      - targets:
        - https://karneliuk.com
        - https://www.openstack.org
        - https://github.com
    relabel_configs:
      - source_labels: [__address__]
        target_label: __param_target
      - source_labels: [__param_target]
        target_label: instance
      - target_label: __address__
        replacement: 192.168.52.72:9115
  - job_name: 'blackbox-probe-http-3'
    metrics_path: /probe
    params:
      module: [http_2xx]
    static_configs:
      - targets:
        - https://karneliuk.com
        - https://www.openstack.org
        - https://github.com
    relabel_configs:
      - source_labels: [__address__]
        target_label: __param_target
      - source_labels: [__param_target]
        target_label: instance
      - target_label: __address__
        replacement: 192.168.53.72:9115
  - job_name: 'blackbox-probe-icmp-1'
    metrics_path: /probe
    params:
      module: [icmp_test]
    static_configs:
      - targets:
        - karneliuk.com
        - www.openstack.org
        - github.com
    relabel_configs:
      - source_labels: [__address__]
        target_label: __param_target
      - source_labels: [__param_target]
        target_label: instance
      - target_label: __address__
        replacement: 192.168.51.72:9116
  - job_name: 'blackbox-probe-icmp-2'
    metrics_path: /probe
    params:
      module: [icmp_test]
    static_configs:
      - targets:
        - karneliuk.com
        - www.openstack.org
        - github.com
    relabel_configs:
      - source_labels: [__address__]
        target_label: __param_target
      - source_labels: [__param_target]
        target_label: instance
      - target_label: __address__
        replacement: 192.168.52.72:9116
  - job_name: 'blackbox-probe-icmp-3'
    metrics_path: /probe
    params:
      module: [icmp_test]
    static_configs:
      - targets:
        - karneliuk.com
        - www.openstack.org
        - github.com
    relabel_configs:
      - source_labels: [__address__]
        target_label: __param_target
      - source_labels: [__param_target]
        target_label: instance
      - target_label: __address__
        replacement: 192.168.53.72:9116
  - job_name: 'blackbox-probe-dns-1'
    metrics_path: /probe
    params:
      module: [dns_test]
    static_configs:
      - targets:
        - 8.8.8.8
        - 9.9.9.9
    relabel_configs:
      - source_labels: [__address__]
        target_label: __param_target
      - source_labels: [__param_target]
        target_label: instance
      - target_label: __address__
        replacement: 192.168.51.72:9117
  - job_name: 'blackbox-probe-dns-2'
    metrics_path: /probe
    params:
      module: [dns_test]
    static_configs:
      - targets:
        - 8.8.8.8
        - 9.9.9.9
    relabel_configs:
      - source_labels: [__address__]
        target_label: __param_target
      - source_labels: [__param_target]
        target_label: instance
      - target_label: __address__
        replacement: 192.168.52.72:9117
  - job_name: 'blackbox-probe-dns-3'
    metrics_path: /probe
    params:
      module: [dns_test]
    static_configs:
      - targets:
        - 8.8.8.8
        - 9.9.9.9
    relabel_configs:
      - source_labels: [__address__]
        target_label: __param_target
      - source_labels: [__param_target]
        target_label: instance
      - target_label: __address__
        replacement: 192.168.53.72:9117
  - job_name: 'blackbox-probe-http-parser-1'
    metrics_path: /probe
    params:
      module: [http_parsing]
    static_configs:
      - targets:
        - https://karneliuk.com
    relabel_configs:
      - source_labels: [__address__]
        target_label: __param_target
      - source_labels: [__param_target]
        target_label: instance
      - target_label: __address__
        replacement: 192.168.51.72:9118
  - job_name: 'blackbox-probe-http-parser-2'
    metrics_path: /probe
    params:
      module: [http_parsing]
    static_configs:
      - targets:
        - https://karneliuk.com
    relabel_configs:
      - source_labels: [__address__]
        target_label: __param_target
      - source_labels: [__param_target]
        target_label: instance
      - target_label: __address__
        replacement: 192.168.52.72:9118
  - job_name: 'blackbox-probe-http-parser-3'
    metrics_path: /probe
    params:
      module: [http_parsing]
    static_configs:
      - targets:
        - https://karneliuk.com
    relabel_configs:
      - source_labels: [__address__]
        target_label: __param_target
      - source_labels: [__param_target]
        target_label: instance
      - target_label: __address__
        replacement: 192.168.53.72:9118
...

For each of the jobs:

This was the most complicated part in the Prometheus configuration. Finally, you need to launch the container.

Step #2.2. Prometheus Docker-compose File

This file isn’t changed since the previous blogpost:


1
2
3
4
5
6
7
8
9
10
11
12
13
14
$ cat docker-compose.yaml
---
version: "3.9"
services:
  prom:
    image: "prom/prometheus"
    ports:
      - "9090:9090"
    volumes:
      - "./config:/etc/prometheus"
      - "tsdb_persistent:/prometheus"
volumes:
  tsdb_persistent:
...

Step #2.3. Launch Prometheus

Using docker-compose, launch the instance of Prometheus:


1
2
3
4
$ sudo docker-compose up -d
Creating network "prometheus_default" with the default driver
Creating volume "prometheus_tsdb_persistent" with default driver
Creating prometheus_prom_1 ... done

Validate the newly created Docker container with Prometheus is up and running:


1
2
3
$ sudo docker container ls
CONTAINER ID   IMAGE             COMMAND                  CREATED       STATUS          PORTS                                       NAMES
e778afb49974   prom/prometheus   "/bin/prometheus --c…"   2 hours ago   Up 59 minutes   0.0.0.0:9090->9090/tcp, :::9090->9090/tcp   prometheus_prom_1

Let some time (5-10 minutes) for metrics to be scrapped before progressing further.

Validation

First of all, check that the proves are up in the “Status” -> “Targets” tab:

You would see the long list of probes. If they are up, this is a desired state and it is good. However, if they are down, then you would need to troubleshoot it (e.g., network connectivity, status of Docker container, firewall rules, correctness of the probes’ configuration, etc).

Next, navigate to “Graph” tab and choosing the “Graph” visualisation, start looking for the needed metrics:

Step #1. Reachability of Servers

For this use case, it is recommended to use ICMP provided the ICMP is not filtered on the server. You can use the probe_icmp_duration_seconds metric, which effectively contains the RTT of the ping together with the processing delay on the probe side. As we are collecting the metric from the different nodes, we need to aggregate the data to simplify the view per instance (monitored site), but yet per exporter node:

Step #2. Availability of the Web Service

For this case, it is recommended to use two metrics:

The first one will allow you to see, if the service is OK resulting in HTTP 200 response code, whereas the second one will show how quickly the user will get the response for the service.

However, this time, you would need to select the probes, which makes a simple HTTP GET without parsing. Therefore, in addition to aggregation of the time, you would need also add a filtering by the job name, which will exclude specific jobs:

HTTP Status Code
HTTP Response Time

Step #3. Resolution of FQDN

For DNS, it is recommended to use probe_dns_duration_seconds to asses how long does it take to make a resolution:

DNS Resolution Time

Step #4. Availability of Specific Content

The setup is the same as for the availability of the web services in step #2 with a single difference: filtering by job name is now inverted.

HTTP Response Time with RegEx Parsing

Examples in GitHub

You can find this and other examples in our GitHub repository.

Lessons Learned

The were two key lessons learned for us:

Conclusion

Prometheus provides a great capabilities for monitoring of user experience of publicly available services (e.g., web pages, e-commerce platforms, etc). The central management of the monitoring solution via a configuration file at Prometheus node reduces the operational expenses quite significantly, which together with Docker-based implementation of a backend and exporters makes the maintenance of the platform manageable even for small teams. In the upcoming blogpost about Prometheus, we’ll touch an example of creating your own exporter. Take care and good bye.

Need Help? Contract Us

If you need a trusted and experienced partner to automate your network and IT infrastructure, get in touch with us.

P.S.

If you have further questions or you need help with your networks, we are happy to assist you, just send us a message. Also don’t forget to share the article on your social media, if you like it.

BR,

Anton Karneliuk

Exit mobile version