bannerbanner
IT Cloud
IT Cloud

Полная версия

IT Cloud

Язык: Русский
Год издания: 2021
Добавлена:
Настройки чтения
Размер шрифта
Высота строк
Поля
На страницу:
4 из 12

HTTP – Zabbix makes requests over http, for example printers

SNMP – a network protocol for communicating with network devices

IPMI is a protocol for communicating with server devices such as routers

In 2019, Gratner presented a rating of monitoring systems in its square:

** Dynatrace;

** Cisco (AppDynamics);

** New Relic;

** Broadcom (CA Technologies);

** Riverbed and Microsoft;

** IBM;

** Oracle;

** SolarWinds;

** Micro Focus;

** ManageEngine and Tingyun.

Not included in the square:

** Correlsense;

** Datadog;

** Elastic;

** Honeycomb;

** Instant;

** Jennifer Soft;

** Light Step;

** Nastel Technologies;

** SignalFx;

** Splunk;

** Sysdig.

When we run an application in a Docker container, all the standard output (what is displayed in the console) of the running program (process) is buffered. We can view this buffer with the docker logs name_container program . If we follow the Docker ideology – "one process, one container" – we can view the logs of an individual program. It is convenient to use the less and tail commands to view logs. The first program allows you to conveniently scroll through the logs with the keyboard arrows, search for the desired one based on matches and using a regular expression pattern, like the text editor vi . The second program displays the amount we need

An important criterion for ensuring smooth operation is the control of free space. So, if there is no space left, then the database will not be able to write data, with other components the situation can be more dire than the loss of new data. Docker has limit settings not only for individual containers, at least 10%. During imaging or container startup, an error may be thrown that the specified limits have been exceeded. To change the default settings, you need to tell the Dockerd server the settings, after stopping it with service docker stop (all containers will be stopped) and after resuming it with service docker start (the containers will be resumed). Settings can be set as options / bin / dockerd –storange-opt dm.basesize = 50G –stirange-opt

In Container, we have authorization, control over our containers, with the ability to create them for testing and see graphs on the processor and memory. More will require a monitoring system. There are quite a few monitoring systems, for example, Zabbix, Graphite, Prometheus, Nagios, InfluxData, OkMeter, DataDog, Bosum, Sensu and others, of which Zabbix and Prometheus are the most popular. The first is traditionally used, since it is the leading deployment tool, which admins love for its ease of use (all you need to do is to have SSH access to the server), low-level, which allows you to work not only with servers, but also with other hardware, such as routers. The second is the opposite of the first: it is focused exclusively on collecting metrics and monitoring, focused as a ready-made solution, and not a framework and fell in love with programmers, set it according to the principle, chose metrics and received graphs. The key feature between Zabbix and Prometheus is not in the preferences of some to customize in detail for themselves and others to spend much less time, but in the scope. Zabbix is focused on setting up work with a specific hardware, which can be anything, and often very exotic in a corporate environment, and for this entity, a manual collection of metrics is written, a schedule is manually configured. For a dynamically changing environment of cloud solutions, even if it is just a Docker container, and even more so if it is Kubernetes, in which a huge number of entities are constantly created, and the entities themselves, apart from the general environment, are not of particular interest, it is not suitable for this in Prometheus Service Discovery is built-in and navigation is supported for Kubernetes through the namespace, the balancer (service) and the group of containers (POD), which can be configured in Grafana in the form of tables. In Kubernetes, according to The News Stack 2017, Kubernetes User and Experience is used in 63% of cases, in the rest there are more rare cloud monitoring tools.

Metrics can be system (for example, CRU, RAM, ROM) and application (service and application metrics). System metrics are core metrics that are used by Kubernetes for scaling and the like and non-core metrics that are not used by Kubernetes. Here is an example of bundles for collecting metrics:

* cAdvisor + Heapster + InfluxDB

* cAdvisor + collectd + Heapster

* cAdvisor + Prometheus

* snapd + Heapster

* snapd + SNAP cluster-level agent

* Sysdig

There are many monitoring systems and services on the market. We will consider exactly OpenSource, which can be installed in your cluster. They can be divided according to the model of obtaining metrics: into those who collect logs by polling, and those who expect that metrics will be poisoned in them. The latter are simpler both in structure and in use on a small scale. An example would be InfluxDB, which is a database that you can write to. The downside of this solution is the difficulty of scaling both in terms of support and load. If all services write at the same time, then they can overload the monitoring system, especially since it is difficult to scale, since the endpoint is registered in each service. The first group to practice a pull model of interaction is Prometheus. It is also a database with a daemon that polls services based on their registrations in the configuration file and pulls labels in a specific format, for example:

cpu_usage: 2

cpu_usage {app: myapp}: 2

Prometheus is a mature product, it was developed in 2012, and in 2016 it was included in the CNCF (Cloud Native Computing Foundation) consortium. Prometheus consists of:

* TSDB (Time Series Satabase) database, which looks more like a storage queue for metrics, with a specified accumulation period, for example, a week, allowing hundreds of thousands of metrics to be processed per second. This base is local to Prometheus, does not support horizontal scaling, in the case of Prometheus it is achieved by raising several of its instances and sharding them. Prometheus supports data aggregation, which is useful for reducing the amount of accumulated data, as well as archiving the database from memory to disk.

* Service Discovery support Kubernetes in a box through a public API through polling PODs filtered according to the config on port 9121 of the TPC.

* Grafana (a separate product, added by default) – a universal UI with dashboards and charts that supports Prometheus via PromQL.

To return metrics, you can use ready-made solutions or develop your own. For the vast majority of system metrics there is an exporter, and for applied metrics, you often have to give your own metrics. Exporters are general and specialized. For example, NodeExporter provides most of the metrics, including those for processes, but there are two of them, and there are more specialized metrics. If you run Prometheus without exporters, then it will give out almost a thousand metrics, but these are the metrics of Prometheus itself, and there will be no node_ * prefixes in them. For these metrics to appear, you need to enable NodeExporter and write a URL to it in the Prometheus configuration to collect the metrics it provides. For NodeExporter, this can be localhost or the node address and port 9256. Usually, exporters specialize in product-specific metrics, for example:

** node_exporter – node metrics (CRU, Memory, Network);

** snmp_exporter – SNMP protocol metrics;

** mysqld_exporter – MySQL database metrics;

** consul_exporter – Consul database metrics;

** graphite_exporter – Graphite database metrics;

** memcached_exporter – Memcached database metrics;

** haproxy_exporter – HAProxy balancer metrics;

** CAdvisor – container metrics;

** process-exporter – detailed process metrics;

** metrics-server – CRU, Memory, File-descriptors, Disks;

** cAdvisor – a Docker daemon metrics – containers monitoring;

** kube-state-metrics – deployments, PODs, nodes.

Prometheus supports remote data writing (https://prometheus.io/docs/prometheus/latest/configuration/configuration/#remote_write), for example, to TSDB distributed storage for Prometheus – Weave Works Cortex, using a setting in the configuration, which allows data analysis from multiple Prometheus:

remote_write:

– url: "http: // localhost: 9000 / receive"

Let's consider his work on a ready-made instance. I'll take www.katacoda.com/courses/istio/deploy-istio-on-kubernetes for this and go through it. Our Prometheus is located on its standard port 9090:

controlplane $ kubectl -n istio-system get svc prometheus

NAME TYPE CLUSTER-IP EXTERNAL-IP PORT (S) AGE

prometheus ClusterIP 10.99.70.170 9090 / TCP 6m59s

To open its UI, I'll go to the WEB tab and change the address 80 to 9090: https://2886795314-9090-ollie08.environments.katacoda.com/graph. In the input line, you need to enter the desired metric in the PromQL (Prometheus query language) language, as well as InfluxQL for InfluxDB and SQL for TimescaleDB. For example, I will enter "CRU", and it will display me a list containing it. There are two tabs under the line: a tab with a graph and a tab for displaying in a tabular form. I will be looking at a tabular view. I selected machine_cru_cores and clicked Execute. Common metrics usually have similar names, for example machine_cru_cores and node_cru_cores. The metrics themselves consist of the name, tags in brackets and the value of the metric, in the same form they need to be requested, in the same form they are displayed in the table.

machine_cpu_cores {beta_kubernetes_io_arch = "amd64", beta_kubernetes_io_os = "linux", instance = "controlplane", job = "kubernetes-cadvisor", kubernetes_io_arch = "amd64", kubernetes_io_hostname "=" controlplane ", kubernetes_io_hostname" = "controlplane"

machine_cpu_cores {beta_kubernetes_io_arch = "amd64", beta_kubernetes_io_os = "linux", instance = "node01", job = "kubernetes-cadvisor", kubernetes_io_arch = "amd64", kubernetes_io_hostname = "node01", kubernetes_io_hostname = "node01", kubernetes_io_hostname = "node01"

If the network is MEMORY, then you can select machine_memory_bytes – the size of the RAM on the machine (server or virtual):

machine_memory_bytes {beta_kubernetes_io_arch = "amd64", beta_kubernetes_io_os = "linux", instance = "controlplane", job = "kubernetes-cadvisor", kubernetes_io_arch = "amd64", kubernetes_io_hostname "}

machine_memory_bytes {beta_kubernetes_io_arch = "amd64", beta_kubernetes_io_os = "linux", instance = "node01", job = "kubernetes-cadvisor", kubernetes_io_arch = "amd64", kubernetes_io_hostname = "node901", kubernetes_io_hostname = "node901"

But in bytes it is not clear, so we will use PromQL to translate to Gb: machine_memory_bytes / 1000/1000/1000

{beta_kubernetes_io_arch = "amd64", beta_kubernetes_io_os = "linux", instance = "controlplane", job = "kubernetes-cadvisor", kubernetes_io_arch = "amd64", kubernetes_io_hostname = "controlplane", kubernetes_io25}

{beta_kubernetes_io_arch = "amd64", beta_kubernetes_io_os = "linux", instance = "node01", job = "kubernetes-cadvisor", kubernetes_io_arch = "amd64", kubernetes_io_hostname = "node01", kubernetes_io48}

Let's enter for memory_bytes to search for container_memory_usage_bytes – used memory. The list contains all containers and their current memory consumption, I will give only three:

container_memory_usage_bytes {beta_kubernetes_io_arch = "amd64", beta_kubernetes_io_os = "linux", container = "POD", container_name = "POD", id = "/ kubepods.slice / kubepods-besteffort.slice / kubepod-pods-beseff633. b6549e892baa8687e4e98a106024b5c31a4af077d7c5544af03a3c72ec8997e0.scope ", image =" k8s.gcr.io/pause:3.1 ", instance =" controlplane ", job =" kubernetes-cadvisor ", kubernetes-cadvisorname," kubernetes-cadvisor "," kubernetes-cadvisor "," kubernetes-cadvisor "," kubernetes-cadvisor " , name = "k8s_POD_etcd-controlplane_kube-system_0e619e5dc53ed9efcef63f5fe1d7ee71_0", namespace = "kube-system", pod = "etcd-controlplane", pod_name = "etcd-controlplane"} 45056

container_memory_usage_bytes {beta_kubernetes_io_arch = "amd64", beta_kubernetes_io_os = "linux", container = "POD", container_name = "POD", id = "/ kubepods.slice / kubepods-besteffort.slice / kubepods-pods-besteff2. 76711789af076c8f2331d8212dad4c044d263c5cc3fa333347921bd6de7950a4.scope ", image =" k8s.gcr.io/pause:3.1 ", instance =" controlplane ", job =" kubernetes-caduvisor ", kubernetes_dio_host , name = "k8s_POD_kube-proxy-nhzhn_kube-system_5a815a40-f2de-11ea-88d2-0242ac110032_0", namespace = "kube-system", pod = "kube-proxy-nhzhn", pod_name = "kube-proxy-450 nhz

container_memory_usage_bytes {beta_kubernetes_io_arch = "amd64", beta_kubernetes_io_os = "linux", container = "POD", container_name = "POD", id = "/ kubepods.slice / kubepods-besteffort.slice / kubepa-poda-besteffort. 24ef0e898e1bb7dec9854b67291171aa9c5715d7683f53bdfc2cef49a19744fe.scope ", image =" k8s.gcr.io/pause:3.1 ", instance =" node01 ", job =" kubernetes-caduvisor amuber, kubernetes_dio_arch ", kubernetes_dio_arch , name = "k8s_POD_kube-proxy-6v49x_kube-system_6473aeea-f2de-11ea-88d2-0242ac110032_0", namespace = "kube-system", pod = "kube-proxy-6v49x", pod_name = "kube-proxy-835549x

Let's set the label that is contained in the metrics to filter out one: container_memory_usage_bytes {container_name = "prometheus"}

container_MEMORY_usage_bytes {beta_Kubernetes_io_arch = "amd64", beta_Kubernetes_io_os = "linux", container = "prometheus", container_name = "prometheus", id = "/ kubePODs.slice / kubePODs-burstableODslice-burdeaf2.slice. b314fb5c4ce8894f872f05bdd524b4b7d6ce5415aeb3fb91d6048441c47584a6.scope ", image =" sha256: b82ef1f3aa072922c657dd2b2c6b59ec0ac88e69c447998291066e1f67e741d8 ", instance =" node01 ", JOB =" Kubernetes-cadvisor ", Kubernetes_io_arch =" amd64 ", Kubernetes_io_hostname =" node01 ", Kubernetes_io_os =" linux ", name =" k8s_prometheus_prometheus- 5b77b7d695-knf44_istio-system_eaf4e833-f2de-11ea-88d2-0242ac110032_0 ", namespace =" istio-system ", POD =" prometheus-5b77b7d695-knf44 ", POD_name =" prometheus-5b77b7d44

283443200

Let's bring in Mb: container_memory_usage_bytes {container_name = "prometheus"} / 1000/1000

{Beta_kubernetes_io_arch = "amd64", beta_kubernetes_io_os = "linux", container = "prometheus", container_name = "prometheus", id = "/ kubepods.slice / kubepods-burstable.slice / kubepods-burstable-podeaf4e833_f2de_11ea_88d2_0242ac110032.slice / docker-b314fb5c4ce8894f872f05bdd524b4b7d6ce5415aeb3fb91d6048441c47584a6 .scope ", image =" sha256: b82ef1f3aa072922c657dd2b2c6b59ec0ac88e69c447998291066e1f67e741d8 ", instance =" node01 ", job =" kubernetes-cadvisor ", kubernetes_io_arch =" amd64 ", kubernetes_io_hostname =" node01 ", kubernetes_io_os =" linux ", name =" k8s_prometheus_prometheus-5b77b7d695 -knf44_istio-system_eaf4e833-f2de-11ea-88d2-0242ac110032_0 ", namespace =" istio-system ", pod =" prometheus-5b77b7d695-knf44 ", pod_name =" prometheus-5b77b7d695-knf44 "}

286.18752

Filter by container_memory_usage_bytes {container_name = "prometheus", instance = "node01"}

beta_kubernetes_io_arch = "amd64", beta_kubernetes_io_os = "linux", container = "prometheus", container_name = "prometheus", id = "/ kubepods.slice / kubepods-burstable.slice / kubepods-burstable-podeaf4e833_f2de_11ea_88d2_0242ac110032.slice / docker-b314fb5c4ce8894f872f05bdd524b4b7d6ce5415aeb3fb91d6048441c47584a6. scope ", image =" sha256: b82ef1f3aa072922c657dd2b2c6b59ec0ac88e69c447998291066e1f67e741d8 ", instance =" node01 ", job =" kubernetes-cadvisor ", kubernetes_io_arch =" amd64 ", kubernetes_io_hostname =" node01 ", kubernetes_io_os =" linux ", name =" k8s_prometheus_prometheus-5b77b7d695- knf44_istio-system_eaf4e833-f2de-11ea-88d2-0242ac110032_0 ", namespace =" istio-system ", pod =" prometheus-5b77b7d695-knf44 ", pod_name =" prometheus-5b77b7d695-knf44 "}

289.890304

And on the second one it is not: container_memory_usage_bytes {container_name = "prometheus", instance = "node02"}

no data

There are also aggregate functions sum (container_memory_usage_bytes) / 1000/1000/1000

{} 22.812798976

max (container_memory_usage_bytes) / 1000/1000/1000

{} 3.6422983679999996

min (container_memory_usage_bytes) / 1000/1000/1000

{} 0

You can also group by labels instance: max (container_memory_usage_bytes) by (instance) / 1000/1000/1000

{instance = "controlplane"} 1.641836544

{instance = "node01"} 3.6622745599999997

You can perform actions with the same type of labels and filter out: container_memory_mapped_file / container_memory_usage_bytes * 100> 80

{Beta_kubernetes_io_arch = "amd64", beta_kubernetes_io_os = "linux", container = "POD", container_name = "POD", id = "/ kubepods.slice / kubepods-burstable.slice / kubepods-burstable-pode45f10af1ae684722cbd74cb11807900.slice / docker-5cb2f2083fbc467b8b394b27b69686d309f951450bcb910d509572aea9922806 .scope ", image =" k8s.gcr.io/pause:3.1 ", instance =" controlplane ", job =" kubernetes-cadvisor ", kubernetes_io_arch =" amd64 ", kubernetes_io_hostname =" controlplane ", kubernetes_io_os =" linux ", name = "k8s_POD_kube-controller-manager-controlplane_kube-system_e45f10af1ae684722cbd74cb11807900_0", namespace = "kube-system", pod = "kube-controller-manager-controlplane", pod_name = "kube-controller-manager-controlplane"}

80.52631578947368

You can look at the file system metrics using container_fs_limit_bytes, which produces a large list – I will give a few of it:

container_fs_limit_bytes {beta_kubernetes_io_arch = "amd64", beta_kubernetes_io_os = "linux", container = "POD", container_name = "POD", device = "/ dev / vda1", id = "/ kubepods.slice / kubepods-besteffort.subods / kubepods-besteffort.slice -besteffort-pod0e619e5dc53ed9efcef63f5fe1d7ee71.slice / docker-b6549e892baa8687e4e98a106024b5c31a4af077d7c5544af03a3c72ec8997e0.scope ", image =" k8s.gcr.io/pause:3.1 ", instance =" controlplane ", job =" kubernetes-cadvisor ", kubernetes_io_arch =" amd64 ", kubernetes_io_hostname = "controlplane", kubernetes_io_os = "linux", name = "k8s_POD_etcd-controlplane_kube-system_0e619e5dc53ed9efcef63f5fe1d7ee71_0", namespace = "kube-system", pod = "etcd-controlplane", pod_name "} etcd-controlplane =" etc

253741748224

container_fs_limit_bytes {beta_kubernetes_io_arch = "amd64", beta_kubernetes_io_os = "linux", container = "POD", container_name = "POD", device = "/ dev / vda1", id = "/ kubepods.slice / kubepods-besteffort.subods / kubepods-besteffort.slice -besteffort-pod5a815a40_f2de_11ea_88d2_0242ac110032.slice / docker-76711789af076c8f2331d8212dad4c044d263c5cc3fa333347921bd6de7950a4.scope ", image =" k8s.gcr.io/pause:3.1 ", instance =" controlplane ", job =" kubernetes-cadvisor ", kubernetes_io_arch =" amd64 ", kubernetes_io_hostname = "controlplane", kubernetes_io_os = "linux", name = "k8s_POD_kube-proxy-nhzhn_kube-system_5a815a40-f2de-11ea-88d2-0242ac110032_0", namespace = "kube-system", pod = "kube_name =", podhn "kube-proxy-nhzhn"}

253741748224

It contains the metrics of RAM through its device: "container_fs_limit_bytes {device =" tmpfs "} / 1000/1000/1000"

{beta_kubernetes_io_arch = "amd64", beta_kubernetes_io_os = "linux", device = "tmpfs", id = "/", instance = "controlplane", job = "kubernetes-cadvisor", kubernetes_io_arch = "amd64", kubernetes control_ioplane_host , kubernetes_io_os = "linux"} 0.209702912

{beta_kubernetes_io_arch = "amd64", beta_kubernetes_io_os = "linux", device = "tmpfs", id = "/", instance = "node01", job = "kubernetes-cadvisor", kubernetes_io_arch = "amd64", kubernetes_io_host , kubernetes_io_os = "linux"} 0.409296896

If we want to get the minimum disk, then we need to remove the RAM device from the list: "min (container_fs_limit_bytes {device! =" Tmpfs "} / 1000/1000/1000)"

{} 253.74174822400002

In addition to metrics that indicate the value of the metric itself, there are metrics and counters. Their names usually end in "_total". If we look at them, we will see an ascending line. To get the value, we need to get the difference (using the rate function) over a period of time (indicated in square brackets), something like rate (name_metric_total) [time]. Time is usually kept in seconds or minutes. The prefix "s" is used to represent seconds, for example 40s, 60s. For minutes – "m", for example, 2m, 5m. It is important to note that you cannot set a time shorter than the exporter polling time, otherwise the metric will not be displayed.

And you can see the names of the metrics that you could record along the path / metrics:

controlplane $ curl https://2886795314-9090-ollie08.environments.katacoda.com/metrics 2> / dev / null | head

# HELP go_gc_duration_seconds A summary of the GC invocation durations.

# TYPE go_gc_duration_seconds summary

go_gc_duration_seconds {quantile = "0"} 3.536e-05

go_gc_duration_seconds {quantile = "0.25"} 7.5348e-05

go_gc_duration_seconds {quantile = "0.5"} 0.000163193

go_gc_duration_seconds {quantile = "0.75"} 0.001391603

go_gc_duration_seconds {quantile = "1"} 0.246707852

go_gc_duration_seconds_sum 0.388611299

go_gc_duration_seconds_count 74

# HELP go_goroutines Number of goroutines that currently exist.

Raising the Prometheus and Graphana ligament

We examined the metrics in the already configured Prometheus, now we will raise Prometheus and configure it ourselves:

essh @ kubernetes-master: ~ $ docker run -d –net = host –name prometheus prom / prometheus

09416fc74bf8b54a35609a1954236e686f8f6dfc598f7e05fa12234f287070ab

essh @ kubernetes-master: ~ $ docker ps -f name = prometheus

CONTAINER ID IMAGE NAMES

09416fc74bf8 prom / prometheus prometheus

UI with graphs for displaying metrics:

essh @ kubernetes-master: ~ $ firefox localhost: 9090

Add the go_gc_duration_seconds {quantile = "0"} metric from the list:

essh @ kubernetes-master: ~ $ curl localhost: 9090 / metrics 2> / dev / null | head -n 4

# HELP go_gc_duration_seconds A summary of the GC invocation durations.

# TYPE go_gc_duration_seconds summary

go_gc_duration_seconds {quantile = "0"} 1.0097e-05

go_gc_duration_seconds {quantile = "0.25"} 1.7841e-05

Going to the UI at localhost: 9090 in the menu, select Graph. Let's add to the dashboard with the chart: select the metric using the list – insert metrics at cursor . Here we see the same metrics as in the localhost: 9090 / metrics list, but aggregated by parameters, for example, just go_gc_duration_seconds. We select the go_gc_duration_seconds metric and show it on the Execute button . In the console tab of the dashboard, we see the metrics:

go_gc_duration_seconds {instance = "localhost: 9090", JOB = "prometheus", quantile = "0"} 0.000009186 go_gc_duration_seconds {instance = "localhost: 9090", JOB = "prometheus", quantile = "0.25"} 0.000012056 = go_congc_ instance "localhost: 9090", JOB = "prometheus", quantile = "0.5"} 0.000023256 go_gc_duration_seconds {instance = "localhost: 9090", JOB = "prometheus", quantile = "0.75"} 0.000068848 go_gc_duration_seconds {instance = "localhost: 9090 ", JOB =" prometheus ", quantile =" 1 "} 0.00021869

by going to the Graph tab – their graphical representation.

Now Prometheus collects metrics from the current node: go_ *, net_ *, process_ *, prometheus_ *, promhttp_ *, scrape_ * and up. To collect metrics from Docker, we tell him to write his metrics in Prometheus on port 9323:

eSSH @ Kubernetes-master: ~ $ curl http: // localhost: 9323 / metrics 2> / dev / null | head -n 20

# HELP builder_builds_failed_total Number of failed image builds

# TYPE builder_builds_failed_total counter

builder_builds_failed_total {reason = "build_canceled"} 0

builder_builds_failed_total {reason = "build_target_not_reachable_error"} 0

builder_builds_failed_total {reason = "command_not_supported_error"} 0

builder_builds_failed_total {reason = "Dockerfile_empty_error"} 0

builder_builds_failed_total {reason = "Dockerfile_syntax_error"} 0

builder_builds_failed_total {reason = "error_processing_commands_error"} 0

builder_builds_failed_total {reason = "missing_onbuild_arguments_error"} 0

builder_builds_failed_total {reason = "unknown_instruction_error"} 0

# HELP builder_builds_triggered_total Number of triggered image builds

# TYPE builder_builds_triggered_total counter

На страницу:
4 из 12