Полная версия
IT Cloud
Eugeny Shtoltc
IT Cloud
Prologue
More than 70 (76) tools are considered in practice in the book:
* Google Cloud Platform, Amazone WEB Services, Microsoft Azure;
* console utilities: cat, sed, NPM, node, exit, curl, kill, Dockerd, ps, sudo, grep, git, cd, mkdir, rm, rmdir, mongos, Python, df, eval, ip, mongo, netstat, oc, pgrep, ping, pip, pstree, systemctl, top, uname, VirtualBox, which, sleep, wget, tar, unzip, ls, virsh, egrep, cp, mv, chmod, ifconfig, kvm, minishift;
* standard tools: NGINX, MinIO, HAProxy, Docker, Consul, Vagrant, Ansible, kvm;
* DevOps tools: Jenkins, GitLab CI, BASH, PHP, Micro Kubernetes, kubectl, Velero, Helm, "http load testing";
* cloud Traefic, Kubernetes, Envoy, Istio, OpenShift, OKD, Rancher ,;
* several programming languages: PHP, NodeJS, Python, Golang.
Containerization
Infrastructure development historyLimoncelli (author of "The Practice of Cloud System Administration"), who worked for a long time at Google Inc, believes that 2010 is the year of transition from the era of the traditional Internet to the era of cloud computing.
* 1985-1994 – the time of mainframes (large computers) and intra-corporate data exchange, in which you can easily plan the load
* 1995-2000 – the era of the emergence of Internet companies,
* 2000-2003
* 2003-2010
* 2010-2019
The increase in the productivity of a separate machine is less than the increase in cost, for example, an increase in productivity by 2 times leads to an increase in cost significantly more than 2 times. At the same time, each subsequent increase in productivity is much more expensive. Consequently, each new user was more expensive.
Later, in the period 2000-2003, an ecosystem was able to form, providing a fundamentally different approach:
* the emergence of distributed computing;
* the emergence of low-power mass equipment;
* maturation of OpenSource solutions, allowing you to install software on multiple machines, not bundled with a processor license;
* maturation of telecommunication infrastructure;
* increasing reliability due to the distribution of points of failure;
* the ability to increase performance if needed in the future by adding new components.
The next stage was unification, which was most pronounced in 2003-2010:
* providing in the data center not a place in the closet (power-location), but already unified hardware purchased in bulk for the whole cent;
* saving on resources;
* virtualization of the network and computers.
Amazon set another milestone in 2010 and ushered in the era of cloud computing. The stage is characterized by the construction of large-scale data cents with a deliberate surplus in capacity to obtain a lower cost of computing power due to wholesale, based on savings for oneself and a profitable sale of their surplus at retail. This approach is applied not only to computing power, infrastructure, but also software, forming it as services to reduce the cost of their use by selling them at retail to both large companies and beginners.
The need for uniformity of the environment
Usually, novice Linux developers prefer to work from under Windows, so as not to learn an unfamiliar OS and stuff their own cones on it, because before everything was far from so simple and so debugged. Often, developers are forced to work from under Windows because of corporate preferences: 1C, Directum and other systems run only on Windows, and the rest, and most importantly the network infrastructure, is tailored for this operating system. Working from Windows leads to a large loss of working time for both developers and DevOps on fixing both minor and major differences in operating systems. These differences start to show up from the simplest tasks, for example, that it may be easier to make a page in pure HTML. But an incorrectly configured editor will put in the BOM and line feeds accepted in Windows: "\ n \ r" instead of "\ n"). BOM, when gluing the header, body and footer of the page, will create indents between them, they will not be visible in the editor, since these are formed by bytes of meta information about the file type, which in Linux do not have such a meaning and are perceived as translation of the indentation. Other newlines in GIT do not allow you to see the difference you made, because the difference is on each line.
Now let's take the Front developer. At first glance, what is difficult, because JS (JavaScript), HTML and CSS are interpreted natively by the browser. Previously, the layout of all different pages was done – it was checked by the designer and the customer and was given to the PHP developer for integration with the framework or cms. In order not to edit the header on each page, and then find out for a long time when they started to differ and which one was more correctly used by HAML. HAML adds additional syntax to HTML to avoid boiling: loops, file connections, in our case, a single header and footer. But it requires a special program to transform HTML into pure HTML. In MS Windows, this is solved by installing the compiler program and connecting it to the IDE, since all these features are in the IDE WEBStorm. With CSS and its size, duplicates, dependencies and support for different browsers, everything is much more complicated – LESS was used there, and now he headed the more functional SASS and libraries of support for different browsers, which requires the RUBY compiler and such a bundle usually does not work the first time. And for JS, CoffeScript was used. All this needs to be run through compression and validation programs (HTML validation is usually not needed).
When the project starts to grow and ceases to be separate pages with "JS inserts", and becomes SPA (Single Page Application, one page web applications), where everything is created by JS, and already collectors (Galp, Grunt), package managers and NodeJS are not assembled, the difficulties are becoming more and more. All these programs are free and were originally developed for Linux, designed to work from the BASH console and under Windows do not always behave well and are difficult to automate in graphical interfaces, despite the efforts of the IDE developers. Therefore, many WEB developers have switched from MS Windows to MacOS, which is a fork of UNIX systems, BASH is natively built into it.
Docker as lightweight virtual machines
Initially, the problem of isolation of provisions and projects was solved by virtualization – system software that emulates at a certain level the environment, which can be hardware (a computer as a set of components such as a processor, RAM, network device and others, if necessary) or, less often, operating system. The system administrator chooses the amount of RAM (no more free), processor, network device. Installs the operating system and, if necessary, drivers, installs the necessary programs. If you need a workplace for a second developer, he does the same. To install programs, it looks into the / bin directory of the first one and installs the missing ones. And here the first quiet problem arises, which has not yet manifested itself, that the programs are installed in different versions, but this will be a headache for developers, if one developer has a lot of work, and the other does not, or a headache for this sysadmin – if the developer works, in production – not.
With the increase in the number of jobs, the following problems are added:
* Less than 30% of the performance of the parent system is available to you, because all the commands that the processor must execute are executed by the virtualization program. To increase performance, the VT-X processor mode allows the processor, in which the processor directly executes commands from the virtual environment, and in cases of incompatibility, it throws an exception. True, these throws are hundreds of times more expensive than ordinary commands, so adult virtualization systems (VirtualBox, VMVare, and others) try to filter and modify potentially incompatible commands, which can significantly increase performance.
* Each workstation has to be created anew, for this the system administrator writes a script that automates this process, but it is naturally not ideal, and coming to constantly update it, make patches of incompatibilities that were installed by the programmers.
* A linear increase in the occupied disk space from the number of containers, and an exponential increase from product versions, despite the fact that one instance takes up a lot of space. That is, each sandbox contains a program emulation instance, an operating system image, all installed programs and developer code, which is quite a lot. The only thing that is one thing is the installation of the virtual machine itself, and then only within the framework of one physical server. For example, if we have 10 developers, then the size will be 10 times larger, with 3 product versions – 30.
All these disadvantages for the WEB are intended to be solved by Docker. Hereinafter, we will talk about Docker for Linux, and will not consider the slightly different kernel implementation for FreeBSD, and the implementation for Windows 10 Professional, within which either a stripped-down Linux kernel or independent development of Windows containerization is purchased. The main idea is not to create a layer (virtualization of hardware, its own OS and hypervisor), but to differentiate rights. You do not multiply to put in the MS Windows container, but you can put both RedHut and Debian, since the kernel is one, and the differences are in the files, creating a sandbox (separate directories and prohibiting going beyond its chapels) with these files. Also, we are talking about WEB solutions, since for native solutions, problems may arise when the program needs to have exclusive access from the container (Docker sandbox) to the OS kernel, for example, for native rendering of windows. You can also limit the amount of memory, processor time, number of processes.
Lightweight Virtualization or Lightweight Isolation – A Look at Docker Implementation
Let's take a look at the history of the appearance of the prerequisites for the emergence of Docker, namely the prerequisites, since Docker itself does not implement isolation, let alone virtualization, but organizes work with it from the first. Unlike virtualization, which resembles a hangar with its own world and its own foundation, on which you can impose whatever your heart desires, for example, we give out a lawn, then for isolation you can draw an analogy with a fence. Isolation appeared in the Linux kernel gradually, in parts responsible for different levels, and in parallel, programs appeared to provide an interface and the concept of applying this isolation in real projects. Isolation consists of 6 types of resource limitation.
The first in the kernel was the isolation of the file system, which allows you to create a sandbox using the chroot command back in 1979, from outside the sandbox is completely visible, but when you go inside the folder over which the command is executed becomes root, and you will not be able to return. The next one was the delimitation of processes, so the sandbox exists and the host system as long as the process with pid (number) 1 exists. For the sandbox it is its own, outside the sandbox it is a normal process. Further, the steel distinctions of CGroups were tightened: user groups, memory and others. All this exists in the kernel of any Linux, regardless of whether you have Docker installed or not. Throughout history, attempts were made, OpenSource and commercial, to create containers, developing the functionality themselves, and similar solutions found their users, but they did not penetrate the masses. Docker at the beginning of its existence used a fairly stable but difficult to use LXC containerization solution. Gradually he replaced LXC with native CGroup. Docker also supports the salinity of its image (more on that later), but does not implement it itself, but uses UnuonFS (UFS).
Docker and disk space
Since Docker does not implement functionality, but uses the built-in Linux kernel, and does not have a graphical interface under the hood, it itself takes up very little space.
Since the container uses the host OS kernel, the base image (usually the OS) contains only complementary packages. So the Debian Docker image is 125Mb, and the ISO image is 290Mb. To check that one core is used in the container, we will display information about it: uname -a or cat / proc / version , and information about the container environment itself cat / etc / ussue
Docker builds an image based on the instructions in the Dockerfile, which can be located remotely or locally, and can be created from it at any time. Therefore, if you are not using the image at the moment, then you can delete it. An exception is the image created from the container using the Docker commit command , but it is not very correct to create this way, and you can always select from the Dockerfile image with the Docker history command and delete the image. The advantage of storing images is that you do not need to wait while it is being created: the OS and libraries are downloaded.
Docker itself uses an image called Image, which is built based on the instructions in the Dockerfile. When you create several containers on its basis, the space practically does not increase, since the container is just a process and config settings. When changing files in the container, the files themselves are not saved, but the changes made are saved, which will be deleted after the container is transferred. This guarantees in 99% of cases a completely identical environment, and as a result, it is not important to place preparatory operations common for all containers for installing specific programs in the image, a side effect of which is the absence of their duplication. To be able to save data, folders and files are mounted to the host (parent) system. Therefore, you can run a hundred or more containers on a regular computer, and you will not see any changes in the free local on the disk. At the same time, if the developers use git, and how can they do without it, and they often accumulate, then there may be no need to mount the folders with the source code.
The docker image is not a monolithic image of your product, but a layer cake of images, the layers of which are cached. This allows you to significantly save time for creating an image. Caching can be disabled with the build –no-cache = true command switch if Docker does not recognize that the data is mutable. Docker can see the changes in the ADD statement adding a file from the host system to the container by the hash of the file. So, if you create two containers, one with NGINX and the other with MySQL, both of which are based on Ubuntu 14.04, there will be three image layers: MySQL, NGINX, and Ubuntu. The images can be viewed with the Docker history command . It also works for your projects – when copying 2 versions of the code to your image using the ADD command with your product, you will have 3 layers and two images: the base one, with the code of the first version and the code of the second version, regardless of the number of containers. The number of layers is limited to 127. It is important to note that when cloning a project, you need to specify the version, not just git clone , but git clone –branch v1 and git clone –branch v2 , otherwise Docker will cache the layer created by the Git Clone command and when creating the second we get the same image.
Docker does not consume resources, but only limits them if it is specified in the settings when creating the container (for memory, the key is m, for the processor – c). Since Docker supports different containerization filesystems, there is no unified interface to customize. But, in any case, the resource is consumed as much as required, and not as much as allocated, as in virtual machines.
Such concern about the occupied disk space and the weightlessness of the containers themselves entails irresponsibility in downloading images and creating containers.
Garbage collection by container
Due to the fact that the container provides much more opportunities than a virtual machine, the situation is complicated by leaving garbage after Docker is running. The problem is solved simply by running the moore collector, which appeared in version 1.13 or more difficult for earlier versions by writing a script you need.
Just as simple to create a docker container run name_image , it is also simple to delete docker rm -f id_container . Often, in order to just experiment, it is convenient to run the container interactively docker run -ti name_image bash and we will immediately find ourselves in the container. When we exit it with Cntl + D , it will be stopped. To automatically remove the output field, use the –rm parameter . But because containers are so weightless, they are so easy to create, they are often thrown and not removed, which leads to their explosive growth. You can look at the running ones with the docker ps command , and at the stopped ones – docker ps -a . To prevent this, use the docker containers prune garbage collector , which was introduced in version 1.13 and which will remove all stopped containers. For earlier versions, use the docker rm $ script (docker ps -q -f status = exited) . If its launch is not desirable for you, most likely you are using Docker incorrectly, since it is almost as quick and easy to pull a container from an image as to restore it to work. If you need to save state in the container, then this is done by mounting folders or volumes.
A slightly more complicated situation is with images. When creating a container, if there is no image, it will be downloaded. Since one image can be for several containers, then when the container itself is deleted, it is not deleted. You will have to delete it manually docker rmi name_image , and if it is used, a warning will simply be issued. The cost of saving disk space comes at the cost of the fact that Docker cannot simply determine whether an image is needed yet or not. Since version 1.13, it can, using the docker imgae prune -a command , analyze which images are not used by containers and delete them. You need to be more careful here if Docker cannot get the image again, but the assumption of such a situation is not very correct. One such situation is the creation of a clustered image, while the Dockerfile config describing the process of its creation was lost, otherwise you can get the image from the Dockerfile using the docker build name_image command . It is correct to immediately take action and restore the Dockerfile from the image by looking at the commands that create images using Docker history name_image . The second situation is to create an image from a running container using the Docker commit command , and not from the Dockerfile, which is so actively popularized, but also actively deprecated.
Since an image consists of layers that are shared in different images, these layers remain in different emergency situations. Since we cannot use them separately, it is safe to delete them with the docker image prune command .
To save the results of the container's work, you can mount the host machine folder to the container folder. We can explicitly specify the folder on the host machine, for example, docker run -v / page_host: / page_container nama_image , or enable it to be generated by docker run -v / page_container nama_image . To remove generated folders (volumes) that are no longer used by containers, enter the Docker volume prune command . For the collection of unused networks, there is also a garbage collector.
There is also a single garbage collector, in fact, simply combining specialized docker system prune parameters into one with logically compatible parameters . There is a tendency to put it in crowns. You can also look at the space occupied by all containers, all images and all volumes using the docker system df command , and also without grouping – docker system df -v .
Many of the issues described here by garbage collection are handled by Docker-compose. In addition, it greatly simplifies life, unless you run the container once for experiments. So the command Docker-compose up starts the containers, and docker-compose down -v removes them, and all dependencies between them are also removed. All container launch parameters are described in Docker-compose.YML, as well as the relationships between them. Thanks to this, when changing the launch parameters of containers, you do not need to worry about deleting the old ones and creating new ones, you do not need to register all the parameters of the containers – just fill in with the up parameter , and it will either re-create or update the container configuration.
To prevent cluttering the system, Docker has a built-in configurable limit on the number of containers and images, reminding you to clean the system by running the garbage collector.
Saving time on container creation
We already met in the previous topic about images, about their layers and caching. Let's look at them in terms of container creation time. Why is this so important, after all, by analogy with virtualization, the system administrator started the creation of the container and while he passes it to the programmer, by this time he will definitely be assembled. It is important to note that a lot has changed since then, namely, the principles and requirements for the ecosystem and its use have changed. So, for example, if earlier the developer, having developed and tested his code at his workplace, sent it to the QA manager for testing it for compliance with business requirements, and when his turn comes to this code, the tester at his workplace will run this code and check … Now the infrastructure is handled by DevOps, which establishes a continuous process for delivering features developed by programmers, and containers are created automatically with each submission to the production branch for automated testing. At the same time, so that the work of some tests does not affect the work of others, a separate container is created for each test, and often the tests run in parallel in order to instantly show the result to the developer, while he remembers what he did and did not switch his attention to another task.
For standard programs: no need to install, no need to maintain
We often use a huge number of ready-made solutions. When choosing a solution, we are faced with a dilemma: on the one hand, it is more universal and more proven than we can afford to do, on the other hand, it is complex enough to figure out how to properly install and configure it ourselves, in order to install all dependencies, resolve conflicts, set up for initial use. Now installation and configuration has become much easier, standardized, low-level problems are largely absent. But before we continue, let's digress and take a look at the process from getting started to starting to use the app within the story:
* In those days, when all programs were written in assembler, the programs were distributed by mail, users had already installed and tested them, because testing in the companies was not provided. In case of problems, the user informed the developer about the problems to the company and, after fixing them, received by mail the already corrected version on the disk. The process is very long and the user tested it himself.
* During the distribution on disks, companies already wrote their software products in higher-level languages, tested them for different OS versions. Hereinafter, we will consider free software. The program already contained a MakeFile, which itself compiled and installed the program.
* Since the advent of the Internet, software is massively installed using package managers, when they exit, it is downloaded and installed from the remote OS repository. He tries to monitor and maintain the compatibility of the compatibility of programs. Further study and use of the program: how to start it, how to configure it, how to understand that it works falls on the user or the system administrator.
* With the advent of Docker Hub and WEB, applications are downloaded and run by a container. It usually does not need to be configured for initial operation.
For containers and images in general, the server can adjust the amount of free space and the occupied space. By default, 10G is allocated for all containers and images, while this volume should remain as dm.min_free_space = 5%, but it is better to put it in the config, which may have to be created as /etc/docker/daemon.json :