Docker is one of the most popular containerization platforms available today. Founded in 2014 by Solomon Hykes, Docker has come a long way since then. Docker is becoming increasingly popular due to the ever-increasing rate with which it has been adopted by top companies.
Docker has now become the defacto standard when people think about deploying their applications through containers. It has become the base for container-based orchestration for companies ranging from small-scale startups to large-scale enterprises. In this article, we will discuss the following topics –
- What is Docker?
- Docker container evolution.
- The underlying concept of Linux containers.
- Docker platform.
- What can you use Docker for?
- Docker architecture.
- How does Docker work?
- Docker objects.
- Dockerhub, Dockerfile, Docker run.
- Docker Swarm and Compose.
- Advantages of Docker.
- Why Docker containers are better than Virtual Machines?
We will walk you through each of the above-mentioned Docker concepts in detail. This will help you to understand the essence of Docker and containers in general. So without any further ado, let’s get started.
What is Docker?
Docker is a containerization platform that is used to build, test, and deploy applications. These are done in a containerized, isolated, and packaged environment called containers. Docker allows you to isolated your applications from the underlying host machine infrastructure. This allows you to build and share applications quickly and easily. It increases security, portability, and allows developers to work on the same application without having to worry about dependency conflicts.
We can leverage Docker’s methodology to ship, test, or deploy code to significantly reduce the delay in building code and time-to-market. Originally build for Linux, Docker now supports Windows as well as macOS. Let’s consider the example below to understand the use of Docker.
A simple example to understand containers
Suppose you are working on a web application with your team. You along with your other team member are required to work on the front-end of the web application. Since, both of you are working on different machines, both of you need to work on the same copy of the website and keep them updating based on the changes that the other one has made. Also, suppose you have included a feature which requires an older version of a package. However, the system of your other team member has the newer version of this package installed. This might lead to dependency conflicts as some libraries of that package might not work in the newer version.
A simple solution to all these problems is to create build and deploy your applications in containers. Containers allow us to create an isolated and packaged environment to build, test, and deploy our applications. It is separated from the underlying host machine. The environment provided by the containers already has the packages, libraries, system files, binaries, configurations, etc. installed in it. Hence, we can easily run our application and share it with our team members.
Docker Container Evolution
Containerization is not a new concept. Google has been using its own container technology for quite a few years now. It started way back in 2000s. Check out the list of all the projects that are container-based and started on or after 2000.
|Year of inception||Technology|
|2000||The introduction of the container concept was done by FreeBSD jail.|
|2003||OS-level virtualization was released by a Linux-V server project.|
|2005||Another OS-level virtualization project called Solaris Zones.|
|2007||A research on generic process containers was released by Google.|
|2008||The LXC containers was released.|
|2011||Warden was announced by Cloudfoundry.|
|2013||Google open-sourced lcmcty.|
|2013||Dot-Cloud announced the Docker project.|
|2014||CoreOS was announced by Rocket. (rkt)|
|2016||Windows Container was released.|
The Underlying Concept of Linux Containers
Before we dive into the concept of Docker containers, it’s important to understand what are Linux containers (LXC). Before the inception of containerization, organizations used to achieve isolation with the help of Virtual Machines using hypervisors like hyperv, Xen, etc. Virtual Machines used to run on top of the hardware of the underlying host machines. Containers, however, uses OS-level virtualization to achieve the same result.
There are two important Linux concepts that we need to understand. These are – Userspace and Kernel Space.
- Userspace – The userspace contains all the codes that is used to run user-specific programs like applications, processes, etc. When we start a program action, the corresponding userspace process makes a system call to the kernel space.
- Kernel space – Kernel space can also be called as the heart of the Linux Operating System. It houses the kernel code which is used to deal with the system storage, hardware, etc.
Containers as a Process
When we start any application in a Linux machine, such as an Nginx server, we essentially start a process. A process is nothing but an instruction which is self-contained and has limited level of isolation. What Docker containers does, is that, it isolates these processes with only those configurations and files that are required to run them. Hence, we can define containers as a process with the isolation of userspace components. This makes us feel that we are running the applications in a separate operating system.
Container can also be called as a group of processes because the parent process which is the container itself can have several other processes running inside it.
For instance, when we start an Nginx container running as a process in our machine, the parent process is the Nginx process. And there are several child processes such as workers, cache managers, etc. running inside it.
Each container has its own isolated userspace. This allows us to run multiple containers on a single host. However, this does not mean that each container has its own OS. Virtual Machines have their own kernel OS. However, this makes them very heavy to run and they utilize a lot of resources of the underlying host. However, a container sits on top of the host’s OS. It shares the kernel of the host OS. In fact, we can run multiple Linux distro-based containers on a single host.
Linux Namespaces and Control Groups
Namespaces and control groups in Linux help it to isolate containers. We can understand this with the help of an example. Consider a building with several flats in it. Even though all the flats are inside the same building, however, they have their own identity. They are separated by concrete walls and entry into one flat is only allowed with appropriate permission. Each household has its own electricity connection, water supply, etc.
In the same way, a single Linux host machine can run multiple containers. Each of them is isolated from one another. They have their own CPU, memory, processes, IP, etc. These are achieved by Linux namespaces and control groups.
- Namespaces – These helps Linux containers to have their one mount points, process management, IP, etc. It provides boundaries to containers. The important namespaces in a Linux machine are – pid, net, ipc, mnt, uts, usr, group, etc. These namespaces allow users to their own network interfaces, IP, etc. The processes running inside each namespace do not have the access to its outer world.
- Control Groups – We don’t specify any CPU, memory limit on starting a service in Linux. The kernel automatically does this for us. However, we can also explicitly mention these limits using cgroups in Linux. There has to be a mechanism to limit the number of resources allocated to such a large number of processes or containers in a Linux machine. The Linux control groups helps is to do so. They help us to restrict memory, CPU, and other resources that are allocated to each container. If we don’t do this, then a single container can eat up all the resources and leave others high and dry. The Docker tool abstracts these complex processes for us and allows us to specify the limits using simple parameters.
Docker provides you with a containerization platform with tools to package and run an application in an isolated environment. The containers are very light in weight and they contain all the files and packages needed to run the application. Consequently, we don’t need to rely on what packages or dependencies are installed on the host. Docker provides us with tools such as containers, images, registries, networks, etc. to build and share these applications easily. Hence, we can ensure that everyone on our team has the same container environment to run the applications.
Docker provides us with all the tools that are needed to manage the entire lifecycle of our containers.
- We can develop our applications and all of their components in a single container or even a microservices architecture built using multiple containers.
- We can even test the application and distribute it using Docker containers.
- Docker allows us to deploy our application in the production environment through containers or container-orchestration. Irrespective of whether the production environment is a virtual machine, a Linux host, or a cloud provider, the entire process is the same.
What can you use Docker for?
Docker allows quick and consistent application delivery.
Docker helps us to streamline our application development lifecycle. It allows application developers to work in standardized environments using containers. They are great of CI and CD workflows. Let us consider the below scenarion.
- The application developers in our team write their code in their own local machine.
- They share their code snippets with their team members using Docker containers.
- Using Docker, their works can be pushed to test environments and run automated as well as manual tests.
- If bugs are introduced, they can be fixed in the development container itself and then redeploy them to the test environment.
- When the testing is over, getting the bug fix to the customer can be done simply by pushing the new Docker image to the production environment.
Docker allows responsive scaling and deployment.
The container-based platform provided by Docker can run on a host machine, a virtual machine, or even on the cloud. The portability and lightweight nature of Docker containers make it easy to manage workloads dynamically. Scaling of applications becomes quite easier. This is because we only need to create new containers and we will be able to scale them easily.
Running multiple workloads on a single hardware.
Docker provides a ligh-weight, fast, cost-effective alternative to virtual machines. Hence, we can use all the compute capacity in our machine to deploy multiple containers. It allows us to create high density environments without having to purchase additional hardware.
The Docker architecture has changed multiple times since its inception. Initially, Docker used to run on top of Linux Containers (LXC). Then, it moved to libcontainer in 2014. Later on, it used runc which is a CLI following OCI specs. Now, it runs using containerd. The Docker architecture is separated into 3 components –
- Docker Engine which uses dockerd.
- Docker containerd which uses containerd.
- Docker-runc which uses runc.
The Docker Engine comprises of a server called daemon, API interface, and a CLI. The daemon runs continuously in the background using the dockerd service. It has the responsibility to build the images. To manage all the Docker objects, the daemon calls the rest API called containerd.
The containerd is a system daemon service that gets its instructions from daemon. The runc is the runtime of Docker containers. The runc is responsible for creating isolation of containers using namespaces and cgroups.
When a user executes a Docker run or build command using the Docker CLI (which can be a Linux terminal), the REST API exposes itself and passes the instruction to the daemon. The daemon creates the containers and images and responds back to the CLI using the API.
How does Docker work?
Docker leverages the client-server architecture. The CLI talks to the daemon which does the main tasks of creating, running, and sharing containers. They can either run on the same machine or the client can talk to a daemon on a remote machine. They communicate using a REST API over UNIX sockets.
The daemon is responsible to listen the requests from the API. Hence, it is able to manage objects such as images, containers, networks, and volumes. Multiple daemons can communicate to manage a group of services.
The CLI can be used to interact with Docker server. When we execute commands such as Docker run, the CLI sends the requests through the API to the daemon (dockerd). The dockerd then carries out appropriate actions.
There are many public and private registries that have tons of Docker images. The official Docker registry called Dockerhub is a public registry that contains several useful images such as Nginx, Apache, Centos, Ubuntu, etc. The users can directly pull images from the Dockerhub by executing the Docker pull command through the client. Users can also push their custom images to Dockerhub using the push command.
When we use Docker, there are many objects that we have to interact with regularly. These objects are images, containers, volumes, networks, etc. Let’s discuss them one by one.
Images are blueprints of the container environments that we create. These are non-writable templates. Often, an image is built using one or more base images. We pull these base images from a registry and use them along with other instructions that are provided inside a special file called Dockerfile. You can either pull a base image directly from registries such as Dockerhub or build your own image using Dockerfile. We can use the docker build command to build a new image from a dockerfile. To pull a base image from Dockerhub, we can use the docker pull command.
Containers are runtime instances of images. Containers are created when we execute docker start or run commands on the images. For example, if you have a web image built using the Nginx base image from Dockerhub, you can execute the Docker run command on it to create a container. Once we create a container associated with an image, a writable layer is created on top of that image. We can make changes inside the container, commit these changes, and create a new image layer.
Volumes are a solution to the problem of persistent storage. If you want to access the files from your host machine or other containers to the current container, you can mount a directory as volumes. We can share a single volume with multiple containers.
Networks allow us to create a secure channel of communication between multiple containers. We can share resources and information by exposing ports.
What is Dockerhub?
You can leverage tons of pre-built images that are available in public or private registries such as Dockerhub. We can either pull an image from Dockerhub directly or use it along with the FROM instruction inside a Dockerfile. To do so, we first need to create an account on hub.docker.com. Next, we need to log in to Dockerhub through our command line using the docker login command. Consequently, we will be able to pull images straight from the command line. If an image already exists in your machine, then Docker uses caching to use it without downloading a new copy.
There are several useful official and vendor-specific images available in Dockerhub such as Ubuntu, Centos, Nginx, Python, Apache, Java, Php, etc. We can use them to create a customized container environment for our application. Hence, we don’t need to go through the hassles of manually installing these packages. Moreover, we can write instructions for installing additional packages or libraries on top of these images using the Dockerfile.
Moreover, you can upload your own customized images on Dockerhub by creating a new repository inside it. This allows you to share your work with your team members.
What is Dockerfile?
A Dockerfile is a simple text file which allows you to write instructions to create a customized DOcker image. Almost all the Dockerfiles start with a FROM instruction. Using this instruction, you can use any image from Dockerhub as your base image. Each instruction in a Dockerfile creates a new image layer on top of the previous layers. The new layer will only contain the differences from the previous ones. Some commonly used Dockerfile instructions are – FROM, RUN, CMD, ENTRYPOINT, MAINTAINER, WORKDIR, COPY, ADD, LABEL, etc.
Using the RUN instruction, you can install packages. For instance, if you want a container associated with this image to have the python library installed, you can se the instruction – “RUN apt-get install python3”. This uses the apt-get command in the debian-based OS to install the python3 package.
Consider the Dockerfile below.
FROM ubuntu:latest WORKDIR /app COPY . . RUN apt-get install python3 RUN pip3 install beautifulsoup CMD ["python3", "app.py"]
The FROM instruction in the above Dockerfile pulls an Ubuntu base image with the latest tag from Dockerhub. The WORKDIR instruction is used to set the default working directory inside the container. All the subsequent instructions take place inside this directory. The COPY instruction is used to copy the files from the current directory in the host machine to the default working directory in the container. The RUN instructions are used to specify a command that needs to be run once the container is created. we have installed python3 and beautifulsoup library inside it. The CMD instruction is used to specify the command that needs to be executed when we invoke the Docker run command on the image. We can build this image using the following command –
$ docker build -t image-name .
Here, the -t flag is used to specify the tag of the image. However, we have not specified one. Hence, it will automatically insert the latest tag with the image. The . (dot) represents the path where we have the Dockerfile which is the current directory in this case.
Docker Run Command
Once we have created our image using the build command, we can run a container associated with this image using the run command. We can either invoke the run command on a custom image like the one we have created above. Or we can invoke the run command on an image which does not exist in our system but can be pulled from Dockerhub. To create a container of the image we created in the above example, we can execute this command.
$ docker run -it image-name bash
Here, the i and t flags are used to run the container in interactive mode so that we can provide input commands throught the CLI. After the image name, we have specified the command which we need to execute inside the container. We have used the bash command which will provide us the access to the bash of the container.
We can also invoke the run command on any image that does not exist in our machine. For instance, consider the command below.
$ docker run -i -t ubuntu /bin/bash
Once we execute the above command, the following things happen.
- The daemon checks for an image of the same name in our local machine. If it exists, it does not pull a new copy. If it doesn’t exist, it runs the following command in the background to pull the image from Dockerhub.
$ docker pull ubuntu
- The daemon then automatically creates a new container by executing the following command in the background.
$ docker container create
- The daemon then allocates a writable filesystem as a final layer. This allows us to modify the image.
- It then creates a network interface connecting the container to a default network. This assigns an IP to the container.
- It then starts the container in interactive mode and execute the bash command so that we have access to the bash of the container.
Docker Swarm and Compose
Docker Swarm is a tool for container orchestration which allows the users to manage multiple containers. These containers are deployed on different virtual machines or cloud servers. In simpler words, the user has the ability to manage all the containers through a single command line. Consequently, managing all the microservices in a microservices architecture becomes a piece of cake. There are several worked nodes working under one or more manager nodes. The manager nodes assigns tasks to the worker nodes. The nodes are configured to work together in a cluster. Docker swarm gurantees high availability of the application, provides load balancing, and encourages scalibility.
Docker compose allows us to define and run multi-container Docker applications. We can use a YAMl file to define the services. Each service defined inside the file can run a separate container. For instance, if we have a web application with flask front-end and redis backend, then we can deploy two separate containers running flask and redis services. We can then define these services inside a docker-compose YAML file. Each services environment can be defined in their own dockerfiles or can be imported directly from Dockerhub. Then, using a single command, we can easily start or stop these services.
Advantages of Docker
Docker containers allows us to easily build, test, and deploy complex applications using one or more containers. Some of there advantages are –
- They allow isolation of application components and throttling. The applications are isolated from each other as well as from the underlying host infrastructure.
- They enable portability. A Docker container can be run on any machine. We can commit a container to an image, push it on any registry, and share it with our team members. They can also be converted into tarball archives.
- We can create microservices architecture using Docker containers. Each microservice of the applicaion can be deployed on a separate container and are independent of each other. We can then deploy them as compose or swarm clusters depending on our requirements.
- They allow easy container management and orchestration. Also, scaling of services becomes easy with the help of Docker containers.
Why Docker containers are better than Virtual Machines?
Virtual Machines achieve isolation by sitting on top of the hardware infrastrure of the host machine. Each VM instance has its own kernel and only the hardware resources are shared. This is achieved with the help of hypervisors. Virtual Machines are quite large in sizes, usually in the range of gigabytes. Moreover, they do not allow portability of applications.
On the other hand, Docker containers sit on top of the OS of the underlying host. The containers don’t have separate kernels and hence, they are light in weight. The kernel namespace is shared between the containers, however, each container is still isolated because they have their own unique namespaces. Hence, the processes of one container do not affect the other. Containers are easy to share. However, VMs are considered to be more secure than containers.
To sum up, in this article, we have walked you through the ins and outs of Docker and its concepts. We started with a basic introduction to Docker, discussed the underlying technologies, architecture, objects, and advantages.
If you have any query, suggestion, or doubt, please mention it on the comments. Our expert will get back to you with the solution. If you liked this article, please give a thumbs up in the comments. Our users’ happiness is what drives us to create quality content like this article on What is Docker?
Also, check out the complete series of tutorials on Docker here.