Header image for the post titled Docker: The Basics

Docker is just one of those technologies that make you fall in love ๐Ÿ’• with it, and make you wonder how you’ve manged to live your life without it for so long. It has revolutionized the way we develop, deploy, and run applications. On the surface, Docker is just a tool that allows us to create reproducible environments. But this simple concept is executed so well that it manages to completely change how you approach app development and service deployment as a whole. This note covers the fundamental concepts behind this wonderful tool and guides you through its basic usage.

What is Docker?

Docker is a platform for developing, shipping, and running applications in containers. Containers are lightweight, standalone, and executable packages that include everything needed to run a piece of software, including the code, runtime, system tools, libraries, and settings.

Docker vs. Virtual Machines

While both Docker and virtual machines (VMs) provide isolation and virtualization, they differ in several key aspects:

  1. Resource Usage: Docker containers share the host system’s kernel and are more lightweight than VMs, which require a full OS for each instance.
  2. Startup Time: Containers can start almost instantly, while VMs typically take longer to boot up.
  3. Efficiency: Docker allows for better utilization of system resources, as containers consume only what they need.
  4. Portability: Docker containers are highly portable and can run on any system that supports Docker, regardless of the underlying OS.

Key Terms

1. Dockerfile

A Dockerfile is a text document containing a series of instructions used to build a Docker image. It serves as a blueprint for creating a reproducible environment for your application.

2. Image

An image is a read-only template used to create Docker containers. It contains all the code, dependencies, and configurations needed to run an application. Images can be as basic as just an operating system (e.g., Ubuntu, Alpine) or as complex as a full application stack (e.g., WordPress with Apache and MySQL).

3. Container

A container is a runnable instance of an image. It represents a single running process that has been spun up from an image. Containers are isolated from each other and the host system, providing a consistent environment for applications to run.

4. Image repository (Docker Hub)

Base images are retrieved (by default) from docker hub, one of cloud-based registry services for storing and sharing Docker images. Docker Hub is provided by Docker organizations itself, and offers a vast collection of pre-built images that developers can use as a starting point for their applications. Some “official” images are built by developers of specific tools (e.g., nginx, node, mysql), carrying their seal of approval. Docker Hub also allows users to push their custom images to public or private repositories, making it easy to share and distribute containerized applications across teams or with the wider community.

Key Concepts

Ephemeral Containers

Docker containers are ephemeral, meaning that they are designed to be temporary and disposable. This concept is fundamental to understanding how to effectively use Docker in your development and deployment workflows:

  1. Temporary by Design: Containers are intended to be stopped, destroyed, and replaced with minimal setup or configuration.

  2. Stateless Applications: The ephemeral nature of containers encourages the development of stateless applications, where the application doesn’t rely on the container’s state to function correctly.

  3. Data Persistence: Any data or changes made inside a container are typically lost when the container is removed. To persist data, you need to use Docker volumes or bind mounts.

  4. Immutable Infrastructure: Ephemerality supports the concept of immutable infrastructure, where containers are never modified in place but rather replaced entirely with new versions.

  5. Scalability and Resilience: The ability to quickly spin up and tear down containers facilitates easy scaling and improves application resilience.

  6. Clean Environment: Each time a container starts, it provides a clean, consistent environment, reducing “it works on my machine” type issues.

The Single-Process Model

In Docker, each container is designed to run a single primary process. This process is typically defined by the CMD or ENTRYPOINT instruction in the Dockerfile. When this primary process exits, the container stops running. This design philosophy aligns with the Unix principle of “do one thing and do it well,” promoting simplicity and modularity in application architecture.

When designing applications that require multiple processess (e.g., the backend API and the database), they should be broken up into multiple containers. Such single-process containers are easier to manage, monitor, and debug. Taking this approach makes it easier to scale individual components of your application, and have better control over resource allocation and usage. The improved isolation between different parts of your application is also good from the security standpoint.

Key points about the single process model:

  1. Container Lifecycle: The lifecycle of a container is tied directly to its main process. When the process exits, the container stops. A common mistake is starting a container without defining a running process. This causes the container to exit immediately after startup.

  2. Process ID 1: The main process in a container runs with Process ID (PID) 1, similar to the init process in a Linux system.

  3. Signal Handling: The primary process is responsible for handling system signals, such as SIGTERM for graceful shutdowns.

  4. Logging: Output (stdout and stderr) from the main process is captured by Docker’s logging system.

Getting Started with Docker

Installing Docker

Before we begin, make sure you have Docker installed on your system. You can download and install Docker from the official website.

Basic Docker Commands

  1. Pull an image:

    docker pull <image_name>:<tag>
    

    Example: docker pull nginx:1.23

  2. Run a container:

    docker run [options] <image_name>:<tag>
    

    Common options:

    • --name: Assign a custom name to the container
    • -p: Bind a port (host_port:container_port)
    • -d: Run in detached mode (background)
    • -it: Run interactively with a terminal
    • --rm: Remove the container when it exits

    Example: docker run -d -p 8080:80 --name my-nginx nginx:1.23

  3. List running containers:

    docker ps
    

    Use -a flag to show all containers (including stopped ones)

  4. Start/Stop containers:

    docker start <container_name_or_id>
    docker stop <container_name_or_id>
    
  5. Remove a container:

    docker rm <container_name_or_id>
    
  6. List images:

    docker images
    
  7. Remove an image:

    docker rmi <image_name>:<tag>
    

Creating Docker Images

To create your own Docker image, you need to write a Dockerfile. Here’s a basic example for a Node.js application:

FROM node:19-alpine
WORKDIR /app/
COPY package.json .
RUN npm install
COPY src ./src
CMD ["node", "src/server.js"]

Let’s break down this Dockerfile:

  1. FROM node:19-alpine: Specifies the base image to use (Node.js 19 on Alpine Linux). You can usually specify the :latest tag, to retrieve the latest version of a given package, but it is more advisable to go with specific versions. After all, we are trying to achieve reproducibility here!
  2. WORKDIR /app/: Sets the working directory inside the container. The end slash in /app/ tells Docker to create the folder if does not exist
  3. COPY package.json .: Copies the package.json file to the working directory
  4. RUN npm install: Installs the Node.js dependencies
  5. COPY src ./src: Copies the application source code
  6. CMD ["node", "src/server.js"]: Specifies the command to run when the container starts

To build an image from this Dockerfile:

docker build -t my-node-app:1.0 .

The -t flag allows you to tag the image with a name and version.

Running Containers

Once you have an image, you can run it as a container:

docker run -d -p 3000:3000 --name my-node-app my-node-app:1.0

This command runs the container in detached mode (-d), maps port 3000 on the host to port 3000 in the container (-p 3000:3000), and names the container “my-node-app”.

Connecting to Running Containers

Sometimes you need to interact directly with a running container, whether for debugging, maintenance, or running commands. Docker provides two main ways to connect to containers interactively:

To start a new container and immediately connect to it, use the following command:

docker run -it [yourImage] [shell]

For example:

docker run -it ubuntu:20.04 bash

This command does the following:

  • -i: Keeps STDIN open even if not attached, allowing you to interact with the container
  • -t: Allocates a pseudo-TTY, giving you a terminal-like experience
  • [yourImage]: Specifies the image to use (in this case, Ubuntu 20.04)
  • [shell]: Specifies the shell to use (in this case, bash)

This approach is useful when you want to start a new container specifically for interactive use. To connect to a container that’s already running, use the docker exec command:

docker exec -it [yourContainer] [shell]

For example:

docker exec -it my-running-container bash

This method is particularly useful for debugging or inspecting containers that are already running as part of your application. However, be cautious when making changes inside a running container. Remember that these changes are typically not persisted unless you’re using volumes or commit the changes to a new image.

Multi-Stage Builds

Staged building allows you to create more efficient and smaller images. This technique is particularly useful for applications that require a build process, as it enables you to use one image for building your application and another for running it, without carrying over unnecessary build tools and dependencies to your final image.

In a multi-stage build, you use multiple FROM statements in your Dockerfile. Each FROM instruction can use a different base image, and each one begins a new stage of the build. You can selectively copy artifacts from one stage to another, using COPY --from=<stage_name_or_number>, leaving behind everything you don’t need in the final image.

The general syntax involves adding multiple FROM statements within your Dockerfile. The last FROM statement defines the final base image for your resulting Docker image.

Here’s a basic structure of a Dockerfile using multi-stage builds:

# Build stage
FROM node:14 AS builder
WORKDIR /app
COPY package*.json ./
RUN npm install
COPY . .
RUN npm run build

# Production stage
FROM node:14-alpine
WORKDIR /app
COPY --from=builder /app/dist ./dist
COPY package*.json ./
RUN npm install --only=production
CMD ["npm", "start"]

In this example:

  1. The first stage uses a full Node.js image to build the application.
  2. The second stage uses a lighter Alpine-based Node.js image for the production environment.
  3. Only the built artifacts and production dependencies are copied to the final image.

Concepts to keep in mind:

  1. Multiple FROM Statements: Each FROM instruction starts a new build stage.

  2. Naming Stages: You can name your stages using AS <name>. This makes it easier to reference them later.

  3. Copying Between Stages: Use COPY --from=<stage_name_or_number> to copy artifacts from one stage to another.

  4. Final Image: The last FROM statement defines the base image for your final Docker image.

Volumes

Volumes are used to persist data generated by and used by Docker containers. They are especially useful for databases and stateful applications.

To create and use a volume:

  1. Create a volume:

    docker volume create my-volume
    
  2. Run a container with the volume:

    docker run -d -v my-volume:/data my-image
    

Container Networking

Docker provides various networking options to allow containers to communicate with each other and the outside world:

  1. Bridge Network: The default network for containers. Containers on the same bridge network can communicate with each other.

  2. Host Network: Removes network isolation between the container and the host system.

  3. Overlay Network: Enables communication between containers across multiple Docker hosts.

  4. Macvlan Network: Assigns a MAC address to a container, making it appear as a physical device on the network.

To create a custom network:

docker network create my-network

To run a container on a specific network:

docker run --network my-network my-image

Docker Secrets

Docker Secrets provide a secure way to manage sensitive data such as passwords, API keys, and certificates. They are encrypted at rest and during transit in a Docker swarm.

To create a secret:

echo "mypassword" | docker secret create my_secret -

Secrets cannot be used in single containers. Instead they can be created through Docker Swarm, or Docker Compose (see the next section). When included in a container, secrets are mounted in /run/secrets/<secret name>

Docker Compose

Docker Compose is a powerful tool for defining and running multi-container Docker applications. It allows you to use a YAML file to configure your application’s services, networks, and volumes, and then create and start all the services from your configuration with a single command.

Key Concepts

  1. Services: Containers that make up your application.
  2. Networks: How your containers communicate with each other.
  3. Volumes: Persistent data storage for your containers.

Installation

Docker Compose is included with Docker Desktop for Windows and Mac. For Linux, you may need to install it separately. Check the official Docker documentation for the most up-to-date installation instructions.

The “compose.yml” file

The compose.yml file is the heart of Docker Compose. It defines the services, networks, and volumes for your application. Here’s a basic structure:

version: '3'
services:
  web:
    build: .
    ports:
      - "5000:5000"
  redis:
    image: "redis:alpine"

Key Components of compose.yml

  1. version: Specifies the Compose file format version.
  2. services: Defines the containers to be run.
  3. build: Specifies the path to the Dockerfile.
  4. image: Specifies the image to start the container from.
  5. ports: Maps container ports to host ports.
  6. volumes: Mounts paths as volumes.
  7. environment: Adds environment variables.
  8. depends_on: Expresses dependency between services.

Basic Commands

  • docker compose up: Create and start containers
  • docker compose down: Stop and remove containers, networks, images, and volumes

Advanced Configuration

Environment Variables

You can use environment variables in your Compose file:

web:
  image: "webapp:${TAG}"

Use a .env file or export variables in your shell.

Extending Services

You can extend service definitions:

services:
  web:
    extends:
      file: common-services.yml
      service: webapp

Networks

Define custom networks for your services:

services:
  web:
    networks:
      - frontend
      - backend
networks:
  frontend:
    driver: bridge
  backend:
    driver: bridge

Volumes

Define named volumes for data persistence:

services:
  db:
    image: postgres
    volumes:
      - db-data:/var/lib/postgresql/data
volumes:
  db-data:

Docker Compose in Development vs Production

While Docker Compose is great for development and testing, it’s not typically used in production environments. For production, consider using orchestration platforms like Kubernetes or Docker Swarm.

However, you can use Compose for production with some adjustments:

  1. Use production-ready images
  2. Set appropriate environment variables
  3. Configure proper logging and monitoring
  4. Use secrets management for sensitive data

Best Practices

  1. Version Control: Keep your compose.yml in version control with your application code.
  2. Environment Separation: Use multiple Compose files for different environments (dev, test, prod).
  3. Don’t Abuse Extends: While useful, overusing extends can make your configuration hard to understand.
  4. Use Depends_on Wisely: Understand that depends_on doesn’t wait for a service to be “ready”, only for it to start.

Example: Web Application with Database and Cache

Here’s a more complex example of a compose.yml file for a web application with a database and cache:

version: '3'
services:
  web:
    build: .
    ports:
      - "8000:8000"
    depends_on:
      - db
      - redis
    environment:
      - DATABASE_URL=postgres://user:pass@db:5432/dbname
      - REDIS_URL=redis://redis:6379
  db:
    image: postgres:13
    volumes:
      - postgres_data:/var/lib/postgresql/data
    environment:
      - POSTGRES_DB=dbname
      - POSTGRES_USER=user
      - POSTGRES_PASSWORD=pass
  redis:
    image: redis:6
volumes:
  postgres_data:

This configuration sets up three services: a web application, a PostgreSQL database, and a Redis cache. It also defines a volume for persistent database storage.

Wrap-up

Docker is a powerful tool with many more advanced features and use cases. As you become more comfortable with these basics, you can explore more complex scenarios and integrations to further improve your development and deployment workflows.

Remember to always refer to the official Docker documentation for the most up-to-date and detailed information.