Docker Architecture and Components

Lecture Overview

In this session, we'll explore the architecture of Docker and its key components. Understanding how Docker is structured internally will give you a solid foundation for working with containers effectively. You'll learn how the different pieces of Docker fit together to create a coherent system for building, shipping, and running containerized applications.

Docker Architecture at a Glance

Docker uses a client-server architecture with several distinct components working together. Before diving into each component, let's get a high-level overview of how Docker is structured:

┌─────────────────────────────────────────────────────────────────┐
│                           Host Machine                           │
│                                                                  │
│  ┌──────────────┐       ┌─────────────────────────────────────┐ │
│  │              │       │        Docker Host (daemon)         │ │
│  │ Docker Client│<─────>│                                     │ │
│  │              │       │ ┌─────────┐ ┌─────────┐ ┌─────────┐ │ │
│  └──────────────┘       │ │Container│ │Container│ │Container│ │ │
│         ▲               │ └─────────┘ └─────────┘ └─────────┘ │ │
│         │               └─────────────────────────────────────┘ │
└─────────┼──────────────────────────────────────────────────────┘
          │
          ▼
┌──────────────────┐
│  Registry        │
│  (e.g. Docker Hub)│
└──────────────────┘

Think of Docker as a transportation system. The Docker Client is like dispatching office that sends instructions. The Docker Daemon (Server) is like the transportation hub that manages all the vehicles (containers). The Registry is like a warehouse storing vehicle designs (images) that can be requested when needed.

Let's now explore each component in detail to understand what it does and how it interacts with the others.

Docker Client

The Docker Client is the primary way users interact with Docker. It's what you're using when you run commands starting with docker in your terminal.

Functions of the Docker Client

Interprets user commands
Communicates with the Docker daemon
Presents results back to the user
Supports both command line and programmatic interfaces

The client can communicate with a Docker daemon running on the same machine (the default configuration) or connect to a remote daemon running on another system.

Docker Client in action: When you run a command like docker run nginx, the client:

Parses your command
Connects to the Docker daemon
Sends instructions to pull the nginx image (if needed) and create a container
Returns output from the daemon back to your terminal

The Docker client is somewhat analogous to a remote control for your TV. It doesn't do the actual work of displaying content (that's the TV's job), but it sends instructions that control what happens. Similarly, the Docker client doesn't run containers itself but instructs the daemon to do so.

Common Client Commands

The Docker client provides commands for the entire container lifecycle:

docker build - Build an image from a Dockerfile
docker pull - Download an image from a registry
docker run - Create and start a container
docker ps - List running containers
docker stop - Stop a running container
docker rm - Remove a container
docker rmi - Remove an image

Each of these commands gets translated by the client into API calls to the Docker daemon.

Docker Daemon (dockerd)

The Docker daemon (often referred to as dockerd) is the heart of Docker. It's a background service that manages everything related to containers on a system.

Responsibilities of the Docker Daemon

Listening for API requests from the Docker client
Managing Docker objects (images, containers, networks, volumes)
Communicating with other daemons to manage distributed Docker services
Building, running, and distributing containers

Think of the Docker daemon as a factory manager. It receives blueprints (images), creates products (containers), manages resources, and oversees the entire production process.

Technical insight: The Docker daemon exposes a REST API that the client and other programs can use to interact with it. This API-driven design makes Docker highly automatable and integrable with other systems.

Daemon Configuration

The daemon can be configured in various ways to control:

Security settings
Default container parameters
Storage locations
Networking options
Logging and debugging information

In production environments, proper daemon configuration is crucial for security and performance.

Real-world context: In a development team, each developer runs their own Docker daemon on their local machine. In a production environment, you might have multiple servers each running a Docker daemon, potentially managed by an orchestration tool like Kubernetes.

Docker Images

Docker images are read-only templates used to create containers. They contain everything needed to run an application: code, runtime, libraries, environment variables, and configuration files.

Key Characteristics of Images

Immutable - once built, an image doesn't change
Layered - composed of multiple filesystem layers
Shareable - can be pushed to and pulled from registries
Versioned - typically tagged with version information

The layered nature of images is one of Docker's most powerful features. Let's explore it further.

Layered File System

Docker images are built using a layered approach where each layer represents a set of filesystem changes:

┌───────────────────────┐
│   Application Code    │  <-- Top layer
├───────────────────────┤
│   Application Deps    │
├───────────────────────┤
│   Runtime (e.g. Node) │
├───────────────────────┤
│   Base OS (e.g. Alpine)│  <-- Bottom layer
└───────────────────────┘

Each layer only stores the differences from the previous layer. This approach has several advantages:

Efficient storage - common layers are shared between images
Faster transfers - only new or changed layers need to be transferred
Build caching - unchanged layers can be reused in subsequent builds

Analogy: Think of Docker images like a stack of transparent sheets. Each sheet has some content drawn on it, and when stacked together, they form a complete picture. If you want to create a similar image, you can reuse most of the stack and just replace or add sheets as needed, rather than drawing everything from scratch.

Image IDs and Tags

Each Docker image has a unique identifier (a SHA256 hash) and can have multiple human-readable tags:

$ docker images
REPOSITORY    TAG       IMAGE ID       CREATED       SIZE
nginx         latest    ad4c705f24d3   2 weeks ago   133MB
python        3.9       a8bd5b274a97   3 weeks ago   915MB
python        3.10      98f52028b399   3 weeks ago   920MB

In this example:

REPOSITORY - The name of the image
TAG - A version or variant identifier (e.g., "latest", "3.9")
IMAGE ID - A unique identifier for the image

Tags are crucial for version management. For example, python:3.9 and python:3.10 refer to different versions of Python, while both being part of the "python" repository.

Best practice: Never rely on the "latest" tag in production environments. Always specify exact version tags to ensure consistency and prevent unexpected changes when images are updated.

Containers

If images are the blueprints, containers are the running instances created from those blueprints. A container is a runnable instance of an image.

Container Characteristics

Isolated - runs in its own namespace with limited visibility of the host system
Lightweight - shares the host kernel rather than running a full OS
Portable - runs the same way regardless of the infrastructure
Ephemeral - designed to be disposable and replaceable

When you create a container from an image, Docker adds a writable layer on top of the immutable image layers. This allows the container to modify files while keeping the original image unchanged.

┌───────────────────────┐
│    Writable Layer     │  <-- Container-specific layer
├───────────────────────┤
│   Application Code    │  
├───────────────────────┤
│   Application Deps    │  <-- Image layers (read-only)
├───────────────────────┤
│   Runtime (e.g. Node) │
├───────────────────────┤
│   Base OS (e.g. Alpine)│
└───────────────────────┘

Analogy: Consider a container like a kitchen. The image provides all the appliances, utensils, and basic ingredients (like flour, sugar, etc.). When you start cooking (run the container), you might create new dishes and temporarily modify the kitchen state, but when you're done (container stops), the kitchen returns to its original state. If you want to save your changes, you need to create a new "blueprint" (image) from your current state.

Container Lifecycle

Containers have a distinct lifecycle with several states:

Created - Container is defined but not started
Running - Container processes are executing
Paused - Container processes are temporarily suspended
Stopped - Container processes have terminated but the container still exists
Removed - Container is deleted along with its writable layer

Container lifecycle commands:

# Create and run a container
$ docker run --name my-nginx -d nginx

# Pause a running container
$ docker pause my-nginx

# Unpause a container
$ docker unpause my-nginx

# Stop a container
$ docker stop my-nginx

# Start a stopped container
$ docker start my-nginx

# Remove a container
$ docker rm my-nginx

Understanding this lifecycle is crucial for managing containers effectively, especially in production environments where automatic restarts and health checks become important.

Docker Registries

Docker registries are repositories for storing and distributing Docker images. They play a crucial role in the "build once, run anywhere" philosophy of Docker.

Registry Types

Public registries - Like Docker Hub, which hosts a vast collection of community and official images
Private registries - For organizations to store proprietary images securely
Local registries - Run within an organization's network for faster access and better control

Docker Hub is the default registry that Docker uses when you run commands like docker pull without specifying a registry.

Analogy: Docker registries are like libraries or bookstores. Docker Hub is like a public library with books (images) anyone can borrow. Private registries are like personal bookshelves where you keep books that are special to you or your organization. When you need a book, you first check if it's on your bookshelf (cache), and if not, you go to the library (registry) to get it.

Working with Registries

Common operations with registries include:

# Pull an image from Docker Hub
$ docker pull nginx:latest

# Tag an image for a specific registry
$ docker tag my-app:1.0 my-registry.example.com/my-app:1.0

# Push an image to a registry
$ docker push my-registry.example.com/my-app:1.0

# Pull from a specific registry
$ docker pull my-registry.example.com/my-app:1.0

In enterprise environments, organizations often maintain their own registries for several reasons:

Security - Control over who can access images
Compliance - Ensure all images meet organizational standards
Performance - Faster image pulls over the internal network
Reliability - No dependency on external services

Best practice: For production applications, always use a private registry with proper access controls. Scan images for vulnerabilities before pushing them to your registry, and implement policies about which external images can be pulled.

Docker Storage

Docker provides several options for managing data in containers. Understanding these is crucial because containers are ephemeral by design - when a container is removed, any data that was written to its writable layer is lost.

Storage Options

Volumes - The preferred way to persist data in Docker
Bind mounts - Mount a host directory into a container
tmpfs mounts - Store data in the host's memory only

Docker Volumes

Volumes are the preferred mechanism for persisting data generated and used by Docker containers. Some key benefits of volumes include:

They are completely managed by Docker
They can be more safely shared among multiple containers
Volume drivers allow for storing volumes on remote hosts or cloud providers
They're easier to back up or migrate than bind mounts
They can be pre-populated with data from a container

┌─────────────────────────────────────────┐
│               Host System                │
│                                          │
│  ┌──────────────┐    ┌───────────────┐  │
│  │  Container A │    │  Container B  │  │
│  │              │    │               │  │
│  │              │    │               │  │
│  └──────┬───────┘    └───────┬───────┘  │
│         │                    │          │
│         │                    │          │
│         ▼                    ▼          │
│  ┌──────────────────────────────────┐   │
│  │            Volume                │   │
│  │                                  │   │
│  └──────────────────────────────────┘   │
│                                          │
└─────────────────────────────────────────┘

Working with volumes:

# Create a volume
$ docker volume create my-data

# Run a container with a volume
$ docker run -v my-data:/app/data nginx

# List volumes
$ docker volume ls

# Inspect a volume
$ docker volume inspect my-data

# Remove a volume
$ docker volume rm my-data

# Clean up unused volumes
$ docker volume prune

Analogy: Think of volumes like external hard drives for your containers. The container itself might be temporary, but the external drive (volume) persists and can be connected to different containers over time. This separation of compute (container) from storage (volume) is a fundamental pattern in cloud-native architecture.

Bind Mounts

Bind mounts have been around since the early days of Docker. They allow you to mount a file or directory on the host machine into a container. The main differences from volumes are:

Bind mounts depend on the host machine's filesystem structure
Non-Docker processes on the host can modify them directly
They're often used in development for live code reloading

Using bind mounts:

# Mount the current directory into a container
$ docker run -v $(pwd):/app nginx

tmpfs Mounts

tmpfs mounts are stored in the host system's memory only, never written to the host system's filesystem. This is useful for storing sensitive information that you don't want to persist.

Using tmpfs mounts:

# Create a container with a tmpfs mount
$ docker run --tmpfs /app/temp nginx

Best practice: For production applications, always use named volumes for persistent data and clearly document what data needs to persist. In development, bind mounts are often convenient for code changes, but volumes should still be used for databases and other stateful components.

Docker Networking

Docker's networking subsystem is pluggable, using drivers. Several drivers exist by default, and you can install third-party drivers as well. Each driver offers specific features and capabilities.

Network Drivers

bridge - The default network driver. Containers can communicate with each other if they're on the same bridge network.
host - Removes network isolation between the container and the host. The container shares the host's networking namespace.
overlay - Connects multiple Docker daemons across different hosts, enabling swarm services to communicate.
macvlan - Assigns a MAC address to a container, making it appear as a physical device on your network.
none - Disables all networking for a container.

Bridge Networks

The bridge driver creates a private network internal to the host. Containers on this network can communicate with each other, and the host can forward traffic to the external world.

┌─────────────────────────────────────────────────────┐
│                  Host System                         │
│                                                      │
│  ┌────────────┐    ┌────────────┐    ┌────────────┐ │
│  │Container A │    │Container B │    │Container C │ │
│  │  172.17.0.2│    │  172.17.0.3│    │  172.17.0.4│ │
│  └─────┬──────┘    └─────┬──────┘    └─────┬──────┘ │
│        │                 │                 │        │
│        └─────────┬───────┴─────────┬───────┘        │
│                  │                 │                │
│           ┌──────┴─────────────────┴──────┐         │
│           │      Bridge Network           │         │
│           │         172.17.0.0/16         │         │
│           └───────────────┬───────────────┘         │
│                           │                         │
│                     ┌─────┴─────┐                   │
│                     │  eth0     │                   │
└─────────────────────┼───────────┼─────────────────┘
                      │           │
                      │  Internet │

Working with networks:

# List networks
$ docker network ls

# Create a bridge network
$ docker network create my-network

# Run a container on a specific network
$ docker run --network=my-network --name=container1 nginx

# Connect a running container to a network
$ docker network connect my-network container2

# Inspect a network
$ docker network inspect my-network

# Disconnect a container from a network
$ docker network disconnect my-network container1

# Remove a network
$ docker network rm my-network

Real-world application: In a typical web application architecture, you might create a custom bridge network for your application. Your frontend container, backend API container, and database container would all connect to this network, allowing them to communicate with each other using their container names as hostnames, while isolating them from other containers on the system.

Container DNS

One important feature of Docker networking is automatic DNS resolution between containers. Containers on the same user-defined network can resolve each other by name.

Example: If you have two containers named web and db on the same network, the web container can connect to the db container simply by using the hostname db in its configuration.

# Create a network
$ docker network create app-network

# Start a database container
$ docker run -d --name db --network app-network postgres

# Start a web container that can connect to the database using hostname "db"
$ docker run -d --name web --network app-network -e DATABASE_URL=postgres://postgres:postgres@db:5432/postgres my-web-app

Best practice: Always create custom networks for your applications rather than using the default bridge network. This provides better isolation, automatic DNS resolution between containers, and more control over your network configuration.

Docker Compose

While not strictly part of the core Docker architecture, Docker Compose is an essential tool that works with Docker to define and run multi-container applications.

What is Docker Compose?

Docker Compose is a tool for defining and running multi-container Docker applications. With Compose, you use a YAML file to configure your application's services, networks, and volumes. Then, with a single command, you create and start all the services from your configuration.

Sample docker-compose.yml file:

version: '3'

services:
  web:
    build: ./web
    ports:
      - "5000:5000"
    volumes:
      - ./web:/code
    depends_on:
      - db
      - redis

  db:
    image: postgres:12
    volumes:
      - postgres_data:/var/lib/postgresql/data
    environment:
      - POSTGRES_PASSWORD=password

  redis:
    image: redis:6

volumes:
  postgres_data:

This file defines a three-service application:

A web service built from the Dockerfile in the ./web directory
A db service using the postgres:12 image
A redis service using the redis:6 image

It also configures:

Port mapping for the web service
Volume mounts for code and database data
Service dependencies
Environment variables

Basic Docker Compose commands:

# Start services
$ docker-compose up

# Start services in the background
$ docker-compose up -d

# Stop services
$ docker-compose down

# Stop services and remove volumes
$ docker-compose down -v

# View logs
$ docker-compose logs

# Run a command in a service
$ docker-compose exec web python manage.py migrate

Analogy: If Docker is like having individual appliances in your kitchen, Docker Compose is like having a single control panel that turns on all the appliances you need for a specific recipe, configured exactly as required. Instead of turning on the stove, then the mixer, then the blender individually, you just press one button labeled "Make Cake" and everything is set up correctly.

We'll explore Docker Compose in much more depth in tomorrow's session, but it's important to understand how it fits into the overall Docker architecture as a higher-level tool that works with the core components we've discussed.

How Components Work Together

Let's trace through a typical workflow to see how all these Docker components interact:

Example Workflow: Running a Container

Client Instruction: You issue docker run nginx in your terminal
Client Processing: The Docker client formats this as an API request to the daemon
Daemon Image Check: The daemon checks if the nginx image exists locally
Registry Interaction: If not found locally, the daemon pulls the image from Docker Hub
Image Download: The registry sends the image layers to the daemon
Container Creation: The daemon creates a new container based on the image
Storage Setup: The daemon sets up any necessary storage (volumes or bind mounts)
Network Configuration: The daemon connects the container to the appropriate network
Container Start: The daemon starts the container processes
Output Return: The daemon streams output back to the client

┌──────────┐         ┌──────────┐         ┌──────────┐
│  Client  │ ──────▶ │  Daemon  │ ◀─────▶ │ Registry │
└──────────┘         └──────────┘         └──────────┘
                          │
                          ▼
                     ┌──────────┐
                     │  Images  │
                     └──────────┘
                          │
                          ▼
                     ┌──────────┐
                     │Containers│
                     └──────────┘
                       ▲      ▲
                       │      │
                 ┌─────┘      └─────┐
                 │                  │
           ┌──────────┐      ┌──────────┐
           │ Volumes  │      │ Networks │
           └──────────┘      └──────────┘

This workflow demonstrates how the client, daemon, registry, images, containers, storage, and networking all work together to create a functioning containerized application.

Docker Architecture in Production Environments

While the architecture we've discussed applies to all Docker installations, production environments often add additional components and considerations:

Container Orchestration

In production, Docker is often managed by an orchestration platform like:

Kubernetes - The most widely used container orchestration system
Docker Swarm - Docker's native clustering solution
Amazon ECS - AWS's container orchestration service

These platforms add capabilities for:

Scheduling containers across multiple hosts
Automatic scaling
Load balancing
Self-healing (restarting failed containers)
Rolling updates

Security Considerations

Production Docker deployments typically include:

Image scanning for vulnerabilities
Access controls for the Docker daemon
Network segmentation
Container resource limits
Read-only filesystem mounts
Non-root users inside containers

Monitoring and Logging

Comprehensive monitoring solutions are essential for Docker in production:

Container metrics (CPU, memory, network, disk usage)
Application metrics
Centralized logging
Container health checks
Alerting systems

Production architecture example: A typical production setup might include:

Multiple host machines running Docker
Kubernetes managing containers across those hosts
A private Docker registry secured with authentication
CI/CD pipelines that build, test, and deploy Docker images
Prometheus and Grafana for monitoring
ELK Stack or Loki for logging

Docker Architecture Evolution

Docker's architecture has evolved significantly since its initial release:

Major Architectural Changes

Separation of containerd - The core container runtime was extracted as a separate project
OCI Standards - Docker adopted Open Container Initiative standards for image and runtime specifications
BuildKit - A new, more efficient build system replaced the legacy builder
Rootless Mode - Support for running Docker without root privileges

Modern Docker architecture:

┌───────────────────────────────────────────────────┐
│                  Docker Engine                      │
│                                                     │
│  ┌─────────────┐      ┌──────────────┐             │
│  │   Docker    │      │   dockerd    │             │
│  │   Client    │◀────▶│   (daemon)   │             │
│  └─────────────┘      └──────┬───────┘             │
│                              │                      │
│                        ┌─────▼──────┐               │
│                        │ containerd │               │
│                        └─────┬──────┘               │
│                              │                      │
│                        ┌─────▼──────┐               │
│                        │ runc/runsc │               │
│                        └────────────┘               │
└───────────────────────────────────────────────────┘

In this modern architecture:

dockerd - The Docker daemon that manages the Docker API
containerd - A daemon that manages the complete container lifecycle
runc - A container runtime that implements the OCI runtime specification

This modular approach allows for more flexibility and enables other tools to leverage Docker's components. For example, Kubernetes can use containerd directly without the full Docker daemon.

Key Takeaways

Docker uses a client-server architecture with the client sending commands to the daemon
Images are read-only templates used to create containers
Containers are runnable instances of images with an additional writable layer
Registries store and distribute images
Docker provides sophisticated storage options, with volumes being the preferred method for persistent data
Docker's networking capabilities allow containers to communicate and be accessible from outside
Docker Compose simplifies managing multi-container applications
Production environments often add orchestration, security measures, and monitoring to the basic Docker architecture

Understanding these components and how they interact is fundamental to working effectively with Docker and designing containerized applications.

Looking Ahead

In our afternoon session, we'll begin hands-on work with Docker, where you'll see these architectural components in action. We'll:

Run your first Docker container
Explore Docker Hub and public images
Create a basic Dockerfile
Build and run your own image

By the end of the day, you'll have practical experience with the core Docker components we've discussed in this theoretical session.

Discussion Questions

How does Docker's architecture compare to traditional virtualization solutions like VMware or VirtualBox?
Why is the layered approach to images important for efficiency in Docker?
What are the advantages and disadvantages of Docker's approach to container networking?
In what scenarios might Docker volumes be preferred over bind mounts, and vice versa?
How does understanding Docker's architecture help you design better containerized applications?

Additional Resources

Docker Architecture Overview - Official documentation
Docker Storage Overview - Detailed information about storage options
Docker Networking Overview - Comprehensive guide to Docker networks
containerd GitHub Repository - For those interested in the lower-level container runtime
Docker Architecture Explained - Visual explanation of Docker components