Lecture Overview
In this session, we'll explore the architecture of Docker and its key components. Understanding how Docker is structured internally will give you a solid foundation for working with containers effectively. You'll learn how the different pieces of Docker fit together to create a coherent system for building, shipping, and running containerized applications.
Docker Architecture at a Glance
Docker uses a client-server architecture with several distinct components working together. Before diving into each component, let's get a high-level overview of how Docker is structured:
┌─────────────────────────────────────────────────────────────────┐
│ Host Machine │
│ │
│ ┌──────────────┐ ┌─────────────────────────────────────┐ │
│ │ │ │ Docker Host (daemon) │ │
│ │ Docker Client│<─────>│ │ │
│ │ │ │ ┌─────────┐ ┌─────────┐ ┌─────────┐ │ │
│ └──────────────┘ │ │Container│ │Container│ │Container│ │ │
│ ▲ │ └─────────┘ └─────────┘ └─────────┘ │ │
│ │ └─────────────────────────────────────┘ │
└─────────┼──────────────────────────────────────────────────────┘
│
▼
┌──────────────────┐
│ Registry │
│ (e.g. Docker Hub)│
└──────────────────┘
Think of Docker as a transportation system. The Docker Client is like dispatching office that sends instructions. The Docker Daemon (Server) is like the transportation hub that manages all the vehicles (containers). The Registry is like a warehouse storing vehicle designs (images) that can be requested when needed.
Let's now explore each component in detail to understand what it does and how it interacts with the others.
Docker Client
The Docker Client is the primary way users interact with Docker. It's what you're using when you run commands starting with docker in your terminal.
Functions of the Docker Client
- Interprets user commands
- Communicates with the Docker daemon
- Presents results back to the user
- Supports both command line and programmatic interfaces
The client can communicate with a Docker daemon running on the same machine (the default configuration) or connect to a remote daemon running on another system.
Docker Client in action: When you run a command like docker run nginx, the client:
- Parses your command
- Connects to the Docker daemon
- Sends instructions to pull the nginx image (if needed) and create a container
- Returns output from the daemon back to your terminal
The Docker client is somewhat analogous to a remote control for your TV. It doesn't do the actual work of displaying content (that's the TV's job), but it sends instructions that control what happens. Similarly, the Docker client doesn't run containers itself but instructs the daemon to do so.
Common Client Commands
The Docker client provides commands for the entire container lifecycle:
docker build- Build an image from a Dockerfiledocker pull- Download an image from a registrydocker run- Create and start a containerdocker ps- List running containersdocker stop- Stop a running containerdocker rm- Remove a containerdocker rmi- Remove an image
Each of these commands gets translated by the client into API calls to the Docker daemon.
Docker Daemon (dockerd)
The Docker daemon (often referred to as dockerd) is the heart of Docker. It's a background service that manages everything related to containers on a system.
Responsibilities of the Docker Daemon
- Listening for API requests from the Docker client
- Managing Docker objects (images, containers, networks, volumes)
- Communicating with other daemons to manage distributed Docker services
- Building, running, and distributing containers
Think of the Docker daemon as a factory manager. It receives blueprints (images), creates products (containers), manages resources, and oversees the entire production process.
Technical insight: The Docker daemon exposes a REST API that the client and other programs can use to interact with it. This API-driven design makes Docker highly automatable and integrable with other systems.
Daemon Configuration
The daemon can be configured in various ways to control:
- Security settings
- Default container parameters
- Storage locations
- Networking options
- Logging and debugging information
In production environments, proper daemon configuration is crucial for security and performance.
Real-world context: In a development team, each developer runs their own Docker daemon on their local machine. In a production environment, you might have multiple servers each running a Docker daemon, potentially managed by an orchestration tool like Kubernetes.
Docker Images
Docker images are read-only templates used to create containers. They contain everything needed to run an application: code, runtime, libraries, environment variables, and configuration files.
Key Characteristics of Images
- Immutable - once built, an image doesn't change
- Layered - composed of multiple filesystem layers
- Shareable - can be pushed to and pulled from registries
- Versioned - typically tagged with version information
The layered nature of images is one of Docker's most powerful features. Let's explore it further.
Layered File System
Docker images are built using a layered approach where each layer represents a set of filesystem changes:
┌───────────────────────┐
│ Application Code │ <-- Top layer
├───────────────────────┤
│ Application Deps │
├───────────────────────┤
│ Runtime (e.g. Node) │
├───────────────────────┤
│ Base OS (e.g. Alpine)│ <-- Bottom layer
└───────────────────────┘
Each layer only stores the differences from the previous layer. This approach has several advantages:
- Efficient storage - common layers are shared between images
- Faster transfers - only new or changed layers need to be transferred
- Build caching - unchanged layers can be reused in subsequent builds
Analogy: Think of Docker images like a stack of transparent sheets. Each sheet has some content drawn on it, and when stacked together, they form a complete picture. If you want to create a similar image, you can reuse most of the stack and just replace or add sheets as needed, rather than drawing everything from scratch.
Image IDs and Tags
Each Docker image has a unique identifier (a SHA256 hash) and can have multiple human-readable tags:
$ docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
nginx latest ad4c705f24d3 2 weeks ago 133MB
python 3.9 a8bd5b274a97 3 weeks ago 915MB
python 3.10 98f52028b399 3 weeks ago 920MB
In this example:
- REPOSITORY - The name of the image
- TAG - A version or variant identifier (e.g., "latest", "3.9")
- IMAGE ID - A unique identifier for the image
Tags are crucial for version management. For example, python:3.9 and python:3.10 refer to different versions of Python, while both being part of the "python" repository.
Best practice: Never rely on the "latest" tag in production environments. Always specify exact version tags to ensure consistency and prevent unexpected changes when images are updated.
Containers
If images are the blueprints, containers are the running instances created from those blueprints. A container is a runnable instance of an image.
Container Characteristics
- Isolated - runs in its own namespace with limited visibility of the host system
- Lightweight - shares the host kernel rather than running a full OS
- Portable - runs the same way regardless of the infrastructure
- Ephemeral - designed to be disposable and replaceable
When you create a container from an image, Docker adds a writable layer on top of the immutable image layers. This allows the container to modify files while keeping the original image unchanged.
┌───────────────────────┐
│ Writable Layer │ <-- Container-specific layer
├───────────────────────┤
│ Application Code │
├───────────────────────┤
│ Application Deps │ <-- Image layers (read-only)
├───────────────────────┤
│ Runtime (e.g. Node) │
├───────────────────────┤
│ Base OS (e.g. Alpine)│
└───────────────────────┘
Analogy: Consider a container like a kitchen. The image provides all the appliances, utensils, and basic ingredients (like flour, sugar, etc.). When you start cooking (run the container), you might create new dishes and temporarily modify the kitchen state, but when you're done (container stops), the kitchen returns to its original state. If you want to save your changes, you need to create a new "blueprint" (image) from your current state.
Container Lifecycle
Containers have a distinct lifecycle with several states:
- Created - Container is defined but not started
- Running - Container processes are executing
- Paused - Container processes are temporarily suspended
- Stopped - Container processes have terminated but the container still exists
- Removed - Container is deleted along with its writable layer
Container lifecycle commands:
# Create and run a container
$ docker run --name my-nginx -d nginx
# Pause a running container
$ docker pause my-nginx
# Unpause a container
$ docker unpause my-nginx
# Stop a container
$ docker stop my-nginx
# Start a stopped container
$ docker start my-nginx
# Remove a container
$ docker rm my-nginx
Understanding this lifecycle is crucial for managing containers effectively, especially in production environments where automatic restarts and health checks become important.
Docker Registries
Docker registries are repositories for storing and distributing Docker images. They play a crucial role in the "build once, run anywhere" philosophy of Docker.
Registry Types
- Public registries - Like Docker Hub, which hosts a vast collection of community and official images
- Private registries - For organizations to store proprietary images securely
- Local registries - Run within an organization's network for faster access and better control
Docker Hub is the default registry that Docker uses when you run commands like docker pull without specifying a registry.
Analogy: Docker registries are like libraries or bookstores. Docker Hub is like a public library with books (images) anyone can borrow. Private registries are like personal bookshelves where you keep books that are special to you or your organization. When you need a book, you first check if it's on your bookshelf (cache), and if not, you go to the library (registry) to get it.
Working with Registries
Common operations with registries include:
# Pull an image from Docker Hub
$ docker pull nginx:latest
# Tag an image for a specific registry
$ docker tag my-app:1.0 my-registry.example.com/my-app:1.0
# Push an image to a registry
$ docker push my-registry.example.com/my-app:1.0
# Pull from a specific registry
$ docker pull my-registry.example.com/my-app:1.0
In enterprise environments, organizations often maintain their own registries for several reasons:
- Security - Control over who can access images
- Compliance - Ensure all images meet organizational standards
- Performance - Faster image pulls over the internal network
- Reliability - No dependency on external services
Best practice: For production applications, always use a private registry with proper access controls. Scan images for vulnerabilities before pushing them to your registry, and implement policies about which external images can be pulled.
Docker Storage
Docker provides several options for managing data in containers. Understanding these is crucial because containers are ephemeral by design - when a container is removed, any data that was written to its writable layer is lost.
Storage Options
- Volumes - The preferred way to persist data in Docker
- Bind mounts - Mount a host directory into a container
- tmpfs mounts - Store data in the host's memory only
Docker Volumes
Volumes are the preferred mechanism for persisting data generated and used by Docker containers. Some key benefits of volumes include:
- They are completely managed by Docker
- They can be more safely shared among multiple containers
- Volume drivers allow for storing volumes on remote hosts or cloud providers
- They're easier to back up or migrate than bind mounts
- They can be pre-populated with data from a container
┌─────────────────────────────────────────┐
│ Host System │
│ │
│ ┌──────────────┐ ┌───────────────┐ │
│ │ Container A │ │ Container B │ │
│ │ │ │ │ │
│ │ │ │ │ │
│ └──────┬───────┘ └───────┬───────┘ │
│ │ │ │
│ │ │ │
│ ▼ ▼ │
│ ┌──────────────────────────────────┐ │
│ │ Volume │ │
│ │ │ │
│ └──────────────────────────────────┘ │
│ │
└─────────────────────────────────────────┘
Working with volumes:
# Create a volume
$ docker volume create my-data
# Run a container with a volume
$ docker run -v my-data:/app/data nginx
# List volumes
$ docker volume ls
# Inspect a volume
$ docker volume inspect my-data
# Remove a volume
$ docker volume rm my-data
# Clean up unused volumes
$ docker volume prune
Analogy: Think of volumes like external hard drives for your containers. The container itself might be temporary, but the external drive (volume) persists and can be connected to different containers over time. This separation of compute (container) from storage (volume) is a fundamental pattern in cloud-native architecture.
Bind Mounts
Bind mounts have been around since the early days of Docker. They allow you to mount a file or directory on the host machine into a container. The main differences from volumes are:
- Bind mounts depend on the host machine's filesystem structure
- Non-Docker processes on the host can modify them directly
- They're often used in development for live code reloading
Using bind mounts:
# Mount the current directory into a container
$ docker run -v $(pwd):/app nginx
tmpfs Mounts
tmpfs mounts are stored in the host system's memory only, never written to the host system's filesystem. This is useful for storing sensitive information that you don't want to persist.
Using tmpfs mounts:
# Create a container with a tmpfs mount
$ docker run --tmpfs /app/temp nginx
Best practice: For production applications, always use named volumes for persistent data and clearly document what data needs to persist. In development, bind mounts are often convenient for code changes, but volumes should still be used for databases and other stateful components.
Docker Networking
Docker's networking subsystem is pluggable, using drivers. Several drivers exist by default, and you can install third-party drivers as well. Each driver offers specific features and capabilities.
Network Drivers
- bridge - The default network driver. Containers can communicate with each other if they're on the same bridge network.
- host - Removes network isolation between the container and the host. The container shares the host's networking namespace.
- overlay - Connects multiple Docker daemons across different hosts, enabling swarm services to communicate.
- macvlan - Assigns a MAC address to a container, making it appear as a physical device on your network.
- none - Disables all networking for a container.
Bridge Networks
The bridge driver creates a private network internal to the host. Containers on this network can communicate with each other, and the host can forward traffic to the external world.
┌─────────────────────────────────────────────────────┐
│ Host System │
│ │
│ ┌────────────┐ ┌────────────┐ ┌────────────┐ │
│ │Container A │ │Container B │ │Container C │ │
│ │ 172.17.0.2│ │ 172.17.0.3│ │ 172.17.0.4│ │
│ └─────┬──────┘ └─────┬──────┘ └─────┬──────┘ │
│ │ │ │ │
│ └─────────┬───────┴─────────┬───────┘ │
│ │ │ │
│ ┌──────┴─────────────────┴──────┐ │
│ │ Bridge Network │ │
│ │ 172.17.0.0/16 │ │
│ └───────────────┬───────────────┘ │
│ │ │
│ ┌─────┴─────┐ │
│ │ eth0 │ │
└─────────────────────┼───────────┼─────────────────┘
│ │
│ Internet │
Working with networks:
# List networks
$ docker network ls
# Create a bridge network
$ docker network create my-network
# Run a container on a specific network
$ docker run --network=my-network --name=container1 nginx
# Connect a running container to a network
$ docker network connect my-network container2
# Inspect a network
$ docker network inspect my-network
# Disconnect a container from a network
$ docker network disconnect my-network container1
# Remove a network
$ docker network rm my-network
Real-world application: In a typical web application architecture, you might create a custom bridge network for your application. Your frontend container, backend API container, and database container would all connect to this network, allowing them to communicate with each other using their container names as hostnames, while isolating them from other containers on the system.
Container DNS
One important feature of Docker networking is automatic DNS resolution between containers. Containers on the same user-defined network can resolve each other by name.
Example: If you have two containers named web and db on the same network, the web container can connect to the db container simply by using the hostname db in its configuration.
# Create a network
$ docker network create app-network
# Start a database container
$ docker run -d --name db --network app-network postgres
# Start a web container that can connect to the database using hostname "db"
$ docker run -d --name web --network app-network -e DATABASE_URL=postgres://postgres:postgres@db:5432/postgres my-web-app
Best practice: Always create custom networks for your applications rather than using the default bridge network. This provides better isolation, automatic DNS resolution between containers, and more control over your network configuration.
Docker Compose
While not strictly part of the core Docker architecture, Docker Compose is an essential tool that works with Docker to define and run multi-container applications.
What is Docker Compose?
Docker Compose is a tool for defining and running multi-container Docker applications. With Compose, you use a YAML file to configure your application's services, networks, and volumes. Then, with a single command, you create and start all the services from your configuration.
Sample docker-compose.yml file:
version: '3'
services:
web:
build: ./web
ports:
- "5000:5000"
volumes:
- ./web:/code
depends_on:
- db
- redis
db:
image: postgres:12
volumes:
- postgres_data:/var/lib/postgresql/data
environment:
- POSTGRES_PASSWORD=password
redis:
image: redis:6
volumes:
postgres_data:
This file defines a three-service application:
- A
webservice built from the Dockerfile in the./webdirectory - A
dbservice using the postgres:12 image - A
redisservice using the redis:6 image
It also configures:
- Port mapping for the web service
- Volume mounts for code and database data
- Service dependencies
- Environment variables
Basic Docker Compose commands:
# Start services
$ docker-compose up
# Start services in the background
$ docker-compose up -d
# Stop services
$ docker-compose down
# Stop services and remove volumes
$ docker-compose down -v
# View logs
$ docker-compose logs
# Run a command in a service
$ docker-compose exec web python manage.py migrate
Analogy: If Docker is like having individual appliances in your kitchen, Docker Compose is like having a single control panel that turns on all the appliances you need for a specific recipe, configured exactly as required. Instead of turning on the stove, then the mixer, then the blender individually, you just press one button labeled "Make Cake" and everything is set up correctly.
We'll explore Docker Compose in much more depth in tomorrow's session, but it's important to understand how it fits into the overall Docker architecture as a higher-level tool that works with the core components we've discussed.
How Components Work Together
Let's trace through a typical workflow to see how all these Docker components interact:
Example Workflow: Running a Container
- Client Instruction: You issue
docker run nginxin your terminal - Client Processing: The Docker client formats this as an API request to the daemon
- Daemon Image Check: The daemon checks if the nginx image exists locally
- Registry Interaction: If not found locally, the daemon pulls the image from Docker Hub
- Image Download: The registry sends the image layers to the daemon
- Container Creation: The daemon creates a new container based on the image
- Storage Setup: The daemon sets up any necessary storage (volumes or bind mounts)
- Network Configuration: The daemon connects the container to the appropriate network
- Container Start: The daemon starts the container processes
- Output Return: The daemon streams output back to the client
┌──────────┐ ┌──────────┐ ┌──────────┐
│ Client │ ──────▶ │ Daemon │ ◀─────▶ │ Registry │
└──────────┘ └──────────┘ └──────────┘
│
▼
┌──────────┐
│ Images │
└──────────┘
│
▼
┌──────────┐
│Containers│
└──────────┘
▲ ▲
│ │
┌─────┘ └─────┐
│ │
┌──────────┐ ┌──────────┐
│ Volumes │ │ Networks │
└──────────┘ └──────────┘
This workflow demonstrates how the client, daemon, registry, images, containers, storage, and networking all work together to create a functioning containerized application.
Docker Architecture in Production Environments
While the architecture we've discussed applies to all Docker installations, production environments often add additional components and considerations:
Container Orchestration
In production, Docker is often managed by an orchestration platform like:
- Kubernetes - The most widely used container orchestration system
- Docker Swarm - Docker's native clustering solution
- Amazon ECS - AWS's container orchestration service
These platforms add capabilities for:
- Scheduling containers across multiple hosts
- Automatic scaling
- Load balancing
- Self-healing (restarting failed containers)
- Rolling updates
Security Considerations
Production Docker deployments typically include:
- Image scanning for vulnerabilities
- Access controls for the Docker daemon
- Network segmentation
- Container resource limits
- Read-only filesystem mounts
- Non-root users inside containers
Monitoring and Logging
Comprehensive monitoring solutions are essential for Docker in production:
- Container metrics (CPU, memory, network, disk usage)
- Application metrics
- Centralized logging
- Container health checks
- Alerting systems
Production architecture example: A typical production setup might include:
- Multiple host machines running Docker
- Kubernetes managing containers across those hosts
- A private Docker registry secured with authentication
- CI/CD pipelines that build, test, and deploy Docker images
- Prometheus and Grafana for monitoring
- ELK Stack or Loki for logging
Docker Architecture Evolution
Docker's architecture has evolved significantly since its initial release:
Major Architectural Changes
- Separation of containerd - The core container runtime was extracted as a separate project
- OCI Standards - Docker adopted Open Container Initiative standards for image and runtime specifications
- BuildKit - A new, more efficient build system replaced the legacy builder
- Rootless Mode - Support for running Docker without root privileges
Modern Docker architecture:
┌───────────────────────────────────────────────────┐
│ Docker Engine │
│ │
│ ┌─────────────┐ ┌──────────────┐ │
│ │ Docker │ │ dockerd │ │
│ │ Client │◀────▶│ (daemon) │ │
│ └─────────────┘ └──────┬───────┘ │
│ │ │
│ ┌─────▼──────┐ │
│ │ containerd │ │
│ └─────┬──────┘ │
│ │ │
│ ┌─────▼──────┐ │
│ │ runc/runsc │ │
│ └────────────┘ │
└───────────────────────────────────────────────────┘
In this modern architecture:
- dockerd - The Docker daemon that manages the Docker API
- containerd - A daemon that manages the complete container lifecycle
- runc - A container runtime that implements the OCI runtime specification
This modular approach allows for more flexibility and enables other tools to leverage Docker's components. For example, Kubernetes can use containerd directly without the full Docker daemon.
Key Takeaways
- Docker uses a client-server architecture with the client sending commands to the daemon
- Images are read-only templates used to create containers
- Containers are runnable instances of images with an additional writable layer
- Registries store and distribute images
- Docker provides sophisticated storage options, with volumes being the preferred method for persistent data
- Docker's networking capabilities allow containers to communicate and be accessible from outside
- Docker Compose simplifies managing multi-container applications
- Production environments often add orchestration, security measures, and monitoring to the basic Docker architecture
Understanding these components and how they interact is fundamental to working effectively with Docker and designing containerized applications.
Looking Ahead
In our afternoon session, we'll begin hands-on work with Docker, where you'll see these architectural components in action. We'll:
- Run your first Docker container
- Explore Docker Hub and public images
- Create a basic Dockerfile
- Build and run your own image
By the end of the day, you'll have practical experience with the core Docker components we've discussed in this theoretical session.
Discussion Questions
- How does Docker's architecture compare to traditional virtualization solutions like VMware or VirtualBox?
- Why is the layered approach to images important for efficiency in Docker?
- What are the advantages and disadvantages of Docker's approach to container networking?
- In what scenarios might Docker volumes be preferred over bind mounts, and vice versa?
- How does understanding Docker's architecture help you design better containerized applications?
Additional Resources
- Docker Architecture Overview - Official documentation
- Docker Storage Overview - Detailed information about storage options
- Docker Networking Overview - Comprehensive guide to Docker networks
- containerd GitHub Repository - For those interested in the lower-level container runtime
- Docker Architecture Explained - Visual explanation of Docker components