Lecture Overview
In this session, we'll explore how to create your own Docker images by writing Dockerfiles. A Dockerfile is a text document containing instructions that Docker uses to automatically build an image. Understanding Dockerfiles is a crucial skill for containerizing applications effectively. By the end of this session, you'll be able to create custom Docker images tailored to your specific application needs.
What is a Dockerfile?
A Dockerfile is a plain text file named Dockerfile (with no file extension) that contains a series of instructions for Docker to build an image.
Key Concepts
- Instructions: Commands that Docker executes during the build process
- Base Image: The starting point for your image, usually a minimal operating system or runtime environment
- Layers: Each instruction creates a new layer in the image
- Build Context: The set of files and directories available during the build process
Analogy: A Dockerfile is like a recipe for baking a cake. It starts with base ingredients (the base image), follows a sequence of steps (instructions), and each step transforms the ingredients in some way. The final result is a complete, ready-to-use cake (Docker image). Just as a good baker might adapt a recipe for different occasions, you'll customize your Dockerfile for different applications.
Why Create Custom Images?
While public images are convenient, custom images offer several advantages:
- Application-specific configuration: Tailored to your application's exact needs
- Dependency management: Include only the libraries and tools your application requires
- Consistency: Ensure development, testing, and production environments are identical
- Automation: Streamline deployment by automating environment setup
- Version control: Track changes to your environment alongside your code
Dockerfile Syntax and Structure
Dockerfiles use a simple, declarative syntax. Each line contains an instruction followed by arguments.
Basic Structure
# Comment
INSTRUCTION arguments
For example:
# Use Python 3.9 as base image
FROM python:3.9-slim
# Set working directory
WORKDIR /app
# Copy requirements file
COPY requirements.txt .
# Install dependencies
RUN pip install --no-cache-dir -r requirements.txt
# Copy application code
COPY . .
# Set environment variables
ENV PYTHONUNBUFFERED=1
# Command to run when container starts
CMD ["python", "app.py"]
Common Instructions
| Instruction | Purpose |
|---|---|
FROM |
Sets the base image |
RUN |
Executes commands in the container during build |
COPY |
Copies files from the host to the container |
ADD |
Similar to COPY, but with additional features (URL support, auto-extraction) |
WORKDIR |
Sets the working directory for subsequent instructions |
ENV |
Sets environment variables |
EXPOSE |
Documents which ports the container listens on |
CMD |
Provides default command to run when container starts |
ENTRYPOINT |
Configures the container to run as an executable |
Best practice: Keep your Dockerfile commands in logical order, typically following this pattern: base image, dependencies, application code, configuration, and finally the command to run.
Dockerfile Instructions in Detail
Let's explore each common instruction in more depth:
FROM
The FROM instruction initializes a new build stage and sets the base image. It's typically the first instruction in a Dockerfile.
FROM [--platform=<platform>] <image>[:<tag>] [AS <name>]
Examples:
FROM python:3.9-slim
FROM ubuntu:20.04
FROM node:14-alpine AS build-stage
Note: The FROM instruction must be the first non-comment instruction in the Dockerfile. You can have multiple FROM instructions in a single Dockerfile for multi-stage builds (an advanced technique we'll cover later).
WORKDIR
The WORKDIR instruction sets the working directory for any subsequent RUN, CMD, ENTRYPOINT, COPY, and ADD instructions.
WORKDIR /path/to/directory
Example:
WORKDIR /app
Best practice: Always use absolute paths with WORKDIR. If a specified directory doesn't exist, Docker will create it. Use WORKDIR instead of a series of RUN cd /path commands for clarity and reliability.
COPY and ADD
These instructions copy files from the build context to the image.
COPY [--chown=<user>:<group>] <src> <dest>
ADD [--chown=<user>:<group>] <src> <dest>
Examples:
COPY requirements.txt .
COPY . /app/
ADD https://example.com/file.tar.gz /tmp/
COPY vs ADD: COPY simply copies files, while ADD has additional features like URL support and automatic tar extraction. Generally, use COPY unless you specifically need ADD's features, as COPY is more explicit.
RUN
The RUN instruction executes commands in a new layer and commits the results.
RUN <command>
RUN ["executable", "param1", "param2"]
Examples:
RUN apt-get update && apt-get install -y curl
RUN pip install --no-cache-dir -r requirements.txt
RUN ["bash", "-c", "echo $HOME"]
Best practice: Chain related commands in a single RUN instruction using && to reduce the number of layers. For package installations, include cleanup steps in the same RUN instruction to keep the image size down.
ENV
The ENV instruction sets environment variables that persist when a container is run.
ENV <key>=<value> ...
Examples:
ENV PYTHONUNBUFFERED=1
ENV NODE_ENV=production PORT=3000
Note: Environment variables set with ENV are available to containers created from the image, not just during the build process. They can be overridden at runtime with docker run -e.
EXPOSE
The EXPOSE instruction informs Docker that the container listens on specific network ports at runtime.
EXPOSE <port> [<port>/<protocol>...]
Examples:
EXPOSE 8080
EXPOSE 80/tcp 443/tcp
Note: EXPOSE doesn't actually publish the port. It functions as documentation between the person who builds the image and the person who runs the container. To actually publish the port when running the container, use docker run -p.
CMD
The CMD instruction provides default commands for an executing container.
CMD ["executable", "param1", "param2"]
CMD command param1 param2
CMD ["param1", "param2"] (as default parameters to ENTRYPOINT)
Examples:
CMD ["python", "app.py"]
CMD nginx -g "daemon off;"
Note: There can only be one CMD instruction in a Dockerfile. If multiple are specified, only the last one takes effect. The CMD can be overridden by specifying a command when running the container with docker run image command.
ENTRYPOINT
The ENTRYPOINT instruction configures the container to run as an executable.
ENTRYPOINT ["executable", "param1", "param2"]
ENTRYPOINT command param1 param2
Examples:
ENTRYPOINT ["python", "app.py"]
ENTRYPOINT ["docker-entrypoint.sh"]
ENTRYPOINT vs CMD: ENTRYPOINT specifies the executable to run, while CMD provides default arguments that can be overridden. They're often used together, with ENTRYPOINT providing the command and CMD providing default arguments.
Other Instructions
USER: Sets the user name or UID for subsequent instructionsVOLUME: Creates a mount point for external volumesARG: Defines build-time variables that can be passed during buildLABEL: Adds metadata to the imageHEALTHCHECK: Tells Docker how to test if the container is still working
Creating Your First Dockerfile
Let's create a basic Dockerfile for a Python web application using Flask.
Step 1: Create the Application Files
First, create a new directory for your project and set up the application files:
mkdir flask_docker_demo
cd flask_docker_demo
Create a file named app.py with the following content:
from flask import Flask
app = Flask(__name__)
@app.route('/')
def hello():
return "Hello, Docker!"
if __name__ == "__main__":
app.run(host="0.0.0.0", port=5000)
Create a file named requirements.txt with the following content:
flask==2.0.1
Step 2: Create the Dockerfile
Now, create a file named Dockerfile (with no extension) in the same directory:
# Use Python 3.9 slim as the base image
FROM python:3.9-slim
# Set the working directory in the container
WORKDIR /app
# Copy the requirements file into the container
COPY requirements.txt .
# Install the dependencies
RUN pip install --no-cache-dir -r requirements.txt
# Copy the rest of the application code
COPY . .
# Tell Docker that the container listens on port 5000
EXPOSE 5000
# Set environment variables
ENV FLASK_APP=app.py
ENV FLASK_ENV=production
ENV PYTHONUNBUFFERED=1
# Command to run when the container starts
CMD ["python", "app.py"]
Let's break down what each instruction does:
FROM python:3.9-slim: Start with a slim Python 3.9 imageWORKDIR /app: Set the working directory to /appCOPY requirements.txt .: Copy only the requirements file firstRUN pip install...: Install the Python dependenciesCOPY . .: Copy the rest of the application codeEXPOSE 5000: Document that the container listens on port 5000ENV...: Set environment variables for FlaskCMD ["python", "app.py"]: Run the application when the container starts
Step 3: Build the Docker Image
Now, let's build the Docker image from our Dockerfile:
docker build -t flask-demo .
The -t flag tags the image with a name, and the . specifies the build context (the current directory).
You should see output showing the build progress, with each step corresponding to an instruction in the Dockerfile:
Sending build context to Docker daemon 4.096kB
Step 1/8 : FROM python:3.9-slim
---> 8c705081f50d
Step 2/8 : WORKDIR /app
---> Running in 6ce8e46488fd
Removing intermediate container 6ce8e46488fd
---> 9d53d0d5b1fd
Step 3/8 : COPY requirements.txt .
---> 7b90bead6fed
...
Successfully built f8b2f81f83a
Successfully tagged flask-demo:latest
Step 4: Run the Container
Now we can run a container from our newly built image:
docker run -p 5000:5000 --name my-flask-app flask-demo
This command starts a container named my-flask-app from the flask-demo image and maps port 5000 from the container to port 5000 on the host.
You should be able to access your Flask application by opening http://localhost:5000 in your web browser, which should display "Hello, Docker!"
Analogy: The process we just went through is like creating a specialized kitchen (the Docker image) designed specifically for making a particular dish (our Flask application). We started with a basic kitchen (the base image), added the tools and ingredients we need (dependencies), and provided the recipe (application code). Now, whenever we want to make that dish, we can simply "turn on" our specialized kitchen (run a container) and it's ready to go - no setup required!
Dockerfile Best Practices
Layer Optimization
Each instruction in a Dockerfile creates a new layer. To optimize your images:
- Combine related commands in a single
RUNinstruction to reduce layers - Clean up in the same layer where you create files (e.g., remove package manager caches)
- Use multi-stage builds for complex applications (more on this later)
Before optimization:
RUN apt-get update
RUN apt-get install -y curl
RUN apt-get install -y nginx
RUN apt-get clean
After optimization:
RUN apt-get update && \
apt-get install -y curl nginx && \
apt-get clean && \
rm -rf /var/lib/apt/lists/*
Leveraging Build Cache
Docker caches the results of each instruction to speed up subsequent builds. To make the most of caching:
- Order instructions from least to most likely to change
- Copy dependency files (like requirements.txt) separately before copying the rest of the code
- Be aware that cache invalidation causes all subsequent instructions to be re-executed
Poor caching strategy:
COPY . /app
RUN pip install -r requirements.txt
Better caching strategy:
COPY requirements.txt /app/
RUN pip install -r requirements.txt
COPY . /app
This way, changing your application code doesn't trigger a reinstallation of all dependencies.
Security Best Practices
- Use specific version tags for base images, not
latest - Run containers as non-root users when possible
- Minimize the number of installed packages to reduce the attack surface
- Don't store secrets in your Dockerfile (use environment variables or secret management tools)
- Scan your images for vulnerabilities before deployment
Running as a non-root user:
# Create a non-root user
RUN adduser --disabled-password --gecos "" appuser
# Switch to non-root user
USER appuser
# Make sure the user owns the application directory
COPY --chown=appuser:appuser . /app
General Best Practices
- Use .dockerignore files to exclude unnecessary files from the build context
- Keep images small by removing unnecessary files and using smaller base images
- Document your image with comments and
LABELinstructions - Test your Dockerfile in different environments before production use
- Prefer
COPYoverADDunless you need the additional features
Example .dockerignore file:
.git
.gitignore
.env
__pycache__
*.pyc
*.pyo
*.pyd
.Python
env/
venv/
node_modules/
npm-debug.log
Advanced Dockerfile Techniques
Multi-stage Builds
Multi-stage builds allow you to use multiple FROM statements in your Dockerfile. Each FROM instruction starts a new build stage, and you can selectively copy artifacts from one stage to another, leaving behind everything you don't need in the final image.
Example: Building a React frontend with Node.js, then serving it with NGINX:
# Build stage
FROM node:14 AS build
WORKDIR /app
COPY package*.json ./
RUN npm install
COPY . .
RUN npm run build
# Production stage
FROM nginx:alpine
COPY --from=build /app/build /usr/share/nginx/html
EXPOSE 80
CMD ["nginx", "-g", "daemon off;"]
This Dockerfile uses two stages: first, it builds the React application using Node.js, then it copies only the built files to an NGINX image for serving. The final image contains only NGINX and the built frontend files, not Node.js or the development dependencies.
ARG and Build-time Variables
The ARG instruction defines variables that can be passed at build time with the --build-arg flag.
Example: Configurable Python version:
# Define build arguments
ARG PYTHON_VERSION=3.9
# Use the argument in the FROM instruction
FROM python:${PYTHON_VERSION}-slim
# Later, you could build with a different version:
# docker build --build-arg PYTHON_VERSION=3.10 -t myapp .
Using HEALTHCHECK
The HEALTHCHECK instruction tells Docker how to test that the container is still working properly.
Example: Healthcheck for a web server:
FROM nginx:alpine
COPY index.html /usr/share/nginx/html/
EXPOSE 80
HEALTHCHECK --interval=30s --timeout=3s \
CMD wget --no-verbose --tries=1 --spider http://localhost/ || exit 1
CMD ["nginx", "-g", "daemon off;"]
This Dockerfile includes a health check that tries to fetch the root page every 30 seconds. If the command fails, the container is considered unhealthy.
Environment-specific Dockerfiles
For applications that need different configurations in development and production, you can create multiple Dockerfiles or use build arguments to switch configurations.
Example: Development vs. Production:
# Dockerfile.dev
FROM python:3.9-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY . .
ENV FLASK_ENV=development
CMD ["flask", "run", "--host=0.0.0.0"]
# Dockerfile.prod
FROM python:3.9-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY . .
ENV FLASK_ENV=production
RUN python -m compileall .
CMD ["gunicorn", "app:app", "-b", "0.0.0.0:5000"]
You would build these with:
docker build -f Dockerfile.dev -t myapp:dev .
docker build -f Dockerfile.prod -t myapp:prod .
Practical Examples for Different Use Cases
Python Web Application (Django)
# Use Python 3.9 as base image
FROM python:3.9-slim
# Set environment variables
ENV PYTHONDONTWRITEBYTECODE=1
ENV PYTHONUNBUFFERED=1
# Set work directory
WORKDIR /app
# Install dependencies
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
# Copy project
COPY . .
# Run migrations and start server
CMD ["sh", "-c", "python manage.py migrate && python manage.py runserver 0.0.0.0:8000"]
Node.js Application
# Use Node.js 16 as base image
FROM node:16-alpine
# Create app directory
WORKDIR /usr/src/app
# Install app dependencies
COPY package*.json ./
RUN npm install
# Bundle app source
COPY . .
# Expose port
EXPOSE 3000
# Start application
CMD ["node", "server.js"]
Go Application
# Build stage
FROM golang:1.17 AS build
WORKDIR /app
COPY go.mod go.sum ./
RUN go mod download
COPY . .
RUN CGO_ENABLED=0 GOOS=linux go build -a -installsuffix cgo -o app .
# Final stage
FROM alpine:3.14
WORKDIR /root/
COPY --from=build /app/app .
EXPOSE 8080
CMD ["./app"]
Java Spring Boot Application
# Build stage
FROM maven:3.8.3-openjdk-17 AS build
WORKDIR /app
COPY pom.xml .
RUN mvn dependency:go-offline
COPY src/ /app/src/
RUN mvn package -DskipTests
# Final stage
FROM openjdk:17-jdk-slim
WORKDIR /app
COPY --from=build /app/target/*.jar app.jar
EXPOSE 8080
ENTRYPOINT ["java", "-jar", "app.jar"]
Common Issues and Troubleshooting
Build Context Errors
If Docker complains about files not being found during COPY or ADD operations, check:
- The path is correct relative to the build context
- The file exists and has the right permissions
- The file isn't excluded by a
.dockerignorerule
Layer Caching Issues
If changes to your code aren't being reflected in builds:
- Use
docker build --no-cacheto force a complete rebuild - Check that you're copying files in the right order for optimal caching
- Ensure you're not copying generated files that might be out of date
Permission Problems
If your container has permission issues:
- Check that you're running as the right user (
USERinstruction) - Ensure file permissions are set correctly when copying files
- Use
--chownwithCOPYto set ownership
Container Won't Start
If your container exits immediately after starting:
- Check that your
CMDorENTRYPOINTis correct - Try running with
docker run -it --rm image bashto get a shell inside - Check logs with
docker logs container_id - Ensure your application handles signals properly (especially SIGTERM)
Debugging Dockerfiles
To troubleshoot Dockerfile issues:
- Use intermediate containers to debug:
# Find the last successful layer docker build -t debug-image . || true # Start a container from that layer docker run -it debug-image bash - Add
RUN ls -lacommands to check the state at various points - Use
docker history image_nameto see layer sizes and commands
Building and Publishing Your Image
Building with Tags
To build an image with a specific tag:
docker build -t username/repository:tag .
For example:
docker build -t johndoe/flask-app:1.0 .
Publishing to Docker Hub
To share your image on Docker Hub:
- Log in to Docker Hub:
docker login - Push your image:
docker push username/repository:tag
Building for Multiple Architectures
To build for multiple CPU architectures (like amd64 and arm64):
docker buildx create --name mybuilder --use
docker buildx build --platform linux/amd64,linux/arm64 -t username/repository:tag --push .
Note: buildx is Docker's experimental builder with multi-architecture support. You may need to enable experimental features in Docker Desktop settings.
Dockerfile Exercises
Exercise 1: Basic Static Website
Create a Dockerfile for a simple static website served by NGINX:
- Create an
index.htmlfile with some content - Write a Dockerfile that:
- Uses NGINX as the base image
- Copies your HTML file to the right location
- Exposes port 80
- Build and run the image
- Access the website at
http://localhost:80
Solution:
# Dockerfile
FROM nginx:alpine
COPY index.html /usr/share/nginx/html/
EXPOSE 80
CMD ["nginx", "-g", "daemon off;"]
# Build and run
docker build -t static-site .
docker run -p 80:80 static-site
Exercise 2: Python Data Science Environment
Create a Dockerfile for a Python data science environment:
- Write a Dockerfile that:
- Uses Python 3.9 as the base image
- Installs common data science packages (numpy, pandas, matplotlib)
- Sets up a working directory
- Starts a Python shell by default
- Build and run the image with a volume mount for your data
Solution:
# Dockerfile
FROM python:3.9-slim
RUN pip install --no-cache-dir numpy pandas matplotlib jupyter
WORKDIR /data
CMD ["python"]
# Build and run
docker build -t datascience .
docker run -it -v $(pwd):/data datascience
Exercise 3: Multi-stage Frontend Build
Create a multi-stage Dockerfile for a React application:
- Write a Dockerfile that:
- Uses Node.js to build the React application in the first stage
- Uses NGINX to serve the built files in the second stage
- Results in a small final image
Solution:
# Dockerfile
# Build stage
FROM node:14 AS build
WORKDIR /app
# Install dependencies
COPY package*.json ./
RUN npm install
# Copy and build app
COPY . .
RUN npm run build
# Production stage
FROM nginx:alpine
# Copy built files from build stage
COPY --from=build /app/build /usr/share/nginx/html
# Expose port
EXPOSE 80
# Start NGINX
CMD ["nginx", "-g", "daemon off;"]
Key Takeaways
- A Dockerfile is a text file containing instructions for building a Docker image
- Common instructions include
FROM,RUN,COPY,WORKDIR,ENV,EXPOSE, andCMD - Each instruction creates a layer in the image, which affects caching and image size
- Optimizing Dockerfiles involves minimizing layers, leveraging caching, and reducing image size
- Multi-stage builds allow you to create smaller, more efficient images
- Best practices include running as non-root, using specific tags, and not storing secrets in images
- Different applications may require different Dockerfile strategies
- Dockerfiles can be customized for different environments and use cases
With these concepts and techniques, you're now equipped to create custom Docker images tailored to your specific application needs!
Looking Ahead
In our next session, we'll dive deeper into building and running your own custom images. We'll apply what we've learned about Dockerfiles to create more complex applications and explore advanced techniques for optimizing and deploying them.
Discussion Questions
- How might you adapt the Flask Dockerfile we created to better suit a development environment? What about a production environment?
- What are the security implications of running containers as the root user? How would you modify a Dockerfile to run as a non-root user?
- How could multi-stage builds improve your application deployment workflow?
- What strategies would you use to minimize the size of your Docker images without sacrificing functionality?
- How would you adapt Dockerfile strategies for different types of applications (e.g., frontend vs. backend, static vs. dynamic)?
Additional Resources
- Dockerfile Reference - Official documentation for Dockerfile instructions
- Dockerfile Best Practices - Docker's official best practice guide
- Docker Java Labs - Examples for Java applications
- Python Speed Docker Guide - Specialized guide for Python in Docker
- Docker Build Command Reference - Detailed information on build options