Creating a Basic Dockerfile

Week 1, Wednesday - Afternoon Session

Lecture Overview

In this session, we'll explore how to create your own Docker images by writing Dockerfiles. A Dockerfile is a text document containing instructions that Docker uses to automatically build an image. Understanding Dockerfiles is a crucial skill for containerizing applications effectively. By the end of this session, you'll be able to create custom Docker images tailored to your specific application needs.

What is a Dockerfile?

A Dockerfile is a plain text file named Dockerfile (with no file extension) that contains a series of instructions for Docker to build an image.

Key Concepts

Analogy: A Dockerfile is like a recipe for baking a cake. It starts with base ingredients (the base image), follows a sequence of steps (instructions), and each step transforms the ingredients in some way. The final result is a complete, ready-to-use cake (Docker image). Just as a good baker might adapt a recipe for different occasions, you'll customize your Dockerfile for different applications.

Why Create Custom Images?

While public images are convenient, custom images offer several advantages:

Dockerfile Syntax and Structure

Dockerfiles use a simple, declarative syntax. Each line contains an instruction followed by arguments.

Basic Structure

# Comment
INSTRUCTION arguments

For example:

# Use Python 3.9 as base image
FROM python:3.9-slim

# Set working directory
WORKDIR /app

# Copy requirements file
COPY requirements.txt .

# Install dependencies
RUN pip install --no-cache-dir -r requirements.txt

# Copy application code
COPY . .

# Set environment variables
ENV PYTHONUNBUFFERED=1

# Command to run when container starts
CMD ["python", "app.py"]

Common Instructions

Instruction Purpose
FROM Sets the base image
RUN Executes commands in the container during build
COPY Copies files from the host to the container
ADD Similar to COPY, but with additional features (URL support, auto-extraction)
WORKDIR Sets the working directory for subsequent instructions
ENV Sets environment variables
EXPOSE Documents which ports the container listens on
CMD Provides default command to run when container starts
ENTRYPOINT Configures the container to run as an executable

Best practice: Keep your Dockerfile commands in logical order, typically following this pattern: base image, dependencies, application code, configuration, and finally the command to run.

Dockerfile Instructions in Detail

Let's explore each common instruction in more depth:

FROM

The FROM instruction initializes a new build stage and sets the base image. It's typically the first instruction in a Dockerfile.

FROM [--platform=<platform>] <image>[:<tag>] [AS <name>]

Examples:

FROM python:3.9-slim
FROM ubuntu:20.04
FROM node:14-alpine AS build-stage

Note: The FROM instruction must be the first non-comment instruction in the Dockerfile. You can have multiple FROM instructions in a single Dockerfile for multi-stage builds (an advanced technique we'll cover later).

WORKDIR

The WORKDIR instruction sets the working directory for any subsequent RUN, CMD, ENTRYPOINT, COPY, and ADD instructions.

WORKDIR /path/to/directory

Example:

WORKDIR /app

Best practice: Always use absolute paths with WORKDIR. If a specified directory doesn't exist, Docker will create it. Use WORKDIR instead of a series of RUN cd /path commands for clarity and reliability.

COPY and ADD

These instructions copy files from the build context to the image.

COPY [--chown=<user>:<group>] <src> <dest>
ADD [--chown=<user>:<group>] <src> <dest>

Examples:

COPY requirements.txt .
COPY . /app/
ADD https://example.com/file.tar.gz /tmp/

COPY vs ADD: COPY simply copies files, while ADD has additional features like URL support and automatic tar extraction. Generally, use COPY unless you specifically need ADD's features, as COPY is more explicit.

RUN

The RUN instruction executes commands in a new layer and commits the results.

RUN <command>
RUN ["executable", "param1", "param2"]

Examples:

RUN apt-get update && apt-get install -y curl
RUN pip install --no-cache-dir -r requirements.txt
RUN ["bash", "-c", "echo $HOME"]

Best practice: Chain related commands in a single RUN instruction using && to reduce the number of layers. For package installations, include cleanup steps in the same RUN instruction to keep the image size down.

ENV

The ENV instruction sets environment variables that persist when a container is run.

ENV <key>=<value> ...

Examples:

ENV PYTHONUNBUFFERED=1
ENV NODE_ENV=production PORT=3000

Note: Environment variables set with ENV are available to containers created from the image, not just during the build process. They can be overridden at runtime with docker run -e.

EXPOSE

The EXPOSE instruction informs Docker that the container listens on specific network ports at runtime.

EXPOSE <port> [<port>/<protocol>...]

Examples:

EXPOSE 8080
EXPOSE 80/tcp 443/tcp

Note: EXPOSE doesn't actually publish the port. It functions as documentation between the person who builds the image and the person who runs the container. To actually publish the port when running the container, use docker run -p.

CMD

The CMD instruction provides default commands for an executing container.

CMD ["executable", "param1", "param2"]
CMD command param1 param2
CMD ["param1", "param2"] (as default parameters to ENTRYPOINT)

Examples:

CMD ["python", "app.py"]
CMD nginx -g "daemon off;"

Note: There can only be one CMD instruction in a Dockerfile. If multiple are specified, only the last one takes effect. The CMD can be overridden by specifying a command when running the container with docker run image command.

ENTRYPOINT

The ENTRYPOINT instruction configures the container to run as an executable.

ENTRYPOINT ["executable", "param1", "param2"]
ENTRYPOINT command param1 param2

Examples:

ENTRYPOINT ["python", "app.py"]
ENTRYPOINT ["docker-entrypoint.sh"]

ENTRYPOINT vs CMD: ENTRYPOINT specifies the executable to run, while CMD provides default arguments that can be overridden. They're often used together, with ENTRYPOINT providing the command and CMD providing default arguments.

Other Instructions

Creating Your First Dockerfile

Let's create a basic Dockerfile for a Python web application using Flask.

Step 1: Create the Application Files

First, create a new directory for your project and set up the application files:

mkdir flask_docker_demo
cd flask_docker_demo

Create a file named app.py with the following content:

from flask import Flask
app = Flask(__name__)

@app.route('/')
def hello():
    return "Hello, Docker!"

if __name__ == "__main__":
    app.run(host="0.0.0.0", port=5000)

Create a file named requirements.txt with the following content:

flask==2.0.1

Step 2: Create the Dockerfile

Now, create a file named Dockerfile (with no extension) in the same directory:

# Use Python 3.9 slim as the base image
FROM python:3.9-slim

# Set the working directory in the container
WORKDIR /app

# Copy the requirements file into the container
COPY requirements.txt .

# Install the dependencies
RUN pip install --no-cache-dir -r requirements.txt

# Copy the rest of the application code
COPY . .

# Tell Docker that the container listens on port 5000
EXPOSE 5000

# Set environment variables
ENV FLASK_APP=app.py
ENV FLASK_ENV=production
ENV PYTHONUNBUFFERED=1

# Command to run when the container starts
CMD ["python", "app.py"]

Let's break down what each instruction does:

  • FROM python:3.9-slim: Start with a slim Python 3.9 image
  • WORKDIR /app: Set the working directory to /app
  • COPY requirements.txt .: Copy only the requirements file first
  • RUN pip install...: Install the Python dependencies
  • COPY . .: Copy the rest of the application code
  • EXPOSE 5000: Document that the container listens on port 5000
  • ENV...: Set environment variables for Flask
  • CMD ["python", "app.py"]: Run the application when the container starts

Step 3: Build the Docker Image

Now, let's build the Docker image from our Dockerfile:

docker build -t flask-demo .

The -t flag tags the image with a name, and the . specifies the build context (the current directory).

You should see output showing the build progress, with each step corresponding to an instruction in the Dockerfile:

Sending build context to Docker daemon  4.096kB
Step 1/8 : FROM python:3.9-slim
 ---> 8c705081f50d
Step 2/8 : WORKDIR /app
 ---> Running in 6ce8e46488fd
Removing intermediate container 6ce8e46488fd
 ---> 9d53d0d5b1fd
Step 3/8 : COPY requirements.txt .
 ---> 7b90bead6fed
...
Successfully built f8b2f81f83a
Successfully tagged flask-demo:latest

Step 4: Run the Container

Now we can run a container from our newly built image:

docker run -p 5000:5000 --name my-flask-app flask-demo

This command starts a container named my-flask-app from the flask-demo image and maps port 5000 from the container to port 5000 on the host.

You should be able to access your Flask application by opening http://localhost:5000 in your web browser, which should display "Hello, Docker!"

Analogy: The process we just went through is like creating a specialized kitchen (the Docker image) designed specifically for making a particular dish (our Flask application). We started with a basic kitchen (the base image), added the tools and ingredients we need (dependencies), and provided the recipe (application code). Now, whenever we want to make that dish, we can simply "turn on" our specialized kitchen (run a container) and it's ready to go - no setup required!

Dockerfile Best Practices

Layer Optimization

Each instruction in a Dockerfile creates a new layer. To optimize your images:

Before optimization:

RUN apt-get update
RUN apt-get install -y curl
RUN apt-get install -y nginx
RUN apt-get clean

After optimization:

RUN apt-get update && \
    apt-get install -y curl nginx && \
    apt-get clean && \
    rm -rf /var/lib/apt/lists/*

Leveraging Build Cache

Docker caches the results of each instruction to speed up subsequent builds. To make the most of caching:

Poor caching strategy:

COPY . /app
RUN pip install -r requirements.txt

Better caching strategy:

COPY requirements.txt /app/
RUN pip install -r requirements.txt
COPY . /app

This way, changing your application code doesn't trigger a reinstallation of all dependencies.

Security Best Practices

Running as a non-root user:

# Create a non-root user
RUN adduser --disabled-password --gecos "" appuser

# Switch to non-root user
USER appuser

# Make sure the user owns the application directory
COPY --chown=appuser:appuser . /app

General Best Practices

Example .dockerignore file:

.git
.gitignore
.env
__pycache__
*.pyc
*.pyo
*.pyd
.Python
env/
venv/
node_modules/
npm-debug.log

Advanced Dockerfile Techniques

Multi-stage Builds

Multi-stage builds allow you to use multiple FROM statements in your Dockerfile. Each FROM instruction starts a new build stage, and you can selectively copy artifacts from one stage to another, leaving behind everything you don't need in the final image.

Example: Building a React frontend with Node.js, then serving it with NGINX:

# Build stage
FROM node:14 AS build
WORKDIR /app
COPY package*.json ./
RUN npm install
COPY . .
RUN npm run build

# Production stage
FROM nginx:alpine
COPY --from=build /app/build /usr/share/nginx/html
EXPOSE 80
CMD ["nginx", "-g", "daemon off;"]

This Dockerfile uses two stages: first, it builds the React application using Node.js, then it copies only the built files to an NGINX image for serving. The final image contains only NGINX and the built frontend files, not Node.js or the development dependencies.

ARG and Build-time Variables

The ARG instruction defines variables that can be passed at build time with the --build-arg flag.

Example: Configurable Python version:

# Define build arguments
ARG PYTHON_VERSION=3.9

# Use the argument in the FROM instruction
FROM python:${PYTHON_VERSION}-slim

# Later, you could build with a different version:
# docker build --build-arg PYTHON_VERSION=3.10 -t myapp .

Using HEALTHCHECK

The HEALTHCHECK instruction tells Docker how to test that the container is still working properly.

Example: Healthcheck for a web server:

FROM nginx:alpine
COPY index.html /usr/share/nginx/html/
EXPOSE 80
HEALTHCHECK --interval=30s --timeout=3s \
  CMD wget --no-verbose --tries=1 --spider http://localhost/ || exit 1
CMD ["nginx", "-g", "daemon off;"]

This Dockerfile includes a health check that tries to fetch the root page every 30 seconds. If the command fails, the container is considered unhealthy.

Environment-specific Dockerfiles

For applications that need different configurations in development and production, you can create multiple Dockerfiles or use build arguments to switch configurations.

Example: Development vs. Production:

# Dockerfile.dev
FROM python:3.9-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY . .
ENV FLASK_ENV=development
CMD ["flask", "run", "--host=0.0.0.0"]
# Dockerfile.prod
FROM python:3.9-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY . .
ENV FLASK_ENV=production
RUN python -m compileall .
CMD ["gunicorn", "app:app", "-b", "0.0.0.0:5000"]

You would build these with:

docker build -f Dockerfile.dev -t myapp:dev .
docker build -f Dockerfile.prod -t myapp:prod .

Practical Examples for Different Use Cases

Python Web Application (Django)

# Use Python 3.9 as base image
FROM python:3.9-slim

# Set environment variables
ENV PYTHONDONTWRITEBYTECODE=1
ENV PYTHONUNBUFFERED=1

# Set work directory
WORKDIR /app

# Install dependencies
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

# Copy project
COPY . .

# Run migrations and start server
CMD ["sh", "-c", "python manage.py migrate && python manage.py runserver 0.0.0.0:8000"]

Node.js Application

# Use Node.js 16 as base image
FROM node:16-alpine

# Create app directory
WORKDIR /usr/src/app

# Install app dependencies
COPY package*.json ./
RUN npm install

# Bundle app source
COPY . .

# Expose port
EXPOSE 3000

# Start application
CMD ["node", "server.js"]

Go Application

# Build stage
FROM golang:1.17 AS build

WORKDIR /app

COPY go.mod go.sum ./
RUN go mod download

COPY . .
RUN CGO_ENABLED=0 GOOS=linux go build -a -installsuffix cgo -o app .

# Final stage
FROM alpine:3.14

WORKDIR /root/

COPY --from=build /app/app .

EXPOSE 8080

CMD ["./app"]

Java Spring Boot Application

# Build stage
FROM maven:3.8.3-openjdk-17 AS build
WORKDIR /app
COPY pom.xml .
RUN mvn dependency:go-offline
COPY src/ /app/src/
RUN mvn package -DskipTests

# Final stage
FROM openjdk:17-jdk-slim
WORKDIR /app
COPY --from=build /app/target/*.jar app.jar
EXPOSE 8080
ENTRYPOINT ["java", "-jar", "app.jar"]

Common Issues and Troubleshooting

Build Context Errors

If Docker complains about files not being found during COPY or ADD operations, check:

Layer Caching Issues

If changes to your code aren't being reflected in builds:

Permission Problems

If your container has permission issues:

Container Won't Start

If your container exits immediately after starting:

Debugging Dockerfiles

To troubleshoot Dockerfile issues:

Building and Publishing Your Image

Building with Tags

To build an image with a specific tag:

docker build -t username/repository:tag .

For example:

docker build -t johndoe/flask-app:1.0 .

Publishing to Docker Hub

To share your image on Docker Hub:

  1. Log in to Docker Hub:
    docker login
  2. Push your image:
    docker push username/repository:tag

Building for Multiple Architectures

To build for multiple CPU architectures (like amd64 and arm64):

docker buildx create --name mybuilder --use
docker buildx build --platform linux/amd64,linux/arm64 -t username/repository:tag --push .

Note: buildx is Docker's experimental builder with multi-architecture support. You may need to enable experimental features in Docker Desktop settings.

Dockerfile Exercises

Exercise 1: Basic Static Website

Create a Dockerfile for a simple static website served by NGINX:

  1. Create an index.html file with some content
  2. Write a Dockerfile that:
    • Uses NGINX as the base image
    • Copies your HTML file to the right location
    • Exposes port 80
  3. Build and run the image
  4. Access the website at http://localhost:80

Solution:

# Dockerfile
FROM nginx:alpine
COPY index.html /usr/share/nginx/html/
EXPOSE 80
CMD ["nginx", "-g", "daemon off;"]
# Build and run
docker build -t static-site .
docker run -p 80:80 static-site

Exercise 2: Python Data Science Environment

Create a Dockerfile for a Python data science environment:

  1. Write a Dockerfile that:
    • Uses Python 3.9 as the base image
    • Installs common data science packages (numpy, pandas, matplotlib)
    • Sets up a working directory
    • Starts a Python shell by default
  2. Build and run the image with a volume mount for your data

Solution:

# Dockerfile
FROM python:3.9-slim

RUN pip install --no-cache-dir numpy pandas matplotlib jupyter

WORKDIR /data

CMD ["python"]
# Build and run
docker build -t datascience .
docker run -it -v $(pwd):/data datascience

Exercise 3: Multi-stage Frontend Build

Create a multi-stage Dockerfile for a React application:

  1. Write a Dockerfile that:
    • Uses Node.js to build the React application in the first stage
    • Uses NGINX to serve the built files in the second stage
    • Results in a small final image

Solution:

# Dockerfile
# Build stage
FROM node:14 AS build

WORKDIR /app

# Install dependencies
COPY package*.json ./
RUN npm install

# Copy and build app
COPY . .
RUN npm run build

# Production stage
FROM nginx:alpine

# Copy built files from build stage
COPY --from=build /app/build /usr/share/nginx/html

# Expose port
EXPOSE 80

# Start NGINX
CMD ["nginx", "-g", "daemon off;"]

Key Takeaways

With these concepts and techniques, you're now equipped to create custom Docker images tailored to your specific application needs!

Looking Ahead

In our next session, we'll dive deeper into building and running your own custom images. We'll apply what we've learned about Dockerfiles to create more complex applications and explore advanced techniques for optimizing and deploying them.

Discussion Questions

  1. How might you adapt the Flask Dockerfile we created to better suit a development environment? What about a production environment?
  2. What are the security implications of running containers as the root user? How would you modify a Dockerfile to run as a non-root user?
  3. How could multi-stage builds improve your application deployment workflow?
  4. What strategies would you use to minimize the size of your Docker images without sacrificing functionality?
  5. How would you adapt Dockerfile strategies for different types of applications (e.g., frontend vs. backend, static vs. dynamic)?

Additional Resources