Creating a Basic Dockerfile

Lecture Overview

In this session, we'll explore how to create your own Docker images by writing Dockerfiles. A Dockerfile is a text document containing instructions that Docker uses to automatically build an image. Understanding Dockerfiles is a crucial skill for containerizing applications effectively. By the end of this session, you'll be able to create custom Docker images tailored to your specific application needs.

What is a Dockerfile?

A Dockerfile is a plain text file named Dockerfile (with no file extension) that contains a series of instructions for Docker to build an image.

Key Concepts

Instructions: Commands that Docker executes during the build process
Base Image: The starting point for your image, usually a minimal operating system or runtime environment
Layers: Each instruction creates a new layer in the image
Build Context: The set of files and directories available during the build process

Analogy: A Dockerfile is like a recipe for baking a cake. It starts with base ingredients (the base image), follows a sequence of steps (instructions), and each step transforms the ingredients in some way. The final result is a complete, ready-to-use cake (Docker image). Just as a good baker might adapt a recipe for different occasions, you'll customize your Dockerfile for different applications.

Why Create Custom Images?

While public images are convenient, custom images offer several advantages:

Application-specific configuration: Tailored to your application's exact needs
Dependency management: Include only the libraries and tools your application requires
Consistency: Ensure development, testing, and production environments are identical
Automation: Streamline deployment by automating environment setup
Version control: Track changes to your environment alongside your code

Dockerfile Syntax and Structure

Dockerfiles use a simple, declarative syntax. Each line contains an instruction followed by arguments.

Basic Structure

# Comment
INSTRUCTION arguments

For example:

# Use Python 3.9 as base image
FROM python:3.9-slim

# Set working directory
WORKDIR /app

# Copy requirements file
COPY requirements.txt .

# Install dependencies
RUN pip install --no-cache-dir -r requirements.txt

# Copy application code
COPY . .

# Set environment variables
ENV PYTHONUNBUFFERED=1

# Command to run when container starts
CMD ["python", "app.py"]

Common Instructions

Instruction	Purpose
`FROM`	Sets the base image
`RUN`	Executes commands in the container during build
`COPY`	Copies files from the host to the container
`ADD`	Similar to COPY, but with additional features (URL support, auto-extraction)
`WORKDIR`	Sets the working directory for subsequent instructions
`ENV`	Sets environment variables
`EXPOSE`	Documents which ports the container listens on
`CMD`	Provides default command to run when container starts
`ENTRYPOINT`	Configures the container to run as an executable

Best practice: Keep your Dockerfile commands in logical order, typically following this pattern: base image, dependencies, application code, configuration, and finally the command to run.

Dockerfile Instructions in Detail

Let's explore each common instruction in more depth:

FROM

The FROM instruction initializes a new build stage and sets the base image. It's typically the first instruction in a Dockerfile.

FROM [--platform=<platform>] <image>[:<tag>] [AS <name>]

Examples:

FROM python:3.9-slim
FROM ubuntu:20.04
FROM node:14-alpine AS build-stage

Note: The FROM instruction must be the first non-comment instruction in the Dockerfile. You can have multiple FROM instructions in a single Dockerfile for multi-stage builds (an advanced technique we'll cover later).

WORKDIR

The WORKDIR instruction sets the working directory for any subsequent RUN, CMD, ENTRYPOINT, COPY, and ADD instructions.

WORKDIR /path/to/directory

Example:

WORKDIR /app

Best practice: Always use absolute paths with WORKDIR. If a specified directory doesn't exist, Docker will create it. Use WORKDIR instead of a series of RUN cd /path commands for clarity and reliability.

COPY and ADD

These instructions copy files from the build context to the image.

COPY [--chown=<user>:<group>] <src> <dest>
ADD [--chown=<user>:<group>] <src> <dest>

Examples:

COPY requirements.txt .
COPY . /app/
ADD https://example.com/file.tar.gz /tmp/

COPY vs ADD: COPY simply copies files, while ADD has additional features like URL support and automatic tar extraction. Generally, use COPY unless you specifically need ADD's features, as COPY is more explicit.

RUN

The RUN instruction executes commands in a new layer and commits the results.

RUN <command>
RUN ["executable", "param1", "param2"]

Examples:

RUN apt-get update && apt-get install -y curl
RUN pip install --no-cache-dir -r requirements.txt
RUN ["bash", "-c", "echo $HOME"]

Best practice: Chain related commands in a single RUN instruction using && to reduce the number of layers. For package installations, include cleanup steps in the same RUN instruction to keep the image size down.

ENV

The ENV instruction sets environment variables that persist when a container is run.

ENV <key>=<value> ...

Examples:

ENV PYTHONUNBUFFERED=1
ENV NODE_ENV=production PORT=3000

Note: Environment variables set with ENV are available to containers created from the image, not just during the build process. They can be overridden at runtime with docker run -e.

EXPOSE

The EXPOSE instruction informs Docker that the container listens on specific network ports at runtime.

EXPOSE <port> [<port>/<protocol>...]

Examples:

EXPOSE 8080
EXPOSE 80/tcp 443/tcp

Note: EXPOSE doesn't actually publish the port. It functions as documentation between the person who builds the image and the person who runs the container. To actually publish the port when running the container, use docker run -p.

CMD

The CMD instruction provides default commands for an executing container.

CMD ["executable", "param1", "param2"]
CMD command param1 param2
CMD ["param1", "param2"] (as default parameters to ENTRYPOINT)

Examples:

CMD ["python", "app.py"]
CMD nginx -g "daemon off;"

Note: There can only be one CMD instruction in a Dockerfile. If multiple are specified, only the last one takes effect. The CMD can be overridden by specifying a command when running the container with docker run image command.

ENTRYPOINT

The ENTRYPOINT instruction configures the container to run as an executable.

ENTRYPOINT ["executable", "param1", "param2"]
ENTRYPOINT command param1 param2

Examples:

ENTRYPOINT ["python", "app.py"]
ENTRYPOINT ["docker-entrypoint.sh"]

ENTRYPOINT vs CMD: ENTRYPOINT specifies the executable to run, while CMD provides default arguments that can be overridden. They're often used together, with ENTRYPOINT providing the command and CMD providing default arguments.

Other Instructions

USER: Sets the user name or UID for subsequent instructions
VOLUME: Creates a mount point for external volumes
ARG: Defines build-time variables that can be passed during build
LABEL: Adds metadata to the image
HEALTHCHECK: Tells Docker how to test if the container is still working

Creating Your First Dockerfile

Let's create a basic Dockerfile for a Python web application using Flask.

Step 1: Create the Application Files

First, create a new directory for your project and set up the application files:

mkdir flask_docker_demo
cd flask_docker_demo

Create a file named app.py with the following content:

from flask import Flask
app = Flask(__name__)

@app.route('/')
def hello():
    return "Hello, Docker!"

if __name__ == "__main__":
    app.run(host="0.0.0.0", port=5000)

Create a file named requirements.txt with the following content:

flask==2.0.1

Step 2: Create the Dockerfile

Now, create a file named Dockerfile (with no extension) in the same directory:

# Use Python 3.9 slim as the base image
FROM python:3.9-slim

# Set the working directory in the container
WORKDIR /app

# Copy the requirements file into the container
COPY requirements.txt .

# Install the dependencies
RUN pip install --no-cache-dir -r requirements.txt

# Copy the rest of the application code
COPY . .

# Tell Docker that the container listens on port 5000
EXPOSE 5000

# Set environment variables
ENV FLASK_APP=app.py
ENV FLASK_ENV=production
ENV PYTHONUNBUFFERED=1

# Command to run when the container starts
CMD ["python", "app.py"]

Let's break down what each instruction does:

FROM python:3.9-slim: Start with a slim Python 3.9 image
WORKDIR /app: Set the working directory to /app
COPY requirements.txt .: Copy only the requirements file first
RUN pip install...: Install the Python dependencies
COPY . .: Copy the rest of the application code
EXPOSE 5000: Document that the container listens on port 5000
ENV...: Set environment variables for Flask
CMD ["python", "app.py"]: Run the application when the container starts

Step 3: Build the Docker Image

Now, let's build the Docker image from our Dockerfile:

docker build -t flask-demo .

The -t flag tags the image with a name, and the . specifies the build context (the current directory).

You should see output showing the build progress, with each step corresponding to an instruction in the Dockerfile:

Sending build context to Docker daemon  4.096kB
Step 1/8 : FROM python:3.9-slim
 ---> 8c705081f50d
Step 2/8 : WORKDIR /app
 ---> Running in 6ce8e46488fd
Removing intermediate container 6ce8e46488fd
 ---> 9d53d0d5b1fd
Step 3/8 : COPY requirements.txt .
 ---> 7b90bead6fed
...
Successfully built f8b2f81f83a
Successfully tagged flask-demo:latest

Step 4: Run the Container

Now we can run a container from our newly built image:

docker run -p 5000:5000 --name my-flask-app flask-demo

This command starts a container named my-flask-app from the flask-demo image and maps port 5000 from the container to port 5000 on the host.

You should be able to access your Flask application by opening http://localhost:5000 in your web browser, which should display "Hello, Docker!"

Analogy: The process we just went through is like creating a specialized kitchen (the Docker image) designed specifically for making a particular dish (our Flask application). We started with a basic kitchen (the base image), added the tools and ingredients we need (dependencies), and provided the recipe (application code). Now, whenever we want to make that dish, we can simply "turn on" our specialized kitchen (run a container) and it's ready to go - no setup required!

Dockerfile Best Practices

Layer Optimization

Each instruction in a Dockerfile creates a new layer. To optimize your images:

Combine related commands in a single RUN instruction to reduce layers
Clean up in the same layer where you create files (e.g., remove package manager caches)
Use multi-stage builds for complex applications (more on this later)

Before optimization:

RUN apt-get update
RUN apt-get install -y curl
RUN apt-get install -y nginx
RUN apt-get clean

After optimization:

RUN apt-get update && \
    apt-get install -y curl nginx && \
    apt-get clean && \
    rm -rf /var/lib/apt/lists/*

Leveraging Build Cache

Docker caches the results of each instruction to speed up subsequent builds. To make the most of caching:

Order instructions from least to most likely to change
Copy dependency files (like requirements.txt) separately before copying the rest of the code
Be aware that cache invalidation causes all subsequent instructions to be re-executed

Poor caching strategy:

COPY . /app
RUN pip install -r requirements.txt

Better caching strategy:

COPY requirements.txt /app/
RUN pip install -r requirements.txt
COPY . /app

This way, changing your application code doesn't trigger a reinstallation of all dependencies.

Security Best Practices

Use specific version tags for base images, not latest
Run containers as non-root users when possible
Minimize the number of installed packages to reduce the attack surface
Don't store secrets in your Dockerfile (use environment variables or secret management tools)
Scan your images for vulnerabilities before deployment

Running as a non-root user:

# Create a non-root user
RUN adduser --disabled-password --gecos "" appuser

# Switch to non-root user
USER appuser

# Make sure the user owns the application directory
COPY --chown=appuser:appuser . /app

General Best Practices

Use .dockerignore files to exclude unnecessary files from the build context
Keep images small by removing unnecessary files and using smaller base images
Document your image with comments and LABEL instructions
Test your Dockerfile in different environments before production use
Prefer COPY over ADD unless you need the additional features

Example .dockerignore file:

.git
.gitignore
.env
__pycache__
*.pyc
*.pyo
*.pyd
.Python
env/
venv/
node_modules/
npm-debug.log

Advanced Dockerfile Techniques

Multi-stage Builds

Multi-stage builds allow you to use multiple FROM statements in your Dockerfile. Each FROM instruction starts a new build stage, and you can selectively copy artifacts from one stage to another, leaving behind everything you don't need in the final image.

Example: Building a React frontend with Node.js, then serving it with NGINX:

# Build stage
FROM node:14 AS build
WORKDIR /app
COPY package*.json ./
RUN npm install
COPY . .
RUN npm run build

# Production stage
FROM nginx:alpine
COPY --from=build /app/build /usr/share/nginx/html
EXPOSE 80
CMD ["nginx", "-g", "daemon off;"]

This Dockerfile uses two stages: first, it builds the React application using Node.js, then it copies only the built files to an NGINX image for serving. The final image contains only NGINX and the built frontend files, not Node.js or the development dependencies.

ARG and Build-time Variables

The ARG instruction defines variables that can be passed at build time with the --build-arg flag.

Example: Configurable Python version:

# Define build arguments
ARG PYTHON_VERSION=3.9

# Use the argument in the FROM instruction
FROM python:${PYTHON_VERSION}-slim

# Later, you could build with a different version:
# docker build --build-arg PYTHON_VERSION=3.10 -t myapp .

Using HEALTHCHECK

The HEALTHCHECK instruction tells Docker how to test that the container is still working properly.

Example: Healthcheck for a web server:

FROM nginx:alpine
COPY index.html /usr/share/nginx/html/
EXPOSE 80
HEALTHCHECK --interval=30s --timeout=3s \
  CMD wget --no-verbose --tries=1 --spider http://localhost/ || exit 1
CMD ["nginx", "-g", "daemon off;"]

This Dockerfile includes a health check that tries to fetch the root page every 30 seconds. If the command fails, the container is considered unhealthy.

Environment-specific Dockerfiles

For applications that need different configurations in development and production, you can create multiple Dockerfiles or use build arguments to switch configurations.

Example: Development vs. Production:

# Dockerfile.dev
FROM python:3.9-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY . .
ENV FLASK_ENV=development
CMD ["flask", "run", "--host=0.0.0.0"]

# Dockerfile.prod
FROM python:3.9-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY . .
ENV FLASK_ENV=production
RUN python -m compileall .
CMD ["gunicorn", "app:app", "-b", "0.0.0.0:5000"]

You would build these with:

docker build -f Dockerfile.dev -t myapp:dev .
docker build -f Dockerfile.prod -t myapp:prod .

Practical Examples for Different Use Cases

Python Web Application (Django)

# Use Python 3.9 as base image
FROM python:3.9-slim

# Set environment variables
ENV PYTHONDONTWRITEBYTECODE=1
ENV PYTHONUNBUFFERED=1

# Set work directory
WORKDIR /app

# Install dependencies
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

# Copy project
COPY . .

# Run migrations and start server
CMD ["sh", "-c", "python manage.py migrate && python manage.py runserver 0.0.0.0:8000"]

Node.js Application

# Use Node.js 16 as base image
FROM node:16-alpine

# Create app directory
WORKDIR /usr/src/app

# Install app dependencies
COPY package*.json ./
RUN npm install

# Bundle app source
COPY . .

# Expose port
EXPOSE 3000

# Start application
CMD ["node", "server.js"]

Go Application

# Build stage
FROM golang:1.17 AS build

WORKDIR /app

COPY go.mod go.sum ./
RUN go mod download

COPY . .
RUN CGO_ENABLED=0 GOOS=linux go build -a -installsuffix cgo -o app .

# Final stage
FROM alpine:3.14

WORKDIR /root/

COPY --from=build /app/app .

EXPOSE 8080

CMD ["./app"]

Java Spring Boot Application

# Build stage
FROM maven:3.8.3-openjdk-17 AS build
WORKDIR /app
COPY pom.xml .
RUN mvn dependency:go-offline
COPY src/ /app/src/
RUN mvn package -DskipTests

# Final stage
FROM openjdk:17-jdk-slim
WORKDIR /app
COPY --from=build /app/target/*.jar app.jar
EXPOSE 8080
ENTRYPOINT ["java", "-jar", "app.jar"]

Common Issues and Troubleshooting

Build Context Errors

If Docker complains about files not being found during COPY or ADD operations, check:

The path is correct relative to the build context
The file exists and has the right permissions
The file isn't excluded by a .dockerignore rule

Layer Caching Issues

If changes to your code aren't being reflected in builds:

Use docker build --no-cache to force a complete rebuild
Check that you're copying files in the right order for optimal caching
Ensure you're not copying generated files that might be out of date

Permission Problems

If your container has permission issues:

Check that you're running as the right user (USER instruction)
Ensure file permissions are set correctly when copying files
Use --chown with COPY to set ownership

Container Won't Start

If your container exits immediately after starting:

Check that your CMD or ENTRYPOINT is correct
Try running with docker run -it --rm image bash to get a shell inside
Check logs with docker logs container_id
Ensure your application handles signals properly (especially SIGTERM)

Debugging Dockerfiles

To troubleshoot Dockerfile issues:

Use intermediate containers to debug:

# Find the last successful layer
docker build -t debug-image . || true
# Start a container from that layer
docker run -it debug-image bash

Add RUN ls -la commands to check the state at various points
Use docker history image_name to see layer sizes and commands

Building and Publishing Your Image

Building with Tags

To build an image with a specific tag:

docker build -t username/repository:tag .

For example:

docker build -t johndoe/flask-app:1.0 .

Publishing to Docker Hub

To share your image on Docker Hub:

Log in to Docker Hub:
```
docker login
```
Push your image:
```
docker push username/repository:tag
```

Building for Multiple Architectures

To build for multiple CPU architectures (like amd64 and arm64):

docker buildx create --name mybuilder --use
docker buildx build --platform linux/amd64,linux/arm64 -t username/repository:tag --push .

Note: buildx is Docker's experimental builder with multi-architecture support. You may need to enable experimental features in Docker Desktop settings.

Dockerfile Exercises

Exercise 1: Basic Static Website

Create a Dockerfile for a simple static website served by NGINX:

Create an index.html file with some content
Write a Dockerfile that:
- Uses NGINX as the base image
- Copies your HTML file to the right location
- Exposes port 80
Build and run the image
Access the website at http://localhost:80

Solution:

# Dockerfile
FROM nginx:alpine
COPY index.html /usr/share/nginx/html/
EXPOSE 80
CMD ["nginx", "-g", "daemon off;"]

# Build and run
docker build -t static-site .
docker run -p 80:80 static-site

Exercise 2: Python Data Science Environment

Create a Dockerfile for a Python data science environment:

Write a Dockerfile that:
- Uses Python 3.9 as the base image
- Installs common data science packages (numpy, pandas, matplotlib)
- Sets up a working directory
- Starts a Python shell by default
Build and run the image with a volume mount for your data

Solution:

# Dockerfile
FROM python:3.9-slim

RUN pip install --no-cache-dir numpy pandas matplotlib jupyter

WORKDIR /data

CMD ["python"]

# Build and run
docker build -t datascience .
docker run -it -v $(pwd):/data datascience

Exercise 3: Multi-stage Frontend Build

Create a multi-stage Dockerfile for a React application:

Write a Dockerfile that:
- Uses Node.js to build the React application in the first stage
- Uses NGINX to serve the built files in the second stage
- Results in a small final image

Solution:

# Dockerfile
# Build stage
FROM node:14 AS build

WORKDIR /app

# Install dependencies
COPY package*.json ./
RUN npm install

# Copy and build app
COPY . .
RUN npm run build

# Production stage
FROM nginx:alpine

# Copy built files from build stage
COPY --from=build /app/build /usr/share/nginx/html

# Expose port
EXPOSE 80

# Start NGINX
CMD ["nginx", "-g", "daemon off;"]

Key Takeaways

A Dockerfile is a text file containing instructions for building a Docker image
Common instructions include FROM, RUN, COPY, WORKDIR, ENV, EXPOSE, and CMD
Each instruction creates a layer in the image, which affects caching and image size
Optimizing Dockerfiles involves minimizing layers, leveraging caching, and reducing image size
Multi-stage builds allow you to create smaller, more efficient images
Best practices include running as non-root, using specific tags, and not storing secrets in images
Different applications may require different Dockerfile strategies
Dockerfiles can be customized for different environments and use cases

With these concepts and techniques, you're now equipped to create custom Docker images tailored to your specific application needs!

Looking Ahead

In our next session, we'll dive deeper into building and running your own custom images. We'll apply what we've learned about Dockerfiles to create more complex applications and explore advanced techniques for optimizing and deploying them.

Discussion Questions

How might you adapt the Flask Dockerfile we created to better suit a development environment? What about a production environment?
What are the security implications of running containers as the root user? How would you modify a Dockerfile to run as a non-root user?
How could multi-stage builds improve your application deployment workflow?
What strategies would you use to minimize the size of your Docker images without sacrificing functionality?
How would you adapt Dockerfile strategies for different types of applications (e.g., frontend vs. backend, static vs. dynamic)?

Additional Resources

Dockerfile Reference - Official documentation for Dockerfile instructions
Dockerfile Best Practices - Docker's official best practice guide
Docker Java Labs - Examples for Java applications
Python Speed Docker Guide - Specialized guide for Python in Docker
Docker Build Command Reference - Detailed information on build options