Python in Docker Containers

Week 2: Python Fundamentals - Docker Containerization Deep Dive

Session Overview

Welcome to our deep dive into Python and Docker! Today, we'll explore how containerization revolutionizes Python development and deployment. We'll learn why Docker has become essential in modern Python workflows, how to set up Python environments in containers, and best practices for Python-based Docker applications.

Understanding Docker and Containerization

Before diving into Python-specific aspects, let's ensure we have a solid understanding of what Docker provides:

What is Docker?

Docker is a platform that packages applications and their dependencies into standardized units called containers. These containers are isolated, lightweight, and contain everything needed to run an application, including code, runtime, system tools, and libraries.

Container vs. Virtual Machine

Containers are often compared to virtual machines, but they operate differently:

This makes containers significantly lighter and faster to start than VMs, while still providing strong isolation.

Analogy: Apartments vs. Houses

Think of the difference between containers and VMs like apartments versus houses:

  • Containers (Apartments) share core infrastructure (foundation, plumbing, electrical systems) but have their own private living spaces
  • VMs (Houses) have completely independent infrastructure, making them larger and more resource-intensive

Containers are more efficient when you need many isolated environments that can share core resources.

Why Use Docker for Python Development?

Docker solves several persistent challenges in Python development:

The "Works on My Machine" Problem

One of the most common issues in software development is code that runs perfectly on one machine but fails on another. This occurs because:

Docker containers package the entire runtime environment, ensuring consistent behavior across development, testing, and production.

Python-Specific Benefits

Real-World Example: Data Science Workflows

Data scientists often face the "dependency nightmare" when collaborating on models. One team member might use NumPy 1.18 with Python 3.7, while another uses NumPy 1.20 with Python 3.9, leading to subtle bugs. With Docker, they can specify:

FROM python:3.9-slim

WORKDIR /app

COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

COPY . .

CMD ["python", "model_training.py"]

This ensures that everyone uses exactly the same environment, eliminating inconsistencies.

Analogy: Docker as Recipe Boxes

Think of Docker containers like recipe boxes that include not just the recipe (your code) but also all the ingredients (dependencies), cooking tools (runtime), and even the cooking environment (system libraries):

  • Without Docker: "Make this cake" but everyone has different flour, ovens at different temperatures, and varying measuring cups
  • With Docker: "Here's a complete box with the exact ingredients, tools, and instructions" - ensuring the same cake every time

Getting Started with Python in Docker

Official Python Docker Images

The Python team maintains official Docker images available on Docker Hub. These images come in several variants:

Running Python in a Docker Container

Let's start with the simplest possible example - running a Python interpreter in a container:

# Pull the Python image (if not already present)
docker pull python:3.10

# Run an interactive Python shell
docker run -it python:3.10

# You should now see the Python REPL
>>> print("Hello from containerized Python!")
Hello from containerized Python!
>>> exit()

Running a Python Script in a Container

Create a file named hello_docker.py with the following content:

import platform
import sys

print("Hello from Python in Docker!")
print(f"Python version: {sys.version}")
print(f"Platform: {platform.platform()}")

Now run this script in a container:

# Assuming hello_docker.py is in your current directory
docker run -v "$(pwd):/app" -w /app python:3.10 python hello_docker.py

This command:

Creating a Python Dockerfile

While running commands directly is useful for simple cases, most projects need a custom Docker image. This is done by creating a Dockerfile.

Basic Python Dockerfile

Create a file named Dockerfile (no extension) in your project directory:

# Use an official Python runtime as a parent image
FROM python:3.10-slim

# Set the working directory in the container
WORKDIR /app

# Copy the current directory contents into the container at /app
COPY . /app/

# Install any needed packages specified in requirements.txt
COPY requirements.txt /app/
RUN pip install --no-cache-dir -r requirements.txt

# Make port 5000 available to the world outside this container
EXPOSE 5000

# Define environment variable
ENV NAME World

# Run app.py when the container launches
CMD ["python", "app.py"]

Create a simple app.py:

import os
from flask import Flask

app = Flask(__name__)

@app.route('/')
def hello():
    name = os.environ.get('NAME', 'World')
    return f'Hello, {name}!'

if __name__ == '__main__':
    app.run(host='0.0.0.0', port=5000)

And a requirements.txt file:

flask==2.0.1

Building and Running the Docker Image

# Build the image
docker build -t my-python-app .

# Run the container
docker run -p 5000:5000 my-python-app

Your Flask application should now be running at http://localhost:5000

Best Practices for Python Dockerfiles

  1. Use Specific Versions: Always specify exact versions in requirements.txt (e.g., flask==2.0.1 not just flask)
  2. Layer Caching: Copy and install requirements before copying the rest of the code to leverage Docker's caching mechanism
  3. Non-Root User: For production, run as a non-root user for security
  4. Multi-Stage Builds: Use for compiling extensions or reducing image size
  5. Environment Variables: Use ENV for configuration that can change

Here's an improved version of our Dockerfile with these practices:

FROM python:3.10-slim

# Set environment variables
ENV PYTHONDONTWRITEBYTECODE 1
ENV PYTHONUNBUFFERED 1
ENV PIP_NO_CACHE_DIR 1

# Create a non-root user
RUN useradd -m appuser

# Set the working directory
WORKDIR /app

# Install dependencies
COPY requirements.txt .
RUN pip install -r requirements.txt

# Copy the application
COPY . .

# Change ownership to non-root user
RUN chown -R appuser:appuser /app

# Switch to non-root user
USER appuser

# Expose port
EXPOSE 5000

# Run the application
CMD ["python", "app.py"]

Python with Docker Compose

For applications with multiple services (e.g., web server, database, cache), Docker Compose simplifies management.

Creating a Docker Compose File

Create a file named docker_compose.yml:

version: '3'

services:
  web:
    build: .
    ports:
      - "5000:5000"
    volumes:
      - .:/app
    environment:
      - DATABASE_URL=postgresql://postgres:postgres@db:5432/postgres
    depends_on:
      - db
  
  db:
    image: postgres:13
    volumes:
      - postgres_data:/var/lib/postgresql/data
    environment:
      - POSTGRES_PASSWORD=postgres
      - POSTGRES_USER=postgres
      - POSTGRES_DB=postgres
    ports:
      - "5432:5432"

volumes:
  postgres_data:

Using Docker Compose

# Start all services
docker-compose up

# Start in detached mode
docker-compose up -d

# Stop all services
docker-compose down

# Rebuild images and start
docker-compose up --build

Real-World Example: Python Web Application Stack

A production-ready Python web application might include:

version: '3'

services:
  nginx:
    image: nginx:latest
    ports:
      - "80:80"
    volumes:
      - ./nginx/conf.d:/etc/nginx/conf.d
      - static_volume:/app/static
      - media_volume:/app/media
    depends_on:
      - web

  web:
    build: .
    command: gunicorn myapp.wsgi:application --bind 0.0.0.0:8000
    volumes:
      - .:/app
      - static_volume:/app/static
      - media_volume:/app/media
    environment:
      - DATABASE_URL=postgresql://postgres:postgres@db:5432/postgres
      - REDIS_URL=redis://redis:6379/0
    depends_on:
      - db
      - redis

  db:
    image: postgres:13
    volumes:
      - postgres_data:/var/lib/postgresql/data
    environment:
      - POSTGRES_PASSWORD=postgres
      - POSTGRES_USER=postgres
      - POSTGRES_DB=postgres

  redis:
    image: redis:6

volumes:
  postgres_data:
  static_volume:
  media_volume:

This setup includes:

Analogy: Docker Compose as an Orchestra Conductor

If individual Docker containers are like musicians, Docker Compose is like an orchestra conductor:

  • Each container (musician) knows how to play its own part
  • Docker Compose (conductor) coordinates when each starts, stops, and how they work together
  • The conductor ensures everyone is playing in the right order and at the right time
  • The score (docker-compose.yml) defines exactly how everything should work together

Just as a conductor makes it easier to manage a complex orchestra, Docker Compose makes it easier to manage complex multi-container applications.

Development Workflow with Python and Docker

Hot Reloading for Development

One challenge when developing with Docker is seeing code changes reflected immediately. Here's how to set up hot reloading:

version: '3'

services:
  web:
    build: .
    command: python -m flask run --host=0.0.0.0 --port=5000
    volumes:
      - .:/app
    ports:
      - "5000:5000"
    environment:
      - FLASK_ENV=development
      - FLASK_APP=app.py

With this setup:

  1. The local directory is mounted into the container
  2. Flask's development server automatically reloads when files change
  3. Changes made on your host machine are immediately reflected

Debugging Python in Docker

For debugging, you can:

  1. Use simple print statements
  2. Mount debugger configurations
  3. Use remote debugging

For pdb/debugpy setup:

# In your code
import debugpy

# Enable debugger attachment
debugpy.listen(("0.0.0.0", 5678))
print("Waiting for debugger to attach...")
debugpy.wait_for_client()

And in your docker-compose.yml:

services:
  web:
    # ...
    ports:
      - "5000:5000"
      - "5678:5678"  # Debugger port

Testing in Docker

Creating a separate service for testing ensures isolation:

services:
  web:
    # ... your web service config

  test:
    build: .
    command: pytest
    volumes:
      - .:/app
    environment:
      - DATABASE_URL=postgresql://postgres:postgres@db:5432/test_db
    depends_on:
      - db

Run tests with:

docker-compose run test

Advanced Python Docker Patterns

Multi-Stage Builds for Python Applications

Multi-stage builds can significantly reduce image size, especially for applications with build dependencies:

# Build stage
FROM python:3.10 AS builder

WORKDIR /app

COPY requirements.txt .
RUN pip wheel --no-cache-dir --no-deps --wheel-dir /app/wheels -r requirements.txt

# Final stage
FROM python:3.10-slim

WORKDIR /app

# Copy built wheels from builder stage
COPY --from=builder /app/wheels /wheels
COPY --from=builder /app/requirements.txt .

# Install packages from wheels
RUN pip install --no-cache /wheels/*

COPY . .

CMD ["python", "app.py"]

Python Applications with C Extensions

Many Python packages (NumPy, Pandas, etc.) require C compilation. Ensure your Docker image includes necessary build tools:

FROM python:3.10

# Install build dependencies
RUN apt-get update && apt-get install -y \
    build-essential \
    python3-dev \
    && rm -rf /var/lib/apt/lists/*

WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

COPY . .
CMD ["python", "app.py"]

Using Alpine-based Images

Alpine-based images are much smaller but require special handling:

FROM python:3.10-alpine

# Install build dependencies
RUN apk add --no-cache \
    gcc \
    musl-dev \
    python3-dev

WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

COPY . .
CMD ["python", "app.py"]

Be aware that Alpine uses musl libc instead of glibc, which can cause subtle compatibility issues with some Python packages.

Production Optimizations

FROM python:3.10-slim

ENV PYTHONDONTWRITEBYTECODE=1 \
    PYTHONUNBUFFERED=1 \
    PIP_NO_CACHE_DIR=1

# Create user
RUN useradd -m appuser

WORKDIR /app

COPY requirements.txt .
RUN pip install -r requirements.txt

COPY . .
RUN chown -R appuser:appuser /app

USER appuser

# Add health check
HEALTHCHECK --interval=30s --timeout=30s --start-period=5s --retries=3 \
  CMD curl -f http://localhost:5000/health || exit 1

EXPOSE 5000
CMD ["python", "app.py"]

Real-World Python Docker Applications

Data Science and Machine Learning

Data science workflows benefit greatly from containerization:

FROM python:3.10

WORKDIR /app

# Install dependencies
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

# Install Jupyter
RUN pip install jupyterlab

# Copy project files
COPY . .

# Expose Jupyter port
EXPOSE 8888

# Start Jupyter
CMD ["jupyter", "lab", "--ip=0.0.0.0", "--port=8888", "--no-browser", "--allow-root", "--NotebookApp.token=''"]

With Docker Compose, you can create a complete data science environment:

version: '3'

services:
  jupyter:
    build: .
    ports:
      - "8888:8888"
    volumes:
      - .:/app
      - ./data:/app/data
  
  postgres:
    image: postgres:13
    environment:
      - POSTGRES_PASSWORD=postgres
    volumes:
      - postgres_data:/var/lib/postgresql/data
  
  mlflow:
    image: ghcr.io/mlflow/mlflow:v2.1.1
    ports:
      - "5000:5000"
    volumes:
      - ./mlruns:/mlruns
    command: mlflow server --host 0.0.0.0

volumes:
  postgres_data:

Django Web Applications

Django projects often include multiple services:

version: '3'

services:
  web:
    build: .
    command: gunicorn myproject.wsgi:application --bind 0.0.0.0:8000
    volumes:
      - .:/app
      - static_volume:/app/static
      - media_volume:/app/media
    ports:
      - "8000:8000"
    environment:
      - DATABASE_URL=postgres://postgres:postgres@db:5432/postgres
      - CELERY_BROKER_URL=redis://redis:6379/0
    depends_on:
      - db
      - redis
  
  db:
    image: postgres:13
    volumes:
      - postgres_data:/var/lib/postgresql/data
    environment:
      - POSTGRES_PASSWORD=postgres
  
  redis:
    image: redis:6
  
  celery:
    build: .
    command: celery -A myproject worker -l info
    volumes:
      - .:/app
    environment:
      - DATABASE_URL=postgres://postgres:postgres@db:5432/postgres
      - CELERY_BROKER_URL=redis://redis:6379/0
    depends_on:
      - web
      - db
      - redis

volumes:
  postgres_data:
  static_volume:
  media_volume:

Microservices with Python

In a microservices architecture, each service can be its own Python container:

version: '3'

services:
  auth_service:
    build: ./auth_service
    ports:
      - "5000:5000"
    environment:
      - DATABASE_URL=postgres://postgres:postgres@db:5432/auth_db
  
  product_service:
    build: ./product_service
    ports:
      - "5001:5000"
    environment:
      - DATABASE_URL=postgres://postgres:postgres@db:5432/product_db
      - AUTH_SERVICE_URL=http://auth_service:5000
  
  order_service:
    build: ./order_service
    ports:
      - "5002:5000"
    environment:
      - DATABASE_URL=postgres://postgres:postgres@db:5432/order_db
      - PRODUCT_SERVICE_URL=http://product_service:5000
      - AUTH_SERVICE_URL=http://auth_service:5000
  
  db:
    image: postgres:13
    volumes:
      - postgres_data:/var/lib/postgresql/data
    environment:
      - POSTGRES_PASSWORD=postgres

volumes:
  postgres_data:

Analogy: Microservices as Specialized Shops

Think of a microservices architecture like a shopping center with specialized stores instead of a single department store:

  • Each shop (microservice) specializes in one thing and does it well
  • Shops can be updated or replaced individually without affecting others
  • Shops communicate with each other when necessary (e.g., the tailor might send customers to the shoe store)
  • The shopping center can add new stores or remove underperforming ones easily

Docker makes it easy to manage all these specialized services, ensuring they can work together while remaining independently maintainable.

Verifying Python in Docker

Core Verification Steps

When setting up Python in Docker, verify the following:

  1. Python Version: Ensure the container uses the expected Python version
  2. Package Installation: Verify dependencies are correctly installed
  3. Environment Variables: Check that environment variables are properly set
  4. Filesystem Access: Confirm volumes are mounted correctly
  5. Network Connectivity: Test that services can communicate

Verification Script

Create a file named verify_environment.py:

#!/usr/bin/env python3
import sys
import os
import platform
import subprocess
import importlib.util

def verify_python_version():
    print(f"Python version: {sys.version}")
    print(f"Python executable: {sys.executable}")
    print(f"Platform: {platform.platform()}")

def verify_packages(required_packages):
    print("\nPackage verification:")
    for package in required_packages:
        try:
            spec = importlib.util.find_spec(package)
            if spec is None:
                print(f"❌ {package} is NOT installed")
            else:
                module = importlib.import_module(package)
                version = getattr(module, '__version__', 'unknown')
                print(f"✅ {package} is installed (version: {version})")
        except ImportError:
            print(f"❌ {package} is NOT installed")

def verify_environment_variables(required_vars):
    print("\nEnvironment variables:")
    for var in required_vars:
        value = os.environ.get(var)
        if value:
            print(f"✅ {var} is set to: {value}")
        else:
            print(f"❌ {var} is NOT set")

def verify_filesystem_access(paths):
    print("\nFilesystem access:")
    for path in paths:
        if os.path.exists(path):
            print(f"✅ {path} exists and is accessible")
            if os.path.isdir(path):
                try:
                    test_file = os.path.join(path, 'test_write.txt')
                    with open(test_file, 'w') as f:
                        f.write('test')
                    os.remove(test_file)
                    print(f"✅ {path} is writable")
                except Exception as e:
                    print(f"❌ {path} is NOT writable: {e}")
        else:
            print(f"❌ {path} does NOT exist or is NOT accessible")

def verify_network_connectivity(endpoints):
    print("\nNetwork connectivity:")
    for endpoint in endpoints:
        try:
            result = subprocess.run(['curl', '-s', '-o', '/dev/null', '-w', '%{http_code}', endpoint], 
                                   capture_output=True, text=True, timeout=5)
            status = result.stdout.strip()
            if status.startswith('2') or status.startswith('3'):
                print(f"✅ {endpoint} is reachable (status: {status})")
            else:
                print(f"❌ {endpoint} returned status: {status}")
        except subprocess.SubprocessError as e:
            print(f"❌ {endpoint} is NOT reachable: {e}")

if __name__ == "__main__":
    verify_python_version()
    
    # Customize these lists for your application
    verify_packages(['flask', 'requests', 'sqlalchemy', 'numpy'])
    verify_environment_variables(['DATABASE_URL', 'FLASK_ENV'])
    verify_filesystem_access(['/app', '/app/data', '/tmp'])
    verify_network_connectivity(['http://localhost:5000', 'http://db:5432', 'https://pypi.org'])

Run this script in your container:

docker run -it --rm myapp python verify_environment.py

Using Docker Compose for Verification

Add a verification service to your docker-compose.yml:

services:
  # ... your existing services
  
  verify:
    build: .
    command: python verify_environment.py
    depends_on:
      - web
      - db
    environment:
      - DATABASE_URL=postgresql://postgres:postgres@db:5432/postgres
      - FLASK_ENV=development

Run it with:

docker-compose run verify

Troubleshooting Python in Docker

Common Issues and Solutions

Issue Possible Causes Solutions
Module Not Found Error
  • Package not in requirements.txt
  • Build failed silently
  • Update requirements.txt
  • Rebuild with --no-cache
Permission Denied
  • Volume mount permissions
  • Running as non-root user
  • Fix host permissions
  • Use chown in Dockerfile
Connection Refused
  • Service not ready
  • Incorrect port mapping
  • Binding to 127.0.0.1 instead of 0.0.0.0
  • Use wait-for scripts
  • Check port mappings
  • Bind to 0.0.0.0
Memory Errors
  • Container memory limits
  • Large data processing
  • Increase container memory
  • Optimize code for memory usage

Debugging Commands

# View container logs
docker logs container_name

# Enter a running container
docker exec -it container_name bash

# Inspect a container
docker inspect container_name

# Check resource usage
docker stats

# View networks
docker network ls

Interactive Debugging Session

To debug a failing container:

# Start the container with a different command
docker run -it --entrypoint=bash myapp

# Or for a failed container, commit its state to a new image and debug
docker commit failed_container debug_image
docker run -it --entrypoint=bash debug_image

Wrapping Up and Next Steps

Today we've covered the fundamentals of using Python in Docker containers, from basic setup to production-ready configurations. Docker containers have revolutionized Python development by providing consistent, isolated environments that solve many traditional deployment challenges.

Key Takeaways

Practice Exercises

  1. Create a Dockerfile for a simple Flask application
  2. Set up a development environment with code hot-reloading
  3. Build a multi-container application with Python, PostgreSQL, and Redis
  4. Implement the verification script to test your Docker environment
  5. Optimize your Docker image size using multi-stage builds

Additional Resources

In our next session, we'll build on these containerization concepts to explore how to effectively manage Python dependencies in Docker and implement best practices for production deployments.