Docker Hub and Public Images

Week 1, Wednesday - Afternoon Session

Lecture Overview

In this session, we'll explore Docker Hub — Docker's official registry for container images — and learn how to find, use, and work with public images. Rather than building every image from scratch, we can leverage the vast ecosystem of pre-built images to accelerate our development. By the end of this session, you'll understand how to find appropriate images, evaluate their quality and security, and incorporate them into your projects effectively.

Introduction to Docker Hub

Docker Hub is the world's largest library and community for container images. It serves as the default registry for Docker, meaning that when you run commands like docker pull python, Docker automatically looks for images on Docker Hub.

What is Docker Hub?

Analogy: Docker Hub is like a massive public library for container images. Just as a library catalogs books by different authors on different subjects, Docker Hub catalogs images from different providers for different applications. Some books are written by renowned authors (official images), some by community members (community images), and some are in special collections with restricted access (private repositories).

Key Concepts

Let's head over to Docker Hub and explore its features.

Accessing Docker Hub

You can access Docker Hub through your web browser at https://hub.docker.com/. When you visit, you'll see a search bar, featured content, and various categories of images.

While you can browse Docker Hub without an account, creating a free account allows you to:

Finding Images on Docker Hub

Docker Hub contains millions of images, so finding the right one is an important skill. Let's explore different methods for finding images.

Using the Docker CLI

You can search for images directly from your terminal using the docker search command:

docker search nginx

This returns a list of images related to the search term, along with information such as:

Example output:

NAME                              DESCRIPTION                                     STARS     OFFICIAL   AUTOMATED
nginx                             Official build of Nginx.                          16831     [OK]       
jwilder/nginx-proxy               Automated Nginx reverse proxy for docker con...   2122                 [OK]
richarvey/nginx-php-fpm           Container running Nginx + PHP-FPM capable of...   818                  [OK]
jc21/nginx-proxy-manager          Docker container for managing Nginx proxy ho...   248                  
...

While convenient, this method provides limited information. For more detailed information, it's better to use the Docker Hub website.

Using the Docker Hub Website

The Docker Hub website provides a more comprehensive search experience with more details about each image:

  1. Go to https://hub.docker.com/search
  2. Enter your search term in the search bar
  3. Use filters like "Official Images" or "Verified Publishers" to narrow results
  4. Sort results by relevance, stars, or recency

Understanding Search Results

When you search on Docker Hub, you'll see several pieces of information that help you evaluate an image:

Example search: If you search for "PostgreSQL" on Docker Hub, you'll see multiple results including:

  • postgres (Official Image): The official PostgreSQL image
  • bitnami/postgresql (Verified Publisher): Bitnami's PostgreSQL image
  • Various community images with specific configurations

Evaluating Image Quality and Security

Not all images are created equal. When selecting an image, you should consider several factors to ensure quality and security.

Image Trust Hierarchy

Docker Hub has a hierarchy of image trustworthiness:

  1. Official Images: Most trustworthy, maintained by Docker and upstream vendors
  2. Verified Publisher Images: Created by trusted partners with a verified badge
  3. Community Images: Created by individual users, varying in quality and security

Analogy: Think of this like medication sources. Official images are like FDA-approved medications from established pharmaceutical companies. Verified publisher images are like supplements from reputable brands with quality certifications. Community images range from carefully formulated products by knowledgeable herbalists to unknown substances mixed in someone's garage — they require more scrutiny before use.

Key Evaluation Criteria

When evaluating an image, consider:

Image Details Page

Clicking on an image in Docker Hub takes you to its details page, which provides:

Best Practice: Always prefer official images when available. They're maintained by Docker and the software vendors, follow best practices, are regularly updated for security, and provide clear documentation.

Understanding Image Tags

Tags are how Docker identifies specific versions of an image. They're crucial for reproducibility and stability in your projects.

What are Tags?

A tag is a label applied to an image in a repository, indicating a specific version or variant. When you pull an image without specifying a tag, Docker uses the latest tag by default.

Common Tagging Conventions

Many repositories follow these conventions:

Common Image Variants

Many official images offer different variants with varying tradeoffs:

Example tags for Python:

  • python:3.9 - Python 3.9 on Debian
  • python:3.9-slim - Smaller variant of Python 3.9
  • python:3.9-alpine - Python 3.9 on Alpine Linux (smallest)
  • python:3.9-windowsservercore - Python 3.9 on Windows Server Core

Viewing Available Tags

You can see all available tags for an image on its Docker Hub page. For example, for Python:

  1. Go to https://hub.docker.com/_/python
  2. Click on the "Tags" tab

You'll see a list of all available tags along with their size and architecture support.

Best Practice: Always use specific version tags in production environments, never latest. This ensures reproducibility and prevents unexpected changes when images are updated.

Pulling and Using Public Images

Now that we understand how to find and evaluate images, let's look at how to pull and use them effectively.

Pulling Images

To download an image from Docker Hub, use the docker pull command:

docker pull [repository]:[tag]

Examples:

# Pull the latest version of nginx
docker pull nginx

# Pull a specific version of Python
docker pull python:3.9-slim

# Pull PostgreSQL version 13 with Alpine Linux
docker pull postgres:13-alpine

The image will be downloaded to your local Docker environment, where it can be used to run containers.

Viewing Local Images

To see which images you have downloaded locally:

docker images

Output example:

REPOSITORY   TAG         IMAGE ID       CREATED       SIZE
nginx        latest      605c77e624dd   3 days ago   142MB
python       3.9-slim    8c7051081f50   5 days ago   124MB
postgres     13-alpine   87180a7e49e8   1 week ago   213MB

Running Containers from Images

Once you've pulled an image, you can run a container from it:

docker run [options] [repository]:[tag] [command]

If you haven't explicitly pulled the image, docker run will automatically pull it for you.

Example running containers from public images:

# Run an nginx web server
docker run -d -p 8080:80 --name my-nginx nginx

# Run a PostgreSQL database
docker run -d -p 5432:5432 \
  -e POSTGRES_PASSWORD=mysecretpassword \
  --name my-postgres \
  postgres:13-alpine

# Run a Python container with an interactive shell
docker run -it --rm python:3.9-slim python

Exploring Popular Official Images

Let's explore some of the most popular official images on Docker Hub and how they can be used in your projects.

NGINX

NGINX is a high-performance web server and reverse proxy.

Basic usage:

docker run -d -p 8080:80 --name webserver nginx

Serving custom content:

docker run -d -p 8080:80 -v $(pwd)/html:/usr/share/nginx/html nginx

Custom configuration:

docker run -d -p 8080:80 -v $(pwd)/nginx.conf:/etc/nginx/nginx.conf:ro nginx

Real-world application: You can use NGINX as a reverse proxy in front of your Python web applications. It can handle SSL termination, static file serving, and load balancing, allowing your application to focus on business logic.

PostgreSQL

PostgreSQL is a powerful, open-source relational database.

Basic usage:

docker run -d \
  -p 5432:5432 \
  -e POSTGRES_PASSWORD=mysecretpassword \
  --name my-postgres \
  postgres

Data persistence:

docker run -d \
  -p 5432:5432 \
  -e POSTGRES_PASSWORD=mysecretpassword \
  -e POSTGRES_USER=myuser \
  -e POSTGRES_DB=mydb \
  -v postgres_data:/var/lib/postgresql/data \
  --name my-postgres \
  postgres

Initialization scripts:

docker run -d \
  -p 5432:5432 \
  -e POSTGRES_PASSWORD=mysecretpassword \
  -v postgres_data:/var/lib/postgresql/data \
  -v $(pwd)/init.sql:/docker-entrypoint-initdb.d/init.sql \
  --name my-postgres \
  postgres

Real-world application: In a typical web application, you might use a PostgreSQL container for your database, connected to your Python application container. The database data can be persisted using a Docker volume.

Python

The official Python image provides Python runtimes.

Interactive Python shell:

docker run -it --rm python:3.9 python

Running a Python script:

docker run -it --rm -v $(pwd):/app -w /app python:3.9 python script.py

Running a Flask web application:

docker run -it --rm \
  -p 5000:5000 \
  -v $(pwd):/app \
  -w /app \
  -e FLASK_APP=app.py \
  -e FLASK_ENV=development \
  python:3.9-slim \
  sh -c "pip install -r requirements.txt && flask run --host=0.0.0.0"

Real-world application: The Python image is typically used as a base for creating your own application images. You would start with a Python base image, add your application code and dependencies, and build a custom image.

Redis

Redis is an in-memory data structure store, used as a database, cache, and message broker.

Basic usage:

docker run -d -p 6379:6379 --name my-redis redis

With persistence:

docker run -d \
  -p 6379:6379 \
  -v redis_data:/data \
  --name my-redis \
  redis redis-server --appendonly yes

With custom configuration:

docker run -d \
  -p 6379:6379 \
  -v $(pwd)/redis.conf:/usr/local/etc/redis/redis.conf \
  --name my-redis \
  redis redis-server /usr/local/etc/redis/redis.conf

Real-world application: Redis is commonly used alongside web applications for caching, session storage, and background job queues. For example, you might use Redis with Celery to handle asynchronous tasks in a Python web application.

Understanding Docker Image Size and Optimization

Image size is an important consideration for performance, network transfer times, and resource usage. Let's explore this aspect of public images.

Why Image Size Matters

Size Comparison of Different Variants

Let's compare the sizes of different Python image variants:

docker pull python:3.9
docker pull python:3.9-slim
docker pull python:3.9-alpine
docker images

You might see output like:

REPOSITORY   TAG         IMAGE ID       CREATED       SIZE
python       3.9         f88b2f81f83a   2 weeks ago   915MB
python       3.9-slim    8c705081f50d   2 weeks ago   124MB
python       3.9-alpine  d4d9c6317a1a   2 weeks ago   45MB

The size differences are substantial! But what are the tradeoffs?

Tradeoffs for Different Variants

Variant Pros Cons
Full (e.g., python:3.9)
  • Includes all build tools
  • Includes many system packages
  • Maximum compatibility
  • Very large size
  • Larger attack surface
  • Slower to download and start
Slim (e.g., python:3.9-slim)
  • Much smaller than full
  • Good compatibility
  • Based on Debian
  • Missing some build tools
  • May need additional packages
Alpine (e.g., python:3.9-alpine)
  • Extremely small
  • Secure by default
  • Fast to download and start
  • Uses musl libc instead of glibc
  • Compatibility issues with some C extensions
  • Compilation can be challenging

Best Practice: For Python applications in development, python:3.x-slim is often a good balance of size and compatibility. For production, Alpine-based images are excellent if you've tested thoroughly with your dependencies. The full image is rarely necessary but can be useful for complex build environments.

Working with Image Layers

Understanding how Docker images are constructed from layers helps you optimize image size and reuse.

What are Image Layers?

Docker images are composed of multiple layers, each representing a set of filesystem changes. Layers are:

Analogy: Image layers are like transparencies stacked on top of each other. Each transparency adds something to the final image, but you can see through to the layers below. When Docker builds an image, it's as if it's laying down these transparencies one at a time, with each new layer potentially modifying what's visible in the layers below.

Inspecting Image Layers

To see the layers that make up an image, use the docker history command:

docker history nginx:latest

You'll see output like:

IMAGE          CREATED        CREATED BY                                      SIZE      COMMENT
605c77e624dd   7 days ago     /bin/sh -c #(nop)  CMD ["nginx" "-g" "daemon…   0B        
<missing>      7 days ago     /bin/sh -c #(nop)  STOPSIGNAL SIGQUIT           0B        
<missing>      7 days ago     /bin/sh -c #(nop)  EXPOSE 80                    0B        
<missing>      7 days ago     /bin/sh -c #(nop)  ENTRYPOINT ["/docker-ent…   0B        
<missing>      7 days ago     /bin/sh -c #(nop) COPY file:09a214a3e07c919a…   4.61kB    
<missing>      7 days ago     /bin/sh -c #(nop) COPY file:0fd5fca330dcd6a7…   1.04kB    
<missing>      7 days ago     /bin/sh -c #(nop) COPY file:0b866ff3fc1ef5b0…   1.96kB    
<missing>      7 days ago     /bin/sh -c #(nop) COPY file:65504f71f5855ca0…   1.2kB     
<missing>      7 days ago     /bin/sh -c set -x     && addgroup --system -…   61.1MB    
...

Each row represents a layer in the image, showing when it was created, the command that created it, and its size.

Layer Sharing and Caching

One of the powerful features of Docker's layer system is the ability to share and reuse layers between images. For example:

Example of layer sharing: If you have both nginx:latest and nginx:1.21 images, they likely share many layers. Docker only stores the unique layers for each tag, saving disk space.

Multi-Architecture Images

Docker Hub supports multi-architecture images, which allows the same image name to work across different CPU architectures.

Understanding Multi-Architecture Images

Multi-architecture images are actually a collection of images for different architectures, tied together with a manifest list. When you pull an image, Docker automatically selects the version that matches your system's architecture.

Common architectures include:

Checking Architecture Support

To see which architectures an image supports, look at its Docker Hub page under the Tags section. For official images, you'll often see multiple architectures listed for each tag.

Practical application: Multi-architecture support is increasingly important with the growing adoption of ARM-based servers and Apple Silicon Macs. Most official images now support multiple architectures without any special configuration on your part.

Dealing with Architecture Mismatches

Sometimes you might need to run an image that doesn't have a build for your architecture. In these cases:

docker run --platform linux/amd64 -d nginx

This forces Docker to pull and run the amd64 version, even on an ARM machine (with emulation if available).

Advanced Docker Hub Features

Docker Hub offers several advanced features that can enhance your development workflow.

Automated Builds

Docker Hub can automatically build images from source code repositories:

  1. Link your GitHub or Bitbucket account to Docker Hub
  2. Set up a repository with a Dockerfile
  3. Configure build rules (which branches/tags to build)
  4. Docker Hub will automatically build and publish images when you push changes

Using Docker Hub for Your Own Images

To push your own images to Docker Hub:

  1. Create an account and log in:
    docker login
  2. Tag your image with your username:
    docker tag my-app username/my-app:1.0
  3. Push the image to Docker Hub:
    docker push username/my-app:1.0

Organization Accounts

For team projects, Docker Hub offers organization accounts that allow:

Real-world usage: In a professional setting, your organization might have a Docker Hub organization account where all your custom images are stored. CI/CD pipelines can automatically build and push images to this account, and developers can pull these images as needed.

Alternatives to Docker Hub

While Docker Hub is the default and most popular registry, there are several alternatives worth knowing about:

Public Registries

Private Registry Options

Using Alternative Registries

To pull from an alternative registry, include the registry hostname in the image name:

# Pull from GitHub Container Registry
docker pull ghcr.io/username/image:tag

# Pull from Google Container Registry
docker pull gcr.io/project-id/image:tag

Best Practice: In enterprise environments, it's common to use a private registry for your custom images. This gives you more control over security, access, and availability.

Practical Examples

Let's put our knowledge of Docker Hub and public images to practical use with some real-world examples.

Example 1: Setting Up a Web Development Environment

Let's create a simple web development environment with NGINX, Python, and PostgreSQL:

# Create a network for the containers to communicate
docker network create webdev

# Start a PostgreSQL database
docker run -d \
  --name postgres \
  --network webdev \
  -e POSTGRES_PASSWORD=devpassword \
  -e POSTGRES_USER=devuser \
  -e POSTGRES_DB=devdb \
  -v pg_data:/var/lib/postgresql/data \
  postgres:13-alpine

# Start a Python container for development
docker run -it --rm \
  --name python \
  --network webdev \
  -v "$(pwd)/app:/app" \
  -w /app \
  -p 5000:5000 \
  python:3.9-slim \
  bash

# In the Python container, you can now install dependencies and run your app
# pip install -r requirements.txt
# python app.py

# In a separate terminal, start NGINX as a reverse proxy
docker run -d \
  --name nginx \
  --network webdev \
  -p 8080:80 \
  -v "$(pwd)/nginx.conf:/etc/nginx/conf.d/default.conf" \
  nginx:alpine

With this setup, NGINX can proxy requests to your Python application, which can connect to the PostgreSQL database.

Example 2: Data Analysis Environment

Let's set up a data analysis environment using the Jupyter image:

docker run -it --rm \
  -p 8888:8888 \
  -v "$(pwd)/notebooks:/home/jovyan/work" \
  jupyter/datascience-notebook

This launches a Jupyter notebook with scientific computing libraries. You can access it by opening the URL displayed in the console (typically http://localhost:8888 with a token).

Example 3: WordPress Blog

Let's set up a WordPress blog with MySQL using official images:

# Create a network
docker network create wordpress

# Start MySQL
docker run -d \
  --name wordpress-db \
  --network wordpress \
  -e MYSQL_ROOT_PASSWORD=rootpassword \
  -e MYSQL_DATABASE=wordpress \
  -e MYSQL_USER=wordpress \
  -e MYSQL_PASSWORD=wordpress \
  -v mysql_data:/var/lib/mysql \
  mysql:5.7

# Start WordPress
docker run -d \
  --name wordpress \
  --network wordpress \
  -p 8080:80 \
  -e WORDPRESS_DB_HOST=wordpress-db \
  -e WORDPRESS_DB_USER=wordpress \
  -e WORDPRESS_DB_PASSWORD=wordpress \
  -e WORDPRESS_DB_NAME=wordpress \
  -v wordpress_data:/var/www/html \
  wordpress

You can then access your WordPress site at http://localhost:8080.

Security Considerations

When using public images, security should be a top consideration:

Image Security Best Practices

Vulnerability Scanning

Docker Desktop includes Docker Scout, which can scan images for known vulnerabilities:

docker scout quickview nginx:latest

This shows a summary of known vulnerabilities in the image.

docker scout cves nginx:latest

This shows more detailed information about the CVEs (Common Vulnerabilities and Exposures) in the image.

Best Practice: Integrate image scanning into your CI/CD pipeline to automatically check for vulnerabilities before deployment.

Key Takeaways

With this knowledge, you're well-equipped to effectively find, evaluate, and use public Docker images in your projects!

Looking Ahead

In our next session, we'll learn how to create our own custom Docker images by writing Dockerfiles. This will allow us to package our applications into containers with precisely the dependencies and configuration we need.

Discussion Questions

  1. What criteria would you use to decide between the full, slim, and alpine variants of an image for your project?
  2. How might you approach evaluating a community image that doesn't have an official alternative?
  3. What are the security implications of using public images in a production environment? How would you mitigate these risks?
  4. How could the layered architecture of Docker images help optimize the build and deployment process in a CI/CD pipeline?
  5. In what scenarios might you prefer a specialized image over a more generic base image that you customize yourself?

Additional Resources