Docker Hub and Public Images

Lecture Overview

In this session, we'll explore Docker Hub — Docker's official registry for container images — and learn how to find, use, and work with public images. Rather than building every image from scratch, we can leverage the vast ecosystem of pre-built images to accelerate our development. By the end of this session, you'll understand how to find appropriate images, evaluate their quality and security, and incorporate them into your projects effectively.

Introduction to Docker Hub

Docker Hub is the world's largest library and community for container images. It serves as the default registry for Docker, meaning that when you run commands like docker pull python, Docker automatically looks for images on Docker Hub.

What is Docker Hub?

A cloud-based registry service for Docker images
A central repository where developers can share and find container images
Both a public registry for open-source images and a private registry for teams and organizations
A hub for official images maintained by Docker and software vendors

Analogy: Docker Hub is like a massive public library for container images. Just as a library catalogs books by different authors on different subjects, Docker Hub catalogs images from different providers for different applications. Some books are written by renowned authors (official images), some by community members (community images), and some are in special collections with restricted access (private repositories).

Key Concepts

Image Repository: A collection of related images, usually representing the same application with different versions
Official Images: Curated, well-documented images maintained by Docker
Verified Publisher Images: Created and maintained by commercial entities that partner with Docker
Community Images: Created and maintained by individual Docker Hub users
Tags: Identifiers for specific versions of an image

Let's head over to Docker Hub and explore its features.

Accessing Docker Hub

You can access Docker Hub through your web browser at https://hub.docker.com/. When you visit, you'll see a search bar, featured content, and various categories of images.

While you can browse Docker Hub without an account, creating a free account allows you to:

Push your own images to Docker Hub
Create private repositories
Star and follow your favorite images
Join Docker teams and organizations

Finding Images on Docker Hub

Docker Hub contains millions of images, so finding the right one is an important skill. Let's explore different methods for finding images.

Using the Docker CLI

You can search for images directly from your terminal using the docker search command:

docker search nginx

This returns a list of images related to the search term, along with information such as:

The name of the repository
A brief description
The number of stars (indicating popularity)
Whether it's an official image
Whether it's automated (built automatically from a GitHub repository)

Example output:

NAME                              DESCRIPTION                                     STARS     OFFICIAL   AUTOMATED
nginx                             Official build of Nginx.                          16831     [OK]       
jwilder/nginx-proxy               Automated Nginx reverse proxy for docker con...   2122                 [OK]
richarvey/nginx-php-fpm           Container running Nginx + PHP-FPM capable of...   818                  [OK]
jc21/nginx-proxy-manager          Docker container for managing Nginx proxy ho...   248                  
...

While convenient, this method provides limited information. For more detailed information, it's better to use the Docker Hub website.

Using the Docker Hub Website

The Docker Hub website provides a more comprehensive search experience with more details about each image:

Go to https://hub.docker.com/search
Enter your search term in the search bar
Use filters like "Official Images" or "Verified Publishers" to narrow results
Sort results by relevance, stars, or recency

Understanding Search Results

When you search on Docker Hub, you'll see several pieces of information that help you evaluate an image:

Official/Verified Badge: Indicates trusted images
Pull Count: How many times the image has been downloaded
Star Count: How many users have starred (favorited) the repository
Last Updated: When the repository was last updated
Short Description: Brief summary of what the image contains

Example search: If you search for "PostgreSQL" on Docker Hub, you'll see multiple results including:

postgres (Official Image): The official PostgreSQL image
bitnami/postgresql (Verified Publisher): Bitnami's PostgreSQL image
Various community images with specific configurations

Evaluating Image Quality and Security

Not all images are created equal. When selecting an image, you should consider several factors to ensure quality and security.

Image Trust Hierarchy

Docker Hub has a hierarchy of image trustworthiness:

Official Images: Most trustworthy, maintained by Docker and upstream vendors
Verified Publisher Images: Created by trusted partners with a verified badge
Community Images: Created by individual users, varying in quality and security

Analogy: Think of this like medication sources. Official images are like FDA-approved medications from established pharmaceutical companies. Verified publisher images are like supplements from reputable brands with quality certifications. Community images range from carefully formulated products by knowledgeable herbalists to unknown substances mixed in someone's garage — they require more scrutiny before use.

Key Evaluation Criteria

When evaluating an image, consider:

Maintainer Reputation: Is it from an official source or trusted publisher?
Documentation Quality: Are the usage and configuration well documented?
Update Frequency: How recently was the image updated?
Community Engagement: High pull counts and stars suggest wider usage
Docker Hub Comments: Look for feedback from other users
Open Source: Can you see how the image is built? Is there a Dockerfile?
Security Scanning: Does the image have any known vulnerabilities?

Image Details Page

Clicking on an image in Docker Hub takes you to its details page, which provides:

A detailed description of the image
Usage instructions
Available tags (versions)
Dockerfile source (for some images)
Environment variables and other configuration options

Best Practice: Always prefer official images when available. They're maintained by Docker and the software vendors, follow best practices, are regularly updated for security, and provide clear documentation.

Understanding Image Tags

Tags are how Docker identifies specific versions of an image. They're crucial for reproducibility and stability in your projects.

What are Tags?

A tag is a label applied to an image in a repository, indicating a specific version or variant. When you pull an image without specifying a tag, Docker uses the latest tag by default.

Common Tagging Conventions

Many repositories follow these conventions:

latest: The most up-to-date version (often the most recent stable release)
Version numbers: Major.Minor.Patch (e.g., 13.0.1, 3.9)
Date-based: Using dates as tags (e.g., 20210701)
Variant indicators: Often appended to version (e.g., 3.9-slim, 13-alpine)

Common Image Variants

Many official images offer different variants with varying tradeoffs:

alpine: Based on Alpine Linux, very small footprint but may have compatibility issues
slim: Smaller than default but larger than alpine, good balance of compatibility and size
buster/bullseye/etc.: Based on specific Debian releases
windowsservercore/nanoserver: Windows-based variants

Example tags for Python:

python:3.9 - Python 3.9 on Debian
python:3.9-slim - Smaller variant of Python 3.9
python:3.9-alpine - Python 3.9 on Alpine Linux (smallest)
python:3.9-windowsservercore - Python 3.9 on Windows Server Core

Viewing Available Tags

You can see all available tags for an image on its Docker Hub page. For example, for Python:

Go to https://hub.docker.com/_/python
Click on the "Tags" tab

You'll see a list of all available tags along with their size and architecture support.

Best Practice: Always use specific version tags in production environments, never latest. This ensures reproducibility and prevents unexpected changes when images are updated.

Pulling and Using Public Images

Now that we understand how to find and evaluate images, let's look at how to pull and use them effectively.

Pulling Images

To download an image from Docker Hub, use the docker pull command:

docker pull [repository]:[tag]

Examples:

# Pull the latest version of nginx
docker pull nginx

# Pull a specific version of Python
docker pull python:3.9-slim

# Pull PostgreSQL version 13 with Alpine Linux
docker pull postgres:13-alpine

The image will be downloaded to your local Docker environment, where it can be used to run containers.

Viewing Local Images

To see which images you have downloaded locally:

docker images

Output example:

REPOSITORY   TAG         IMAGE ID       CREATED       SIZE
nginx        latest      605c77e624dd   3 days ago   142MB
python       3.9-slim    8c7051081f50   5 days ago   124MB
postgres     13-alpine   87180a7e49e8   1 week ago   213MB

Running Containers from Images

Once you've pulled an image, you can run a container from it:

docker run [options] [repository]:[tag] [command]

If you haven't explicitly pulled the image, docker run will automatically pull it for you.

Example running containers from public images:

# Run an nginx web server
docker run -d -p 8080:80 --name my-nginx nginx

# Run a PostgreSQL database
docker run -d -p 5432:5432 \
  -e POSTGRES_PASSWORD=mysecretpassword \
  --name my-postgres \
  postgres:13-alpine

# Run a Python container with an interactive shell
docker run -it --rm python:3.9-slim python

Exploring Popular Official Images

Let's explore some of the most popular official images on Docker Hub and how they can be used in your projects.

NGINX

NGINX is a high-performance web server and reverse proxy.

Basic usage:

docker run -d -p 8080:80 --name webserver nginx

Serving custom content:

docker run -d -p 8080:80 -v $(pwd)/html:/usr/share/nginx/html nginx

Custom configuration:

docker run -d -p 8080:80 -v $(pwd)/nginx.conf:/etc/nginx/nginx.conf:ro nginx

Real-world application: You can use NGINX as a reverse proxy in front of your Python web applications. It can handle SSL termination, static file serving, and load balancing, allowing your application to focus on business logic.

PostgreSQL

PostgreSQL is a powerful, open-source relational database.

Basic usage:

docker run -d \
  -p 5432:5432 \
  -e POSTGRES_PASSWORD=mysecretpassword \
  --name my-postgres \
  postgres

Data persistence:

docker run -d \
  -p 5432:5432 \
  -e POSTGRES_PASSWORD=mysecretpassword \
  -e POSTGRES_USER=myuser \
  -e POSTGRES_DB=mydb \
  -v postgres_data:/var/lib/postgresql/data \
  --name my-postgres \
  postgres

Initialization scripts:

docker run -d \
  -p 5432:5432 \
  -e POSTGRES_PASSWORD=mysecretpassword \
  -v postgres_data:/var/lib/postgresql/data \
  -v $(pwd)/init.sql:/docker-entrypoint-initdb.d/init.sql \
  --name my-postgres \
  postgres

Real-world application: In a typical web application, you might use a PostgreSQL container for your database, connected to your Python application container. The database data can be persisted using a Docker volume.

Python

The official Python image provides Python runtimes.

Interactive Python shell:

docker run -it --rm python:3.9 python

Running a Python script:

docker run -it --rm -v $(pwd):/app -w /app python:3.9 python script.py

Running a Flask web application:

docker run -it --rm \
  -p 5000:5000 \
  -v $(pwd):/app \
  -w /app \
  -e FLASK_APP=app.py \
  -e FLASK_ENV=development \
  python:3.9-slim \
  sh -c "pip install -r requirements.txt && flask run --host=0.0.0.0"

Real-world application: The Python image is typically used as a base for creating your own application images. You would start with a Python base image, add your application code and dependencies, and build a custom image.

Redis

Redis is an in-memory data structure store, used as a database, cache, and message broker.

Basic usage:

docker run -d -p 6379:6379 --name my-redis redis

With persistence:

docker run -d \
  -p 6379:6379 \
  -v redis_data:/data \
  --name my-redis \
  redis redis-server --appendonly yes

With custom configuration:

docker run -d \
  -p 6379:6379 \
  -v $(pwd)/redis.conf:/usr/local/etc/redis/redis.conf \
  --name my-redis \
  redis redis-server /usr/local/etc/redis/redis.conf

Real-world application: Redis is commonly used alongside web applications for caching, session storage, and background job queues. For example, you might use Redis with Celery to handle asynchronous tasks in a Python web application.

Understanding Docker Image Size and Optimization

Image size is an important consideration for performance, network transfer times, and resource usage. Let's explore this aspect of public images.

Why Image Size Matters

Download time: Smaller images are faster to pull from registries
Startup time: Smaller images can lead to faster container startup
Resource usage: Smaller images use less disk space
Security surface: Smaller images often have fewer unnecessary packages

Size Comparison of Different Variants

Let's compare the sizes of different Python image variants:

docker pull python:3.9
docker pull python:3.9-slim
docker pull python:3.9-alpine
docker images

You might see output like:

REPOSITORY   TAG         IMAGE ID       CREATED       SIZE
python       3.9         f88b2f81f83a   2 weeks ago   915MB
python       3.9-slim    8c705081f50d   2 weeks ago   124MB
python       3.9-alpine  d4d9c6317a1a   2 weeks ago   45MB

The size differences are substantial! But what are the tradeoffs?

Tradeoffs for Different Variants

Variant	Pros	Cons
Full (e.g., `python:3.9`)	Includes all build tools Includes many system packages Maximum compatibility	Very large size Larger attack surface Slower to download and start
Slim (e.g., `python:3.9-slim`)	Much smaller than full Good compatibility Based on Debian	Missing some build tools May need additional packages
Alpine (e.g., `python:3.9-alpine`)	Extremely small Secure by default Fast to download and start	Uses musl libc instead of glibc Compatibility issues with some C extensions Compilation can be challenging

Best Practice: For Python applications in development, python:3.x-slim is often a good balance of size and compatibility. For production, Alpine-based images are excellent if you've tested thoroughly with your dependencies. The full image is rarely necessary but can be useful for complex build environments.

Working with Image Layers

Understanding how Docker images are constructed from layers helps you optimize image size and reuse.

What are Image Layers?

Docker images are composed of multiple layers, each representing a set of filesystem changes. Layers are:

Created by instructions in a Dockerfile (each instruction creates a layer)
Read-only once created
Cached and reused when possible
Stacked on top of each other to form the complete image

Analogy: Image layers are like transparencies stacked on top of each other. Each transparency adds something to the final image, but you can see through to the layers below. When Docker builds an image, it's as if it's laying down these transparencies one at a time, with each new layer potentially modifying what's visible in the layers below.

Inspecting Image Layers

To see the layers that make up an image, use the docker history command:

docker history nginx:latest

You'll see output like:

IMAGE          CREATED        CREATED BY                                      SIZE      COMMENT
605c77e624dd   7 days ago     /bin/sh -c #(nop)  CMD ["nginx" "-g" "daemon…   0B        
<missing>      7 days ago     /bin/sh -c #(nop)  STOPSIGNAL SIGQUIT           0B        
<missing>      7 days ago     /bin/sh -c #(nop)  EXPOSE 80                    0B        
<missing>      7 days ago     /bin/sh -c #(nop)  ENTRYPOINT ["/docker-ent…   0B        
<missing>      7 days ago     /bin/sh -c #(nop) COPY file:09a214a3e07c919a…   4.61kB    
<missing>      7 days ago     /bin/sh -c #(nop) COPY file:0fd5fca330dcd6a7…   1.04kB    
<missing>      7 days ago     /bin/sh -c #(nop) COPY file:0b866ff3fc1ef5b0…   1.96kB    
<missing>      7 days ago     /bin/sh -c #(nop) COPY file:65504f71f5855ca0…   1.2kB     
<missing>      7 days ago     /bin/sh -c set -x     && addgroup --system -…   61.1MB    
...

Each row represents a layer in the image, showing when it was created, the command that created it, and its size.

Layer Sharing and Caching

One of the powerful features of Docker's layer system is the ability to share and reuse layers between images. For example:

If two images are based on Ubuntu, they share the Ubuntu base layers
When you build an image, Docker reuses cached layers if the instructions haven't changed
This sharing makes pullls faster and reduces disk usage

Example of layer sharing: If you have both nginx:latest and nginx:1.21 images, they likely share many layers. Docker only stores the unique layers for each tag, saving disk space.

Multi-Architecture Images

Docker Hub supports multi-architecture images, which allows the same image name to work across different CPU architectures.

Understanding Multi-Architecture Images

Multi-architecture images are actually a collection of images for different architectures, tied together with a manifest list. When you pull an image, Docker automatically selects the version that matches your system's architecture.

Common architectures include:

amd64: Standard 64-bit x86 PCs and servers
arm64/aarch64: 64-bit ARM (Apple M1/M2, AWS Graviton, etc.)
arm/v7: 32-bit ARM (older Raspberry Pi, etc.)
windows-amd64: Windows on 64-bit x86

Checking Architecture Support

To see which architectures an image supports, look at its Docker Hub page under the Tags section. For official images, you'll often see multiple architectures listed for each tag.

Practical application: Multi-architecture support is increasingly important with the growing adoption of ARM-based servers and Apple Silicon Macs. Most official images now support multiple architectures without any special configuration on your part.

Dealing with Architecture Mismatches

Sometimes you might need to run an image that doesn't have a build for your architecture. In these cases:

Docker Desktop for Mac supports transparent emulation of x86_64 images on Apple Silicon
You can explicitly request an image for a specific architecture using the --platform flag:

docker run --platform linux/amd64 -d nginx

This forces Docker to pull and run the amd64 version, even on an ARM machine (with emulation if available).

Advanced Docker Hub Features

Docker Hub offers several advanced features that can enhance your development workflow.

Automated Builds

Docker Hub can automatically build images from source code repositories:

Link your GitHub or Bitbucket account to Docker Hub
Set up a repository with a Dockerfile
Configure build rules (which branches/tags to build)
Docker Hub will automatically build and publish images when you push changes

Using Docker Hub for Your Own Images

To push your own images to Docker Hub:

Create an account and log in:
```
docker login
```
Tag your image with your username:
```
docker tag my-app username/my-app:1.0
```
Push the image to Docker Hub:
```
docker push username/my-app:1.0
```

Organization Accounts

For team projects, Docker Hub offers organization accounts that allow:

Shared access to repositories
Team management
Role-based access control
Private repositories

Real-world usage: In a professional setting, your organization might have a Docker Hub organization account where all your custom images are stored. CI/CD pipelines can automatically build and push images to this account, and developers can pull these images as needed.

Alternatives to Docker Hub

While Docker Hub is the default and most popular registry, there are several alternatives worth knowing about:

Public Registries

GitHub Container Registry (ghcr.io): Integrated with GitHub accounts and actions
Quay.io: Red Hat's container registry with advanced security features
Google Container Registry (gcr.io): Integrated with Google Cloud
Amazon Elastic Container Registry (ECR Public): AWS's public registry

Private Registry Options

Azure Container Registry: Microsoft's private registry service
Amazon ECR (Private): AWS's private registry service
Google Container Registry (Private): Google Cloud's private registry
Harbor: Open-source registry with security scanning
Self-hosted Docker Registry: Run your own registry server

Using Alternative Registries

To pull from an alternative registry, include the registry hostname in the image name:

# Pull from GitHub Container Registry
docker pull ghcr.io/username/image:tag

# Pull from Google Container Registry
docker pull gcr.io/project-id/image:tag

Best Practice: In enterprise environments, it's common to use a private registry for your custom images. This gives you more control over security, access, and availability.

Practical Examples

Let's put our knowledge of Docker Hub and public images to practical use with some real-world examples.

Example 1: Setting Up a Web Development Environment

Let's create a simple web development environment with NGINX, Python, and PostgreSQL:

# Create a network for the containers to communicate
docker network create webdev

# Start a PostgreSQL database
docker run -d \
  --name postgres \
  --network webdev \
  -e POSTGRES_PASSWORD=devpassword \
  -e POSTGRES_USER=devuser \
  -e POSTGRES_DB=devdb \
  -v pg_data:/var/lib/postgresql/data \
  postgres:13-alpine

# Start a Python container for development
docker run -it --rm \
  --name python \
  --network webdev \
  -v "$(pwd)/app:/app" \
  -w /app \
  -p 5000:5000 \
  python:3.9-slim \
  bash

# In the Python container, you can now install dependencies and run your app
# pip install -r requirements.txt
# python app.py

# In a separate terminal, start NGINX as a reverse proxy
docker run -d \
  --name nginx \
  --network webdev \
  -p 8080:80 \
  -v "$(pwd)/nginx.conf:/etc/nginx/conf.d/default.conf" \
  nginx:alpine

With this setup, NGINX can proxy requests to your Python application, which can connect to the PostgreSQL database.

Example 2: Data Analysis Environment

Let's set up a data analysis environment using the Jupyter image:

docker run -it --rm \
  -p 8888:8888 \
  -v "$(pwd)/notebooks:/home/jovyan/work" \
  jupyter/datascience-notebook

This launches a Jupyter notebook with scientific computing libraries. You can access it by opening the URL displayed in the console (typically http://localhost:8888 with a token).

Example 3: WordPress Blog

Let's set up a WordPress blog with MySQL using official images:

# Create a network
docker network create wordpress

# Start MySQL
docker run -d \
  --name wordpress-db \
  --network wordpress \
  -e MYSQL_ROOT_PASSWORD=rootpassword \
  -e MYSQL_DATABASE=wordpress \
  -e MYSQL_USER=wordpress \
  -e MYSQL_PASSWORD=wordpress \
  -v mysql_data:/var/lib/mysql \
  mysql:5.7

# Start WordPress
docker run -d \
  --name wordpress \
  --network wordpress \
  -p 8080:80 \
  -e WORDPRESS_DB_HOST=wordpress-db \
  -e WORDPRESS_DB_USER=wordpress \
  -e WORDPRESS_DB_PASSWORD=wordpress \
  -e WORDPRESS_DB_NAME=wordpress \
  -v wordpress_data:/var/www/html \
  wordpress

You can then access your WordPress site at http://localhost:8080.

Security Considerations

When using public images, security should be a top consideration:

Image Security Best Practices

Use official or verified images whenever possible
Prefer specific tags over latest to ensure you know what you're running
Keep images updated regularly to get security patches
Use minimal images when possible (alpine/slim variants) to reduce attack surface
Scan images for vulnerabilities using tools like Docker Scout, Snyk, or Trivy
Don't embed secrets in images or containers; use environment variables or secrets management tools

Vulnerability Scanning

Docker Desktop includes Docker Scout, which can scan images for known vulnerabilities:

docker scout quickview nginx:latest

This shows a summary of known vulnerabilities in the image.

docker scout cves nginx:latest

This shows more detailed information about the CVEs (Common Vulnerabilities and Exposures) in the image.

Best Practice: Integrate image scanning into your CI/CD pipeline to automatically check for vulnerabilities before deployment.

Key Takeaways

Docker Hub is the default registry for Docker images, with millions of available images
Official images are curated by Docker and are the most trustworthy option
Tags specify image versions and variants, with conventions like version, version-slim, and version-alpine
When evaluating images, consider maintainer reputation, documentation quality, update frequency, and community engagement
Different image variants offer tradeoffs between size, features, and compatibility
Multi-architecture images allow the same image to work across different CPU architectures
Images are composed of layers, which can be shared and reused between images
Security is a critical consideration when using public images - prefer official images and scan for vulnerabilities

With this knowledge, you're well-equipped to effectively find, evaluate, and use public Docker images in your projects!

Looking Ahead

In our next session, we'll learn how to create our own custom Docker images by writing Dockerfiles. This will allow us to package our applications into containers with precisely the dependencies and configuration we need.

Discussion Questions

What criteria would you use to decide between the full, slim, and alpine variants of an image for your project?
How might you approach evaluating a community image that doesn't have an official alternative?
What are the security implications of using public images in a production environment? How would you mitigate these risks?
How could the layered architecture of Docker images help optimize the build and deployment process in a CI/CD pipeline?
In what scenarios might you prefer a specialized image over a more generic base image that you customize yourself?

Additional Resources

Docker Hub Documentation - Official guide to using Docker Hub
Docker Pull Reference - Detailed information on the pull command
Docker Image Management - Best practices for managing images
Docker Search Reference - How to search for images from the CLI
Official Images Repository - GitHub repository for official Docker images