Python Full Stack Web Developer Course

Week 3: Python Fundamentals (Part 2)

Friday Morning: Package Management Deep Dive

The Foundation of Modern Development: Python Package Management

Welcome to our deep dive into Python package management! Today, we're exploring one of the most critical aspects of Python development that separates hobbyists from professionals: mastering the art and science of package management.

As we prepare to build web applications, effective package management becomes the foundation upon which our entire development process rests. Understanding these concepts thoroughly will save you countless hours of debugging, improve collaboration with your team, and ensure your applications remain maintainable and secure.

The Package Management Ecosystem

Analogy: If Python is a vast kitchen for creating culinary masterpieces, packages are pre-prepared ingredients and tools created by chefs worldwide. Package management is the inventory system that helps you source, organize, and maintain these ingredients efficiently.

At its core, Python package management involves:

Real-world Impact: Instagram, one of the world's largest Python web applications, manages hundreds of Python packages across its infrastructure. Effective package management is what allows their team of engineers to collaborate on a codebase used by billions of people.

PyPI: The Python Package Index

The Python Package Index (PyPI) is the official repository for third-party Python packages. Think of it as the App Store or Play Store for Python software.

Key Facts:

Browsing PyPI:

PyPI offers several ways to discover packages:

Metaphor: PyPI is like a massive library where anyone can contribute books (packages), and anyone can borrow them. The quality and usefulness vary, but the collection as a whole represents one of Python's greatest strengths.

pip: The Standard Package Installer

pip is Python's standard package installer and the primary tool most developers use to install packages from PyPI.

Basic pip Usage

# Installing a package
pip install requests

# Installing a specific version
pip install requests==2.28.1

# Upgrading a package
pip install --upgrade requests

# Uninstalling a package
pip uninstall requests

# Installing multiple packages from a file
pip install -r requirements.txt

Version Specifiers

Understanding version specifiers is crucial for reliable package management:

Specifier Meaning Example
== Exact version requests==2.28.1
>= Greater than or equal to requests>=2.28.1
> Greater than requests>2.28.1
<= Less than or equal to requests<=2.28.1
< Less than requests<2.28.1
~= Compatible release (same as >= current version, < next major version) requests~=2.28.1 (equivalent to >=2.28.1, <2.29.0)

Analogy: Version specifiers are like recipe instructions. == means "use exactly 2 cups of flour," while >= means "use at least 2 cups of flour." The ~= specifier is like saying "use about 2 cups of flour, but definitely not 3 cups."

Advanced pip Commands

# See what's installed
pip list

# Show details about a package
pip show requests

# Find outdated packages
pip list --outdated

# Download without installing
pip download requests

# Install from GitHub
pip install git+https://github.com/username/repository.git

# Install in development mode (editable)
pip install -e .

requirements.txt: Dependency Documentation

The requirements.txt file is the standard way to document project dependencies in Python.

Basic Structure

# requirements.txt example
flask==2.0.1
sqlalchemy>=1.4.0,<2.0.0
requests~=2.28.1
python-dotenv==0.19.0
# Comments are supported
# You can also specify development dependencies in a separate file

Creating and Using requirements.txt

# Generate from current environment
pip freeze > requirements.txt

# Install from requirements file
pip install -r requirements.txt

# Combine with virtual environment activation
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate
pip install -r requirements.txt

Best Practices

Real-world Example: Flask Web Application

# requirements.txt for a Flask web application

# Web Framework
flask==2.0.1
flask-wtf==1.0.0
flask-login==0.5.0

# Database
sqlalchemy==1.4.23
flask-sqlalchemy==2.5.1
flask-migrate==3.1.0
psycopg2-binary==2.9.1  # PostgreSQL driver

# API and HTTP
requests==2.28.1
urllib3==1.26.6

# Environment and Config
python-dotenv==0.19.0
pyyaml==6.0

# Security
flask-bcrypt==0.7.1
pyjwt==2.1.0

# Forms and Validation
email-validator==1.1.3
marshmallow==3.13.0

Analogy: A requirements.txt file is like a shopping list for your application. It ensures that everyone on your team gets exactly the same ingredients, and future you doesn't forget a critical component when setting up on a new machine.

Understanding Dependency Resolution

One of the most challenging aspects of package management is dependency resolution - ensuring all packages work together harmoniously.

The Dependency Graph

Metaphor: Package dependencies form a tree or graph. Your direct requirements are the trunk, their dependencies are branches, and so on to the leaves. The dependency resolver's job is to find a configuration where all packages can coexist.

Your Application
├── Flask 2.0.1
│   ├── Werkzeug 2.0.1
│   ├── Jinja2 3.0.1
│   │   └── MarkupSafe 2.0.1
│   ├── itsdangerous 2.0.1
│   └── click 8.0.1
└── SQLAlchemy 1.4.23

Dependency Hell

"Dependency Hell" occurs when packages have conflicting requirements:

# Conflict example
PackageA requires SomeLibrary>=2.0.0
PackageB requires SomeLibrary<2.0.0

# These cannot be satisfied simultaneously!

Common Resolution Strategies:

pip's Dependency Resolver

Since pip 20.3 (released in 2020), pip includes a new dependency resolver that:

# The new resolver may give messages like:
ERROR: Cannot install example-package because these package versions have conflicting dependencies.

The conflict is caused by:
    package-a 2.0.0 depends on somelib>=1.0.0
    package-b 3.0.0 depends on somelib<1.0.0

To fix this you could try to:
1. loosen the range of package versions you've specified
2. remove package versions to allow pip to attempt to solve the dependency conflict

Real-world Example: In a machine learning project, you might face conflicts between TensorFlow, which requires a specific NumPy version range, and another data science library with different NumPy requirements. The resolver helps identify and resolve these conflicts.

Beyond Basic pip: Advanced Package Management Tools

While pip is sufficient for many projects, more complex applications can benefit from advanced tools:

pip-tools

pip-tools provides a way to maintain pinned dependencies with more control:

# Install pip-tools
pip install pip-tools

# Create a requirements.in file with high-level dependencies
# requirements.in
flask>=2.0.0
sqlalchemy

# Compile to pinned requirements.txt
pip-compile requirements.in

# Install pinned dependencies
pip install -r requirements.txt

# Update when needed
pip-compile --upgrade requirements.in

Benefits:

conda

conda is both a package manager and environment manager, popular in data science:

# Create an environment
conda create --name myenv python=3.9

# Activate environment
conda activate myenv

# Install packages
conda install numpy pandas scikit-learn

# Install from specific channels
conda install -c conda-forge matplotlib

# Create environment from file
conda env create -f environment.yml

Unique Features:

Metaphor: If pip is like a specialized kitchen supplier, conda is like a general contractor who can bring in materials and tools from anywhere, not just culinary sources.

Package Management in Web Development

Web development projects have specific package management needs and challenges:

Web Development Dependencies

Web applications typically require several categories of packages:

Development vs. Production Dependencies

Web projects often distinguish between different types of dependencies:

# requirements-dev.txt
-r requirements.txt  # Include production dependencies

# Testing
pytest==7.0.0
pytest-flask==1.2.0
coverage==6.3.1

# Development tools
black==22.1.0  # Code formatting
flake8==4.0.1  # Linting
mypy==0.931    # Type checking
flask-debugtoolbar==0.13.1

# Documentation
sphinx==4.4.0

Docker and Package Management

For containerized web applications, package management integrates with Docker:

# Example Dockerfile with multi-stage build
FROM python:3.10-slim AS builder

WORKDIR /app

# Install build dependencies
RUN apt-get update && apt-get install -y --no-install-recommends gcc

# Copy requirements
COPY requirements.txt .

# Install dependencies
RUN pip wheel --no-cache-dir --no-deps --wheel-dir /app/wheels -r requirements.txt

# Final stage
FROM python:3.10-slim

WORKDIR /app

# Copy wheels from builder stage
COPY --from=builder /app/wheels /wheels
COPY --from=builder /app/requirements.txt .

# Install dependencies
RUN pip install --no-cache /wheels/*

# Copy application code
COPY . .

CMD ["gunicorn", "app:app"]

Benefits of this approach:

Security in Package Management

Security is a critical aspect of package management, especially for web applications:

Vulnerability Management

Python packages can contain security vulnerabilities that need monitoring:

# Check for security vulnerabilities
pip install safety
safety check -r requirements.txt

# Alternative with built-in pip audit (newer pip versions)
pip-audit

Supply Chain Attacks

Analogy: Supply chain attacks are like food contamination in the distribution system. The issue isn't with your cooking or your restaurant, but with an ingredient that was compromised before it reached you.

Notable examples include:

Best Security Practices

Example: Using pip hash verification

# Generate requirements with hashes
pip-compile --generate-hashes requirements.in

# Results in a file like:
Flask==2.0.1 \
    --hash=sha256:7b2fb8e039275d8d98092bd3eb6c72920be4aeb2aca440dea70f5a9c1a800432 \
    --hash=sha256:cb90f62f1d8e4dc4621f52106613488b5ba826b2e1e10a33eac92f723093ab6a
Werkzeug==2.0.1 \
    --hash=sha256:1de1db30d010ff1af14a009224ec49ab2329ad2cde454c8a708130642d579c42 \
    --hash=sha256:6c1ec5ce6d102ddeebd4acf72ee09c9449aeded860b0ba4c2d9b02327833f5dd

Real-world Impact: In 2018, a popular package (event-stream) was compromised, affecting thousands of applications, including some cryptocurrency wallets that had funds stolen. Proper hash verification would have detected the unauthorized change.

Creating and Distributing Your Own Packages

As your web development skills grow, you might create reusable components worth sharing:

Basic Package Structure

my_package/
├── setup.py           # Package metadata and dependencies
├── README.md          # Documentation
├── LICENSE            # License information
├── requirements.txt   # Development dependencies
├── my_package/        # Actual package code
│   ├── __init__.py    # Makes it a package
│   ├── module1.py     # Code modules
│   └── module2.py
└── tests/             # Test code
    ├── __init__.py
    ├── test_module1.py
    └── test_module2.py

setup.py Example

from setuptools import setup, find_packages

setup(
    name="my-web-utils",
    version="0.1.0",
    packages=find_packages(),
    install_requires=[
        "requests>=2.25.0",
        "beautifulsoup4>=4.9.0",
    ],
    author="Your Name",
    author_email="your.email@example.com",
    description="A collection of web utilities",
    long_description=open("README.md").read(),
    long_description_content_type="text/markdown",
    url="https://github.com/yourusername/my-web-utils",
    classifiers=[
        "Programming Language :: Python :: 3",
        "License :: OSI Approved :: MIT License",
        "Operating System :: OS Independent",
    ],
    python_requires=">=3.7",
)

Building and Publishing

# Install build tools
pip install build twine

# Build the package
python -m build

# Upload to PyPI (test server first)
twine upload --repository-url https://test.pypi.org/legacy/ dist/*

# Upload to real PyPI
twine upload dist/*

Modern Packaging with pyproject.toml

Python is moving toward using pyproject.toml instead of setup.py:

[build-system]
requires = ["setuptools>=42", "wheel"]
build-backend = "setuptools.build_meta"

[project]
name = "my-web-utils"
version = "0.1.0"
authors = [
    {name = "Your Name", email = "your.email@example.com"},
]
description = "A collection of web utilities"
readme = "README.md"
requires-python = ">=3.7"
classifiers = [
    "Programming Language :: Python :: 3",
    "License :: OSI Approved :: MIT License",
    "Operating System :: OS Independent",
]
dependencies = [
    "requests>=2.25.0",
    "beautifulsoup4>=4.9.0",
]

[project.urls]
"Homepage" = "https://github.com/yourusername/my-web-utils"
"Bug Tracker" = "https://github.com/yourusername/my-web-utils/issues"

Real-world Example: Many popular web development tools like Django extensions, Flask plugins, or utility libraries started as internal tools that developers decided to share with the community.

Package Management in Production

Production deployment adds additional considerations to package management:

Reproducible Builds

Ensuring exact reproduction of environments between development and production:

# Creating a wheel directory
pip wheel -r requirements.txt -w ./wheels

# Installing from wheels
pip install --no-index --find-links=./wheels -r requirements.txt

Air-Gapped Environments

Some production environments have no internet access:

# Download all dependencies on a connected machine
pip download -r requirements.txt -d ./packages

# Transfer ./packages directory to air-gapped environment

# Install in the air-gapped environment
pip install --no-index --find-links=./packages -r requirements.txt

Private Package Repositories

For proprietary code or additional security, use private repositories:

# Configure pip to use private repository
pip config set global.index-url https://pypi.internal-company.com/simple

# Include credentials if needed
pip config set global.index-url https://user:pass@pypi.internal-company.com/simple

Metaphor: A private PyPI server is like having your own specialty grocery store where you control exactly what ingredients are available and can add your own proprietary spice blends.

Practical Exercise: Building a Web Utility Package

Let's put our knowledge into practice by creating a simple web utility package that could be reused across projects:

Project: Create a Web Scraping Utility Package

  1. Set up the project structure:
    mkdir -p web_utils/web_utils
    touch web_utils/web_utils/__init__.py
    touch web_utils/web_utils/scraper.py
    touch web_utils/web_utils/parser.py
    touch web_utils/setup.py
    touch web_utils/README.md
    touch web_utils/LICENSE
  2. Create a virtual environment:
    cd web_utils
    python -m venv venv
    source venv/bin/activate  # On Windows: venv\Scripts\activate
  3. Install development dependencies:
    pip install requests beautifulsoup4 pytest black
  4. Create the scraper module (web_utils/scraper.py):
    import requests
    from typing import Dict, Optional
    
    class WebScraper:
        """A simple web scraper utility."""
        
        def __init__(self, user_agent: Optional[str] = None):
            """Initialize the scraper with optional user agent."""
            self.session = requests.Session()
            if user_agent:
                self.session.headers.update({"User-Agent": user_agent})
            else:
                # Default user agent
                self.session.headers.update({
                    "User-Agent": "WebUtils/1.0 (https://github.com/yourusername/web-utils)"
                })
        
        def get_page(self, url: str) -> str:
            """Get the content of a webpage as text."""
            response = self.session.get(url)
            response.raise_for_status()  # Raise exception for 4XX/5XX responses
            return response.text
        
        def get_json(self, url: str, params: Optional[Dict] = None) -> Dict:
            """Get JSON data from an API endpoint."""
            response = self.session.get(url, params=params)
            response.raise_for_status()
            return response.json()
        
        def download_file(self, url: str, local_path: str) -> None:
            """Download a file from a URL to a local path."""
            with self.session.get(url, stream=True) as response:
                response.raise_for_status()
                with open(local_path, 'wb') as f:
                    for chunk in response.iter_content(chunk_size=8192):
                        f.write(chunk)
  5. Create the parser module (web_utils/parser.py):
    from bs4 import BeautifulSoup
    from typing import List, Dict, Optional
    
    class HtmlParser:
        """A utility for parsing HTML content."""
        
        @staticmethod
        def extract_links(html_content: str, base_url: Optional[str] = None) -> List[Dict]:
            """Extract all links from an HTML document.
            
            Args:
                html_content: HTML content as string
                base_url: Optional base URL to resolve relative URLs
                
            Returns:
                List of dictionaries with 'text' and 'url' keys
            """
            soup = BeautifulSoup(html_content, 'html.parser')
            links = []
            
            for a_tag in soup.find_all('a', href=True):
                url = a_tag['href']
                # Resolve relative URLs if base_url is provided
                if base_url and url.startswith('/'):
                    url = base_url.rstrip('/') + url
                    
                links.append({
                    'text': a_tag.get_text(strip=True),
                    'url': url
                })
                
            return links
        
        @staticmethod
        def extract_text(html_content: str, selector: str) -> List[str]:
            """Extract text content matching a CSS selector.
            
            Args:
                html_content: HTML content as string
                selector: CSS selector to match elements
                
            Returns:
                List of text strings from matching elements
            """
            soup = BeautifulSoup(html_content, 'html.parser')
            elements = soup.select(selector)
            return [element.get_text(strip=True) for element in elements]
        
        @staticmethod
        def extract_table(html_content: str, table_selector: str = "table") -> List[Dict]:
            """Extract data from an HTML table into a list of dictionaries.
            
            Args:
                html_content: HTML content as string
                table_selector: CSS selector to find the table
                
            Returns:
                List of dictionaries with column names as keys
            """
            soup = BeautifulSoup(html_content, 'html.parser')
            table = soup.select_one(table_selector)
            
            if not table:
                return []
                
            rows = table.find_all('tr')
            if not rows:
                return []
                
            # Extract headers
            headers = [th.get_text(strip=True) for th in rows[0].find_all(['th', 'td'])]
            
            # Extract data rows
            data = []
            for row in rows[1:]:
                cells = row.find_all(['td', 'th'])
                if len(cells) == len(headers):
                    row_data = {}
                    for i, cell in enumerate(cells):
                        row_data[headers[i]] = cell.get_text(strip=True)
                    data.append(row_data)
                    
            return data
  6. Update the __init__.py file to expose the classes:
    from .scraper import WebScraper
    from .parser import HtmlParser
    
    __version__ = "0.1.0"
  7. Create setup.py:
    from setuptools import setup, find_packages
    
    with open("README.md", "r", encoding="utf-8") as fh:
        long_description = fh.read()
    
    setup(
        name="web-utils",
        version="0.1.0",
        author="Your Name",
        author_email="your.email@example.com",
        description="Utility functions for web scraping and parsing",
        long_description=long_description,
        long_description_content_type="text/markdown",
        url="https://github.com/yourusername/web-utils",
        packages=find_packages(),
        classifiers=[
            "Programming Language :: Python :: 3",
            "License :: OSI Approved :: MIT License",
            "Operating System :: OS Independent",
        ],
        python_requires=">=3.7",
        install_requires=[
            "requests>=2.25.0",
            "beautifulsoup4>=4.9.0",
        ],
    )
  8. Create a basic README.md:
    # Web Utils
    
    A collection of utilities for web scraping and HTML parsing.
    
    ## Installation
    
    ```
    pip install web-utils
    ```
    
    ## Usage
    
    ```python
    from web_utils import WebScraper, HtmlParser
    
    # Create a scraper
    scraper = WebScraper()
    
    # Get a web page
    html = scraper.get_page("https://example.com")
    
    # Extract all links
    parser = HtmlParser()
    links = parser.extract_links(html)
    for link in links:
        print(f"{link['text']}: {link['url']}")
    
    # Extract data from a table
    table_data = parser.extract_table(html)
    print(table_data)
    ```
    
    ## License
    
    MIT
    
  9. Install the package in development mode:
    pip install -e .
  10. Build the package:
    pip install build
    python -m build

This exercise demonstrates:

Real-world Application: This pattern is how libraries like requests-html, newspaper3k, and other specialized web utilities are structured. These tools can save you significant time in future web projects.

Conclusion and Best Practices

As we've explored, package management is a fundamental skill for Python web developers. To wrap up, here are key best practices to follow:

Package Management Checklist

Moving Forward

As we continue our journey into web development with Python, effective package management will be a recurring theme. The concepts we've covered today will apply to Flask, Django, and all other web frameworks we'll explore.

In the next sessions, we'll build on this foundation as we dive into specific web development topics, where we'll encounter and utilize many of the packages we've discussed today.

Additional Resources