Python Full Stack Web Developer Course

Week 2: Python Fundamentals (Part 1)

Friday Morning: Package Management with pip

Introduction to Package Management

Welcome to our session on package management with pip! In Python development, the ability to efficiently manage external libraries and dependencies is just as important as writing good code. Today, we'll dive deep into pip, Python's official package manager, and explore how it can streamline your development process.

Think of pip like a skilled librarian for your code. Just as a librarian helps you find and borrow books from a vast collection, pip helps you discover, install, and manage Python packages from the Python Package Index (PyPI) and other sources. Without package management, you'd need to manually download and install every library you want to use, track their versions, and handle their dependencies yourself—a tedious and error-prone process.

What is pip and PyPI?

pip (which stands for "pip installs packages") is Python's official package installer. It connects to package repositories, downloads packages, and handles the installation process including managing dependencies.

PyPI (the Python Package Index) is the main repository of Python software, hosting over 400,000 projects. Think of PyPI as an enormous shared library where developers publish their code for others to use.

Real-World Analogy: If Python is like a carpentry workshop, then pip is your supplier who delivers tools and materials. PyPI is the massive warehouse where these tools are stored. Just as a carpenter doesn't manufacture their own screws or saws, a Python developer doesn't need to write everything from scratch—they can leverage existing tools from PyPI.

Key Benefits of Using pip:

Verifying and Installing pip

Most Python installations come with pip pre-installed. Let's first check if pip is already installed and what version you have:

pip --version

If you need to install or upgrade pip, here's how:

On Windows:

python -m ensurepip --upgrade

On macOS/Linux:

python3 -m ensurepip --upgrade

Alternative method using get-pip.py:

# Download the installation script
curl https://bootstrap.pypa.io/get-pip.py -o get-pip.py

# Run the script to install pip
python get-pip.py

Note: On some Linux distributions, you might need to use the package manager:

# Debian/Ubuntu
sudo apt update
sudo apt install python3-pip

# CentOS/RHEL
sudo yum install python3-pip

Basic pip Commands

Installing Packages

The most common pip command is install, which downloads and installs packages from PyPI:

pip install package_name

Install a specific version:

pip install package_name==1.2.3

Install the latest version in a specific range:

pip install "package_name>=1.2.0,<2.0.0"

Install multiple packages at once:

pip install package1 package2 package3

Listing Installed Packages

See what packages are currently installed:

pip list

For a more detailed view including dependencies:

pip list --format=columns

Showing Package Information

Get detailed information about a package:

pip show package_name

This command displays information like:

Searching for Packages

Search for packages on PyPI:

pip search search_term

Note: The pip search command has been disabled due to performance issues with PyPI. Instead, you can search for packages directly on the PyPI website.

Uninstalling Packages

Remove a package:

pip uninstall package_name

Upgrading Packages

Update a package to the latest version:

pip install --upgrade package_name

Upgrade pip itself:

pip install --upgrade pip

Managing Dependencies with Requirements Files

For projects with multiple dependencies, manually installing each package becomes cumbersome. Requirements files solve this by listing all dependencies in a single file.

Creating a requirements.txt File

A requirements.txt file is simply a text file listing packages, one per line. For example:

# requirements.txt
requests==2.28.1
flask==2.2.2
sqlalchemy>=1.4.0,<2.0.0
pillow

You can also generate a requirements file from your current environment:

pip freeze > requirements.txt

Note: pip freeze outputs all installed packages, including dependencies. This is useful for replicating environments exactly, but might include more packages than your project directly needs.

Installing from a Requirements File

Install all packages listed in a requirements file:

pip install -r requirements.txt

Real-World Analogy: If pip is your supplier, then a requirements file is like a shopping list. Instead of ordering items one by one over the phone, you simply send your complete list, and the supplier delivers everything you need in one go.

Best Practices for requirements.txt

Example of a Well-Structured Requirements File

# Core dependencies
flask==2.2.2         # Web framework
sqlalchemy==1.4.46   # Database ORM
pyjwt==2.6.0         # JWT handling for authentication

# API integrations
requests==2.28.1     # HTTP client
stripe==5.0.0        # Payment processing

# Development only
pytest==7.2.0        # Testing framework
black==22.12.0       # Code formatting
flake8==6.0.0        # Linting

Using pip with Virtual Environments

Virtual environments provide isolated Python environments for your projects, ensuring dependency conflicts don't occur between different projects. Let's see how pip works with virtual environments.

Creating a Virtual Environment

Using the built-in venv module (Python 3.3+):

# Create a virtual environment
python -m venv myenv

# Activate it on Windows
myenv\Scripts\activate

# Activate it on macOS/Linux
source myenv/bin/activate

With the virtual environment activated, any packages you install with pip will be installed only in this environment, not globally.

Installing Packages in a Virtual Environment

Once your virtual environment is activated, you use pip normally:

# Your prompt should show the active environment
(myenv) $ pip install requests flask

# Verify the installations
(myenv) $ pip list

Creating a requirements.txt from a Virtual Environment

Capture the state of your virtual environment for recreation later:

(myenv) $ pip freeze > requirements.txt

Recreating an Environment

To recreate the environment on another machine or after deletion:

# Create a fresh virtual environment
python -m venv new_env

# Activate it
source new_env/bin/activate  # or new_env\Scripts\activate on Windows

# Install the same packages
(new_env) $ pip install -r requirements.txt

Real-World Analogy: A virtual environment is like a separate workshop for each project. Each workshop has its own tools (packages) that don't interfere with tools in other workshops. This means you can have one project using Flask 1.0 and another using Flask 2.0 without any conflicts.

Advanced pip Features

Installing from Various Sources

pip can install packages from more than just PyPI:

From a Git repository:

pip install git+https://github.com/user/repository.git

From a specific branch or commit:

pip install git+https://github.com/user/repository.git@branch_name
pip install git+https://github.com/user/repository.git@commit_hash

From a local directory (in development mode):

pip install -e /path/to/directory

From a .tar.gz or .whl file:

pip install /path/to/package.tar.gz
pip install /path/to/package.whl

Using Alternative Package Indexes

Use a different package index instead of PyPI:

pip install --index-url https://alternative-pypi.org/simple/ package_name

Add an extra index while keeping PyPI:

pip install --extra-index-url https://alternative-pypi.org/simple/ package_name

Downloading Without Installing

Download a package without installing it:

pip download package_name -d /path/to/download/directory

Installing in User Mode

Install a package in the user's home directory without requiring admin privileges:

pip install --user package_name

Viewing the Dependency Tree

See the dependency tree for an installed package:

pip show --files package_name

For a more detailed dependency tree, you can use the pipdeptree package:

pip install pipdeptree
pipdeptree

Understanding Dependency Resolution

One of pip's most important features is dependency resolution—automatically figuring out what other packages are needed when you install something.

How Dependency Resolution Works

  1. When you pip install package_a, pip first checks what dependencies package_a requires
  2. It then checks if those dependencies are already installed
  3. If not, it adds them to the installation queue
  4. This process continues recursively for all dependencies
  5. pip tries to find a set of package versions that satisfy all requirements

Dependency Conflicts

Sometimes packages have incompatible requirements. For example:

In this case, pip cannot satisfy both requirements and will show an error. You might need to:

Example of a Dependency Conflict

ERROR: Cannot install package_a and package_b because these package versions have conflicting dependencies.

The conflict is caused by:
    package_a requires package_c>=2.0.0
    package_b requires package_c<2.0.0

To fix this you could try to:
1. loosen the range of package versions you've specified
2. remove package versions to allow pip to attempt to solve the dependency conflict

Real-World Analogy: Dependency resolution is like planning a dinner party where some guests have dietary restrictions. If one guest is allergic to nuts and another only eats dishes with nuts, you have an irreconcilable conflict. Similarly, pip tries to "prepare a meal" (your environment) that satisfies all package "dietary requirements" (dependencies).

Configuring pip

You can customize pip's behavior through configuration files or environment variables.

Configuration Files

pip looks for configuration in multiple locations, in this order:

  1. Site-wide: /etc/pip.conf (Unix) or C:\ProgramData\pip\pip.ini (Windows)
  2. User-specific: ~/.config/pip/pip.conf (Unix) or %APPDATA%\pip\pip.ini (Windows)
  3. Virtualenv-specific: myenv/pip.conf or myenv\pip.ini
  4. Per-command using --config

Example Configuration File

[global]
timeout = 60
index-url = https://pypi.org/simple
trusted-host = pypi.org
              files.pythonhosted.org

[install]
require-virtualenv = true
no-cache-dir = false

[freeze]
timeout = 10

Environment Variables

You can also set configuration via environment variables using the format PIP_<UPPER_CONFIG_NAME>:

# Set default timeout
export PIP_TIMEOUT=60

# Require virtualenv for installations
export PIP_REQUIRE_VIRTUALENV=true

Note: Environment variables take precedence over configuration files.

Useful Configuration Options

Security Best Practices with pip

Using third-party packages introduces security considerations. Here are best practices to minimize risks:

1. Keep pip Updated

Regularly update pip itself to get security fixes:

pip install --upgrade pip

2. Verify Package Sources

Use trusted package sources and verify the integrity of packages:

# Verify a package has been properly signed
pip install --require-hashes -r requirements.txt

3. Use Hash-Checking Mode

In a requirements file, you can specify exact file hashes to ensure you get exactly what you expect:

requests==2.28.1 --hash=sha256:7c5599b102feddaa661c826c56ab4fee28bfd17f5abca1ebbe3e7f19d7c97983
flask==2.2.2 --hash=sha256:642c450d19c4ad482f96729bd2a8f6d32554aa1e231f4f6b4e7e5264b16cca2b

4. Scan for Vulnerabilities

Use tools like safety or pip-audit to scan your dependencies for known vulnerabilities:

pip install safety
safety check

# Or with pip-audit
pip install pip-audit
pip-audit

5. Minimize Dependencies

Each dependency increases your attack surface. Regularly review and remove unnecessary packages.

6. Use Dependency Lockfiles

Tools like pip-tools can generate comprehensive lock files with exact versions and hashes:

pip install pip-tools
pip-compile requirements.in  # Generates requirements.txt with pinned versions
pip-sync                    # Installs exactly what's in requirements.txt

7. Be Cautious with Pre-releases

Avoid pre-release versions in production unless necessary:

# Don't use this in production
pip install --pre package_name

Real-World Analogy: Security with dependencies is like food safety in a restaurant. You need to trust your suppliers (package authors), inspect deliveries (verify hashes), check for recalls (vulnerability scanning), and maintain proper storage (isolation with virtual environments).

Modern Alternatives and Complementary Tools

While pip is the standard package manager for Python, several modern tools enhance or complement its functionality:

1. pipenv

Combines pip, virtual environments, and a lock file mechanism:

pip install pipenv

# Create project with virtual environment
pipenv install

# Add packages
pipenv install requests flask

# Add development packages
pipenv install --dev pytest

# Run commands in the virtual environment
pipenv run python app.py

# Activate the environment shell
pipenv shell

Key Benefits:

2. Poetry

Modern packaging and dependency management:

pip install poetry

# Create a new project
poetry new my_project

# Add dependencies
poetry add requests flask

# Add development dependencies
poetry add --dev pytest

# Install all dependencies
poetry install

# Run commands
poetry run python app.py

# Activate the environment shell
poetry shell

Key Benefits:

3. pip-tools

Lightweight approach to dependency pinning:

pip install pip-tools

# Create a requirements.in file with high-level dependencies
echo "flask\nrequests" > requirements.in

# Compile it to a pinned requirements.txt
pip-compile requirements.in

# Install the pinned dependencies
pip-sync requirements.txt

Key Benefits:

4. conda

Package, dependency, and environment management system, particularly popular in data science:

# Create a new environment
conda create -n myenv python=3.10

# Activate it
conda activate myenv

# Install packages
conda install numpy pandas matplotlib

# Create an environment from a file
conda env create -f environment.yml

Key Benefits:

Comparison of Tools

Feature pip pipenv Poetry pip-tools conda
Virtual Environments No (needs venv) Yes Yes No Yes
Lock Files No Yes Yes Yes Yes (environment.yml)
Dev vs. Prod Dependencies No Yes Yes Limited No
Packaging No No Yes No No
Non-Python Dependencies No No No No Yes

When to Choose What:

Real-World Examples

Example 1: Setting Up a Flask Web Application

# Create and activate a virtual environment
python -m venv flask_app_env
source flask_app_env/bin/activate  # or flask_app_env\Scripts\activate on Windows

# Install Flask and related packages
pip install flask flask-sqlalchemy flask-login flask-wtf

# Freeze the dependencies
pip freeze > requirements.txt

# Create a simple app.py
echo "from flask import Flask
app = Flask(__name__)

@app.route('/')
def hello():
    return 'Hello, World!'

if __name__ == '__main__':
    app.run(debug=True)" > app.py

# Run the application
python app.py

Example 2: Setting Up a Data Science Environment

# Create and activate a virtual environment
python -m venv data_science_env
source data_science_env/bin/activate  # or data_science_env\Scripts\activate on Windows

# Install data science packages
pip install numpy pandas matplotlib scikit-learn jupyter

# Create a requirements file with version constraints
echo "numpy>=1.20.0,<2.0.0
pandas>=1.3.0,<2.0.0
matplotlib>=3.4.0,<4.0.0
scikit-learn>=1.0.0,<2.0.0
jupyter>=1.0.0,<2.0.0" > requirements.txt

# In the future, you can recreate this environment with:
# pip install -r requirements.txt

# Launch Jupyter Notebook
jupyter notebook

Example 3: Managing a Production Web Service with pipenv

# Install pipenv
pip install pipenv

# Initialize a new project
mkdir web_service
cd web_service

# Set up pipenv environment
pipenv install flask gunicorn psycopg2-binary requests

# Add development dependencies
pipenv install --dev pytest pytest-cov black flake8

# Create a simple app.py
echo "from flask import Flask, jsonify
app = Flask(__name__)

@app.route('/api/status')
def status():
    return jsonify({'status': 'ok'})

if __name__ == '__main__':
    app.run()" > app.py

# Create a Procfile for deployment
echo "web: gunicorn app:app" > Procfile

# Run the application with pipenv
pipenv run python app.py

# Run tests
pipenv run pytest

# Format code
pipenv run black .

Example 4: Creating and Publishing a Package with Poetry

# Install poetry
pip install poetry

# Create a new library project
poetry new my_library

# Navigate to the project
cd my_library

# Add dependencies
poetry add requests

# Add development dependencies
poetry add --dev pytest black

# Update the pyproject.toml with metadata
# Edit in your text editor...

# Build the package
poetry build

# Publish to PyPI (you'll need credentials)
poetry publish

Troubleshooting Common pip Issues

1. Permission Errors

Problem: "Permission denied" when installing packages

Solutions:

2. Package Not Found

Problem: "No matching distribution found for package_name"

Solutions:

3. Version Conflicts

Problem: "Cannot install X and Y because these package versions have conflicting dependencies"

Solutions:

4. Installation Fails with Build Errors

Problem: Packages with C extensions fail to build

Solutions:

5. SSL Certificate Errors

Problem: "SSL: CERTIFICATE_VERIFY_FAILED" when downloading packages

Solutions:

6. Cached Wheels Not Updated

Problem: Changes to a package don't appear after reinstalling

Solutions:

Debugging Tips

Exercise: Setting Up a Project with Dependencies

Let's apply what we've learned by setting up a web scraping project with proper dependency management.

Project Requirements:

Step 1: Set Up the Project Structure

# Create project directory
mkdir web_scraper
cd web_scraper

# Create a virtual environment
python -m venv scraper_env

# Activate the environment
source scraper_env/bin/activate  # or scraper_env\Scripts\activate on Windows

Step 2: Install Dependencies

# Install packages for web scraping
pip install requests beautifulsoup4 lxml

# Additional utility packages
pip install pandas tqdm

Step 3: Create a requirements.txt File

# Generate requirements.txt with exact versions
pip freeze > requirements.txt

# Alternatively, create a more flexible requirements.txt
echo "requests>=2.28.0,<3.0.0
beautifulsoup4>=4.10.0,<5.0.0
lxml>=4.9.0,<5.0.0
pandas>=1.4.0,<2.0.0
tqdm>=4.64.0,<5.0.0" > requirements.txt

Step 4: Create a Simple Web Scraper

Create a file named scraper.py with the following content:

import requests
from bs4 import BeautifulSoup
import pandas as pd
from tqdm import tqdm
import time
import argparse

def scrape_quotes(pages=1):
    """Scrape quotes from quotes.toscrape.com"""
    base_url = "https://quotes.toscrape.com/page/{}/"
    quotes = []
    
    for page in tqdm(range(1, pages + 1), desc="Scraping pages"):
        response = requests.get(base_url.format(page))
        if response.status_code != 200:
            print(f"Failed to fetch page {page}")
            continue
            
        soup = BeautifulSoup(response.text, 'lxml')
        quotes_on_page = soup.select(".quote")
        
        for quote in quotes_on_page:
            text = quote.select_one(".text").get_text()
            author = quote.select_one(".author").get_text()
            tags = [tag.get_text() for tag in quote.select(".tag")]
            
            quotes.append({
                "text": text,
                "author": author,
                "tags": ", ".join(tags)
            })
            
        # Be nice to the server
        time.sleep(0.5)
        
    return quotes

def main():
    parser = argparse.ArgumentParser(description="Scrape quotes from quotes.toscrape.com")
    parser.add_argument("--pages", type=int, default=1, help="Number of pages to scrape")
    parser.add_argument("--output", type=str, default="quotes.csv", help="Output file name")
    args = parser.parse_args()
    
    print(f"Scraping {args.pages} pages from quotes.toscrape.com")
    quotes = scrape_quotes(args.pages)
    
    if quotes:
        df = pd.DataFrame(quotes)
        df.to_csv(args.output, index=False)
        print(f"Scraped {len(quotes)} quotes and saved to {args.output}")
    else:
        print("No quotes found")

if __name__ == "__main__":
    main()

Step 5: Test the Script

# Run the scraper with default settings (1 page)
python scraper.py

# Scrape multiple pages
python scraper.py --pages 3

# Save to a different file
python scraper.py --pages 2 --output famous_quotes.csv

Step 6: Document the Project

Create a README.md file:

# Web Scraper Project

A simple web scraper for quotes.toscrape.com.

## Setup

1. Create a virtual environment:
   ```
   python -m venv scraper_env
   source scraper_env/bin/activate  # or scraper_env\Scripts\activate on Windows
   ```

2. Install dependencies:
   ```
   pip install -r requirements.txt
   ```

## Usage

Run the scraper:
```
python scraper.py --pages 5 --output quotes.csv
```

Arguments:
- `--pages`: Number of pages to scrape (default: 1)
- `--output`: Output CSV file name (default: quotes.csv)

This exercise demonstrates:

Conclusion

Package management is a fundamental skill for Python developers, enabling you to leverage the vast ecosystem of open-source libraries. By mastering pip and understanding how to manage dependencies effectively, you'll be able to:

As you continue your Python journey, the skills you've learned today will become increasingly valuable. Modern software development relies heavily on package management, and these concepts apply across virtually all programming ecosystems.

In future sessions, we'll build on this foundation as we work with more complex projects that integrate multiple packages into cohesive applications.

Remember: "Standing on the shoulders of giants" is the essence of package management. By leveraging the work of others through packages, you can focus on solving your unique problems rather than reinventing solutions to common challenges.

Additional Resources