The most surprising thing about pip in GitHub Actions is that its default caching behavior often actively hinders efficient dependency installation, leading to slower builds and wasted compute.

Let’s see this in action. Imagine a simple Python project with a requirements.txt file. We want to install these dependencies in a GitHub Actions workflow, and we want it to be fast.

name: Python Package

on: [push]

jobs:
  build:
    runs-on: ubuntu-latest
    steps:
    - uses: actions/checkout@v4
    - name: Set up Python 3.10
      uses: actions/setup-python@v5
      with:
        python-version: "3.10"
    - name: Install dependencies
      run: |
        python -m pip install --upgrade pip
        pip install -r requirements.txt

This looks straightforward. We check out the code, set up Python, and then install dependencies. Now, let’s add caching to speed things up. A common approach is to cache the pip download cache.

name: Python Package with Cache

on: [push]

jobs:
  build:
    runs-on: ubuntu-latest
    steps:
    - uses: actions/checkout@v4
    - name: Set up Python 3.10
      uses: actions/setup-python@v5
      with:
        python-version: "3.10"
    - name: Cache pip downloads
      uses: actions/cache@v3
      with:
        path: ~/.cache/pip

        key: ${{ runner.os }}-pip-${{ hashFiles('**/requirements.txt') }}

        restore-keys: |

          ${{ runner.os }}-pip-

    - name: Install dependencies
      run: |
        python -m pip install --upgrade pip
        pip install -r requirements.txt

Here, we’re using actions/cache to store the contents of ~/.cache/pip. The key is designed to be specific to the operating system and the hash of our requirements.txt. This should make subsequent runs faster if our dependencies haven’t changed.

But here’s the catch: pip’s cache isn’t just about downloaded wheels. It also contains metadata, index information, and potentially pre-compiled artifacts. When pip restores this cache, it might not always fully validate or update the index information it uses. This can lead to pip thinking it has a package available locally or in its index, when in reality, the version it’s looking for has been updated on PyPI, or its own metadata is stale.

The problem is that pip’s default behavior when it thinks it has a package cached or knows about it from a stale index is to not re-fetch it from the remote index. This means even if requirements.txt has changed, or if there’s a newer version available that satisfies the version specifier, pip might stick with an older, locally cached version or information derived from a stale index. This is especially problematic if your requirements.txt uses version ranges (e.g., django>=3.0,<4.0) where pip needs to consult the latest index to pick the best available version within that range.

The real power comes from understanding how pip resolves dependencies and how to nudge it towards always consulting the most up-to-date information. The actions/cache action is excellent for storing downloaded wheels, which are the actual .whl files. However, the metadata that pip uses to find those wheels or decide which wheels to download is often what causes issues when cached.

To truly optimize, we need to ensure pip is always refreshing its understanding of available packages. The most direct way to do this is by telling pip to ignore its existing cache for dependency resolution and instead fetch fresh data from PyPI.

Consider this refined approach:

name: Python Package Optimized Cache

on: [push]

jobs:
  build:
    runs-on: ubuntu-latest
    steps:
    - uses: actions/checkout@v4
    - name: Set up Python 3.10
      uses: actions/setup-python@v5
      with:
        python-version: "3.10"
    - name: Cache pip downloads (wheels only)
      uses: actions/cache@v3
      with:
        path: |
          ~/.cache/pip/wheels
          ~/.cache/pip/http

        key: ${{ runner.os }}-pip-wheels-${{ hashFiles('**/requirements.txt') }}

        restore-keys: |

          ${{ runner.os }}-pip-wheels-

    - name: Install dependencies with fresh index
      run: |
        python -m pip install --upgrade pip
        pip install --no-cache-dir --upgrade --force-reinstall -r requirements.txt

In this version, we’re more selective with caching:

  1. Selective Caching: We only cache ~/.cache/pip/wheels and ~/.cache/pip/http. wheels are the downloaded package files, and http contains cached HTTP responses from package indexes. By isolating these, we reduce the risk of stale metadata being restored.
  2. --no-cache-dir: This flag tells pip not to use its cache for the installation process itself. It will still download wheels, but the metadata resolution happens fresh.
  3. --upgrade --force-reinstall: These flags ensure that even if a package is already installed, pip will attempt to upgrade it to the latest version that satisfies requirements.txt and reinstall it if necessary. This is crucial for ensuring that if requirements.txt has changed or new versions are available, pip will pick them up.

The mechanical advantage here is that --no-cache-dir forces pip to hit the remote index for every package resolution step, ensuring it has the absolute latest information about available versions. --upgrade --force-reinstall then makes sure that it actually uses that latest information to update or reinstall packages. While it might seem counter-intuitive to disable caching when you want speed, in this context, we’re caching the results (the downloaded wheels) but ensuring the decision-making process is always fresh. This prevents pip from getting stuck with outdated metadata that would otherwise lead to incorrect or stale installations, and ultimately, slower or broken builds.

The next hurdle you’ll likely encounter is managing build dependencies versus runtime dependencies, especially with packages that compile C extensions.

Want structured learning?

Take the full Pip course →