Python CPU vs I/O Bound: Choose the Right Concurrency (2026)

Python’s concurrency model is fundamentally misunderstood because people think the Global Interpreter Lock (GIL) makes threading useless for performance, but it’s actually the key to its surprising effectiveness for I/O-bound tasks.

Let’s see what I mean. Imagine we have a task that involves fetching data from a bunch of websites. This is a classic I/O-bound problem: our program spends most of its time waiting for network responses, not crunching numbers.

Here’s a synchronous, single-threaded approach:

import requests
import time

def fetch_url(url):
    try:
        response = requests.get(url, timeout=10)
        print(f"Fetched {url} in {response.elapsed.total_seconds():.2f}s")
        return url, response.status_code
    except requests.exceptions.RequestException as e:
        print(f"Error fetching {url}: {e}")
        return url, None

urls = [
    "https://www.google.com",
    "https://www.github.com",
    "https://www.python.org",
    "https://www.stackoverflow.com",
    "https://www.reddit.com",
]

start_time = time.time()
results = [fetch_url(url) for url in urls]
end_time = time.time()

print(f"\nSynchronous execution took {end_time - start_time:.2f} seconds.")

When you run this, you’ll see each URL being fetched one after another. The requests.get call blocks until it gets a response. If one site is slow, everything grinds to a halt. The total time is roughly the sum of all individual request times.

Now, let’s introduce threading. This is where the GIL often causes confusion. The GIL is a mutex (a lock) that protects access to Python objects, preventing multiple native threads from executing Python bytecode at the exact same time within a single process. For CPU-bound tasks (heavy computation), this means threads won’t give you true parallelism on multi-core processors. But for I/O-bound tasks? It’s a superpower.

import requests
import time
import threading

def fetch_url_threaded(url, results, lock):
    try:
        response = requests.get(url, timeout=10)
        with lock:
            print(f"Fetched {url} in {response.elapsed.total_seconds():.2f}s")
        results.append((url, response.status_code))
    except requests.exceptions.RequestException as e:
        with lock:
            print(f"Error fetching {url}: {e}")
        results.append((url, None))

urls = [
    "https://www.google.com",
    "https://www.github.com",
    "https://www.python.org",
    "https://www.stackoverflow.com",
    "https://www.reddit.com",
]

results = []
threads = []
lock = threading.Lock() # To safely print from multiple threads

start_time = time.time()
for url in urls:
    thread = threading.Thread(target=fetch_url_threaded, args=(url, results, lock))
    threads.append(thread)
    thread.start()

for thread in threads:
    thread.join() # Wait for all threads to complete
end_time = time.time()

print(f"\nThreaded execution took {end_time - start_time:.2f} seconds.")

Run this and compare the times. You’ll see a dramatic improvement. Why? Because when a thread calls requests.get, it’s waiting for the network. During this waiting period, the operating system can switch to another thread. Crucially, while one Python thread is blocked on I/O (waiting for the network, disk, etc.), the GIL is released. This allows other Python threads to run. So, while you don’t get parallel Python execution, you get concurrent waiting, which is exactly what you want for I/O-bound work. The total time is closer to the duration of the longest individual request, not the sum of all of them.

This is the core mental model:

CPU-bound: Heavy computation. Threads don’t help much due to the GIL. Use multiprocessing to get true parallelism by running tasks in separate processes, each with its own Python interpreter and GIL.
I/O-bound: Waiting for external resources (network, disk, databases). Threads are excellent here because the GIL is released during I/O waits, allowing other threads to make progress.

The real levers you control are:

The task type: Is it compute-heavy or waiting-heavy?
The concurrency primitive: threading for I/O-bound, multiprocessing for CPU-bound.
The asyncio library: For highly scalable I/O-bound applications, asyncio offers an event-driven, single-threaded approach that can often outperform traditional threading by managing many concurrent I/O operations within a single thread using an event loop. It’s cooperative multitasking rather than preemptive multitasking.

The surprising part is how requests (or any blocking I/O library) works seamlessly with threading. You don’t need special "async" versions of requests to benefit from threading for I/O. When requests.get makes the system call to the operating system to initiate the network request, it yields control to the OS. The OS handles the waiting, and Python’s threading mechanism allows another thread to run while the first thread is waiting for the OS. The GIL is irrelevant during this I/O wait.

The next step is understanding how to manage many concurrent I/O operations efficiently, which leads to asyncio.