Python Concurrency: Threads, Processes, asyncio (Part 12)

Your programs so far have done one thing at a time, and for eleven lessons that was correct. But consider a script that downloads a hundred web pages: each request spends two seconds waiting for a server, and almost nothing computing. Done one at a time, that is more than three minutes of your program sitting on its hands. Concurrency is the family of techniques for overlapping the waiting, and the same hundred pages finish in a few seconds. This lesson is the plain-spoken, honest introduction: what concurrency is and is not, the three tools Python offers, and exactly when to reach for each.

Concurrency has a reputation for difficulty, and the reputation is earned in general but avoidable for you, because difficulty lives almost entirely in shared mutable state, threads writing to the same data simultaneously. The modern patterns this lesson teaches, executor maps and async tasks, are designed so you rarely touch that stove. The goal today is not mastery of every primitive; it is a correct mental model and two working recipes, which puts you ahead of a surprising share of working programmers.

★

What you will learn in Part 12

I/O-bound versus CPU-bound: the one diagnosis that decides everything
Threads with ThreadPoolExecutor: overlap waiting in four lines
The GIL, explained honestly and without panic
Processes for genuinely parallel computation
asyncio: async, await, and gather for thousands of concurrent waits
How to choose between threads, processes, and async, every time

✎

Note

Before you start

You need functions from Part 4, exceptions from Part 7, and the generator intuition from Part 9, because async functions pause and resume exactly the way generators taught you. This is the most conceptual lesson in the course; read slowly and run everything.

1. The diagnosis: are you waiting or computing?

Every performance problem in this territory starts with one question. An I/O-bound program spends its time waiting for things outside the CPU: network responses, disk reads, database replies, a user. A CPU-bound program spends its time computing: image processing, cryptography, training models, crunching numbers. The distinction decides your tool with almost no exceptions: I/O-bound work wants threads or asyncio, which interleave the waiting; CPU-bound work wants processes, which recruit more processor cores. Misdiagnose, and your concurrency makes nothing faster while adding every cost.

The decision table for this entire topic
Your bottleneck	Tool	Why it works
Waiting on many network calls or files	threads or asyncio	While one task waits, another runs; the waiting overlaps
Heavy computation on multiple cores	multiprocessing	Separate processes sidestep the GIL and use all cores
Thousands of simultaneous connections	asyncio	Tasks are far cheaper than threads at large scale
A handful of blocking calls to speed up	ThreadPoolExecutor	Smallest change to existing code, four lines
Simple script, no real waiting	none	Concurrency adds complexity; earn it with a real bottleneck

2. Threads: overlapping the waiting

A thread is an independent flow of execution inside your program; the operating system switches between threads, and when one blocks on a network read, others proceed. Raw thread management is fiddly, so modern Python wraps it in an executor: create a pool, map your function over your inputs, collect results in order. These four lines are the single most useful concurrency recipe in the language, and they convert the three-minute downloader into a six-second one without restructuring anything.

from concurrent.futures import ThreadPoolExecutor
import time

def fetch(url):                  # stand-in for a real network call
    time.sleep(1)                # the waiting we want to overlap
    return f"{url}: ok"

urls = [f"https://site/{n}" for n in range(8)]

start = time.perf_counter()
with ThreadPoolExecutor(max_workers=8) as pool:
    results = list(pool.map(fetch, urls))
elapsed = time.perf_counter() - start

print(results[0], "...")
print(f"8 one-second fetches in {elapsed:.1f}s")   # ~1.0s, not 8

Now the famous asterisk: the Global Interpreter Lock. CPython, the standard interpreter, allows only one thread to execute Python bytecode at any instant. For I/O-bound work this barely matters, because waiting threads release the lock, which is why the downloader above genuinely speeds up. For CPU-bound work it matters completely: eight threads crunching numbers take turns on one core and finish no faster, sometimes slower. The GIL is not a scandal, it is a design trade-off, and recent Python versions are gradually offering a GIL-free build; but the guidance for you stands regardless, threads for waiting, processes for computing.

Processes are the heavy tool: separate Python interpreters with separate memory, coordinated by the multiprocessing module or, more pleasantly, the same executor API with ProcessPoolExecutor swapped in. Each process has its own GIL, so eight processes genuinely use eight cores. The costs are real, slower startup and data copied between processes rather than shared, which is why you reserve them for actual computation. When Part 15 trains neural networks, the libraries underneath are doing exactly this kind of parallelism in optimized C, which is also the deeper lesson: in data work, you mostly ride parallelism that libraries built for you.

A practical question every executor user faces immediately: how many workers? For I/O-bound thread pools, the answer is governed by the waiting, not your core count; eight, sixteen, or more threads are reasonable when each spends its life blocked on a server, though the polite ceiling is what the remote service can bear. For process pools, more workers than CPU cores buys nothing and costs memory, so os.cpu_count() is the natural default. In both cases the honest method is the one this course keeps teaching: pick a sensible number, measure with perf_counter, adjust once, and stop tuning.

Checkpoint

Your script resizes 10,000 photos and maxes out one CPU core. Which tool actually helps?

3. asyncio: concurrency as a language feature

The third tool moves concurrency into the language itself. An async def function is a coroutine: calling it creates a waiting task rather than running the body, exactly the deferred behavior you learned from generators in Part 9, and that is no coincidence, coroutines grew directly out of generator machinery. Inside a coroutine, await marks the points where the function may pause; while it is paused, the event loop, a scheduler built into asyncio, runs other coroutines. One thread, thousands of tasks, switching only at awaits: that is the whole model.

import asyncio

async def fetch(url: str) -> str:
    await asyncio.sleep(1)          # a polite, non-blocking wait
    return f"{url}: ok"

async def main() -> None:
    urls = [f"https://site/{n}" for n in range(100)]
    results = await asyncio.gather(*(fetch(u) for u in urls))
    print(f"{len(results)} fetches done")

asyncio.run(main())                 # ~1 second for all 100

Read main carefully, because it contains the two idioms that carry async Python. gather takes many coroutines, runs them concurrently, and returns all results in order; the generator expression feeding it is your Part 9 skill again. And asyncio.run is the bridge from the ordinary world: it starts the event loop, runs one coroutine to completion, and cleans up. The rules that follow from the model: await only inside async def, never call blocking functions like time.sleep inside a coroutine, use the async equivalents, and remember that async helps only when there is waiting to overlap; it does nothing for computation.

When should a beginner actually choose asyncio over threads? Scale and ecosystem. At a dozen concurrent waits, the thread pool is simpler and your libraries do not need to cooperate. At hundreds or thousands, or when you build with frameworks that are async-native, asyncio wins decisively: tasks cost almost nothing, and the await points make the switching visible in the code. Modern web backends live there, which is why our advanced track devotes a full lesson to it; the async Python deep dive with httpx picks up exactly where today ends, with real network calls and production patterns.

Checkpoint

What does await actually do inside an async function?

4. Practice: a concurrent download simulator

The playground below runs real asyncio in your browser, simulating a batch downloader with variable delays, a concurrency limit, and per-task error handling, the honest shape of production async code. One browser-specific note that is itself a good lesson: this page already runs inside an event loop, so instead of asyncio.run(main()) we await main() directly, exactly what you would do in a Jupyter notebook, while on your own machine the asyncio.run form from section 3 is correct.

Python playground

import asyncio, random

async def fetch(name: str, sem: asyncio.Semaphore) -> str:
    async with sem:                      # at most 3 in flight at once
        delay = random.uniform(0.1, 0.5)
        await asyncio.sleep(delay)
        if random.random() < 0.2:        # some downloads fail
            raise ConnectionError(f"{name} timed out")
        return f"{name} done in {delay:.2f}s"

async def main() -> None:
    sem = asyncio.Semaphore(3)
    tasks = [fetch(f"file{n:02}", sem) for n in range(8)]
    results = await asyncio.gather(*tasks, return_exceptions=True)
    for r in results:
        if isinstance(r, Exception):
            print("FAILED:", r)
        else:
            print(r)

# In this browser playground (and in Jupyter) an event loop is
# already running, so we await directly. In a normal script use:
#   asyncio.run(main())
await main()

# Exercises:
# 1. Change the Semaphore to 1 and to 8. Watch the total feel.
# 2. Add one retry for failed fetches before reporting failure.
# 3. Time the whole batch with asyncio.get_event_loop().time()
#    before and after, and print the total.

Two production-grade details hide in that toy. The semaphore caps simultaneous work, which real servers require of you; unlimited concurrency is a denial-of-service attack with extra steps. And return_exceptions=True turns gather into a collector of outcomes rather than a single point of failure, the Part 7 philosophy, failures as values to handle, scaled to a hundred tasks. Keep both habits and your first real async program will already look senior.

! Common mistakes to avoid

✕Adding threads to a CPU-bound loop and seeing zero speedup.

✓The GIL serializes Python computation across threads. Diagnose first: waiting wants threads or async, computing wants ProcessPoolExecutor.
✕Calling time.sleep or other blocking functions inside async code.

✓A blocking call freezes the entire event loop and every task on it. Inside async def, use await asyncio.sleep and async libraries; one blocking call undoes everything.
✕Calling an async function like a normal one and wondering why nothing happened.

✓async def returns a coroutine object; nothing runs until it is awaited or passed to asyncio.run/gather. The warning "coroutine was never awaited" means exactly this.
✕Sharing and mutating data structures across threads casually.

✓That is where concurrency horror stories live. Prefer the patterns shown: map inputs to outputs, collect results, mutate nothing shared. If you must share, you need locks, and that is a sign to redesign.

? Frequently asked questions

Is the GIL going away? +

Python now ships an optional free-threaded build where the GIL can be disabled, maturing steadily since 3.13. The ecosystem is migrating gradually. The diagnosis discipline you learned today remains correct either way, and that is why this lesson teaches it first.

Do I need concurrency for my scripts? +

Mostly no, and that is a fine answer. Reach for it when a real bottleneck appears, many slow network calls being the classic. Concurrency is a tool for measured problems, not a default posture.

asyncio or threads for my first concurrent program? +

ThreadPoolExecutor.map for a quick speedup of existing blocking code; asyncio when you start a new program around network calls, or when concurrency counts grow past dozens. Many working programs use both, async at the core, a thread pool for stubborn blocking libraries.

How does this relate to the web frameworks I keep hearing about? +

Directly: modern Python web servers are asyncio event loops, and every request handler is a coroutine. When you continue into our FastAPI series after this track, today's mental model is the foundation everything sits on.

5. Recap and what comes next

You can now diagnose I/O-bound versus CPU-bound and let the diagnosis choose the tool: ThreadPoolExecutor to overlap a handful of waits, ProcessPoolExecutor for real multi-core computation, asyncio with gather, semaphores, and exception collection for concurrency at scale. You understand the GIL as a trade-off rather than a mystery, and you know the cardinal sins, blocking the loop and sharing mutable state, well enough to avoid them by design.

Next is the lesson that quietly changes your standing as a programmer: Part 13, testing with pytest, where you learn to prove your code works and keep it working while you change it. The Concurrency lesson in the Learn Python app below reviews today's decision table in quiz form, and the full syllabus is on the series hub.

💡

Pro tip

Before adding any concurrency, measure: time.perf_counter() around the slow part, and find out whether you are waiting or computing. Ten lines of measurement routinely save a hundred lines of misapplied threads, and profiling before optimizing is the most adult habit in programming.

Practice on the go

Learn Python, the free Android app

Every topic in this series lives in the app too: bite-size lessons, runnable examples, quizzes, mini projects, and an offline Python playground that runs on your phone.

Get it on Google Play View the app page