Python Generators and Iterators Explained (Part 9)

This is the lesson where Python's curtain comes up. You have looped over lists, strings, files, dictionaries, ranges, and pathlib trees, and the same three-word syntax handled all of them. That is not coincidence; it is a protocol, a small contract any object can sign, and today you learn to read it and then to sign it yourself with generators. The payoff is concrete: after this lesson you can process files larger than your memory, build data pipelines that do no work until asked, and understand why range(1_000_000_000) costs nothing.

If earlier parts taught you more syntax than ideas, this one is the reverse: two ideas, iteration as a protocol and laziness as a strategy, carried by very little new syntax. Take it slowly and run everything. Learners who internalize generators report the same experience: a week later, half the standard library suddenly makes sense in a new way, because half the standard library is built on exactly this.

★

What you will learn in Part 9

The iteration protocol: what for loops actually do
iter and next, and why files, dicts, and ranges all loop
Generator functions: pausing and resuming with yield
Generator expressions, the lazy sibling of comprehensions
Pipelines: chaining generators for constant-memory data flow
A working glimpse of itertools, the iterator toolbox

✎

Note

Before you start

You need comprehensions from Part 5, functions from Part 4, and the file loop from Part 7, because this lesson explains the machinery underneath all three.

1. What a for loop really does

Here is the secret in one paragraph. When you write for item in thing, Python calls iter(thing) to get an iterator, an object whose only job is to remember position and produce the next value. Then, each pass of the loop, Python calls next(iterator) and binds the result to your name. When the iterator is exhausted, it raises a special exception, StopIteration, which the for loop catches silently and uses as its signal to end. That is everything: a getter, a stepper, and a stop signal. You can drive the machinery by hand, and you should, once, right now.

colors = ["red", "green", "blue"]

it = iter(colors)         # ask the list for an iterator
print(next(it))           # red
print(next(it))           # green
print(next(it))           # blue
# next(it) now raises StopIteration: the loop's secret stop sign

# So this loop...
for c in colors:
    print(c)
# ...is exactly: it = iter(colors), then next(it) until StopIteration.

Every loopable object you have met, lists, strings, tuples, sets, dicts, files, ranges, zip and enumerate results, pathlib globs, signs this same contract, which is why one for syntax serves them all. The contract also explains behaviors that previously looked like quirks: an iterator can be consumed only once, which is why a zip object yields its pairs a single time and then sits empty, and why a file object continues from where the last loop stopped. Iterators are position plus a promise, nothing more.

One distinction keeps learners honest here: an iterable is anything you can ask for an iterator, a list, a string, a dict; an iterator is the position-keeping object you get back. A list is iterable but is not itself an iterator, which is why you can loop over the same list twice, each loop asks for a fresh iterator, while a zip object, which is its own iterator, runs dry after one pass. When something mysteriously "loops empty the second time", you are holding an iterator; when it loops fresh every time, you are holding an iterable that mints new ones. That single sentence resolves a remarkable number of confused afternoons.

2. Generators: writing your own iterators with yield

Writing the protocol by hand as a class is verbose, so Python provides the shortcut that made the feature famous: any function containing the keyword yield becomes a generator function. Calling it runs none of its body; it returns a generator object, an iterator, immediately. Each next() runs the body until the next yield, hands out that value, and freezes the function mid-line, locals intact, until asked again. It is a function that can pause, and once you see one trace, you have it forever.

def countdown(n):
    print("starting!")          # runs on FIRST next(), not at call time
    while n > 0:
        yield n                 # hand out a value, freeze right here
        n -= 1
    print("done")

gen = countdown(3)              # nothing printed yet: just a generator
print(next(gen))                # starting!  then 3
print(next(gen))                # 2
print(next(gen))                # 1

for n in countdown(2):          # for drives it like any iterator
    print("got", n)

Read the output order again, because it carries the whole concept: the body did not run at call time, the first next() executed down to the first yield, and each later next() resumed exactly where the function froze. The function's state, here just n, persists between values without any class, list, or global. A generator is the accumulator pattern turned inside out: instead of building the whole result and returning it, you emit one piece at a time, on demand.

Checkpoint

What does calling a generator function do?

3. Why lazy wins: memory and infinity

The practical case for generators is memory. A list of a hundred million numbers occupies gigabytes; a generator that yields them occupies a few dozen bytes, because it stores how to make the next value, not the values themselves. This is precisely how the file loop from Part 7 reads ten-gigabyte logs serenely, and how range counts to a billion for free. Laziness also unlocks something lists cannot represent at all: infinite sequences. A generator may yield forever, and the consumer simply stops asking when it has enough.

def fibonacci():
    a, b = 0, 1
    while True:                 # infinite on purpose
        yield a
        a, b = b, a + b

fib = fibonacci()
firsts = [next(fib) for _ in range(10)]
print(firsts)                   # [0, 1, 1, 2, 3, 5, 8, 13, 21, 34]

# The consumer decides when to stop, not the producer:
for n in fibonacci():
    if n > 1000:
        print("first past 1000:", n)   # 1597
        break

Generator expressions bring laziness to the comprehension syntax you already love: replace the square brackets with parentheses and nothing is computed until consumed. Feeding one directly into a function like sum is the idiom to learn, no intermediate list, constant memory, and you have actually used it already; the playground in Part 7 contained sum(s for _, s in good) and you read it without flinching. When you want the values once, generate; when you need them stored, listify.

squares_list = [n * n for n in range(1_000_000)]   # builds a million ints
squares_gen  = (n * n for n in range(1_000_000))   # builds a recipe

print(sum(squares_gen))            # consumes lazily, constant memory
print(sum(n * n for n in range(1_000_000)))   # same, inline, idiomatic

Laziness also explains a family of built-ins you have been using without naming the pattern: sum, max, min, any, all, and sorted accept any iterable, which means they happily drink straight from a generator. any(score > 90 for score in scores) stops at the first hit, short-circuiting exactly like the and/or of Part 2, and all() stops at the first miss. The combination, a generator expression inside one of these consumers, is among the most idiomatic lines in the language: it states a question over a stream of data and pays for only as much of the stream as the answer requires.

4. Pipelines: generators feeding generators

The professional payoff of this lesson is the pipeline: a chain of small generators, each consuming the previous one, each lazy, so a single line of data flows through every stage from source to sink without anything being stored whole. This is the architecture of log processing, ETL jobs, and data cleaning everywhere, and it composes from pieces you can test alone, the Part 4 design lesson wearing its work clothes. Watch one record travel the entire chain before the next record even leaves the source.

def read_lines(text):
    for line in text.splitlines():
        yield line

def strip_blanks(lines):
    for line in lines:
        if line.strip():
            yield line.strip()

def parse_scores(lines):
    for line in lines:
        name, score = line.split(",")
        yield name, int(score)

raw = "Amina,87\n\nZane,91\nRuwan,78\n\n"
pipeline = parse_scores(strip_blanks(read_lines(raw)))

for name, score in pipeline:       # one record flows through all stages
    print(f"{name}: {score}")

The standard library ships a toolbox for exactly this style, itertools: islice takes the first n items of any iterator, chain glues iterators together, takewhile and dropwhile cut streams on conditions, and count is an infinite range. You do not need them all today; you need to know the box exists, and to feel zero surprise when you meet them in real codebases, because every one of them is just the protocol from section 1 wearing a different hat.

Checkpoint

You need to total the prices of ten million order records read from a file. The memory-safe idiom is:

If you want to feel the memory claim rather than take it on faith, the standard library will oblige: sys.getsizeof reports an object's own size in bytes, and comparing a million-element list against the generator that would produce the same values is a one-line experiment with a four-orders-of-magnitude punchline. It is also a fair moment for honesty about limits: laziness defers work, it does not delete it, and a pipeline that ultimately consumes everything still pays full computational price. What you save is memory and time-to-first-result, which for interactive tools and large files is precisely what matters.

5. Practice: a log analysis pipeline

The playground below assembles everything into a miniature log analyzer: a generator that produces log lines, a filter stage, a parser stage, and a consumer that aggregates with Counter from Part 8, all lazy end to end. The exercises ask you to add stages, the way real pipelines grow. Note how testable each stage is: feed any stage a small list and inspect the output, no files, no setup, a fact Part 13 will exploit gleefully.

Python playground

! Common mistakes to avoid

✕Looping over a generator twice and finding the second loop silent.

✓Iterators are one-shot: once exhausted, they stay empty. Recreate the generator, or store the values in a list when you genuinely need multiple passes.
✕Expecting print(my_generator) to show the values.

✓You will see <generator object ...>, a recipe, not a meal. Wrap it in list() to inspect during debugging, but remember that consumes it.
✕Putting a return with a value in a generator and expecting it to be yielded.

✓return in a generator just stops it (the value hides inside StopIteration). Values come out only through yield.
✕Using a generator where you need len(), indexing, or sorting.

✓Generators have no length and no positions; they only know "next". The moment you need random access or size, materialize with list() and accept the memory cost knowingly.

? Frequently asked questions

When should I choose a generator over a list? +

Generate when data is large, infinite, or passed straight to a consumer like sum, max, or a file writer. Store a list when you need length, indexing, multiple passes, or sorting. The brackets you type are a memory decision now, not just punctuation.

Are generators faster than lists? +

They start producing instantly and use almost no memory, which often makes programs faster overall, but per-item overhead is similar. The honest claim is not speed, it is scalability: the same code handles ten rows or ten billion.

What is yield from? +

Delegation: yield from sub_generator() yields everything the inner generator produces, useful when composing pipelines out of pipelines. File it as recognition knowledge for reading library code.

Do generators relate to async code? +

Deeply: Python's async/await machinery grew directly out of generators' ability to pause and resume. Part 12 introduces asyncio, and the freeze-and-resume intuition you built today is most of the mental model you will need.

6. Recap and what comes next

You now know what every for loop in your career has actually been doing: iter, next, and StopIteration. You can write generator functions that pause at yield, choose between brackets and parentheses as a deliberate memory decision, chain generators into pipelines that process unlimited data in constant space, and recognize itertools as the toolbox for exactly this style. Files, ranges, zip, and pathlib globs have all quietly become the same thing to you, and that compression of concepts is what learning a language deeply feels like.

Next the course turns to communication: Part 10, type hints, where you learn to write down what your functions accept and return, catching bugs before running anything and making editors dramatically smarter. The Iterators and Generators lesson in the Learn Python app below includes trace-the-output quizzes that pair perfectly with today's material, and the full syllabus is on the series hub.

💡

Pro tip

When a generator confuses you, drive it manually in the REPL: g = my_gen(), then next(g), next(g), next(g), watching every print and value. Thirty seconds of manual stepping replaces any amount of theorizing about where it pauses.

Practice on the go

Learn Python, the free Android app

Every topic in this series lives in the app too: bite-size lessons, runnable examples, quizzes, mini projects, and an offline Python playground that runs on your phone.

Get it on Google Play View the app page