pytest Tutorial: Testing Python from Zero (Part 13)

Here is a question that separates hobbyists from professionals: how do you know your code works? Until now your answer has been running it and looking, which is honest and does not scale. Change a function in a thousand-line project and looking cannot tell you which of the other forty functions you just broke. Automated tests answer the question permanently: small programs that check your programs, run in seconds, after every change, forever. This lesson teaches pytest, the tool the Python world has standardized on, and the habits that make tests an accelerant rather than a chore.

One reframe before the syntax, because attitude decides whether testing sticks. Tests are not bureaucracy added after the real work; they are how you go faster. With a test suite, you refactor fearlessly, upgrade dependencies casually, and accept your own six-month-old code without re-reading it, because the suite vouches for it. Every lesson in this course has quietly prepared you: pure functions from Part 4 that map inputs to outputs are precisely the code tests love most. Today the preparation pays off.

★

What you will learn in Part 13

assert: the one statement at the heart of all testing
pytest conventions: files, functions, and zero boilerplate
Reading failure output, which is where pytest earns its fame
Covering edge cases with parametrize
Testing exceptions with raises, and files with tmp_path
The red-green-refactor rhythm professionals actually use

✎

Note

Before you start

You need functions from Part 4 and exceptions from Part 7. pytest itself cannot run inside the browser playground, so this lesson pairs the real terminal workflow with a playground that mimics a test runner, and every example transfers verbatim to your machine.

1. assert: the atom of testing

Python has a statement built for claims: assert takes an expression, does nothing when it is truthy, and raises AssertionError when it is not. Every test you will ever write reduces to asserts: call your function with known input, assert the output equals what you expect. That is the entire theory of unit testing, one sentence long. Everything pytest adds is organization, discovery, and spectacular failure reporting around this one statement.

def grade_for(score: int) -> str:
    if score >= 75:
        return "A"
    elif score >= 65:
        return "B"
    elif score >= 50:
        return "C"
    return "F"

assert grade_for(80) == "A"
assert grade_for(75) == "A"      # boundary: exactly 75
assert grade_for(74) == "B"      # boundary: just below
assert grade_for(0) == "F"
print("all claims hold")

Look at which inputs those asserts chose: not random scores but the boundaries, exactly 75, just below 75, and the extreme. Bugs live at edges, the off-by-one thresholds, the empty list, the zero, the None, and choosing edge-seeking inputs is the actual skill of testing. The mechanics you can learn in an afternoon; the instinct for where code breaks grows with every bug you meet, and tests are how you bottle each one so it never bites twice.

2. pytest: conventions instead of ceremony

pytest, installed with pip install pytest, turns asserts into a managed suite through three conventions: test files are named test_something.py, test functions start with test_, and the body is plain asserts, no classes, no registration, no boilerplate. Run pytest in your project folder and it discovers every test, runs them all, and reports. A green dot per pass, a rich report per failure, and the report is the reason pytest won: it re-evaluates the failing expression and shows the actual values on each side, so most failures are diagnosed before your coffee cools.

# test_grades.py
from grades import grade_for

def test_top_grade():
    assert grade_for(80) == "A"

def test_boundary_75_is_a():
    assert grade_for(75) == "A"

def test_failing_score():
    assert grade_for(20) == "F"

$ pytest
=========== test session starts ===========
collected 3 items

test_grades.py ..F                    [100%]

================ FAILURES =================
______________ test_failing_score _________
    def test_failing_score():
>       assert grade_for(20) == "F"
E       AssertionError: assert 'S' == 'F'
E         (diff shown by pytest)
========= 1 failed, 2 passed in 0.04s =====

Read that failure block once and you know everything: which test, which line, and the crucial fact that grade_for(20) returned "S" when the test expected "F", meaning someone added an S grade to the ladder and the test caught the behavior change instantly. This is the contract suites enforce: behavior changes loudly instead of silently, and the team decides whether the code or the expectation is wrong. Either answer is fine; not knowing is what suites abolish.

Checkpoint

What makes a function easy to test?

3. parametrize: one test, many cases

The grade ladder needs eight or nine boundary checks, and copying a test function nine times is exactly the duplication Part 4 taught you to refuse. pytest's parametrize decorator runs one test body across a table of cases, each reported individually, so a single failure names its exact input. The test reads like a specification of the function, which is the quiet ideal of this whole discipline: tests as the executable documentation of what the code promises.

import pytest
from grades import grade_for

@pytest.mark.parametrize("score, expected", [
    (100, "A"), (75, "A"),          # top band and its floor
    (74, "B"), (65, "B"),           # next band, both edges
    (64, "C"), (50, "C"),
    (49, "F"), (0, "F"),
])
def test_grade_bands(score, expected):
    assert grade_for(score) == expected

Two more tools complete the everyday kit. When a function should raise, asserting the happy path is not enough; pytest.raises asserts the failure contract from Part 7: with pytest.raises(ValueError): withdraw(100, -5). And when code touches files, the tmp_path fixture hands your test a fresh, automatically cleaned temporary folder, so the file-processing skills of Part 7 become testable without littering your disk. Fixtures go far deeper, shared setup, databases, fake servers, but raises and tmp_path cover a beginner's first year honestly.

import pytest
from billing import withdraw, load_scores

def test_negative_amount_rejected():
    with pytest.raises(ValueError):
        withdraw(balance=100, amount=-5)

def test_load_scores_skips_bad_lines(tmp_path):
    f = tmp_path / "grades.txt"
    f.write_text("Amina,87\nbroken\nZane,91\n")
    scores = load_scores(f)
    assert scores == [("Amina", 87), ("Zane", 91)]

The word fixture deserves a proper definition since you just used one. A fixture is a named piece of setup that pytest builds and injects when a test asks for it by parameter name, exactly how tmp_path appeared without an import. You can define your own with the @pytest.fixture decorator, a sample inventory, a populated gradebook, a parsed config, and any test that names it receives a fresh copy, which is how suites share setup without sharing state. File the full machinery under "learn when needed"; recognizing the injection pattern is what today requires.

As suites grow, a little organization keeps them pleasant. The convention is a tests folder beside your code, one test file per module, test_parsing.py exercising parsing.py, so the suite mirrors the codebase and nobody hunts for anything. pytest's -k flag runs a subset by name match, pytest -k slug while you work on slugify, and -x stops at the first failure when you want to fix one thing at a time. Later, a coverage tool will show which lines no test touches; treat the number as a flashlight for forgotten corners, not a score to maximize, because a meaningless assert can inflate coverage without protecting anything.

4. The rhythm: red, green, refactor

Tools in hand, here is how professionals actually move, a loop with three beats. Red: write a small failing test that pins down the next bit of behavior you want. Green: write the simplest code that passes it. Refactor: clean up with the suite standing guard. The discipline of starting red matters more than it looks, because a test you have never seen fail proves nothing; seeing it fail first proves it can. Walk one full cycle below, the same walkthrough rhythm as the package stepper in Part 8.

Guided walkthrough

One red-green-refactor cycle, beat by beat

Red Specify the next behavior with a failing test

We want slugify() to turn titles into URL slugs. No code exists yet, so this fails immediately, and that failure is the specification.

def test_slugify_basic():
    assert slugify("Learn Python!") == "learn-python"

$ pytest -q
F  NameError: name 'slugify' is not defined

Green Write the simplest passing code

Lowercase, strip non-word characters with Part 11 regex, join with hyphens. No cleverness; passing is the only goal of this beat.

import re

def slugify(title: str) -> str:
    words = re.findall(r"[a-z0-9]+", title.lower())
    return "-".join(words)

$ pytest -q
.  1 passed

Red Pin the next edge case

What about extra spaces and unicode dashes? Add the case to a parametrize table and watch it fail or pass honestly.

@pytest.mark.parametrize("title, slug", [
    ("Learn Python!", "learn-python"),
    ("  Hello   World  ", "hello-world"),
    ("A/B Testing 101", "a-b-testing-101"),
])
def test_slugify(title, slug):
    assert slugify(title) == slug

Green Adjust until the table passes

The regex approach already handles these. When a case fails instead, you fix the function, not the test, unless the expectation itself was wrong.

$ pytest -q
...  3 passed in 0.03s

Refactor Clean up under green

Rename, extract, simplify, rerun. The suite makes refactoring safe, and safe refactoring keeps the codebase young. This beat is the payoff of the other two.

$ pytest -q   # after every small cleanup
...  3 passed

Checkpoint

Why should you watch a new test fail before making it pass?

5. Practice: a test runner in the playground

pytest itself needs a real terminal, but its essence, discover functions named test_*, run them, report asserts, fits in twenty lines of the Python you now know, reflection included. The playground below contains exactly that: a micro-runner, a function under test with a planted bug, and a small suite. Find the bug by reading the failures, fix it, and go green, the full professional loop in miniature. Then, on your machine, pip install pytest and run the real thing on the slugify example above.

Python playground

def word_stats(text: str) -> dict:
    """Return count and average length of words. One bug hides here."""
    words = text.split()
    total_len = sum(len(w) for w in words)
    return {
        "count": len(words),
        "avg_len": total_len / len(words),   # hint: what if no words?
        "longest": max(words, key=len) if words else "",
    }

# --- a micro test suite ---
def test_basic_counts():
    s = word_stats("learn python by doing")
    assert s["count"] == 4, f"count was {s['count']}"

def test_longest_word():
    assert word_stats("a bb ccc")["longest"] == "ccc"

def test_empty_text():
    s = word_stats("")
    assert s["count"] == 0 and s["avg_len"] == 0

# --- a micro pytest: discover, run, report ---
passed = failed = 0
for name, fn in sorted(globals().items()):
    if name.startswith("test_") and callable(fn):
        try:
            fn()
            print(f"PASS  {name}")
            passed += 1
        except Exception as err:
            print(f"FAIL  {name}: {err!r}")
            failed += 1
print(f"\n{passed} passed, {failed} failed")

# Exercises:
# 1. Fix word_stats so test_empty_text passes (guard the division).
# 2. Add test_punctuation asserting "hi, there!" has longest "there!".
#    Should it? Decide, and make the code match your decision.
# 3. Add a test for a single word, the other classic edge.

Exercise 2 smuggles in the deepest idea of the lesson: the test forced a design decision, whether punctuation belongs to words, and that is normal. Tests interrogate your intentions while the code is still soft. Teams discover most of their specification exactly this way, one awkward test case at a time, long before users do it for them in production.

! Common mistakes to avoid

✕Testing only the happy path with friendly inputs.

✓Bugs live at edges: empty input, zero, one item, boundaries, malformed data. One edge test is worth five happy ones; the parametrize table makes edges cheap.
✕Writing tests that depend on each other or on run order.

✓Each test builds its own world (tmp_path, fresh objects) and asserts independently. Order-dependent suites rot fast and fail mysteriously.
✕Asserting giant blobs, like a whole report string.

✓Assert the meaningful facts: counts, totals, presence of key lines. Blob asserts break on every cosmetic change and teach the team to ignore failures.
✕Skipping the failing-first step and trusting green.

✓Deliberately break the code once and confirm the test goes red. Ten seconds of paranoia validates the entire safety net.

? Frequently asked questions

How much should a beginner test? +

Every function with logic worth keeping: parsing, calculation, decisions. Skip trivial glue. The honest minimum that changes your life is a handful of edge-case tests on the functions you fear changing.

What is the unittest module I see in older tutorials? +

The standard library framework that preceded pytest, class-based and more ceremonial. pytest runs unittest suites fine, and effectively all new Python projects choose pytest; this course does too.

Should I write tests before or after the code? +

Strict test-first is one school; the honest industry answer is both, fluidly. What is non-negotiable is that the test exists, that you saw it fail at least once, and that it runs on every change.

How do tests run automatically on every change? +

Continuous integration: services run your suite on every push and block merges on red. When you reach our advanced series, the testing lesson there wires pytest into exactly that pipeline for a real web API.

6. Recap and what comes next

You now hold the professional's safety net: assert as the atom, pytest's conventions and luminous failure reports, parametrize for edge tables, raises for failure contracts, tmp_path for file work, and the red-green-refactor rhythm that turns all of it into a working style. Combined with type hints from Part 10, your code now declares its contracts and proves them, which is, in one sentence, what professional Python means. When you later test web services, the FastAPI testing lesson in our production series extends today directly.

And with that, the foundation track is complete. The final three lessons are the destination the whole syllabus has been walking toward: Part 14, machine learning foundations with scikit-learn, where the data skills of Parts 5, 9, and 11 train their first real model. The Testing lesson in the Learn Python app below mirrors today's material, and the full syllabus is on the series hub.

💡

Pro tip

Adopt the bug-to-test reflex today: every time you fix any bug, write the test that would have caught it, then watch it pass. Your suite becomes a museum of every mistake you have ever made, which is precisely why you will never make any of them twice.

Practice on the go

Learn Python, the free Android app

Every topic in this series lives in the app too: bite-size lessons, runnable examples, quizzes, mini projects, and an offline Python playground that runs on your phone.

Get it on Google Play View the app page