LLMs and RAG in Python: An Introduction (Part 16)

Welcome to the final lesson of the track, and to the technology that probably brought many of you here in the first place. Large language models, the engines behind modern AI assistants, are not magic and not a different discipline: they are the neural networks of Part 15 scaled up and trained on text, and using them from Python is easier than most things you have already done in this course. This lesson gives you the working concepts, your first real API call, and a toy retrieval system you can run right now; then it does something deliberately different, because this blog already teaches the advanced material in depth: it hands you the map.

That is worth saying plainly. Where Parts 14 and 15 went deep because the topics deserved fresh ground, this lesson stays intentionally light and then routes you into two full series we have already published: a twelve-part track that builds production LLM applications and an eight-part track on connecting AI agents to your tools. Sixteen lessons brought you from print("Hello, world!") to the doorstep of that material, and today you walk through it with everything you need.

★

What you will learn in Part 16

What an LLM actually is, in the language of Part 15
Tokens, context windows, and why both shape everything
Prompts as the new programming surface, with working patterns
Your first LLM API call from Python, the modern way
What RAG is, with a toy retrieval engine you run in the browser
The complete map of where to go next, series by series

Info

Who this is for

Anyone who completed the track, and especially Parts 14 and 15, whose concepts this lesson reuses constantly. The API examples need a key from an LLM provider; everything else, including the retrieval playground, runs free in the page.

1. What an LLM is, in one honest paragraph

Take the ideas you already own: a neural network with weights learned by gradient descent on a loss. Scale the network to billions of weights, make the training data a vast slice of human text, and make the task deceptively simple, predict the next token, over and over, across that entire corpus. To get good at that one game, the network is forced to internalize grammar, facts, styles, and reasoning patterns, and generating text is just playing the game forward: predict a token, append it, predict the next. Finally, recall the transfer learning insight from Part 15, a pretrained giant adapted cheaply to new tasks; the LLM era is that idea at planetary scale, where the adaptation is often nothing more than the words you send it.

Two practical concepts govern everything you will do with these models. Tokens are the units text is chopped into, roughly three-quarters of an English word each; you pay per token, and models read and write in them. The context window is the model's working memory, the maximum tokens it can consider at once; large today, but finite, and the reason long documents need the retrieval techniques in section 4. When an assistant seems to forget the start of a long conversation, you are watching a context window slide, not a bug.

Checkpoint

At its core, what task is a large language model trained on?

2. Prompts: programming in plain language

You direct an LLM with a prompt, and prompting rewards exactly the skills this course drilled: precision, structure, and decomposed problems. The patterns that earn their keep daily: give the model a role, state the task explicitly, supply the relevant context, constrain the output format, and show an example when format matters. Vague in, vague out; the discipline is the same one that named your variables honestly in Part 1. Assemble a few prompts with the builder below and notice how naturally the structure carries over from everything you have practiced.

Prompt builder

Assemble a well-structured prompt

Role Task Context Output format

Assembled prompt

3. Your first LLM call from Python

Now the moment the whole course has quietly been building toward. Calling a model is ordinary Python: install a package, read an API key from the environment, the os.environ habit from Part 8, send messages, receive text. The example uses the Anthropic SDK and Claude; other providers differ only in naming. Look at how many of your skills appear in fifteen lines: imports, f-strings, dataclass-like structure, environment variables, and the type-hinted functions you would wrap this in for real use.

# pip install anthropic          # and set ANTHROPIC_API_KEY
import os
from anthropic import Anthropic

client = Anthropic()             # reads the key from the environment

def summarize(text: str) -> str:
    response = client.messages.create(
        model="claude-opus-4-8",
        max_tokens=300,
        messages=[{
            "role": "user",
            "content": f"Summarize in two sentences:\n\n{text}",
        }],
    )
    return response.content[0].text

article = open("article.txt").read()
print(summarize(article))

Production reality adds three concerns you are already equipped for: API calls fail sometimes, Part 7's exceptions; they cost money per token, so you measure and cap; and they are slow enough that concurrent calls want Part 12's asyncio. Those three threads are exactly where our advanced series picks up, starting with the full lesson on the request loop, tokens, cost, and retries, and continuing into structured outputs, getting reliable JSON from a model, which turns Part 11's validation discipline loose on AI responses.

4. RAG: giving the model your documents

LLMs know what their training data contained, which means they know nothing about your notes, your company wiki, or anything written yesterday. Retrieval-Augmented Generation is the standard fix, and the idea is disarmingly simple: store your documents, retrieve the few passages most relevant to the user's question, and paste them into the prompt as context before asking. The model answers from your material rather than its memory, citations become possible, and the context window stays manageable because you send only the relevant slice.

Real systems retrieve with embeddings, vectors that place similar meanings near each other, produced by neural networks doing exactly what Part 15 taught. But the architecture is visible with a much humbler retriever, and building one is the perfect capstone exercise: the playground below implements a tiny RAG pipeline over a personal notes collection using bag-of-words similarity, pure Part 5 and Part 11 Python. It is the same shape as the RAG Notes Search mini project in the Learn Python app, and the same skeleton our advanced series upgrades into a production RAG service with chunking, embeddings, and vector search.

Python playground

import math, re
from collections import Counter

NOTES = {
    "venv":    "Create a virtual environment with python -m venv .venv "
               "and activate it before installing packages with pip.",
    "fstring": "F-strings format values inline: f'{name} scored {score}' "
               "and support format specs like {price:,.2f}.",
    "pytest":  "Run pytest in the project folder; it discovers files "
               "named test_*.py and functions starting with test_.",
    "asyncio": "Use asyncio.gather to run many coroutines concurrently "
               "and a Semaphore to limit how many run at once.",
}

def vectorize(text: str) -> Counter:
    return Counter(re.findall(r"[a-z]+", text.lower()))

def similarity(a: Counter, b: Counter) -> float:
    dot = sum(a[w] * b[w] for w in a)
    mag = math.sqrt(sum(v*v for v in a.values())) * \
          math.sqrt(sum(v*v for v in b.values()))
    return dot / mag if mag else 0.0

def retrieve(question: str, k: int = 2) -> list[str]:
    qv = vectorize(question)
    ranked = sorted(NOTES.items(),
                    key=lambda kv: similarity(qv, vectorize(kv[1])),
                    reverse=True)
    return [text for _, text in ranked[:k]]

question = "how do I run my tests?"
context = retrieve(question)
print("retrieved context:")
for c in context:
    print(" -", c)

prompt = (f"Answer using ONLY this context:\n"
          + "\n".join(context)
          + f"\n\nQuestion: {question}")
print("\nprompt we would send to the LLM:\n")
print(prompt)

# Exercises:
# 1. Ask about virtual environments and check the right note wins.
# 2. Add three notes of your own and ask questions against them.
# 3. Return (score, text) pairs and print the scores. Where does
#    bag-of-words retrieval embarrass itself? (Try synonyms.)

Exercise 3 is the honest one: ask about "isolated project dependencies" and bag-of-words misses the venv note entirely, because it matches words, not meaning. That precise failure is what embeddings fix, and you now understand both the problem and the shape of the solution, which is more than many people shipping AI features can claim. Beyond RAG lies the next rung, models that take actions through tools you define, taught in the agent lesson of our production series and, for the standard protocol connecting agents to real systems, the entire Build an MCP Server in Python series.

Checkpoint

What problem does RAG solve?

5. The map: where to go from here

This is the routing section, and it is the most important part of the lesson, because the single biggest predictor of whether learning sticks is whether you keep building. Three roads leave this trailhead, all on this blog, all hands-on in the style you are now used to. Choose by appetite, not by obligation.

Your three roads from here
If you want to...	Go to	You will build
Ship real AI-powered web applications	From Python to Production LLM Apps (12 parts)	FastAPI services, reliable JSON from models, streaming, RAG with real embeddings, and an agent API
Connect AI agents to your own tools and data	Build an MCP Server in Python (8 parts)	A tested, secured, deployed MCP server that Claude and other agents can actually call
Cement fundamentals with pocket-size practice	The Learn Python app (below)	All 24 topics of this track as lessons, quizzes, and 19 mini projects with an offline playground

The natural sequence for most readers: start the production LLM track at part one, where the modern tooling lesson upgrades your Part 8 environment skills with uv and friends, then the async deep dive grows today's Part 12 foundations into production concurrency. The MCP series fits beautifully after, or alongside if agents are your itch. And whatever road you take, the app in your pocket keeps the fundamentals warm with five-minute sessions; spaced repetition is unreasonably effective, and you have a syllabus-shaped app for exactly that.

! Common mistakes to avoid

✕Pasting API keys into your code and committing them.

✓Keys live in environment variables or a .env file that never enters version control. One leaked key is one unpleasant bill; os.environ from Part 8 is the habit.
✕Trusting model output the way you trust a function's return value.

✓Models fabricate fluently. Validate structured outputs like Part 11 taught, ground factual answers with RAG, and keep a human in the loop where mistakes cost.
✕Sending entire documents when a question concerns one paragraph.

✓Tokens are money and context windows are finite. Retrieval first, generation second; that is the entire RAG insight in operational form.
✕Stopping here because the track ended.

✓Knowledge unexercised evaporates in weeks. Pick the next series today, schedule the first lesson, and keep the app within thumb's reach for the gaps between.

? Frequently asked questions

Do I need to understand transformers to build with LLMs? +

No more than you needed compiler theory to learn Python. The mental model from this lesson, next-token prediction, tokens, context, prompts, plus the engineering discipline of this course is what building requires. Architecture internals are a fascinating elective, not a prerequisite.

Are LLM APIs expensive to learn on? +

Experimenting costs little: individual calls are fractions of a cent at small scale, and the request-loop lesson in the advanced track teaches measuring and capping from day one. The free playground exercises here cost nothing at all.

Can I run models locally instead of using APIs? +

Yes; open-weight models run on consumer hardware through tools like Ollama, with the same prompt-in, text-out shape. The concepts in this lesson transfer directly; start with APIs for simplicity, go local when privacy or cost says so.

Sixteen parts later, am I actually a Python programmer now? +

You write functions with hints and tests, wield the standard library, process real data, understand concurrency, and trained models this week. That is not aspiration, that is a skill set. The remaining ingredient is mileage, and the three roads above are made of it.

6. The view from the summit

Look back for a moment, because the distance deserves it. Part 1 taught you what a variable is. Since then: decisions and loops, functions, the four great data structures, classes, exceptions and files, the standard library, generators, type hints, regex and data formats, concurrency, testing, machine learning, deep learning, and today, the models reshaping the industry, called from Python you can read. Every lesson ran in your browser, every concept got a quiz, and every topic lives in the app on your phone for the long game of retention.

The series ends; the syllabus does not. The series hub keeps all sixteen lessons organized for revisiting, the production track and the MCP track are waiting, and the Learn Python app below carries the whole curriculum in your pocket. Thank you for doing the work, every playground run, every quiz, every exercise, and good luck out there. Now go build something.

💡

Pro tip

Tonight, while it is fresh: get an API key, run the summarize() example on a real article, and then point the RAG playground pattern at five of your own notes. The gap between "finished a course" and "has built with LLMs" is one evening, and crossing it tonight makes everything that follows feel inevitable.

Practice on the go

Learn Python, the free Android app

Every topic in this series lives in the app too: bite-size lessons, runnable examples, quizzes, mini projects, and an offline Python playground that runs on your phone.

Get it on Google Play View the app page