Machine Learning with scikit-learn for Beginners (Part 14)

Thirteen lessons of Python were the climb; this is the view. Machine learning sounds like a different universe, mathematics, GPUs, research papers, but here is the working truth: a beginner with solid Python, exactly the Python you now have, can train a real model that makes real predictions in under twenty lines, and genuinely understand every one of them. This lesson does precisely that, with scikit-learn, the library that has introduced more people to machine learning than any classroom on earth, and the playground below trains the model live, in your browser.

What makes this lesson different from the average ML quickstart is that we refuse to hand-wave. You will not just call fit and cheer; you will understand what a feature is, why we hide data from our own model, what accuracy does and does not tell you, and why models that memorize fail at the only job that matters. These foundations are exactly the ones the deep learning of Part 15 and the language models of Part 16 stand on, which is why this series gives them a full lesson instead of a paragraph.

★

What you will learn in Part 14

What machine learning actually is: rules learned from examples
Features, labels, and the shape of a learning problem
Classification versus regression, and which your problem is
The train/test split: why models must be graded on unseen data
fit, predict, score: the scikit-learn rhythm used by every model
Overfitting: the disease, the symptoms, and the basic cures

Info

Who this is for

Anyone who finished Parts 1 to 13, especially 5, 9, and 11; no mathematics beyond arithmetic is assumed. If you only skim one section, make it the train/test split: it is the single idea that separates people who do ML from people who fool themselves with it.

1. Programming in reverse

Everything you have written so far follows one shape: you study the problem, you write the rules, the computer applies them to data. The grade ladder of Part 2 is rules first, answers second. Machine learning runs the arrow backwards: you supply examples of inputs with their correct answers, and the algorithm writes the rules, a learned function from input to output that you never spell out. The win is enormous for problems where the rules are unwritable by hand: nobody can type out the if-statements that recognize a face, detect a fraudulent transaction, or distinguish spam from a newsletter, yet examples of each are abundant.

The vocabulary, which is smaller than its reputation: each example is a row of features, the measurable input values, paired with a label, the correct answer. Learning from labeled examples is supervised learning, the kind this lesson covers and the kind behind most deployed ML. When the label is a category, spam or not, which flower species, it is classification; when the label is a number, tomorrow's temperature, a house price, it is regression. Diagnosing a problem into that two-by-two, supervised or not, category or number, is the first skill of the field, and it is a five-second habit once you have it.

Checkpoint

Predicting the selling price of a used laptop from its specs is...

2. Meet the dataset: 150 flowers

Every field has its hello world; machine learning's is the Iris dataset, 150 real flowers measured in 1936, four features each, sepal length and width, petal length and width, in centimeters, and one label among three species. It is small enough to inspect by eye, real enough to be honest, and it is also the dataset behind the Iris ML Classifier mini project in the Learn Python app, so today you are building the same model the app carries in your pocket. In scikit-learn's terms, the features form a table X with 150 rows and 4 columns, and the labels a list y of 150 species codes; that X and y shape is universal, and once you can put any problem into it, you can model that problem.

from sklearn.datasets import load_iris

iris = load_iris()
X, y = iris.data, iris.target

print(X.shape)                # (150, 4): 150 examples, 4 features
print(iris.feature_names)     # sepal/petal length and width
print(iris.target_names)      # ['setosa' 'versicolor' 'virginica']
print(X[0], "->", iris.target_names[y[0]])
# [5.1 3.5 1.4 0.2] -> setosa

3. The most important idea: hide some data

Here is the idea this lesson exists to plant. If you train a model on all 150 flowers and then measure its accuracy on those same 150 flowers, you have measured memory, not learning, like grading students on the exact worksheet they studied. The fix is the train/test split: set aside a slice of the data, train only on the rest, and grade exclusively on the hidden slice. The score on unseen data, generalization, is the only score that predicts how the model behaves in the real world, and every honest ML result you will ever read was measured this way.

from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(
    X, y,
    test_size=0.25,      # hide a quarter for the exam
    random_state=42,     # reproducible shuffle
    stratify=y,          # keep species balanced in both halves
)
print(len(X_train), "to learn from,", len(X_test), "for the exam")
# 112 to learn from, 38 for the exam

4. fit, predict, score: the scikit-learn rhythm

scikit-learn's genius is one interface for everything: every model is an object, your Part 6 literacy, with fit to learn from training data, predict to answer for new rows, and score to grade itself. Swap the model class and the rest of the script does not change, which turns trying alternatives, the daily reality of ML work, into a one-line edit. We start with a decision tree, a model that learns nested if-questions about the features, the grade ladder of Part 2 discovering itself from data, which makes it the perfect first model: you can literally print what it learned.

from sklearn.tree import DecisionTreeClassifier, export_text

model = DecisionTreeClassifier(max_depth=3, random_state=42)
model.fit(X_train, y_train)                  # learning happens here

print(model.score(X_test, y_test))           # ~0.97 on UNSEEN flowers

new_flower = [[5.0, 3.4, 1.5, 0.2]]
pred = model.predict(new_flower)[0]
print("prediction:", iris.target_names[pred])   # setosa

print(export_text(model, feature_names=iris.feature_names))
# |--- petal length (cm) <= 2.45  -> class: setosa
# |--- petal length (cm) >  2.45
# |    |--- petal width (cm) <= 1.75 ...

Pause on that printed tree, because something quietly profound happened: the algorithm discovered, from examples alone, that petal length under 2.45 cm identifies setosa perfectly, a rule a botanist would also tell you, and it found it in milliseconds without ever being told what a petal is. That is machine learning in one screenful. It also previews the field's great trade-off: this tree is fully explainable, while the neural networks of Part 15 trade that transparency for the power to learn rules no one can print.

5. Overfitting: when models memorize

Now the disease every practitioner learns to fear. Remove the depth limit and the tree will grow until it classifies every training flower perfectly, carving hyper-specific rules around individual examples, including their noise. Training accuracy hits 100 percent; test accuracy drops. The model has overfit: memorized the worksheet, failed the exam. The signature is always the same, a wide gap between training and test scores, and you now know how to detect it, because you measure both. The basic cures are equally honest: simpler models, the max_depth you saw, and more data; the advanced ones, regularization and friends, are refinements of those two instincts.

deep = DecisionTreeClassifier(random_state=42)      # no limits
deep.fit(X_train, y_train)
print("train:", deep.score(X_train, y_train))       # 1.00, suspicious
print("test: ", deep.score(X_test, y_test))         # lower: the gap

shallow = DecisionTreeClassifier(max_depth=3, random_state=42)
shallow.fit(X_train, y_train)
print("train:", shallow.score(X_train, y_train))    # ~0.97
print("test: ", shallow.score(X_test, y_test))      # ~0.97: healthy

Checkpoint

A model scores 99% on training data and 71% on the test set. What are you looking at?

6. The whole workflow, end to end

Guided walkthrough

The five-step loop behind every ML project

Frame Define features, label, and problem type

What is one example? Which measurable values are features, what is the label, and is it a category or a number? Misframing here wastes every later step.

Example: one flower
Features: 4 measurements (cm)
Label: species (3 classes)
Problem: supervised classification

Split Hide the exam before studying

train_test_split before anything else touches the data. Decisions made after peeking at test data leak the exam into studying, the cardinal sin of the field.

X_tr, X_te, y_tr, y_te = train_test_split(
    X, y, test_size=0.25,
    random_state=42, stratify=y)

Fit Train a simple baseline first

A small tree or similar humble model sets the bar. Fancy models must beat the baseline to justify their complexity; often they barely do.

model = DecisionTreeClassifier(max_depth=3)
model.fit(X_tr, y_tr)

Evaluate Grade on unseen data, and look past accuracy

score() summarizes; the classification report shows per-class behavior, which is where real datasets hide their problems, especially imbalanced ones.

from sklearn.metrics import classification_report
print(classification_report(
    y_te, model.predict(X_te),
    target_names=iris.target_names))

Iterate Adjust one thing, re-evaluate, repeat

Change the model, a hyperparameter, or the features; measure again on the same split. ML practice is this loop run patiently, with the test set as referee.

for depth in (1, 2, 3, 5, None):
    m = DecisionTreeClassifier(max_depth=depth)
    m.fit(X_tr, y_tr)
    print(depth, m.score(X_te, y_te))

7. Practice: train it yourself, right here

The playground below runs the complete workflow live in your browser, scikit-learn loads on first run, give it a moment, including the overfitting comparison and a second model, k-nearest-neighbors, which classifies by asking what the closest training examples were. The exercises are the real lesson: change one thing at a time and watch the referee, exactly as the stepper prescribed. This is the same model behind the app's Iris mini project, so you can compare answers on the bus.

Python playground

from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.neighbors import KNeighborsClassifier

iris = load_iris()
X_tr, X_te, y_tr, y_te = train_test_split(
    iris.data, iris.target, test_size=0.25,
    random_state=42, stratify=iris.target)

tree = DecisionTreeClassifier(max_depth=3, random_state=42)
tree.fit(X_tr, y_tr)
print(f"decision tree  test accuracy: {tree.score(X_te, y_te):.2%}")

knn = KNeighborsClassifier(n_neighbors=5)
knn.fit(X_tr, y_tr)
print(f"k-nearest (5)  test accuracy: {knn.score(X_te, y_te):.2%}")

mystery = [[6.1, 2.8, 4.7, 1.2]]
print("tree says:", iris.target_names[tree.predict(mystery)[0]])
print("knn  says:", iris.target_names[knn.predict(mystery)[0]])

# Exercises:
# 1. Remove max_depth and print BOTH train and test accuracy for the
#    tree. Find the overfitting gap with your own eyes.
# 2. Try n_neighbors = 1 and 50. Why is 1 overfitting-flavored and
#    50 underfitting-flavored, on 112 training flowers?
# 3. Train on only the first two features (iris.data[:, :2]).
#    How much accuracy do petals turn out to be worth?

A closing note on what scikit-learn covers beyond today: regression models for numeric targets, clustering for unlabeled data, pipelines that bundle preprocessing with models, and cross-validation, which rotates the exam slice for sturdier grades on small datasets. All of it follows the fit/predict/score rhythm you now own. The CSV skills of Part 11 are how real datasets arrive, the testing discipline of Part 13 is how data code stays trustworthy, and the next lesson takes the one step scikit-learn does not: models that build their own features.

! Common mistakes to avoid

✕Evaluating on training data and reporting the number with a straight face.

✓Always split first and report test accuracy. The training score is a diagnostic for overfitting, never a result.
✕Letting information from the test set influence any decision before the final grade.

✓Choosing features or models by peeking at test results is leakage, the field's quiet epidemic. Decide on training data; touch the test set once, at the end.
✕Chasing a fancy model before establishing a simple baseline.

✓Fit the small tree first. If the impressive model cannot beat it convincingly on the test set, the complexity is cost without benefit.
✕Trusting accuracy alone on imbalanced data.

✓A model predicting "no fraud" for everything scores 99% when fraud is 1% of rows. Read the per-class report; accuracy is a summary, not the truth.

? Frequently asked questions

How much math do I need to go further? +

To use ML well: arithmetic, the willingness to read a chart, and the experimental discipline this lesson taught. To invent new methods: linear algebra, calculus, and probability. Build working things first; the math lands far better once you have intuition to attach it to.

Where do I find datasets to practice on? +

scikit-learn ships several starters like Iris and digits. Beyond those, public portals like Kaggle offer thousands of real CSV datasets, and Part 11 gave you everything needed to load and clean them into X and y.

When is classic ML the right tool versus deep learning? +

Tabular data, rows and columns of features, is classic ML territory, and trees and their ensembles routinely beat neural networks there while training in seconds. Deep learning earns its cost on images, audio, and language, exactly where Part 15 picks up.

What is the app's Iris mini project compared to this lesson? +

The same dataset and model family, packaged to run on your phone with the steps annotated. Doing the lesson here and the project there, a day apart, is deliberate spaced repetition, the most evidence-backed study trick available to you.

8. Recap and what comes next

You trained your first real models and, more importantly, you trained them honestly: features and labels, classification versus regression, a stratified train/test split, the fit/predict/score rhythm, a printed decision tree you could explain to anyone, the overfitting gap and its cures, and the five-step loop that every ML project on earth runs. The mystique is gone; what remains is a craft, and you have started it.

Next, the model gets a brain made of layers: Part 15, deep learning with TensorFlow and Keras, where neurons, weights, and gradient descent stop being buzzwords and a network learns to read clothing photos. The Iris ML Classifier mini project in the Learn Python app below is today's perfect homework, and the syllabus lives on the series hub.

💡

Pro tip

Keep a one-line experiment log whenever you touch a model: what you changed, and the test score. Five experiments without notes teach nothing; five with notes teach exactly what matters on your data, and the habit scales to every ML project you will ever run.

Practice on the go

Learn Python, the free Android app

Every topic in this series lives in the app too: bite-size lessons, runnable examples, quizzes, mini projects, and an offline Python playground that runs on your phone.

Get it on Google Play View the app page