Overfitting and Underfitting

👋

Welcome Back!

Having learned how to train and test models, it’s important to understand why they sometimes fail to generalize. In this lesson, we’ll explore overfitting and underfitting, discovering how models can be too complex or too simple and how to strike the right balance for accurate predictions.

📉

Underfitting

Underfitting happens when the model is too simple and fails to capture the data's pattern. It performs poorly on both training and testing data.

For example, if the true relationship is a curve but the model fits a straight line, it will miss important trends.

An underfitted model is like a student who studies only the basics and can't even answer the practice questions well.

📈

Overfitting

Overfitting is when the model is too complex and fits the training data too closely, including noise. It predicts training data very well but fails on new data.

Think of a student who memorizes answers to practice questions but can't handle new ones. An overfitted model might have a very wiggly curve that hits every training point exactly.

✨

Just Right

The goal is to find a model that generalizes well: it captures the true patterns without memorizing noise. In a plot, this looks like a smooth curve through the data.

It's like a good student who understands the material and can answer any question. We adjust the model's complexity (like adding or removing features) to reach this balance.

📊

Visualizing Fit

Imagine your data plotted as points on a graph. An underfitting model might draw a flat or overly simple line through them (missing the trend), while an overfitting model would zig-zag to hit every point.

A well-fitted model draws a smooth curve that goes near most points without twisting to hit all of them.

⚖️

Bias vs Variance

Underfitting is also called high bias (making strong assumptions, too simple), and overfitting is high variance (too sensitive to data). We need to balance these.

For example, adding more data can reduce variance (less overfitting), and making the model a bit more flexible can reduce bias (less underfitting).

🛡️

Avoiding Overfitting

To prevent overfitting, we can reduce model complexity or gather more data. Techniques like regularization (penalizing complexity) or early stopping during training help too.

Cross-validation can also help us notice overfitting early by testing on multiple subsets of data.

Quiz: Overfit or Underfit

If A Model Gets 100% Accuracy On Training Data But Only 50% On New Data, This Model Is:

A

Underfitting

B

Perfect

C

Overfitting

D

Suffering from data leakage

Fill in the Blank

A model that is too simple and cannot capture the underlying pattern is experiencing ___.

💡 Drag the correct word from below into the blank to complete the sentence.

A model that is too simple and cannot capture the underlying pattern is experiencing

Underfitting

Overfitting

Bias

Variance

Reflection

💭

Think of learning to play a song on guitar. What would underfitting and overfitting look like in this scenario?

How would you adjust your practice to get it "just right" (enough practice for mastery, but not rote memorization)?

Lesson Completed!

Excellent work! You now understand overfitting and underfitting—crucial for building effective ML models!