Evaluating Models

👋

Welcome Back!

After understanding the pitfalls of overfitting and underfitting, we now focus on measuring a model’s performance. You’ll learn key metrics such as accuracy, precision, and recall, which help determine whether a model is truly effective for its intended task.

🎯

Accuracy

Accuracy is the fraction of all predictions that are correct. If a model correctly labels 95 out of 100 test examples, its accuracy is 95%.

Accuracy is easy to understand but can be misleading if classes are unbalanced. For example, if 99% of emails are non-spam, a model that always predicts "not spam" is 99% accurate but useless for finding spam.

🔍

Precision

Precision measures how many of the positive predictions were actually correct. For example, in spam detection: of all emails flagged as spam, what fraction were truly spam?

A high precision means few false alarms. It answers: "When the model predicts positive, how often is it right?"

🎣

Recall

Recall (also called sensitivity) measures how many of the actual positives were correctly identified. Using the spam example: of all real spam emails, how many did the model identify?

A high recall means few missed cases. It answers: "How many of the actual positives did we find?"

⚖️

Combining Metrics

Often we look at both precision and recall. A model might have high precision but low recall (very confident but misses many positives), or vice versa.

The F1 score combines them by taking their harmonic mean, giving one metric that balances both. We use it when we need a single performance measure for imbalanced problems.

📊

Other Metrics

For regression tasks, we use metrics like Mean Squared Error (MSE) or Mean Absolute Error (MAE) to measure average prediction error. Lower values mean better fit.

For classification beyond precision/recall, metrics like ROC-AUC measure the tradeoff between true positive rate and false positive rate. Different problems need different metrics.

🤔

Choosing Metrics

The choice depends on the problem. For medical diagnoses, missing a disease (low recall) may be worse than a false alarm, so we might prioritize recall.

For email spam, missing a few spam messages is okay, but sending an important email to spam (false positive) is bad, so precision is key. Always consider the cost of errors in your application.

Quiz: Precision vs Recall

Which Metric Measures The Fraction Of Actual Positive Examples That The Model Correctly Identified?

A

Accuracy

B

Precision

C

Recall

D

F1 Score

Fill in the Blank

In classification, precision tells us how many predicted positives were correct, while ___ tells us how many actual positives were found.

💡 Drag the correct word from below into the blank to complete the sentence.

In classification, precision tells us how many predicted positives were correct, while

tells us how many actual positives were found.

Recall

Accuracy

F1

MSE

Reflection

💭

Imagine a test that screens for a serious disease. Would it be more important for the test to have high precision or high recall?

Explain your reasoning (hint: consider the consequences of missing a diagnosis versus false alarms).

Lesson Completed!

Outstanding! You now understand how to evaluate ML models using accuracy, precision, recall, and more!