Data, Features, and Labels

⏱️ 15 min 📚 Lesson 4 of 11
1 / 10
👋

Welcome Back!

After understanding the main ML tasks, it’s time to look under the hood at the data that makes learning possible. You’ll learn about features, the input attributes of each example, and labels, which guide the learning process in supervised tasks.

🔢

What are Features?

Features are the input attributes we use for learning. They are measurable properties of the data.

For example, to predict house prices, features might include square footage, number of bedrooms, and neighborhood. In a study on students, features could be study hours and attendance.

Good, relevant features help the model make accurate predictions.

🏷️

What are Labels?

In supervised learning, each example has a label: the correct answer we want to predict.

For instance, in a spam filter, the label is "spam" or "not spam" for each email. In a house price model, the label is the actual sale price of the house.

During training, the model learns to connect features to these labels.

📊

Datasets and Examples

A dataset is a collection of many examples (also called records or observations). Each example has feature values and possibly a label.

Think of a spreadsheet where each row is one example. For instance, a dataset for car sales might have columns "mileage", "age", "brand" (features) and "price" (label), with each row representing one car.

Data Quality and Quantity

ML performance depends on data quality and quantity. More good data usually helps the model learn better.

If the data has errors, bias, or missing values, the model may learn incorrectly. It's important to clean the data and ensure examples represent all relevant groups to train a robust model.

💾

Example Dataset

Imagine a dataset to predict loan defaults. Each row could include features like "income", "credit score", "loan amount" and a label "default" (yes/no).

We feed these into a model: it learns how the features relate to the label 'default'. This structured data (features and labels) is what ML algorithms use.

🔤

Feature Types

Features can be numeric (like height or temperature) or categorical (like color or category).

Numeric features might be scaled (e.g. converting kilometers to meters) so the model treats them fairly. Categorical features (like brand or country) might be converted into a numerical form.

Handling different feature types is part of preparing data for ML.

Quiz: Data Concepts

Which Of These Is A Feature In A Machine Learning Dataset For Predicting Car Prices?

A
The predicted price
B
The car's mileage
C
The model's overall accuracy
D
The size of the training set

Fill in the Blank

Each row in a dataset (with features and a label) is called an ___ or example.

💡 Drag the correct word from below into the blank to complete the sentence.
Each row in a dataset (with features and a label) is called an
or example.
Instance
Column
Algorithm
Model

Reflection

💭

Think of a dataset you know (for example, sports statistics or movie ratings).

What could be some useful features and labels to build an ML model with that data? Describe them briefly.

Lesson Completed!

Excellent! You now understand features, labels, and datasets—the building blocks of ML. Keep it up!