After understanding the main ML tasks, it’s time to look under the hood at the data that makes learning possible. You’ll learn about features, the input attributes of each example, and labels, which guide the learning process in supervised tasks.
Features are the input attributes we use for learning. They are measurable properties of the data.
For example, to predict house prices, features might include square footage, number of bedrooms, and neighborhood. In a study on students, features could be study hours and attendance.
Good, relevant features help the model make accurate predictions.
In supervised learning, each example has a label: the correct answer we want to predict.
For instance, in a spam filter, the label is "spam" or "not spam" for each email. In a house price model, the label is the actual sale price of the house.
During training, the model learns to connect features to these labels.
A dataset is a collection of many examples (also called records or observations). Each example has feature values and possibly a label.
Think of a spreadsheet where each row is one example. For instance, a dataset for car sales might have columns "mileage", "age", "brand" (features) and "price" (label), with each row representing one car.
ML performance depends on data quality and quantity. More good data usually helps the model learn better.
If the data has errors, bias, or missing values, the model may learn incorrectly. It's important to clean the data and ensure examples represent all relevant groups to train a robust model.
Imagine a dataset to predict loan defaults. Each row could include features like "income", "credit score", "loan amount" and a label "default" (yes/no).
We feed these into a model: it learns how the features relate to the label 'default'. This structured data (features and labels) is what ML algorithms use.
Features can be numeric (like height or temperature) or categorical (like color or category).
Numeric features might be scaled (e.g. converting kilometers to meters) so the model treats them fairly. Categorical features (like brand or country) might be converted into a numerical form.
Handling different feature types is part of preparing data for ML.
Which Of These Is A Feature In A Machine Learning Dataset For Predicting Car Prices?
Each row in a dataset (with features and a label) is called an ___ or example.
Think of a dataset you know (for example, sports statistics or movie ratings).
What could be some useful features and labels to build an ML model with that data? Describe them briefly.
Excellent! You now understand features, labels, and datasets—the building blocks of ML. Keep it up!