Why it’s great to master Multiple Linear Regression?
Mastering Multiple Linear Regression offers several advantages and insights into the field of supervised learning and predictive modeling:
- Fundamental predictive modeling technique: Multiple Linear Regression is a fundamental supervised learning technique used for predicting continuous outcomes. By mastering this method, you gain a solid foundation in predictive modeling, which is widely applicable across various domains.
- Understanding the differences from discrete classifiers: Multiple Linear Regression provides a different perspective compared to discrete classifiers. It focuses on predicting continuous values rather than discrete classes, offering a deeper understanding of regression-based modeling approaches.
- Supervised learning at its core: Multiple Linear Regression falls under the category of supervised learning, where models learn from labeled training data to make predictions. By mastering this technique, you enhance your understanding of supervised learning tasks and gain valuable insights into building regression models.
- Similarities between linear and discrete classifiers: Exploring Multiple Linear Regression allows you to draw parallels between linear classifiers and discrete classifiers. Understanding these similarities can provide a cohesive understanding of different types of classifiers and their underlying principles.
- Hands-on experience with multiple linear regression: This learning opportunity will provide hands-on experience with Multiple Linear Regression. Through practical exercises and examples, you will gain valuable insights into implementing and working with this technique, reinforcing your understanding and building practical skills.
What will be covered in this lesson?
In this lesson, you will dive into the world of Multiple Linear Regression, exploring its concepts, techniques, and practical applications. The following topics will be covered:
- Introduction to Multiple Linear Regression: You will learn the basics of Multiple Linear Regression, including its key components, assumptions, and the mathematical formulation that underlies this technique.
- Differences from discrete classifiers: You will gain a clear understanding of how Multiple Linear Regression differs from discrete classifiers, focusing on the prediction of continuous outcomes and the use of regression-based methodologies.
- Supervised learning and regression tasks: The lesson will emphasize the supervised learning nature of Multiple Linear Regression, highlighting its role in predictive modeling and its applications in various regression tasks.
- Similarities between linear and discrete classifiers: You will explore the similarities between linear classifiers and discrete classifiers, identifying commonalities in terms of modeling techniques, decision boundaries, and underlying principles.
- Hands-on experience: Through hands-on exercises and practical examples, you will have the opportunity to apply Multiple Linear Regression to real-world datasets. This hands-on experience will deepen your understanding and equip you with the skills to implement and analyze regression models effectively.
By the end of this lesson, you will have a comprehensive understanding of Multiple Linear Regression, its applications, and its relationship with discrete classifiers. You will also have hands-on experience working with this technique, enhancing your proficiency in supervised learning and predictive modeling.
Step 1: What is Multiple Linear Regression?
Multiple Linear Regression is a Supervised learning task of learning a mapping from input point to a continuous value.
Wow. What does that mean?
This might not help all, but it is the case of a Linear Regression, where there are multiple explanatory variables.
Let’s start simple – Simple Linear Regression is the case most show first. It is given one input variable (explanatory variable) and one output value (response value).
An example could be – if the temperatur is X degrees, we expect to sell Y ice creams. That is, it is trying to predict how many ice creams we sell if we are given a temperature.
Now we know that there are other factors that might have high impact other that the temperature when selling ice cream. Say, is it rainy or sunny. What time of year it is, say, it might be turist season or not.
Hence, a simple model like that might not give a very accurate estimate.
Hence, we would like to model having more input variables (explanatory variables). When we have more than one it is called Multiple Linear Regression.
Step 2: Get Example Data
Let’s take a look at some house price data.
import pandas as pd data = pd.read_csv('https://raw.githubusercontent.com/LearnPythonWithRune/MachineLearningWithPython/main/files/house_prices.csv') print(data.head())
Notice – you can also download the file locally from the GitHub. This will make it faster to run every time.
The output should be giving the following data.
The goal is given a row of data we want to predict the House Unit Price. That is, given all but the last column in a row, can we predict the House Unit Price (the last column).
Step 3: Plot the data
Just for fun – let’s make a scatter plot of all the houses with Latitude and Longitude.
import matplotlib.pyplot as plt fig, ax = plt.subplots() ax.scatter(x=data['Longitude'], y=data['House unit price']) plt.show()
This gives the following plot.
This shows you where the houses are located, which can be interesting because house prices can be dependent on location.
Somehow it should be intuitive that the longitude and latitude should not be linearly correlated to the house price – at least not in the bigger picture.
Step 4: Correlation of the features
Before we make the Multiple Linear Regression, let’s see how the features (the columns) correlate.
This is interesting. Look at the lowest row for the correlations with House Unit Price. It shows that Distance to MRT stations negatively correlated – that is, the longer to a MRT station the lower price. This might not be surprising.
More surprising is that Latitude and Longitude are actually comparably high correlated to the House Unit Price.
This might be the case for this particular dataset.
Step 5: Check the Quality of the dataset
For the Linear Regression model to perform well, you need to check that the data quality is good. If the input data is of poor quality (missing data, outliers, wrong values, duplicates, etc.) then the model will not be very reliable.
Here we will only check for missing values.
Transaction 0 House age 0 Distance to MRT station 0 Number of convenience stores 0 Latitude 0 Longitude 0 House unit price 0 dtype: int64
This tells us that there are no missing values.
If you want to learn more about Data Quality, then check out the free course on Data Science. In that course you will learn more about Data Quality and how it impacts the accuracy of your model.
Step 6: Create a Multiple Linear Regression Model
First we need to divide them into input variables X (explanatory variables) and output values y (response values).
Then we split it into a training and testing dataset. We create the model, we fit it, we use it predict the test dataset and get a score.
from sklearn.model_selection import train_test_split from sklearn.linear_model import LinearRegression from sklearn.metrics import r2_score X = data.iloc[:,:-1] y = data.iloc[:,-1] X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=0, test_size=.15) lin = LinearRegression() lin.fit(X_train, y_train) y_pred = lin.predict(X_test) print(r2_score(y_test, y_pred))
For this run it gave 0.68.
Is that good or bad? Well, good question. The perfect match is 1, but that should not be expected. The worse score you can get is minus infinite – so we are far from that.
In order to get an idea about it – we need to compare it with variations.
In the free Data Science course we explore how to select features and evaluate models. It is a great idea to look into that.
Want to learn more?
In the next lesson you will learn Reinforcement Learning.
This is part of a FREE 10h Machine Learning course with Python.
- 15 video lessons – which explain Machine Learning concepts, demonstrate models on real data, introduce projects and show a solution (YouTube playlist).
- 30 JuPyter Notebooks – with the full code and explanation from the lectures and projects (GitHub).
- 15 projects – with step guides to help you structure your solutions and solution explained in the end of video lessons (GitHub).
Do you know what the 5 key success factors every programmer must have?
How is it possible that some people become programmer so fast?
While others struggle for years and still fail.
Not only do they learn python 10 times faster they solve complex problems with ease.
What separates them from the rest?
I identified these 5 success factors that every programmer must have to succeed:
- Collaboration: sharing your work with others and receiving help with any questions or challenges you may have.
- Networking: the ability to connect with the right people and leverage their knowledge, experience, and resources.
- Support: receive feedback on your work and ask questions without feeling intimidated or judged.
- Accountability: stay motivated and accountable to your learning goals by surrounding yourself with others who are also committed to learning Python.
- Feedback from the instructor: receiving feedback and support from an instructor with years of experience in the field.
I know how important these success factors are for growth and progress in mastering Python.
That is why I want to make them available to anyone struggling to learn or who just wants to improve faster.
With the Python Circle community, you can take advantage of 5 key success factors every programmer must have.
Be part of something bigger and join the Python Circle community.