## Learn how you can become a Python programmer in just 12 weeks.

We respect your privacy. Unsubscribe at anytime.

# A Smooth Introduction to Linear Regression using pandas

## Master the Power of Linear Regression: Unlock Predictive Insights with Visual Understanding

Linear regression is a fundamental machine learning algorithm that allows you to model the relationship between variables and make predictions based on their linear association. By mastering linear regression, you gain the ability to uncover valuable insights, predict outcomes, and solve a wide range of real-world problems.

Why It’s Great to Master Linear Regression:

• Predictive Power: Linear regression enables you to make accurate predictions by establishing the relationship between input variables and the target variable.
• Interpretability: With linear regression, you can easily interpret the coefficients and understand the impact of each variable on the target variable.
• Versatility: Linear regression is widely applicable and can be used in various domains, such as finance, marketing, healthcare, and social sciences.
• Foundation for Advanced Techniques: Linear regression serves as a foundation for more advanced regression techniques and machine learning algorithms.

## Topics Covered in This Tutorial

• Visual Understanding of Linear Regression: Gain an intuitive understanding of how linear regression works and its underlying assumptions through visual explanations.
• Real-World Example: Apply linear regression to a real dataset, guiding you through the process of model training, evaluation, and prediction.
• Supplementary Video: Access an informative video that further enhances your understanding of linear regression and its practical applications.

By mastering linear regression, you empower yourself with a powerful tool for prediction and analysis, enabling you to extract valuable insights and drive data-informed decisions.

## Step 1: What is Linear Regression

Simply said, you can describe Linear Regression as follows.

• Given data input (independent variables) can we predict output (dependent variable)
• It is the mapping from an input point to a continuous value

I like to show it visually.

The goal of Linear Regression is to find the best-fitting line. Hence, some data will be fitted better as it will be closer to the line.

The predictions will be on the line. That is when you have fitted your Linear Regression model, it will predict new values to be on the line.

While this sounds simple, the model is one of the most used models and creates high value.

## Step 2: Correlation and Linear Regression

Often there is a bit of confusion between Linear Regression and Correlation. But they do different things.

Correlation is one number describing a relationship between two variables. While Linear Regression is an equation used to predict values.

• Correlation
• A single measure of the relationship between two variables.
• Linear Regression
• An equation is used for prediction.
• Similarities
• Describes the relationship between variables

## Step 3: Example

Let’s try an example.

```import pandas as pd
data.plot.scatter(x='Height', y='Weight', alpha=.1)
```

This data looks correlated. What would a Linear Regression prediction of it look like?

We can use Sklearn.

### Linear Regression

• The Linear Regression model takes a collection of observations
• Each observation has features (or variables).
• The features the model takes as input are called independent (often denoted with `X`)
• The feature the model outputs is called dependent (often denoted with `y`)
```from sklearn.linear_model import LinearRegression
# Creating a Linear Regression model on our data
lin = LinearRegression()
lin.fit(data[['Height']], data['Weight'])
# Creating a plot
ax = data.plot.scatter(x='Height', y='Weight', alpha=.1)
ax.plot(data['Height'], lin.predict(data[['Height']]), c='r')
```

To measure the accuracy of the prediction the r-squared function is often used, which you can access directly on the model by using the following code.

```lin.score(data[['Height']], data['Weight'])
```

This will give 0.855, which is just a number you can use to compare to other samples.

In the next lesson you will learn How to Clean Data using pandas DataFrames in this Data Science course.

This is one lesson of a 15-part Expert Data Science Blueprint course with the following resources.

• 15 video lessons – covers the Data Science Workflow and concepts, demonstrates everything on real data, introduces projects, and shows a solution (YouTube video).
• 30 JuPyter Notebooks – with the full code and explanation from the lectures and projects (GitHub).
• 15 projects – structured with the Data Science Workflow and a solution explained in the end of video lessons (GitHub).

## Python Circle

Do you know what the 5 key success factors every programmer must have?

How is it possible that some people become programmer so fast?

While others struggle for years and still fail.

Not only do they learn python 10 times faster they solve complex problems with ease.

What separates them from the rest?

I identified these 5 success factors that every programmer must have to succeed:

1. Collaboration: sharing your work with others and receiving help with any questions or challenges you may have.
2. Networking: the ability to connect with the right people and leverage their knowledge, experience, and resources.