Show what Linear Regression is visually and demonstrate it on data.

Simply said, you can describe Linear Regression as follows.

- Given data input (independent variables) can we predict output (dependent variable)
- It is the mapping from input point to a continuous value

I like to show it visually.

The goal of Linear Regression is to find the best fitting line. Hence, some data will be fitted better as it will be closer to the line.

The predictions will be on the line. That is, when you have fitted your Linear Regression model, it will predict new values to be on the line.

While this sounds simple, the model is one of the most used models and creates high value.

Often there is a bit confusing between Linear Regression and Correlation. But they do different things.

Correlation is one number describing a relationship between tow variables. While Linear Regression is an equation used to predict values.

- Correlation
- Single measure of relationship between two variables.

- Linear Regression
- An equation used for prediction.

- Similarities
- Describes relationship between variables

Let’s try an example.

```
import pandas as pd
data = pd.read_csv('https://raw.githubusercontent.com/LearnPythonWithRune/DataScienceWithPython/main/files/weight-height.csv')
data.plot.scatter(x='Height', y='Weight', alpha=.1)
```

This data looks correlated. How would a Linear Regression prediction of it look like?

We can use Sklearn.

- Machine Learning in Python scikit-learn
- LinearRegression Ordinary least squares Linear Regression.

- The
**Linear Regression model**takes a collection of**observations** - Each
**observation**has**featuers**(or variables). - The
**features**the model takes as input are called**independent**(often denoted with`X`

) - The
**feature**the model outputs is called**dependent**(often denoted with`y`

)

```
from sklearn.linear_model import LinearRegression
# Creating a Linear Regression model on our data
lin = LinearRegression()
lin.fit(data[['Height']], data['Weight'])
# Creating a plot
ax = data.plot.scatter(x='Height', y='Weight', alpha=.1)
ax.plot(data['Height'], lin.predict(data[['Height']]), c='r')
```

To measure the accuracy of the prediction the r-squared function is often used, which you can access directly on the model by using the following code.

```
lin.score(data[['Height']], data['Weight'])
```

This will give 0.855, which is just a number you can use to compare to other samples.

Want to learn more about Data Science to become a successful Data Scientist?

This is one lesson of a 15 part Expert Data Science Blueprint course with the following resources.

**15 video lessons**– covers the Data Science Workflow and concepts, demonstrates everything on real data, introduce projects and shows a solution (**YouTube video**).**30 JuPyter Notebooks**– with the full code and explanation from the lectures and projects (GitHub).**15 projects**– structured with the Data Science Workflow and a solution explained in the end of video lessons (GitHub).

Why learn Python? There are many reasons to learn Python, and that is the power…

3 days ago

What will you learn? How to use the modulo operator to check if a number…

1 week ago

There are a lot of Myths out there There are lot of Myths about being…

2 months ago

To be honest, I am not really a great programmer - that is not what…

2 months ago

What does it take to become a Data Scientist? Data Science is in a cross…

2 months ago

What will you learn? Need to setup a SQL server? You don’t need to install…

4 months ago