# A Smooth Introduction to Linear Regression using pandas

## What will we cover?

Show what Linear Regression is visually and demonstrate it on data.

## Step 1: What is Linear Regression

Simply said, you can describe Linear Regression as follows.

• Given data input (independent variables) can we predict output (dependent variable)
• It is the mapping from input point to a continuous value

I like to show it visually.

The goal of Linear Regression is to find the best fitting line. Hence, some data will be fitted better as it will be closer to the line.

The predictions will be on the line. That is, when you have fitted your Linear Regression model, it will predict new values to be on the line.

While this sounds simple, the model is one of the most used models and creates high value.

## Step 2: Correlation and Linear Regression

Often there is a bit confusing between Linear Regression and Correlation. But they do different things.

Correlation is one number describing a relationship between tow variables. While Linear Regression is an equation used to predict values.

• Correlation
• Single measure of relationship between two variables.
• Linear Regression
• An equation used for prediction.
• Similarities
• Describes relationship between variables

## Step 3: Example

Let’s try an example.

```import pandas as pd

data.plot.scatter(x='Height', y='Weight', alpha=.1)
```

This data looks correlated. How would a Linear Regression prediction of it look like?

We can use Sklearn.

### Linear Regression

• The Linear Regression model takes a collection of observations
• Each observation has featuers (or variables).
• The features the model takes as input are called independent (often denoted with `X`)
• The feature the model outputs is called dependent (often denoted with `y`)
```from sklearn.linear_model import LinearRegression

# Creating a Linear Regression model on our data
lin = LinearRegression()
lin.fit(data[['Height']], data['Weight'])

# Creating a plot
ax = data.plot.scatter(x='Height', y='Weight', alpha=.1)
ax.plot(data['Height'], lin.predict(data[['Height']]), c='r')
```

To measure the accuracy of the prediction the r-squared function is often used, which you can access directly on the model by using the following code.

```lin.score(data[['Height']], data['Weight'])
```

This will give 0.855, which is just a number you can use to compare to other samples.