# A Smooth Introduction to Linear Regression using pandas

## What will we cover?

Show what Linear Regression is visually and demonstrate it on data.

## Step 1: What is Linear Regression

Simply said, you can describe Linear Regression as follows.

• Given data input (independent variables) can we predict output (dependent variable)
• It is the mapping from input point to a continuous value

I like to show it visually.

The goal of Linear Regression is to find the best fitting line. Hence, some data will be fitted better as it will be closer to the line.

The predictions will be on the line. That is, when you have fitted your Linear Regression model, it will predict new values to be on the line.

While this sounds simple, the model is one of the most used models and creates high value.

## Step 2: Correlation and Linear Regression

Often there is a bit confusing between Linear Regression and Correlation. But they do different things.

Correlation is one number describing a relationship between tow variables. While Linear Regression is an equation used to predict values.

• Correlation
• Single measure of relationship between two variables.
• Linear Regression
• An equation used for prediction.
• Similarities
• Describes relationship between variables

## Step 3: Example

Let’s try an example.

```import pandas as pd
data.plot.scatter(x='Height', y='Weight', alpha=.1)
```

This data looks correlated. How would a Linear Regression prediction of it look like?

We can use Sklearn.

### Linear Regression

• The Linear Regression model takes a collection of observations
• Each observation has featuers (or variables).
• The features the model takes as input are called independent (often denoted with `X`)
• The feature the model outputs is called dependent (often denoted with `y`)
```from sklearn.linear_model import LinearRegression
# Creating a Linear Regression model on our data
lin = LinearRegression()
lin.fit(data[['Height']], data['Weight'])
# Creating a plot
ax = data.plot.scatter(x='Height', y='Weight', alpha=.1)
ax.plot(data['Height'], lin.predict(data[['Height']]), c='r')
```

To measure the accuracy of the prediction the r-squared function is often used, which you can access directly on the model by using the following code.

```lin.score(data[['Height']], data['Weight'])
```

This will give 0.855, which is just a number you can use to compare to other samples.

This is one lesson of a 15 part Expert Data Science Blueprint course with the following resources.

• 15 video lessons – covers the Data Science Workflow and concepts, demonstrates everything on real data, introduce projects and shows a solution (YouTube video).
• 30 JuPyter Notebooks – with the full code and explanation from the lectures and projects (GitHub).
• 15 projects – structured with the Data Science Workflow and a solution explained in the end of video lessons (GitHub).

## Learn Python

Learn Python A BEGINNERS GUIDE TO PYTHON

• 70 pages to get you started on your journey to master Python.
• How to install your setup with Anaconda.
• Written description and introduction to all concepts.
• Jupyter Notebooks prepared for 17 projects.

Python 101: A CRASH COURSE

1. How to get started with this 8 hours Python 101: A CRASH COURSE.
2. Best practices for learning Python.
4. A chapter for each lesson with a descriptioncode snippets for easy reference, and links to a lesson video.

## Expert Data Science Blueprint

Expert Data Science Blueprint

• Master the Data Science Workflow for actionable data insights.