Learn NumPy Basics and Linear Regression with Python

How to use NumPy with Linear Regression in Python

In this tutorial, you will learn some basic NumPy. The best way to learn something new is to combine it with something useful. Therefore you will use NumPy while creating your first Machine Learning project, a Linear Regression model with Python.

Step 1: What is NumPy?

NumPy is the fundamental package for scientific computing in Python.


Well, that is how it is stated on the official NumPy page.

Maybe a better question is, what do you use NumPy for and why?

Well, the main tool you use from NumPy is the NumPy array. Arrays are quite similar to Python lists, just with a few restrictions.

  1. It can only contain one data type. That is, if a NumPy array has integers, then all entries can only be integers.
  2. The size cannot change (immutable). That is, you can not add or remove entries, like in a Python list.
  3. If it is a multi-dimension array, all sub-arrays must be of the same shape. That is, you cannot have something similar to a Python list of lists, where the first sub-list is of length 3, the second of length 7, and so on. They all must have the same length (or shape).

Why would anyone use them, you might ask? They are more restrictive than Python lists.

Actually, and funny enough, making the data structures more restrictive, like NumPy arrays, can make it more efficient (faster).


Well, think about it. You know more about the data structure, and hence, do not need to make many additional checks.

Step 2: A little NumPy array basics we will use for our Machine Learning project

After this and the learning some basic Machine Learning theory, you will use NumPy for Linear Regression in Python.

A NumPy array can be created from a list.

import numpy as np
a1 = np.array([1, 2, 3, 4])

Which will print.

array([1, 2, 3, 4])

The data type of a NumPy array can be given as follows.


It will print dtype(‘int64’). That is, the full array has only one type, int64, which are 64-bit integers. That is also different from Python integers, where you actually cannot specify the size of the integers. Here you can have int8, int16, int32, int64, and more. Again restrictions, which make it more efficient.


The above gives the shape, here, (4,). Notice, that this shape cannot be changed, because the data structure is immutable.

Let’s create another NumPy array and try a few things.

a1 = np.array([1, 2, 3, 4])
a2 = np.array([5, 6, 7, 8])
print(a1 + a2)

Which results in.

array([2, 4, 6, 8])
array([ 5, 12, 21, 32])
array([ 6,  8, 10, 12])

With a little inspection, you will realize that the first (a1*2) multiplies with 2 in each entry. The second (a1*a2) multiplies the entries pairwise. The third (a1 + a2) adds the entries pairwise.

Step 3: What is Machine Learning?

You need to understand some fundemental things about Machine Learning to use NumPy for Linear Regression in Python.

  • In the classical computing model, everything is programmed into the algorithms. This has the limitation that all decision logic needs to be understood before usage. And if things change, we need to modify the program.
  • With the modern computing model (Machine Learning) this paradigm is changing. We feed the algorithms with data and based on that data, we do the decisions in the program.

How Machine Learning Works

  • On a high level, you can divide Machine Learning into two phases.
    • Phase 1: Learning
    • Phase 2: Prediction
  • The learning phase (Phase 1) can be divided into substeps.
  • It all starts with a training set (training data). This data set should represent the type of data that the Machine Learn model should be used to predict from in Phase 2 (prediction).
  • The pre-processing step is about cleaning up data. While Machine Learning is awesome, it cannot figure out what good data looks like. You need to do the cleaning as well as transform data into the desired format.
  • Then for the magic, the learning step. There are three main paradigms in machine learning.
    • Supervised: where you tell the algorithm what categories each data item is in. Each data item from the training set is tagged with the right answer.
    • Unsupervised: is when the learning algorithm is not told what to do with it and it should make the structure itself.
    • Reinforcement: teaches the machine to think for itself based on past action rewards.
  • Finally, the testing is done to see if the model is good. The training data was divided into a test set and a training set. The test set is used to see if the model can predict from it. If not, a new model might be necessary.

Then the prediction begins.

Step 4: NumPy with Linear Regression in Python

Let’s try to use a Machine Learning model. One of the first models you will meet is the Linear Regression model.

Simply said, this model tries to fit data into a straight line. The best way to understand that is to see it visually with one explanatory variable. That is, given a value (explanatory variable), can you predict the scalar response (the value you want to predict.

Say, given the temperature (explanatory variable), can you predict the sale of ice cream? Assuming there is a linear relationship, can you determine that? A guess is, the hotter it is, the more ice cream is sold. But whether a leaner model is a good predictor, is beyond the scope here.

Let’s try with some simple data.

But first, we need to import a few libraries.

from sklearn.linear_model import LinearRegression

Then we generate some simple data.

x = [i for i in range(10)]
y = [i for i in range(10)]

For the case, it will be fully correlated, but it will only demonstrate it. This part is equivalent to the Get data step.

But x is the explanatory variable and y is the scalar response we want to predict.

When you train the model, you give it input pairs of explanatory and scalar responses. This is needed, as the model needs to learn.

After the learning, you can predict data. But let’s prepare the data for the learning. This is the Pre-processing.

X = np.array(x).reshape((-1, 1))
Y = np.array(y).reshape((-1, 1))

Notice, this is a very simple step, and we only need to convert the data into the correct format.

Then we can train the model (train model).

lin_regressor = LinearRegression()
lin_regressor.fit(X, Y)

Here we will skip the test model step, as the data is simple.

To predict data we can call the model.

Y_pred = lin_regressor.predict(X)

The full code is together here.

from sklearn.linear_model import LinearRegression
x = [i for i in range(10)]
y = [i for i in range(10)]
X = np.array(x).reshape((-1, 1))
Y = np.array(y).reshape((-1, 1))
lin_regressor = LinearRegression()
lin_regressor.fit(X, Y)
Y_pred = lin_regressor.predict(X)

Step 5: Visualize the result of your Linear Regression model with NumPy in Python

You can visualize the data and the prediction as follows (see more about matplotlib here).

import matplotlib.pyplot as plt
alpha = str(round(lin_regressor.intercept_[0], 5))
beta = str(round(lin_regressor.coef_[0][0], 5))
fig, ax = plt.subplots()
ax.set_title(f"Alpha {alpha}, Beta {beta}")
ax.scatter(X, Y)
ax.plot(X, Y_pred, c='r')

Alpha is called constant or intercept and measures the value where the regression line crosses the y-axis.

Beta is called the coefficient or slope and measures the steepness of the linear regression.

Next step

If you want a real project with Linear Regression, then check out the video at the top of the post, which is part of a full course.

The project will look at car specs to see if there is a connection.

Want to learn more Python, then this is part of an 8 hours FREE video course with full explanations, projects on each level, and guided solutions.

The course is structured with the following resources to improve your learning experience.

  • 17 video lessons teaching you everything you need to know to get started with Python.
  • 34 Jupyter Notebooks with lesson code and projects.
  • 2 FREE eBooks to support your Python learning.

See the full FREE course page here.

If you instead want to learn more about Machine Learning. Do not worry.

Then check out my Machine Learning with Python course.

  • 15 video lessons teaching you all aspects of Machine Learning
  • 30 JuPyter Notebooks with lesson code and projects
  • 10 hours of FREE video content to support your learning journey.

Go to the course page for details.

Learn Python


  • 70 pages to get you started on your journey to master Python.
  • How to install your setup with Anaconda.
  • Written description and introduction to all concepts.
  • Jupyter Notebooks prepared for 17 projects.

Python 101: A CRASH COURSE

  1. How to get started with this 8 hours Python 101: A CRASH COURSE.
  2. Best practices for learning Python.
  3. How to download the material to follow along and create projects.
  4. A chapter for each lesson with a descriptioncode snippets for easy reference, and links to a lesson video.

Expert Data Science Blueprint

Expert Data Science Blueprint

  • Master the Data Science Workflow for actionable data insights.
  • How to download the material to follow along and create projects.
  • A chapter to each lesson with a Description, Learning Objective, and link to the lesson video.

Machine Learning

Machine Learning – The Simple Path to Mastery

  • How to get started with Machine Learning.
  • How to download the material to follow along and make the projects.
  • One chapter for each lesson with a Description, Learning Objectives, and link to the lesson video.

Leave a Comment