PyTorch Model to Detect Handwriting for Beginners

What will we cover?

  • What is PyTorch
  • PyTorch vs Tensorflow
  • Get started with PyTorch
  • Work with image classification

Step 1: What is PyTorch?

PyTorch is an optimized tensor library for deep learning using GPUs and CPUs.

What does that mean?

Well, PyTorch is an open source machine learning library and is used for computer vision and natural language processing. It is primarily developed by Facebook’s AI Research Lab.

Step 2: PyTorch and Tensorflow

Often people worry about which framework to use not to waste time.

You probably do the same – but don’t worry, if you use either PyTorch or Tensorflow, then you are on the right track. They are the most popular Deep Learning frameworks, if you learn one, then you will have an easy time to switch to the other later.

PyTorch was release in 2016 by Facebook’s Research Lab, while Tensorflow was released in 2015 by Google Brain team.

Both are good choices for Deep Learning.

Step 3: PyTorch and prepared datasets

PyTorch comes with a long list of prepared datasets and you can see them all here.

We will look at the MNIST dataset for handwritten digit-recognition.

In the video above we also look at the CIFAR10 data set, which consist of 32×32 images of 10 classes.

You can get a dataset by using torchvision.

from torchvision import datasets

data_path = 'downloads/'
mnist = datasets.MNIST(data_path, train=True, download=True)

Step 4: Getting the data and prepare data

First we need to get the data and prepare them by turning them into tensors and normalize them.

Transforming and Normalizing

  • Images are PIL objects in the MNIST dataset
  • You need to be transformed to tensor (the datatype for Tensorflow)
    • torchvision has transformations transform.ToTensor(), which turns NumPy arrays and PIL to Tensor
  • Then you need to normalize images
    • Need to determine the mean value and the standard deviation
  • Then we can apply nomalization
    • torchvision has transform.Normalize, which takes mean and standard deviation
from torchvision import datasets
from torchvision import transforms
import torch
import torch.nn as nn
from torch import optim
import matplotlib.pyplot as plt

data_path = 'downloads/'
mnist = datasets.MNIST(data_path, train=True, download=True)
mnist_val = datasets.MNIST(data_path, train=False, download=True)

mnist = datasets.MNIST(data_path, train=True, download=False, transform=transforms.ToTensor())

imgs = torch.stack([img_t for img_t, _ in mnist], dim=3)

print('get mean')
print(imgs.view(1, -1).mean(dim=1))

print('get standard deviation')
print(imgs.view(1, -1).std(dim=1))

Then we can use those values to make the transformation.

mnist = datasets.MNIST(data_path, train=True, download=False, 
                       transform=transforms.Compose([
                           transforms.ToTensor(),
                           transforms.Normalize((0.1307),
                                               (0.3081))]))

mnist_val = datasets.MNIST(data_path, train=False, download=False, 
                       transform=transforms.Compose([
                           transforms.ToTensor(),
                           transforms.Normalize((0.1307),
                                               (0.3081))]))

Step 5: Creating and testing a Model

The model we will use will be as follows.

We can model that as follows.

input_size = 784 # ?? 28*28
hidden_sizes = [128, 64]
output_size = 10

model = nn.Sequential(nn.Linear(input_size, hidden_sizes[0]),
                     nn.ReLU(),
                     nn.Linear(hidden_sizes[0], hidden_sizes[1]),
                     nn.ReLU(),
                     nn.Linear(hidden_sizes[1], output_size),
                     nn.LogSoftmax(dim=1))

Then we can train the model as follows

train_loader = torch.utils.data.DataLoader(mnist, batch_size=64,
                                           shuffle=True)

optimizer = optim.SGD(model.parameters(), lr=0.01)
loss_fn = nn.NLLLoss()

n_epochs = 10
for epoch in range(n_epochs):
    for imgs, labels in train_loader:
        optimizer.zero_grad()

        batch_size = imgs.shape[0]
        output = model(imgs.view(batch_size, -1))

        loss = loss_fn(output, labels)

        loss.backward()

        optimizer.step()
    print("Epoch: %d, Loss: %f" % (epoch, float(loss)))

And finally, test our model.

val_loader = torch.utils.data.DataLoader(mnist_val, batch_size=64,
                                           shuffle=True)


correct = 0
total = 0
with torch.no_grad():
    for imgs, labels in val_loader:
        batch_size = imgs.shape[0]
        outputs = model(imgs.view(batch_size, -1))
        _, predicted = torch.max(outputs, dim=1)
        total += labels.shape[0]
        correct += int((predicted == labels).sum())
print("Accuracy: %f", correct / total)

Reaching an accuracy of 96.44%

Want to learn more?

Want better results? Try using a CNN model.

This is part of a FREE 10h Machine Learning course with Python.

  • 15 video lessons – which explain Machine Learning concepts, demonstrate models on real data, introduce projects and show a solution (YouTube playlist).
  • 30 JuPyter Notebooks – with the full code and explanation from the lectures and projects (GitHub).
  • 15 projects – with step guides to help you structure your solutions and solution explained in the end of video lessons (GitHub).

Convolutional Neural Network: Detect Handwriting

What will we cover?

  • Understand what Convolutional Neural Network (CNN) is
  • The strength of CNN
  • How to use it to detect handwriting
  • Extract features from pictures
  • Learn Convolution, Pooling and Flatten
  • How to create a CNN

Step 1: What is Computer Vision?

Computational methods for analyzing and understanding digital images.

An example could be detecting handwriting.

Assuming familiarity with Deep Neural Network a naive approach would be to map one pixel to the input network, have some hidden layers, then detect.

If you are new to Artificial Neural Network or Deep Neural Network.

A Deep Neural Network could be given for images.

As follows.

But actually, we are not (the network) is not interested into any specifics of the pixels in the image. Also, what if the images are moved 1 pixel to the left, then this would influence the network. Hence, this approach seems not to be very good.

Step 2: What is Image Convolution?

Image Convolution is applying a filter that adds each pixel value of an image to its neighbors, weighted according to a kernel matrix.

A few techniques are given here.

Pooling

  • Reducing the size of an input by sampling from regoins in the input
  • Bascially reducing the size of the image

Max-Pooling

  • Pooling by choosing the maximum value in each region

Step 3: What is Convolutional Neural Network (CNN)?

Convolutional Neural Network (CNN) is a Neural Networks that use convolution for analyzing images.

Idea of CNN is as follows.

  • We have an input image
  • Apply Convolution – possible several to get several features of the image (feature maps)
  • Apply pooling (this reduces the input)
  • Then flatten it out to traditional network

Step 4: Handwriting detection with CNN

We will use the MNIST data base, which is a classical large datasets of handwritten digits.

Here is the code given below with some comments.

import tensorflow as tf
from tensorflow.keras.utils import to_categorical
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Conv2D, MaxPooling2D, Flatten, Dense, Dropout

# https://en.wikipedia.org/wiki/MNIST_database
mnist = tf.keras.datasets.mnist

# Read the data
(x_train, y_train), (x_test, y_test) = mnist.load_data()

# Scale it to values 0 - 1
x_train = x_train / 255.0
x_test = x_test / 255.0

y_train = to_categorical(y_train)
y_test = to_categorical(y_test)

x_train = x_train.reshape(x_train.shape[0], x_train.shape[1], x_train.shape[2], 1)
x_test = x_test.reshape(x_test.shape[0], x_test.shape[1], x_test.shape[2], 1)

# Creating a model
model = Sequential()
model.add(Conv2D(32, (3, 3), activation='relu', input_shape=(28, 28, 1)))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Flatten())
model.add(Dense(128, activation='relu'))
model.add(Dropout(.5))
model.add(Dense(10, activation='softmax'))

model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
model.fit(x_train, y_train, epochs=10)

model.evaluate(x_test, y_test)

Which gives an accuracy on 98%.

Want to learn more?

Want to compare your result with a model using PyTorch?

This is part of a FREE 10h Machine Learning course with Python.

  • 15 video lessons – which explain Machine Learning concepts, demonstrate models on real data, introduce projects and show a solution (YouTube playlist).
  • 30 JuPyter Notebooks – with the full code and explanation from the lectures and projects (GitHub).
  • 15 projects – with step guides to help you structure your solutions and solution explained in the end of video lessons (GitHub).

Deep Neural Network: A Guide to Deep Learning

What will we cover?

  • Understand Deep Neural Network (DNN)
  • How algorithms calculate weights in DNN
  • Show tools to visually understand what DNN can solve

Step 1: What is Deep Neural Network?

Be sure to read the Artificial Neural Network Guide.

The adjective “deep” in deep learning refers to the use of multiple layers in the network (Wiki).

Usually having two or more hidden layers counts as deep.

Deep learning (also known as deep structured learning) is part of a broader family of machine learning methods based on artificial neural networks with representation learning.

Step 2: How to train and difficulties in training DNN

Training an Artificial Neural Network only relies on finding weights from input to output nodes. In a Deep Neural Network (DNN) become a bit more complex and requires more techniques.

To do that we need backpropagation, which is an algorithm for training Neural Networks with hidden layers (DNN).

  • Algorithm
    • Start with a random choice of weights
    • Repeat
      • Calculate error for output layer
      • For each layer – starting with output layer
        • Propagate error back one layer
        • Update weights

A problem you will encounter is overfitting. Which means to fit too close to training data and not generalize well.

That is, you fit the model to the training data, but the model will not predict well on data not coming from your training data.

To deal with that, dropout is a common technique.

  • Temporarily remove units – selectat random – from a neural network to prevent over reliance on certain units
  • Dropout value of 20%-50%
  • Better performance when dropout is used on a larger network
  • Dropout at each layer of the network has shown good results.
  • Original Paper

Step 3: Play around with it

To learn more about fitting check out the playground at tensorflow.

Ideas to check that

  • If you have no hidden layers then you can only fit with straight lines.
  • If you add hidden layers you can model the XOR function.

Step 4: A DNN model of XOR

Let’s go crazy and fit an XOR dataset with a DNN model.

import numpy as np
import pandas as pd
import tensorflow as tf
from tensorflow.keras.layers import Dense, Dropout
from tensorflow.keras.models import Sequential
from sklearn.model_selection import train_test_split
import matplotlib.pyplot as plt

data = pd.read_csv('https://raw.githubusercontent.com/LearnPythonWithRune/MachineLearningWithPython/main/files/xor.csv')

fig, ax = plt.subplots()
ax.scatter(x=data['x'], y=data['y'], c=data['class id'])
plt.show()

This is the data we want to fit.

Then let’s create it.

Remember to insert the dropout and play around with it.

X_train, X_test, y_train, y_test = train_test_split(data[['x', 'y']], data['class id'], random_state=42)

accuracies = []

for i in range(5):
    tf.random.set_seed(i)
    
    model = Sequential()
    model.add(Dense(6, input_dim=2, activation='relu'))
    # model.add(Dropout(.2))
    model.add(Dense(4, activation='relu'))
    # model.add(Dropout(.2))
    model.add(Dense(1, activation='sigmoid'))
   
    model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
    
    model.fit(X_train, y_train, epochs=100, batch_size=32, verbose=0)
    _, accuracy = model.evaluate(X_test, y_test)
    accuracies.append(accuracy*100)
    
sum(accuracies)/len(accuracies)

Resulting in accuracy of 98%.

Want to learn more?

This is part of a FREE 10h Machine Learning course with Python.

  • 15 video lessons – which explain Machine Learning concepts, demonstrate models on real data, introduce projects and show a solution (YouTube playlist).
  • 30 JuPyter Notebooks – with the full code and explanation from the lectures and projects (GitHub).
  • 15 projects – with step guides to help you structure your solutions and solution explained in the end of video lessons (GitHub).

Artificial Neural Network: The Ultimate Machine Learning Technique

What will we cover?

  • Understand Neural Networks
  • How you can model other machine learning techniques
  • Activation functions
  • How to make simple OR function
  • Different ways to calcualte weights
  • What Batch sizes and Epochs are

Step 1: What is Artificial Neural Network

Artificial Neural Network are computing systems inspired by the biological neural networks that constitute animal brains.

Often just called Neural Network.

The first Neural Network is the following simple network.

Where w1 and w2 are weights and the nodes on the left represent input nodes and the node on the right is the output node.

It can also be represented with a function: h(x1, x2) = w0 + w1*x1 + w2*x2

This is a simple calculation, and the goal of the network is to find optimal weights. But we are still missing something. We need an activation function. That is, how to interpret the output.

Here are some possible activation functions.

  • Step function: 𝑔(𝑥)=1 if 𝑥≥0, else 0
  • Rectified linear unit (ReLU): 𝑔(𝑥)=max(0,𝑥)
  • Sigmoid activation function: sigmoid(𝑥)=1/(1+exp(−𝑥))(x)

Step 2: How to model the OR function

We see the weights are one each. Then let’s analyse it with the activation function g, given by Step function.

  • x1 = 0 and x2=0 then we have g(-1 + x1 + x2) = g(-1 + 0 + 0) = g(-1) = 0
  • x1 = 1 and x2=0 then we have g(-1 + x1 + x2) = g(-1 + 1 + 0) = g(0) = 1
  • x1 = 0 and x2=1 then we have g(-1 + x1 + x2) = g(-1 + 0 + 1) = g(0) = 1
  • x1 = 1 and x2=1 then we have g(-1 + x1 + x2) = g(-1 + 1 + 1) = g(-1) = 1

Exactly like the OR function.

Step 3: Neural Network in the General Case and how to Calculate Weights

In general a Neural Networks can have any number of input and output nodes, where each input node is connected with each output node.

We will later learn about Deep Neural Network – where we can have any number of layers – but for now, let’s only focus Neural Networks with input and output layer.

To calculate weights there are several options.

Gradient Descent

  • Calculate the weights (wiki)
  • Algorithm for minimizing the loss when training neural networks

Pseudo algorithm

  • Start with a random choice of weights
  • Repeat:
    • Calculate the gradient based on all data ponits direction that will lead to decreasing loss
    • Update wieghts accorinding to the gradient

Tradoff

  • Expensive to calculate for all data points

Stocastic Gradient Descent

Pseudo algorithm

  • Start with a random choice of weights
  • Repeat:
    • Calculate the gradient based on one data point direction that will lead to decreasing loss
    • Update wieghts accorinding to the gradient

Mini-Batch Gradient Descent

Pseudo algorithm

  • Start with a random choice of weights
  • Repeat:
    • Calculate the gradient based on one small batch of data ponits direction that will lead to decreasing loss
    • Update wieghts accorinding to the gradient

Step 4: Perceptron

 The perceptron is an algorithm for supervised learning of binary classifiers. A binary classifier is a function which can decide whether or not an input, represented by a vector of numbers, belongs to some specific class.

  • Only capable of learning linearly separable decision boundary.
  • It cannot model the XOR function (we need multi-layer perceptrons (multi-layer neural network))
  • It can take multiple inputs and map linearly to one output with an activation function.

Let’s try some example to show it.

import numpy as np
import tensorflow as tf
from sklearn.model_selection import train_test_split
from tensorflow.keras.layers import Input, Dense
from tensorflow.keras.models import Sequential
import matplotlib.pyplot as plt

data = np.random.randn(200, 3)
data[:100, :2] += (10, 10)
data[:100, 2] = 0
data[100:, 2] = 1

fig, ax = plt.subplots()
ax.scatter(x=data[:,0], y=data[:,1], c=data[:,2])
plt.show()

This should be simple to validate if we can create a Neural Networks model to separate the two classes.

Step 5: Creating a Neural Network

First let’s create a train and test set.

X = data[:,:2]
y = data[:,2]

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=42)

Then we need to create the model and set batch size and epochs.

  • Batch size: a set of N samples.
  • Epoch: an arbitrary cutoff, generally defined as “one pass over the entire dataset”.
model = Sequential()
model.add(Dense(1, input_dim=2, activation='sigmoid'))
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])

model.fit(X_train, y_train, epochs=1000, batch_size=32, verbose=0)

model.evaluate(X_test, y_test)

Which should give 1.000 (100%) accuracy.

This can be visualized by.

y_pred = model.predict(X)
y_pred = np.where(y_pred < .5, 0, 1)

fig, ax = plt.subplots()
ax.scatter(x=X[:,0], y=X[:,1], c=y_pred)
plt.show()

In the video we also show how to visualize the prediction in the different way.

Want to learn more?

This is part of a FREE 10h Machine Learning course with Python.

  • 15 video lessons – which explain Machine Learning concepts, demonstrate models on real data, introduce projects and show a solution (YouTube playlist).
  • 30 JuPyter Notebooks – with the full code and explanation from the lectures and projects (GitHub).
  • 15 projects – with step guides to help you structure your solutions and solution explained in the end of video lessons (GitHub).

Master Unsupervised Learning with k-Means Clustering

What will we cover?

In this lesson we will learn about Unsupervised learning.

  • Understand how Unsupervised Learning is different from Supervised Learning
  • How it can organize data without knowledge
  • Understand how k-Means Clustering works
  • Train a 𝑘-Means Cluster model

Step 1: What is Unsupervised Learning?

Machine Learning is often divided into 3 main categories.

  • Supervised: where you tell the algorithm what categories each data item is in. Each data item from the training set is tagged with the right answer.
  • Unsupervised: is when the learning algorithm is not told what to do with it and it should make the structure itself.
  • Reinforcement: teaches the machine to think for itself based on past action rewards.

Where we see that Unsupervised is one of the main groups.

Unsupervised learning is a type of algorithm that learns patterns from untagged data. The hope is that through mimicry, which is an important mode of learning in people, the machine is forced to build a compact internal representation of its world and then generate imaginative content from it. In contrast to supervised learning where data is tagged by an expert, e.g. as a “ball” or “fish”, unsupervised methods exhibit self-organization that captures patterns as probability densities…

https://en.wikipedia.org/wiki/Unsupervised_learning

Step 2: k-Means Clustering

What is clustering?

Organize a set of objects into groups in such a way that similar objects tend to be in the same group.

What is k-Means Clustering?

Algorithm for clustering data based on repeatedly assigning points to clusters and updating those clusters’ centers.

Example of how it works in steps.
  • First we chose random cluster centroids (hollow point), then assign points to neareast centroid.
  • Then we update the centroid to be centered to the points.
  • Repeat

This can be repeated a specific number of times or until only small change in centroids positions.

Step 3: Create an Example

Let’s create some random data to demonstrate it.

import numpy as np
import pandas as pd
from sklearn.cluster import KMeans
import matplotlib.pyplot as plt

# Generate some numbers
data = np.random.randn(400,2)
data[:100] += 5, 5
data[100:200] += 10, 10
data[200:300] += 10, 5
data[300:] += 5, 10

fig, ax = plt.subplots()

ax.scatter(x=data[:,0], y=data[:,1])
plt.show()

This shows some random data in 4 clusters.

Then the following code demonstrates how it works. You can change max_iter to be the number iteration – try to do it for 1, 2, 3, etc.

model = KMeans(n_clusters=4, init='random', random_state=42, max_iter=10, n_init=1)

model.fit(data)

y_pred = model.predict(data)

fig, ax = plt.subplots()
ax.scatter(x=data[:,0], y=data[:,1], c=y_pred)
ax.scatter(x=model.cluster_centers_[:,0], y=model.cluster_centers_[:,1], c='r')
plt.show()
After 1st iteration – the cluster centers are are no optimal
After 10 iteration it is all in place

Want to learn more?

This is part of a FREE 10h Machine Learning course with Python.

  • 15 video lessons – which explain Machine Learning concepts, demonstrate models on real data, introduce projects and show a solution (YouTube playlist).
  • 30 JuPyter Notebooks – with the full code and explanation from the lectures and projects (GitHub).
  • 15 projects – with step guides to help you structure your solutions and solution explained in the end of video lessons (GitHub).

Reinforcement Learning Explained with Real Problem and Code from Scratch

What will we cover?

  • Understand how Reinforcement Learning works
  • Learn about Agent and Environment
  • How it iterates and gets rewards based on action
  • How to continuously learn new things
  • Create own Reinforcement Learning from scratch

Step 1: Reinforcement Learning simply explained

Reinforcement Learning

Reinforcement Learning is like training a dog. You and the dog talk different languages. This makes it difficult to explain the dog what you want.

A common way to train a dog is like Reinforcement Learning. When the dog does something good, it get’s a reward. This teaches the dog that you want it to do it.

Said differently, if we relate it to the illustration above. The Agent is the dog. The dog is exposed to an Environment called a state. Based on this Agent (the dog) takes an Action. Based on whether you (the owner) likes the Action, you Reward the Agent.

The goal of the Agent is to get the most Reward. This way it makes it possible for you the owner to get the desired behaviour with adjusting the Reward according to the Actions.

Step 2: Markov Decision Process

The model for decision-making represents States (from the Environment), Actions (from the Agent), and the Rewards.

Written a bit mathematical.

  • S is the set of States
  • Actions(s) is the set of Actions when in state s
  • The transition model is P(s´, s, a)
  • The Reward function R(s, a, s’)

Step 3: Q-Learning

Q-learning is a model-free reinforcement learning algorithm to learn the value of an action in a particular state. It does not require a model of the environment (hence “model-free”), and it can handle problems with stochastic transitions and rewards without requiring adaptations. (wiki)

This can be modeled by a learning function Q(s, a), which estimates the value of performing action a when in state s.

It works as follows

  • Start with Q(s, a) = 0 for all s, a
  • Update Q when we take an action

𝑄(𝑠,𝑎)=𝑄(𝑠,𝑎)+𝛼(Q(s,a)=Q(s,a)+α(reward+𝛾max(𝑠′,𝑎′)−𝑄(𝑠,𝑎))=(1−𝛼)𝑄(𝑠,𝑎)+𝛼(+γmax(s′,a′)−Q(s,a))=(1−α)Q(s,a)+α(reward+𝛾max(𝑠′,𝑎′))+γmax(s′,a′))

The ϵ-Greedy Decision Making

The idea behind it is to either explore or exploit

  • With probability ϵ take a random move
  • Otherwise, take action 𝑎a with maximum 𝑄(𝑠,𝑎)

Let’s demonstrate it with code.

Step 3: Code Example

Assume we have the following Environment

Environment
  • You start at a random point.
  • You can either move left or right.
  • You loose if you hit a red box
  • You win if you hit the green box

Quite simple, but how can you program an Agent using Reinforcement Learning? And how can you do it from scratch.

The great way is to use an object representing the field (environment).

Field representing the Environment

To implement it all there are some background resources if needed.

Programming Notes:

What if there are more states?

import numpy as np
import random

class Field:
    def __init__(self):
        self.states = [-1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0]
        self.state = random.randrange(0, len(self.states))
        
    def done(self):
        if self.states[self.state] != 0:
            return True
        else:
            return False
        
    # action: 0 => left
    # action: 1 => right
    def get_possible_actions(self):
        actions = [0, 1]
        if self.state == 0:
            actions.remove(0)
        if self.state == len(self.states) - 1:
            actions.remove(1)
        return actions

    def update_next_state(self, action):
        if action == 0:
            if self.state == 0:
                return self.state, -10
            self.state -= 1
        if action == 1:
            if self.state == len(self.states) - 1:
                return self.state, -10
            self.state += 1
        
        reward = self.states[self.state]
        return self.state, reward

field = Field()
q_table = np.zeros((len(field.states), 2))

alpha = .5
epsilon = .5
gamma = .5

for _ in range(10000):
    field = Field()
    while not field.done():
        actions = field.get_possible_actions()
        if random.uniform(0, 1) < epsilon:
            action = random.choice(actions)
        else:
            action = np.argmax(q_table[field.state])
            
        cur_state = field.state
        next_state, reward = field.update_next_state(action)
        
        q_table[cur_state, action] = (1 - alpha)*q_table[cur_state, action] + alpha*(reward + gamma*np.max(q_table[next_state]))

Step 4: A more complex Example

Check out the video to see a More complex example.

Want to learn more?

This is part of a FREE 10h Machine Learning course with Python.

  • 15 video lessons – which explain Machine Learning concepts, demonstrate models on real data, introduce projects and show a solution (YouTube playlist).
  • 30 JuPyter Notebooks – with the full code and explanation from the lectures and projects (GitHub).
  • 15 projects – with step guides to help you structure your solutions and solution explained in the end of video lessons (GitHub).

How to use Multiple Linear Regression to Predict House Prices

What will we cover?

  • Learn about Multiple Linear Regression
  • Understand difference from discrete classifier
  • Understand it is Supervised learning task
  • Get insight into how similar a linear classifier is to discrete classifier
  • Hands-on experience with multiple linear regression

Step 1: What is Multiple Linear Regression?

Multiple Linear Regression is a Supervised learning task of learning a mapping from input point to a continuous value.

Wow. What does that mean?

This might not help all, but it is the case of a Linear Regression, where there are multiple explanatory variables.

Let’s start simple – Simple Linear Regression is the case most show first. It is given one input variable (explanatory variable) and one output value (response value).

An example could be – if the temperatur is X degrees, we expect to sell Y ice creams. That is, it is trying to predict how many ice creams we sell if we are given a temperature.

Now we know that there are other factors that might have high impact other that the temperature when selling ice cream. Say, is it rainy or sunny. What time of year it is, say, it might be turist season or not.

Hence, a simple model like that might not give a very accurate estimate.

Hence, we would like to model having more input variables (explanatory variables). When we have more than one it is called Multiple Linear Regression.

Step 2: Get Example Data

Let’s take a look at some house price data.

import pandas as pd

data = pd.read_csv('https://raw.githubusercontent.com/LearnPythonWithRune/MachineLearningWithPython/main/files/house_prices.csv')
print(data.head())

Notice – you can also download the file locally from the GitHub. This will make it faster to run every time.

The output should be giving the following data.

The goal is given a row of data we want to predict the House Unit Price. That is, given all but the last column in a row, can we predict the House Unit Price (the last column).

Step 3: Plot the data

Just for fun – let’s make a scatter plot of all the houses with Latitude and Longitude.

import matplotlib.pyplot as plt

fig, ax = plt.subplots()

ax.scatter(x=data['Longitude'], y=data['House unit price'])
plt.show()

This gives the following plot.

This shows you where the houses are located, which can be interesting because house prices can be dependent on location.

Somehow it should be intuitive that the longitude and latitude should not be linearly correlated to the house price – at least not in the bigger picture.

Step 4: Correlation of the features

Before we make the Multiple Linear Regression, let’s see how the features (the columns) correlate.

data.corr()

Which gives.

This is interesting. Look at the lowest row for the correlations with House Unit Price. It shows that Distance to MRT stations negatively correlated – that is, the longer to a MRT station the lower price. This might not be surprising.

More surprising is that Latitude and Longitude are actually comparably high correlated to the House Unit Price.

This might be the case for this particular dataset.

Step 5: Check the Quality of the dataset

For the Linear Regression model to perform well, you need to check that the data quality is good. If the input data is of poor quality (missing data, outliers, wrong values, duplicates, etc.) then the model will not be very reliable.

Here we will only check for missing values.

data.isnull().sum()

Which gives.

Transaction                     0
House age                       0
Distance to MRT station         0
Number of convenience stores    0
Latitude                        0
Longitude                       0
House unit price                0
dtype: int64

This tells us that there are no missing values.

If you want to learn more about Data Quality, then check out the free course on Data Science. In that course you will learn more about Data Quality and how it impacts the accuracy of your model.

Step 6: Create a Multiple Linear Regression Model

First we need to divide them into input variables X (explanatory variables) and output values y (response values).

Then we split it into a training and testing dataset. We create the model, we fit it, we use it predict the test dataset and get a score.

from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import r2_score

X = data.iloc[:,:-1]
y = data.iloc[:,-1]

X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=0, test_size=.15)

lin = LinearRegression()
lin.fit(X_train, y_train)

y_pred = lin.predict(X_test)

print(r2_score(y_test, y_pred))

For this run it gave 0.68.

Is that good or bad? Well, good question. The perfect match is 1, but that should not be expected. The worse score you can get is minus infinite – so we are far from that.

In order to get an idea about it – we need to compare it with variations.

In the free Data Science course we explore how to select features and evaluate models. It is a great idea to look into that.

Want to learn more?

This is part of a FREE 10h Machine Learning course with Python.

  • 15 video lessons – which explain Machine Learning concepts, demonstrate models on real data, introduce projects and show a solution (YouTube playlist).
  • 30 JuPyter Notebooks – with the full code and explanation from the lectures and projects (GitHub).
  • 15 projects – with step guides to help you structure your solutions and solution explained in the end of video lessons (GitHub).
Exit mobile version