PyTorch Model to Detect Handwriting for Beginners

What will we cover?

  • What is PyTorch
  • PyTorch vs Tensorflow
  • Get started with PyTorch
  • Work with image classification

Step 1: What is PyTorch?

PyTorch is an optimized tensor library for deep learning using GPUs and CPUs.

What does that mean?

Well, PyTorch is an open source machine learning library and is used for computer vision and natural language processing. It is primarily developed by Facebook’s AI Research Lab.

Step 2: PyTorch and Tensorflow

Often people worry about which framework to use not to waste time.

You probably do the same – but don’t worry, if you use either PyTorch or Tensorflow, then you are on the right track. They are the most popular Deep Learning frameworks, if you learn one, then you will have an easy time to switch to the other later.

PyTorch was release in 2016 by Facebook’s Research Lab, while Tensorflow was released in 2015 by Google Brain team.

Both are good choices for Deep Learning.

Step 3: PyTorch and prepared datasets

PyTorch comes with a long list of prepared datasets and you can see them all here.

We will look at the MNIST dataset for handwritten digit-recognition.

In the video above we also look at the CIFAR10 data set, which consist of 32×32 images of 10 classes.

You can get a dataset by using torchvision.

from torchvision import datasets
data_path = 'downloads/'
mnist = datasets.MNIST(data_path, train=True, download=True)

Step 4: Getting the data and prepare data

First we need to get the data and prepare them by turning them into tensors and normalize them.

Transforming and Normalizing

  • Images are PIL objects in the MNIST dataset
  • You need to be transformed to tensor (the datatype for Tensorflow)
    • torchvision has transformations transform.ToTensor(), which turns NumPy arrays and PIL to Tensor
  • Then you need to normalize images
    • Need to determine the mean value and the standard deviation
  • Then we can apply nomalization
    • torchvision has transform.Normalize, which takes mean and standard deviation
from torchvision import datasets
from torchvision import transforms
import torch
import torch.nn as nn
from torch import optim
import matplotlib.pyplot as plt
data_path = 'downloads/'
mnist = datasets.MNIST(data_path, train=True, download=True)
mnist_val = datasets.MNIST(data_path, train=False, download=True)
mnist = datasets.MNIST(data_path, train=True, download=False, transform=transforms.ToTensor())
imgs = torch.stack([img_t for img_t, _ in mnist], dim=3)
print('get mean')
print(imgs.view(1, -1).mean(dim=1))
print('get standard deviation')
print(imgs.view(1, -1).std(dim=1))

Then we can use those values to make the transformation.

mnist = datasets.MNIST(data_path, train=True, download=False, 
                       transform=transforms.Compose([
                           transforms.ToTensor(),
                           transforms.Normalize((0.1307),
                                               (0.3081))]))
mnist_val = datasets.MNIST(data_path, train=False, download=False, 
                       transform=transforms.Compose([
                           transforms.ToTensor(),
                           transforms.Normalize((0.1307),
                                               (0.3081))]))

Step 5: Creating and testing a Model

The model we will use will be as follows.

We can model that as follows.

input_size = 784 # ?? 28*28
hidden_sizes = [128, 64]
output_size = 10
model = nn.Sequential(nn.Linear(input_size, hidden_sizes[0]),
                     nn.ReLU(),
                     nn.Linear(hidden_sizes[0], hidden_sizes[1]),
                     nn.ReLU(),
                     nn.Linear(hidden_sizes[1], output_size),
                     nn.LogSoftmax(dim=1))

Then we can train the model as follows

train_loader = torch.utils.data.DataLoader(mnist, batch_size=64,
                                           shuffle=True)
optimizer = optim.SGD(model.parameters(), lr=0.01)
loss_fn = nn.NLLLoss()
n_epochs = 10
for epoch in range(n_epochs):
    for imgs, labels in train_loader:
        optimizer.zero_grad()
        batch_size = imgs.shape[0]
        output = model(imgs.view(batch_size, -1))
        loss = loss_fn(output, labels)
        loss.backward()
        optimizer.step()
    print("Epoch: %d, Loss: %f" % (epoch, float(loss)))

And finally, test our model.

val_loader = torch.utils.data.DataLoader(mnist_val, batch_size=64,
                                           shuffle=True)

correct = 0
total = 0
with torch.no_grad():
    for imgs, labels in val_loader:
        batch_size = imgs.shape[0]
        outputs = model(imgs.view(batch_size, -1))
        _, predicted = torch.max(outputs, dim=1)
        total += labels.shape[0]
        correct += int((predicted == labels).sum())
print("Accuracy: %f", correct / total)

Reaching an accuracy of 96.44%

Want to learn more?

Want better results? Try using a CNN model.

This is part of a FREE 10h Machine Learning course with Python.

  • 15 video lessons – which explain Machine Learning concepts, demonstrate models on real data, introduce projects and show a solution (YouTube playlist).
  • 30 JuPyter Notebooks – with the full code and explanation from the lectures and projects (GitHub).
  • 15 projects – with step guides to help you structure your solutions and solution explained in the end of video lessons (GitHub).

Convolutional Neural Network: Detect Handwriting

What will we cover?

  • Understand what Convolutional Neural Network (CNN) is
  • The strength of CNN
  • How to use it to detect handwriting
  • Extract features from pictures
  • Learn Convolution, Pooling and Flatten
  • How to create a CNN

Step 1: What is Computer Vision?

Computational methods for analyzing and understanding digital images.

An example could be detecting handwriting.

Assuming familiarity with Deep Neural Network a naive approach would be to map one pixel to the input network, have some hidden layers, then detect.

If you are new to Artificial Neural Network or Deep Neural Network.

A Deep Neural Network could be given for images.

As follows.

But actually, we are not (the network) is not interested into any specifics of the pixels in the image. Also, what if the images are moved 1 pixel to the left, then this would influence the network. Hence, this approach seems not to be very good.

Step 2: What is Image Convolution?

Image Convolution is applying a filter that adds each pixel value of an image to its neighbors, weighted according to a kernel matrix.

A few techniques are given here.

Pooling

  • Reducing the size of an input by sampling from regoins in the input
  • Bascially reducing the size of the image

Max-Pooling

  • Pooling by choosing the maximum value in each region

Step 3: What is Convolutional Neural Network (CNN)?

Convolutional Neural Network (CNN) is a Neural Networks that use convolution for analyzing images.

Idea of CNN is as follows.

  • We have an input image
  • Apply Convolution – possible several to get several features of the image (feature maps)
  • Apply pooling (this reduces the input)
  • Then flatten it out to traditional network

Step 4: Handwriting detection with CNN

We will use the MNIST data base, which is a classical large datasets of handwritten digits.

Here is the code given below with some comments.

import tensorflow as tf
from tensorflow.keras.utils import to_categorical
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Conv2D, MaxPooling2D, Flatten, Dense, Dropout
# https://en.wikipedia.org/wiki/MNIST_database
mnist = tf.keras.datasets.mnist
# Read the data
(x_train, y_train), (x_test, y_test) = mnist.load_data()
# Scale it to values 0 - 1
x_train = x_train / 255.0
x_test = x_test / 255.0
y_train = to_categorical(y_train)
y_test = to_categorical(y_test)
x_train = x_train.reshape(x_train.shape[0], x_train.shape[1], x_train.shape[2], 1)
x_test = x_test.reshape(x_test.shape[0], x_test.shape[1], x_test.shape[2], 1)
# Creating a model
model = Sequential()
model.add(Conv2D(32, (3, 3), activation='relu', input_shape=(28, 28, 1)))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Flatten())
model.add(Dense(128, activation='relu'))
model.add(Dropout(.5))
model.add(Dense(10, activation='softmax'))
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
model.fit(x_train, y_train, epochs=10)
model.evaluate(x_test, y_test)

Which gives an accuracy on 98%.

Want to learn more?

Want to compare your result with a model using PyTorch?

This is part of a FREE 10h Machine Learning course with Python.

  • 15 video lessons – which explain Machine Learning concepts, demonstrate models on real data, introduce projects and show a solution (YouTube playlist).
  • 30 JuPyter Notebooks – with the full code and explanation from the lectures and projects (GitHub).
  • 15 projects – with step guides to help you structure your solutions and solution explained in the end of video lessons (GitHub).

Deep Neural Network: A Guide to Deep Learning

What will we cover?

  • Understand Deep Neural Network (DNN)
  • How algorithms calculate weights in DNN
  • Show tools to visually understand what DNN can solve

Step 1: What is Deep Neural Network?

Be sure to read the Artificial Neural Network Guide.

The adjective “deep” in deep learning refers to the use of multiple layers in the network (Wiki).

Usually having two or more hidden layers counts as deep.

Deep learning (also known as deep structured learning) is part of a broader family of machine learning methods based on artificial neural networks with representation learning.

Step 2: How to train and difficulties in training DNN

Training an Artificial Neural Network only relies on finding weights from input to output nodes. In a Deep Neural Network (DNN) become a bit more complex and requires more techniques.

To do that we need backpropagation, which is an algorithm for training Neural Networks with hidden layers (DNN).

  • Algorithm
    • Start with a random choice of weights
    • Repeat
      • Calculate error for output layer
      • For each layer – starting with output layer
        • Propagate error back one layer
        • Update weights

A problem you will encounter is overfitting. Which means to fit too close to training data and not generalize well.

That is, you fit the model to the training data, but the model will not predict well on data not coming from your training data.

To deal with that, dropout is a common technique.

  • Temporarily remove units – selectat random – from a neural network to prevent over reliance on certain units
  • Dropout value of 20%-50%
  • Better performance when dropout is used on a larger network
  • Dropout at each layer of the network has shown good results.
  • Original Paper

Step 3: Play around with it

To learn more about fitting check out the playground at tensorflow.

Ideas to check that

  • If you have no hidden layers then you can only fit with straight lines.
  • If you add hidden layers you can model the XOR function.

Step 4: A DNN model of XOR

Let’s go crazy and fit an XOR dataset with a DNN model.

import numpy as np
import pandas as pd
import tensorflow as tf
from tensorflow.keras.layers import Dense, Dropout
from tensorflow.keras.models import Sequential
from sklearn.model_selection import train_test_split
import matplotlib.pyplot as plt
data = pd.read_csv('https://raw.githubusercontent.com/LearnPythonWithRune/MachineLearningWithPython/main/files/xor.csv')
fig, ax = plt.subplots()
ax.scatter(x=data['x'], y=data['y'], c=data['class id'])
plt.show()

This is the data we want to fit.

Then let’s create it.

Remember to insert the dropout and play around with it.

X_train, X_test, y_train, y_test = train_test_split(data[['x', 'y']], data['class id'], random_state=42)
accuracies = []
for i in range(5):
    tf.random.set_seed(i)
    
    model = Sequential()
    model.add(Dense(6, input_dim=2, activation='relu'))
    # model.add(Dropout(.2))
    model.add(Dense(4, activation='relu'))
    # model.add(Dropout(.2))
    model.add(Dense(1, activation='sigmoid'))
   
    model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
    
    model.fit(X_train, y_train, epochs=100, batch_size=32, verbose=0)
    _, accuracy = model.evaluate(X_test, y_test)
    accuracies.append(accuracy*100)
    
sum(accuracies)/len(accuracies)

Resulting in accuracy of 98%.

Want to learn more?

This is part of a FREE 10h Machine Learning course with Python.

  • 15 video lessons – which explain Machine Learning concepts, demonstrate models on real data, introduce projects and show a solution (YouTube playlist).
  • 30 JuPyter Notebooks – with the full code and explanation from the lectures and projects (GitHub).
  • 15 projects – with step guides to help you structure your solutions and solution explained in the end of video lessons (GitHub).