Step 2: How to train and difficulties in training DNN
Training an Artificial Neural Network only relies on finding weights from input to output nodes. In a Deep Neural Network (DNN) become a bit more complex and requires more techniques.
To do that we need backpropagation, which is an algorithm for training Neural Networks with hidden layers (DNN).
Algorithm
Start with a random choice of weights
Repeat
Calculate error for output layer
For each layer – starting with output layer
Propagate error back one layer
Update weights
A problem you will encounter is overfitting. Which means to fit too close to training data and not generalize well.
That is, you fit the model to the training data, but the model will not predict well on data not coming from your training data.
To deal with that, dropout is a common technique.
Temporarily remove units – selectat random – from a neural network to prevent over reliance on certain units
Dropout value of 20%-50%
Better performance when dropout is used on a larger network
Dropout at each layer of the network has shown good results.
This is part of a FREE 10h Machine Learning course with Python.
15 video lessons – which explain Machine Learning concepts, demonstrate models on real data, introduce projects and show a solution (YouTube playlist).
30 JuPyter Notebooks – with the full code and explanation from the lectures and projects (GitHub).
15 projects – with step guides to help you structure your solutions and solution explained in the end of video lessons (GitHub).
How you can model other machine learning techniques
Activation functions
How to make simple OR function
Different ways to calcualte weights
What Batch sizes and Epochs are
Step 1: What is Artificial Neural Network
Artificial Neural Network are computing systems inspired by the biological neural networks that constitute animal brains.
Often just called Neural Network.
The first Neural Network is the following simple network.
Where w1 and w2 are weights and the nodes on the left represent input nodes and the node on the right is the output node.
It can also be represented with a function: h(x1, x2) = w0 + w1*x1 + w2*x2
This is a simple calculation, and the goal of the network is to find optimal weights. But we are still missing something. We need an activation function. That is, how to interpret the output.
We see the weights are one each. Then let’s analyse it with the activation function g, given by Step function.
x1 = 0 and x2=0 then we have g(-1 + x1 + x2) = g(-1 + 0 + 0) = g(-1) = 0
x1 = 1 and x2=0 then we have g(-1 + x1 + x2) = g(-1 + 1 + 0) = g(0) = 1
x1 = 0 and x2=1 then we have g(-1 + x1 + x2) = g(-1 + 0 + 1) = g(0) = 1
x1 = 1 and x2=1 then we have g(-1 + x1 + x2) = g(-1 + 1 + 1) = g(-1) = 1
Exactly like the OR function.
Step 3: Neural Network in the General Case and how to Calculate Weights
In general a Neural Networks can have any number of input and output nodes, where each input node is connected with each output node.
We will later learn about Deep Neural Network – where we can have any number of layers – but for now, let’s only focus Neural Networks with input and output layer.
Algorithm for minimizing the loss when training neural networks
Pseudo algorithm
Start with a random choice of weights
Repeat:
Calculate the gradient based on all data ponits direction that will lead to decreasing loss
Update wieghts accorinding to the gradient
Tradoff
Expensive to calculate for all data points
Stocastic Gradient Descent
Pseudo algorithm
Start with a random choice of weights
Repeat:
Calculate the gradient based on one data point direction that will lead to decreasing loss
Update wieghts accorinding to the gradient
Mini-Batch Gradient Descent
Pseudo algorithm
Start with a random choice of weights
Repeat:
Calculate the gradient based on one small batch of data ponits direction that will lead to decreasing loss
Update wieghts accorinding to the gradient
Step 4: Perceptron
The perceptron is an algorithm for supervised learning of binary classifiers. A binary classifier is a function which can decide whether or not an input, represented by a vector of numbers, belongs to some specific class.
Only capable of learning linearly separable decision boundary.
It cannot model the XOR function (we need multi-layer perceptrons (multi-layer neural network))
It can take multiple inputs and map linearly to one output with an activation function.
Let’s try some example to show it.
import numpy as np
import tensorflow as tf
from sklearn.model_selection import train_test_split
from tensorflow.keras.layers import Input, Dense
from tensorflow.keras.models import Sequential
import matplotlib.pyplot as plt
data = np.random.randn(200, 3)
data[:100, :2] += (10, 10)
data[:100, 2] = 0
data[100:, 2] = 1
fig, ax = plt.subplots()
ax.scatter(x=data[:,0], y=data[:,1], c=data[:,2])
plt.show()
This should be simple to validate if we can create a Neural Networks model to separate the two classes.
Step 5: Creating a Neural Network
First let’s create a train and test set.
X = data[:,:2]
y = data[:,2]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=42)
Then we need to create the model and set batch size and epochs.
Batch size: a set of N samples.
Epoch: an arbitrary cutoff, generally defined as “one pass over the entire dataset”.
In the video we also show how to visualize the prediction in the different way.
Want to learn more?
This is part of a FREE 10h Machine Learning course with Python.
15 video lessons – which explain Machine Learning concepts, demonstrate models on real data, introduce projects and show a solution (YouTube playlist).
30 JuPyter Notebooks – with the full code and explanation from the lectures and projects (GitHub).
15 projects – with step guides to help you structure your solutions and solution explained in the end of video lessons (GitHub).
Supervised: where you tell the algorithm what categories each data item is in. Each data item from the training set is tagged with the right answer.
Unsupervised: is when the learning algorithm is not told what to do with it and it should make the structure itself.
Reinforcement: teaches the machine to think for itself based on past action rewards.
Where we see that Unsupervised is one of the main groups.
Unsupervised learning is a type of algorithm that learns patterns from untagged data. The hope is that through mimicry, which is an important mode of learning in people, the machine is forced to build a compact internal representation of its world and then generate imaginative content from it. In contrast to supervised learning where data is tagged by an expert, e.g. as a “ball” or “fish”, unsupervised methods exhibit self-organization that captures patterns as probability densities…
After 1st iteration – the cluster centers are are no optimalAfter 10 iteration it is all in place
Want to learn more?
This is part of a FREE 10h Machine Learning course with Python.
15 video lessons – which explain Machine Learning concepts, demonstrate models on real data, introduce projects and show a solution (YouTube playlist).
30 JuPyter Notebooks – with the full code and explanation from the lectures and projects (GitHub).
15 projects – with step guides to help you structure your solutions and solution explained in the end of video lessons (GitHub).
Reinforcement Learning is like training a dog. You and the dog talk different languages. This makes it difficult to explain the dog what you want.
A common way to train a dog is like Reinforcement Learning. When the dog does something good, it get’s a reward. This teaches the dog that you want it to do it.
Said differently, if we relate it to the illustration above. The Agent is the dog. The dog is exposed to an Environment called a state. Based on this Agent (the dog) takes an Action. Based on whether you (the owner) likes the Action, you Reward the Agent.
The goal of the Agent is to get the most Reward. This way it makes it possible for you the owner to get the desired behaviour with adjusting the Reward according to the Actions.
Step 2: Markov Decision Process
The model for decision-making represents States (from the Environment), Actions (from the Agent), and the Rewards.
Written a bit mathematical.
S is the set of States
Actions(s) is the set of Actions when in state s
The transition model is P(s´, s, a)
The Reward function R(s, a, s’)
Step 3: Q-Learning
Q-learning is a model-freereinforcement learning algorithm to learn the value of an action in a particular state. It does not require a model of the environment (hence “model-free”), and it can handle problems with stochastic transitions and rewards without requiring adaptations. (wiki)
This can be modeled by a learning function Q(s, a), which estimates the value of performing action a when in state s.
import numpy as np
import random
class Field:
def __init__(self):
self.states = [-1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0]
self.state = random.randrange(0, len(self.states))
def done(self):
if self.states[self.state] != 0:
return True
else:
return False
# action: 0 => left
# action: 1 => right
def get_possible_actions(self):
actions = [0, 1]
if self.state == 0:
actions.remove(0)
if self.state == len(self.states) - 1:
actions.remove(1)
return actions
def update_next_state(self, action):
if action == 0:
if self.state == 0:
return self.state, -10
self.state -= 1
if action == 1:
if self.state == len(self.states) - 1:
return self.state, -10
self.state += 1
reward = self.states[self.state]
return self.state, reward
field = Field()
q_table = np.zeros((len(field.states), 2))
alpha = .5
epsilon = .5
gamma = .5
for _ in range(10000):
field = Field()
while not field.done():
actions = field.get_possible_actions()
if random.uniform(0, 1) < epsilon:
action = random.choice(actions)
else:
action = np.argmax(q_table[field.state])
cur_state = field.state
next_state, reward = field.update_next_state(action)
q_table[cur_state, action] = (1 - alpha)*q_table[cur_state, action] + alpha*(reward + gamma*np.max(q_table[next_state]))
Step 4: A more complex Example
Check out the video to see a More complex example.
Want to learn more?
This is part of a FREE 10h Machine Learning course with Python.
15 video lessons – which explain Machine Learning concepts, demonstrate models on real data, introduce projects and show a solution (YouTube playlist).
30 JuPyter Notebooks – with the full code and explanation from the lectures and projects (GitHub).
15 projects – with step guides to help you structure your solutions and solution explained in the end of video lessons (GitHub).
Get insight into how similar a linear classifier is to discrete classifier
Hands-on experience with multiple linear regression
Step 1: What is Multiple Linear Regression?
Multiple Linear Regression is a Supervised learning task of learning a mapping from input point to a continuous value.
Wow. What does that mean?
This might not help all, but it is the case of a Linear Regression, where there are multiple explanatory variables.
Let’s start simple – Simple Linear Regression is the case most show first. It is given one input variable (explanatory variable) and one output value (response value).
An example could be – if the temperatur is X degrees, we expect to sell Y ice creams. That is, it is trying to predict how many ice creams we sell if we are given a temperature.
Now we know that there are other factors that might have high impact other that the temperature when selling ice cream. Say, is it rainy or sunny. What time of year it is, say, it might be turist season or not.
Hence, a simple model like that might not give a very accurate estimate.
Hence, we would like to model having more input variables (explanatory variables). When we have more than one it is called Multiple Linear Regression.
Step 2: Get Example Data
Let’s take a look at some house price data.
import pandas as pd
data = pd.read_csv('https://raw.githubusercontent.com/LearnPythonWithRune/MachineLearningWithPython/main/files/house_prices.csv')
print(data.head())
Notice – you can also download the file locally from the GitHub. This will make it faster to run every time.
The output should be giving the following data.
The goal is given a row of data we want to predict the House Unit Price. That is, given all but the last column in a row, can we predict the House Unit Price (the last column).
Step 3: Plot the data
Just for fun – let’s make a scatter plot of all the houses with Latitude and Longitude.
import matplotlib.pyplot as plt
fig, ax = plt.subplots()
ax.scatter(x=data['Longitude'], y=data['House unit price'])
plt.show()
This gives the following plot.
This shows you where the houses are located, which can be interesting because house prices can be dependent on location.
Somehow it should be intuitive that the longitude and latitude should not be linearly correlated to the house price – at least not in the bigger picture.
Step 4: Correlation of the features
Before we make the Multiple Linear Regression, let’s see how the features (the columns) correlate.
data.corr()
Which gives.
This is interesting. Look at the lowest row for the correlations with House Unit Price. It shows that Distance to MRT stations negatively correlated – that is, the longer to a MRT station the lower price. This might not be surprising.
More surprising is that Latitude and Longitude are actually comparably high correlated to the House Unit Price.
This might be the case for this particular dataset.
Step 5: Check the Quality of the dataset
For the Linear Regression model to perform well, you need to check that the data quality is good. If the input data is of poor quality (missing data, outliers, wrong values, duplicates, etc.) then the model will not be very reliable.
Here we will only check for missing values.
data.isnull().sum()
Which gives.
Transaction 0
House age 0
Distance to MRT station 0
Number of convenience stores 0
Latitude 0
Longitude 0
House unit price 0
dtype: int64
This tells us that there are no missing values.
If you want to learn more about Data Quality, then check out the free course on Data Science. In that course you will learn more about Data Quality and how it impacts the accuracy of your model.
Step 6: Create a Multiple Linear Regression Model
First we need to divide them into input variables X (explanatory variables) and output values y (response values).
Then we split it into a training and testing dataset. We create the model, we fit it, we use it predict the test dataset and get a score.
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import r2_score
X = data.iloc[:,:-1]
y = data.iloc[:,-1]
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=0, test_size=.15)
lin = LinearRegression()
lin.fit(X_train, y_train)
y_pred = lin.predict(X_test)
print(r2_score(y_test, y_pred))
For this run it gave 0.68.
Is that good or bad? Well, good question. The perfect match is 1, but that should not be expected. The worse score you can get is minus infinite – so we are far from that.
In order to get an idea about it – we need to compare it with variations.
In the free Data Science course we explore how to select features and evaluate models. It is a great idea to look into that.
Want to learn more?
This is part of a FREE 10h Machine Learning course with Python.
15 video lessons – which explain Machine Learning concepts, demonstrate models on real data, introduce projects and show a solution (YouTube playlist).
30 JuPyter Notebooks – with the full code and explanation from the lectures and projects (GitHub).
15 projects – with step guides to help you structure your solutions and solution explained in the end of video lessons (GitHub).
Use the Support Vector Machine (SVM) model on data.
Explore the result of SVM on classification data.
Step 1: What is Maximum Margin Separator?
Boundary that maximizes the distances between any of the data points (Wiki)
The problem can be illustrated as follows.
Looking at the image to the left we separate all the red dots from the blue dots. This separation is perfect. But we know that this line might not be ideal if more dots are coming. Imagine another blue dot is added (right image).
Could we have chosen the a better line of separation?
As you see above – there is a better line to chose from the start. The one that is the longest from all points.
Step 2: What is Support Vector Machine (SVM)?
The Support Vector Machine solves the separation problem stated above.
But basically, it is all about classifying data. That is, given a collection of data and a set of categories for this data, the model helps classifies data into the correct categories.
Example of facial expression you might have categories of happy, sad, surprised, and angry. Then given an image of a face it can categorize it into one of the categories.
How does it do it?
Well, you need training data with correct labels.
In this tutorial we will make a gentle introduction to classification based on simple data.
Step 3: Gender classification based on height and heir length
Let’s consider the a list of measured height and hair lengths with the given gender.
import pandas as pd
url = 'https://raw.githubusercontent.com/LearnPythonWithRune/MachineLearningWithPython/main/files/gender.csv'
data = pd.read_csv(url)
print(data.head())
Resulting in this.
Height Hair length Gender
0 151 99 F
1 193 8 M
2 150 123 F
3 176 0 M
4 188 11 M
Step 4: Visualize the data
You can visualize the result as follows.
import pandas as pd
import matplotlib.pyplot as plt
url = 'https://raw.githubusercontent.com/LearnPythonWithRune/MachineLearningWithPython/main/files/gender.csv'
data = pd.read_csv(url)
data['Class'] = data['Gender'].apply(lambda x: 'r' if x == 'F' else 'b')
data = data.iloc[:25]
fig, ax = plt.subplots()
ax.scatter(x=data['Height'], y=data['Hair length'], c=data['Class'])
plt.show()
Where we only keep the first 25 points to simplify the plot.
Step 5: Creating a SVC model
We will use Sklearns SVC (Support Vector Classification (docs)) model to fit the data.
import pandas as pd
import numpy as np
from sklearn import svm
url = 'https://raw.githubusercontent.com/LearnPythonWithRune/MachineLearningWithPython/main/files/gender.csv'
data = pd.read_csv(url)
data['Class'] = data['Gender'].apply(lambda x: 'r' if x == 'F' else 'b')
X = data[['Height', 'Hair length']]
y = data['Gender']
y = np.array([0 if gender == 'M' else 1 for gender in y])
clf = svm.SVC(kernel='linear')
clf.fit(X, y)
Step 6: Visualize the model
We create a “box” to color the model prediction.
import pandas as pd
import numpy as np
from sklearn import svm
import matplotlib.pyplot as plt
url = 'https://raw.githubusercontent.com/LearnPythonWithRune/MachineLearningWithPython/main/files/gender.csv'
data = pd.read_csv(url)
data['Class'] = data['Gender'].apply(lambda x: 'r' if x == 'F' else 'b')
X = data[['Height', 'Hair length']]
y = data['Gender']
y = np.array([0 if gender == 'M' else 1 for gender in y])
clf = svm.SVC(kernel='linear')
clf.fit(X, y)
X_test = np.random.rand(10000, 2)
X_test = X_test*(70, 140) + (140, 0)
y_pred = clf.predict(X_test)
fig, ax = plt.subplots()
ax.scatter(x=X_test[:,0], y=X_test[:,1], c=y_pred, alpha=.25)
y_color = ['r' if value == 0 else 'b' for value in y]
ax.scatter(x=X['Height'], y=X['Hair length'], c=y_color)
plt.show()
Resulting in.
Want to learn more?
This is part of a FREE 10h Machine Learning course with Python.
15 video lessons – which explain Machine Learning concepts, demonstrate models on real data, introduce projects and show a solution (YouTube playlist).
30 JuPyter Notebooks – with the full code and explanation from the lectures and projects (GitHub).
15 projects – with step guides to help you structure your solutions and solution explained in the end of video lessons (GitHub).
Said differently, if you have some items you need to classify, it could be books you want to put in categories, say fiction, non-fiction, etc.
Then if you were given a pile of books with the right categories given to them, how can you make a function (the machine learning model), which on other books without labels can guess the right category.
Supervised learning simply means, that in the learning phase, the algorithm (the one creating the model) is given examples with correct labels.
Notice, that supervised learning does not only restrict to classification problems, but it could predict anything.
If you are new to Machine Learning, I advise you start with this tutorial.
The classification problem is a supervised learning task of getting a function mapping an input point to a discrete category.
There is binary classification and multiclass classification, where the binary maps into two classes, and the multi classmaps into 3 or more classes.
I find it easiest to understand with examples.
Assume we want to predict if will rain or not rain tomorrow. This is a binary classification problem, because we map into two classes: rain or no rain.
To train the model we need already labelled historic data.
Hence, the task is given rows of historic data with correct labels, train a machine learning model (a Linear Classifier in this case) with this data. Then after that, see how good it can predict future data (without the right class label).
Step 3: Linear Classification explained mathematically and visually
Some like the math behind an algorithm. If you are not one of them, focus on the visual part – it will give you the understanding you need.
The task of Supervised Learning mathematically can be explained simply with the example data above to find a function f(humidity, pressure) to predict rain or no rain.
Examples
f(93, 000.7) = rain
f(49, 1015.5) = no rain
f(79, 1031.1) = no rain
The goal of Supervised Learning is to approximate the function f – the approximation function is often denoted h.
Why not identify f precisely? Well, because it is not ideal, as this would be an overfitted function, that would predict the historic data 100% accurate, but would fail to predict future values very well.
As we work with Linear Classifiers, we want the function to be linear.
That is, we want the approximation function h, to be on the form.
x_1: Humidity
x_2: Pressure
h(x_1, x_2) = w_0 + w_1*x_1 + w_2*x_2
Hence, the goal is to optimize values w_0, w_1, w_2, to find the best classifier.
What does all this math mean?
Well, that it is a linear classifier that makes decisions based on the value of a linear combination of the characteristics.
The above diagram shows how it would classify with a line whether it will predict rain or not. On the left side, this is the data classified from historic data, and the line shows an optimized line done by the machine learning algorithm.
On the right side, we have a new input data (without label), then with this line, it would classify it as rain (assuming blue means rain).
Step 4: What is the Perceptron Classifier?
The Perceptron Classifier is a linear algorithm that can be applied to binary classification.
It learns iteratively by adding new knowledge to an already existing line.
The learning rate is given by alpha, and the learning rule is as follows (don’t worry if you don’t understand it – it is not important).
Given data point x and y update each weight according to this.
w_i = w_i + alpha*(y – h_w(x)) X x_i
The rule can also be stated as follows.
w_i = w_i + alpha(actual value – estimated value) X x_i
Said in words, it adjusted the values according to the actual values. Every time a new values comes, it adjusts the weights to fit better accordingly.
Given the line after it has been adjusted to all the training data – then it is ready to predict.
Let’s try this on real data.
Step 5: Get the Weather data we will use to train a Perceptron model with
You can get all the code in a Jupyter Notebook with the csv file here.
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
from sklearn.linear_model import Perceptron
import matplotlib.pyplot as plt
Notice that in the Notebook we have an added line %matplotlib inline, which you should add if you run in a Notebook. The code here will be aligned with PyCharm or a similar IDE.
Then let’s read the data.
data = pd.read_csv('files/weather.csv', parse_dates=True, index_col=0)
print(data.head())
If you want to read the data directly from GitHub and not download the weather.csv file, you can do that as follows.
data = pd.read_csv('https://raw.githubusercontent.com/LearnPythonWithRune/MachineLearningWithPython/main/files/weather.csv', parse_dates=True, index_col=0)
print(data.head())
This shows how many rows in each column has null value (missing values). We want to work only with a two features (columns), to keep our classification simple. Obviously, we need to keep RainTomorrow, as that is carrying the label of the class.
We select the features we want and drop the rows with null-values as follows.
The next step we need to do is to split the dataset into a features and labels.
But we also want to rename the labels from No and Yes to be numeric.
X = dataset[['Humidity3pm', 'Pressure3pm']]
y = dataset['RainTomorrow']
y = np.array([0 if value == 'No' else 1 for value in y])
Then we do the splitting as follows, where we but a random_state in order to be able to reproduce. This is often a great idea, if you randomness and encounter a problem, then you can reproduce it.
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=0)
This has divided the features into a train and test set (X_train, X_test), and the labels into a train and test (y_train, y_test) dataset.
Step 8: Train the Perceptron model and measure accuracy
Finally we want to create the model, fit it (train it), predict on the training data, and print the accuracy score.
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
from sklearn.linear_model import Perceptron
import matplotlib.pyplot as plt
data = pd.read_csv('https://raw.githubusercontent.com/LearnPythonWithRune/MachineLearningWithPython/main/files/weather.csv', parse_dates=True, index_col=0)
print(data.head())
print(data.isnull().sum())
dataset = data[['Humidity3pm', 'Pressure3pm', 'RainTomorrow']].dropna()
X = dataset[['Humidity3pm', 'Pressure3pm']]
y = dataset['RainTomorrow']
y = np.array([0 if value == 'No' else 1 for value in y])
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=0)
clf = Perceptron(random_state=0)
clf.fit(X_train, y_train)
y_pred = clf.predict(X_test)
print(accuracy_score(y_test, y_pred))
print(sum(y == 0)/len(y))
fig, ax = plt.subplots()
X_data = X.to_numpy()
y_all = clf.predict(X_data)
ax.scatter(x=X_data[:,0], y=X_data[:,1], c=y_all, alpha=.25)
plt.show()
fig, ax = plt.subplots()
ax.scatter(x=X_data[:,0], y=X_data[:,1], c=y, alpha=.25)
plt.show()
Want to learn more?
This is part of a FREE 10h Machine Learning course with Python.
15 video lessons – which explain Machine Learning concepts, demonstrate models on real data, introduce projects and show a solution (YouTube playlist).
30 JuPyter Notebooks – with the full code and explanation from the lectures and projects (GitHub).
15 projects – with step guides to help you structure your solutions and solution explained in the end of video lessons (GitHub).
Finally, we will explore a Supervised Machine Learning model called k-Nearest-Neighbors (KNN) classifier to get an understanding through practical application.
Goal of Lesson
Understand the difference between Classical Computing and Machine Learning
Know the 3 main categories of Machine Learning
Dive into Supervised Learning
Classification with 𝑘-Nearest-Neighbors Classifier (KNN)
How to classify data
What are the challenges with cleaning data
Create a project on real data with 𝑘-Nearest-Neighbor Classifier
Step 1: What is Machine Learning?
Classical Computing vs Machine Learning
In the classical computing model every thing is programmed into the algorithms.
This has the limitation that all decision logic need to be understood before usage.
And if things change, we need to modify the program.
With the modern computing model (Machine Learning) this paradigm is changes.
We feed the algorithms (models) with data.
Based on that data, the algorithms (models) make decisions in the program.
Imagine you needed to teach your child how to bike a bicycle.
In the classical computing sense, you will instruct your child how to use a specific muscle in all cases. That is, if you lose balance to the right, then activate the your third muscle in your right leg. You need instructions for all muscles in all situations.
That is a lot of instructions and chances are, you forget specific situations.
Machine Learning feeds the child data, that is it will fall, it will fail – but eventually, it will figure it out itself, without instructions on how to use the specific muscles in the body.
Well, that is actually how most learn how to bike.
Step 2: How Machine Learning Works
On a high level, Machine Learning is divided into two phases.
Learning phase: Where the algorithm (model) learns in a training environment. Like, when you support your child learning to ride the bike, like catching the child while falling not to hit too hard.
Prediction phase: Where the algorithm (model) is applied on real data. This is when the child can bike on its own.
The Learning Phase is often divided into a few steps.
Phase 1: Learning
Get Data: Identify relevant data for the problem you want to solve. This data set should represent the type of data that the Machine Learn model will use to predict from in Phase 2 (predction).
Pre-processing: This step is about cleaning up data. While the Machine Learning is awesome, it cannot figure out what good data looks like. You need to do the cleaning as well as transforming data into a desired format.
Train model: This is where the magic happens, the learning step (Train model). There are three main paradigms in machine learning.
Supervised: where you tell the algorithm what categories each data item is in. Each data item from the training set is tagged with the right answer.
Unsupervised: is when the learning algorithm is not told what to do with it and it should make the structure itself.
Reinforcement: teaches the machine to think for itself based on past action rewards.
Test model: Finally, the testing is done to see if the model is good. The training data was divided into a test set and training set. The test set is used to see if the model can predict from it. If not, a new model might be necessary.
The Prediction Phase can be illustrated as follows.
Phase 2: Prediction
Step 3: Supervised Learning explained with Example
Supervised learning can be be explained as follows.
Given a dataset of input-output pairs, learn a function to map inputs to outputs.
There are different tasks – but we start to focus on Classification. Where supervised classification is the task of learning a function mapping an input point to a discrete category.
Now the best way to understand new things is to relate it to something we already understand.
Consider the following data.
Given the Humidity and Pressure for a given day can we predict if it will rain or not.
How will a Supervised Classification algorithm work?
Learning Phase: Given a set of historical data to train the model – like the data above, given rows of Humidity and Pressure and the label Rain or No Rain. Let the algorithm work with the data and figure it out.
Note: we leave out pre-processing and testing the model here.
Prediction Phase: Let the algorithm get new data – like in the morning you read Humidity and Pressure and let the algorithm predict if will rain or not that given day.
Written mathematically, it is the task to find a function 𝑓 as follows.
Ideally: 𝑓(ℎ𝑢𝑚𝑖𝑑𝑖𝑡𝑦,𝑝𝑟𝑒𝑠𝑠𝑢𝑟𝑒)
Examples:
𝑓(93,999.7) = Rain
𝑓(49,1015.5) = No Rain
𝑓(79,1031.1) = No Rain
Goal: Approximate the function 𝑓 – the approximation function is often denoted ℎ
Step 4: Visualize the data we want to fit
We will use pandas to work with data, which is a fast, powerful, flexible and easy to use open source data analysis and manipulation tool, built on top of the Python programming language.
The data we want to work with can be downloaded from a here and stored locally. Or you can access it directly as follows.
import pandas as pd
file_dest = 'https://raw.githubusercontent.com/LearnPythonWithRune/MachineLearningWithPython/main/files/weather.csv'
data = pd.read_csv(file_dest, parse_dates=True, index_col=0)
First lets’s visualize the data we want to work with.
The goal is to make a mode which can predict Blue or Red dots.
Step 5: The k-Nearest-Neighbors Classifier
Given an input, choose the class of nearest datapoint.
𝑘-Nearest-Neighbors Classification
Given an input, choose the most common class out of the 𝑘 nearest data points
Let’s try to implement a model. We will use sklearn for that.
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score
dataset_clean = dataset.dropna()
X = dataset_clean[['Humidity3pm', 'Pressure3pm']]
y = dataset_clean['RainTomorrow']
y = np.array([0 if value == 'No' else 1 for value in y])
neigh = KNeighborsClassifier()
neigh.fit(X_train, y_train)
y_pred = neigh.predict(X_test)
accuracy_score(y_test, y_pred)
This actually covers what you need. Make sure to have the dataset data from the previous step available here.
Check out this video explaining all steps in more depth. Also, it includes a guideline for making your first project with Machine Learning along with a solution for it.
This is part of a FREE 10h Machine Learning course with Python.
15 video lessons – which explain Machine Learning concepts, demonstrate models on real data, introduce projects and show a solution (YouTube playlist).
30 JuPyter Notebooks – with the full code and explanation from the lectures and projects (GitHub).
15 projects – with step guides to help you structure your solutions and solution explained in the end of video lessons (GitHub).
We will learn what Reinforcement Learning is and how it works. Then by using Object-Oriented Programming technics (more about Object-Oriented Programming), we implement a Reinforcement Model to solve the problem of figuring out where to pick up and drop of item on a field.
Reinforcement Learning teaches the machine to think for itself based on past action rewards.
Basically, the Reinforcement Learning algorithm tries to predict actions that gives rewards and avoids punishment.
It is like training a dog. You and the dog do not talk the same language, but the dogs learns how to act based on rewards (and punishment, which I do not advise or advocate).
Hence, if a dog is rewarded for a certain action in a given situation, then next time it is exposed to a similar situation it will act the same.
Translate that to Reinforcement Learning.
The agent is the dog that is exposed to the environment.
Then the agent encounters a state.
The agent performs an action to transition to a new state.
Then after the transition the agent receives a reward or penalty (punishment).
This forms a policy to create a strategy to choose actions in a given state.
What algorithms are used for Reinforcement Learning?
The most common algorithm for Reinforcement Learning are.
Q-Learning: is a model-free reinforcement learning algorithm to learn a policy telling an agent what action to take under what circumstances.
Temporal Difference: refers to a class of model-free reinforcement learning methods which learn by bootstrapping from the current estimate of the value function.
Deep Adversarial Network: is a technique employed in the field of machine learning which attempts to fool models through malicious input.
We will focus on the Q-learning algorithm as it is easy to understand as well as powerful.
How does the Q-learning algorithm work?
As already noted, I just love this algorithm. It is “easy” to understand and powerful as you will see.
The Q-Learning algorithm has a Q-table (a Matrix of dimension state x actions – don’t worry if you do not understand what a Matrix is, you will not need the mathematical aspects of it – it is just an indexed “container” with numbers).
The agent (or Q-Learning algorithm) will be in a state.
Then in each iteration the agent needs take an action.
The agent will continuously update the reward in the Q-table.
The learning can come from either exploiting or exploring.
This translates into the following pseudo algorithm for the Q-Learning.
The agent is in a given stateºº and needs to choose an action.
Algorithm
Initialise the Q-table to all zeros
Iterate
Agent is in state state.
With probability epsilon choose to explore, else exploit.
If explore, then choose a randomaction.
If exploit, then choose the bestaction based on the current Q-table.
Update the Q-table from the new reward to the previous state.
As you can se, we have introduced the following variables.
epsilon: the probability to take a random action, which is done to explore new territory.
alpha: is the learning rate that the algorithm should make in each iteration and should be in the interval from 0 to 1.
gamma: is the discount factor used to balance the immediate and future reward. This value is usually between 0.8 and 0.99
reward: is the feedback on the action and can be any number. Negative is penalty (or punishment) and positive is a reward.
Step 2: The problem we want to solve
Here we have a description of task we want to solve.
To keep it simple, we create a field of size 10×10 positions. In that field there is an item that needs to be picked up and moved to a drop-off point.
At each position there are 6 different actions that can be taken.
Action 0: Go South if on field.
Action 1: Go North if on field.
Action 2: Go East if on field (Please notice, I mixed up East and West (East is Left here)).
Action 3: Go West if on field (Please notice, I mixed up East and West (West is right here)).
Action 4: Pickup item (it can try even if it is not there)
Action 5: Drop-off item (it can try even if it does not have it)
Based on these actions we will make a reward system.
If the agent tries to go off the field, punish with -10 in reward.
If the agent makes a (legal) move, punish with -1 in reward, as we do not want to encourage endless walking around.
If the agent tries to pick up item, but it is not there or it has it already, punish with -10 in reward.
If the agent picks up the item correct place, reward with 20.
If agent tries to drop-off item in wrong place or does not have the item, punish with -10 in reward.
If the agent drops-off item in correct place, reward with 20.
That translates into the following code. I prefer to implement this code, as I think the standard libraries that provide similar frameworks hide some important details. As an example, and shown later, how do you map this into a state in the Q-table?
Step 3: Implementing the field
First we need a way to represent the field, representing the environment our model lives in. This is defined in Step 2 and could be implemented as follows.
class Field:
def __init__(self, size, item_pickup, item_drop_off, start_position):
self.size = size
self.item_pickup = item_pickup
self.item_drop_off = item_drop_off
self.position = start_position
self.item_in_car = False
def get_number_of_states(self):
return self.size*self.size*self.size*self.size*2
def get_state(self):
state = self.position[0]*self.size*self.size*self.size*2
state = state + self.position[1]*self.size*self.size*2
state = state + self.item_pickup[0]*self.size*2
state = state + self.item_pickup[1]*2
if self.item_in_car:
state = state + 1
return state
def make_action(self, action):
(x, y) = self.position
if action == 0: # Go South
if y == self.size - 1:
return -10, False
else:
self.position = (x, y + 1)
return -1, False
elif action == 1: # Go North
if y == 0:
return -10, False
else:
self.position = (x, y - 1)
return -1, False
elif action == 2: # Go East
if x == 0:
return -10, False
else:
self.position = (x - 1, y)
return -1, False
elif action == 3: # Go West
if x == self.size - 1:
return -10, False
else:
self.position = (x + 1, y)
return -1, False
elif action == 4: # Pickup item
if self.item_in_car:
return -10, False
elif self.item_pickup != (x, y):
return -10, False
else:
self.item_in_car = True
return 20, False
elif action == 5: # Drop off item
if not self.item_in_car:
return -10, False
elif self.item_drop_off != (x, y):
self.item_pickup = (x, y)
self.item_in_car = False
return -10, False
else:
return 20, True
Step 4: A Naive approach to solve it (NON-Machine Learning)
A naive approach would to just take random actions and hope for the best. This is obviously not optimal, but nice to have as a base line to compare with.
To make an estimate on how many steps it takes you can run this code.
runs = [naive_solution() for _ in range(100)]
print(sum(runs)/len(runs))
Where we use List Comprehension (learn more about list comprehension). This gave 143579.21. Notice, you most likely will get something different, as there is a high level of randomness involved.
Step 5: Implementing our Reinforcement Learning Model
Here we give the algorithm for what we need to implement.
Algorithm
Initialise the Q-table to all zeros
Iterate
Agent is in state state.
With probability epsilon choose to explore, else exploit.
If explore, then choose a randomaction.
If exploit, then choose the bestaction based on the current Q-table.
Update the Q-table from the new reward to the previous state.
In this tutorial you will learn some basic NumPy. The best way to learn something new is to combine it with something useful. Therefore you will use the NumPy while creating your first Machine Learning project.
Step 1: What is NumPy?
NumPy is the fundamental package for scientific computing in Python.
Well, that is how it is stated on the official NumPy page.
Maybe a better question is, what do you use NumPy for and why?
Well, the main tool you use from NumPy is the NumPy array. Arrays are quite similar to Python lists, just with a few restrictions.
It can only contain one data type. That is, if a NumPy array has integers, then all entries can only be integers.
The size cannot change (immutable). That is, you can not add or remove entries, like in a Python list.
If it is a multi-dimension array, all sub-arrays must be of same shape. That is, you cannot have something similar to a Python list of list, where the first sub-list is of length 3, the second of length 7, and so on. They all must have same length (or shape).
Why would anyone use them, you might ask? They are more restrictive than Python lists.
Actually, and funny enough, making the data structures more restrictive, like NumPy arrays, can make it more efficient (faster).
Why?
Well, think about it. You know more about the data structure, and hence, do not need to make many additional checks.
Step 2: A little NumPy array basics we will use for our Machine Learning project
The data type of a NumPy array can be given as follows.
print(a1.dtype)
It will print dtype(‘int64’). That is, the full array has only one type, int64, which are 64 bit integers. That is also different from Python integers, where you actually cannot specify the size of the integers. Here you can have int8, int16, int32, int64, and more. Again restrictions, which makes it more efficient.
print(a1.shape)
The above gives the shape, here, (4,). Notice, that this shape cannot be changed, because the data structure is immutable.
Let’s create another NumPy array and try a few things.
With a little inspection you will realize that the first (a1*2) multiplies with 2 in each entry. The second (a1*a2) multiplies the entries pairwise. The third (a1 + a2) adds the entries pairwise.
Step 3: What is Machine Learning?
In the classical computing model every thing is programmed into the algorithms. This has the limitation that all decision logic need to be understood before usage. And if things change, we need to modify the program.
With the modern computing model (Machine Learning) this paradigm is changes. We feed the algorithms with data, and based on that data, we do the decisions in the program.
How Machine Learning Works
On a high level you can divide Machine Learning into two phases.
Phase 1: Learning
Phase 2: Prediction
The learing phase (Phase 1) can be divided into substeps.
It all starts with a training set (training data). This data set should represent the type of data that the Machine Learn model should be used to predict from in Phase 2 (predction).
The pre-processing step is about cleaning up data. While the Machine Learning is awesome, it cannot figure out what good data looks like. You need to do the cleaning as well as transforming data into a desired format.
Then for the magic, the learning step. There are three main paradigms in machine learning.
Supervised: where you tell the algorithm what categories each data item is in. Each data item from the training set is tagged with the right answer.
Unsupervised: is when the learning algorithm is not told what to do with it and it should make the structure itself.
Reinforcement: teaches the machine to think for itself based on past action rewards.
Finally, the testing is done to see if the model is good. The training data was divided into a test set and training set. The test set is used to see if the model can predict from it. If not, a new model might be necessary.
Then the prediction begins.
Step 4: A Linear Regression Model
Let’s try to use a Machine Learning model. One of the first model you will meet is the Linear Regression model.
Simply said, this model tries to fit data to a straight line. The best way to understand that, is to see it visually with one explanatory variable. That is, given a value (explanatory variable), can you predict the scalar response (the value you want to predict.
Say, given the temperature (explanatory variable), can you predict the sale of ice cream. Assuming there is a linear relationship, can you determine that? A guess is, the hotter it is, the more ice cream is sold. But whether a leaner model is a good predictor, is beyond the scope here.
Let’s try with some simple data.
But first we need to import a few libraries.
from sklearn.linear_model import LinearRegression
Then we generate some simple data.
x = [i for i in range(10)]
y = [i for i in range(10)]
For the case, it will be fully correlated, but it will only demonstrate it. This part is equivalent to the Get data step.
But x is the explanatory variable and y the scalar response we want to predict.
When you train the model, you give it input pairs of explanatory and scalar response. This is needed, as the model needs to learn.
After the learning you can predict data. But let’s prepare the data for the learning. This is the Pre-processing.
X = np.array(x).reshape((-1, 1))
Y = np.array(y).reshape((-1, 1))
Notice, this is very simple step, and we only need to convert the data into the correct format.
Here we will skip the test model step, as the data is simple.
To predict data we can call the model.
Y_pred = lin_regressor.predict(X)
The full code together here.
from sklearn.linear_model import LinearRegression
x = [i for i in range(10)]
y = [i for i in range(10)]
X = np.array(x).reshape((-1, 1))
Y = np.array(y).reshape((-1, 1))
lin_regressor = LinearRegression()
lin_regressor.fit(X, Y)
Y_pred = lin_regressor.predict(X)