Reinforcement Learning Explained with Real Problem and Code from Scratch

What will we cover?

  • Understand how Reinforcement Learning works
  • Learn about Agent and Environment
  • How it iterates and gets rewards based on action
  • How to continuously learn new things
  • Create own Reinforcement Learning from scratch

Step 1: Reinforcement Learning simply explained

Reinforcement Learning

Reinforcement Learning is like training a dog. You and the dog talk different languages. This makes it difficult to explain the dog what you want.

A common way to train a dog is like Reinforcement Learning. When the dog does something good, it get’s a reward. This teaches the dog that you want it to do it.

Said differently, if we relate it to the illustration above. The Agent is the dog. The dog is exposed to an Environment called a state. Based on this Agent (the dog) takes an Action. Based on whether you (the owner) likes the Action, you Reward the Agent.

The goal of the Agent is to get the most Reward. This way it makes it possible for you the owner to get the desired behaviour with adjusting the Reward according to the Actions.

Step 2: Markov Decision Process

The model for decision-making represents States (from the Environment), Actions (from the Agent), and the Rewards.

Written a bit mathematical.

  • S is the set of States
  • Actions(s) is the set of Actions when in state s
  • The transition model is P(s´, s, a)
  • The Reward function R(s, a, s’)

Step 3: Q-Learning

Q-learning is a model-free reinforcement learning algorithm to learn the value of an action in a particular state. It does not require a model of the environment (hence “model-free”), and it can handle problems with stochastic transitions and rewards without requiring adaptations. (wiki)

This can be modeled by a learning function Q(s, a), which estimates the value of performing action a when in state s.

It works as follows

  • Start with Q(s, a) = 0 for all s, a
  • Update Q when we take an action

𝑄(𝑠,𝑎)=𝑄(𝑠,𝑎)+𝛼(Q(s,a)=Q(s,a)+α(reward+𝛾max(𝑠′,𝑎′)−𝑄(𝑠,𝑎))=(1−𝛼)𝑄(𝑠,𝑎)+𝛼(+γmax(s′,a′)−Q(s,a))=(1−α)Q(s,a)+α(reward+𝛾max(𝑠′,𝑎′))+γmax(s′,a′))

The ϵ-Greedy Decision Making

The idea behind it is to either explore or exploit

  • With probability ϵ take a random move
  • Otherwise, take action 𝑎a with maximum 𝑄(𝑠,𝑎)

Let’s demonstrate it with code.

Step 3: Code Example

Assume we have the following Environment

Environment
  • You start at a random point.
  • You can either move left or right.
  • You loose if you hit a red box
  • You win if you hit the green box

Quite simple, but how can you program an Agent using Reinforcement Learning? And how can you do it from scratch.

The great way is to use an object representing the field (environment).

Field representing the Environment

To implement it all there are some background resources if needed.

Programming Notes:

What if there are more states?

import numpy as np
import random

class Field:
    def __init__(self):
        self.states = [-1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0]
        self.state = random.randrange(0, len(self.states))
        
    def done(self):
        if self.states[self.state] != 0:
            return True
        else:
            return False
        
    # action: 0 => left
    # action: 1 => right
    def get_possible_actions(self):
        actions = [0, 1]
        if self.state == 0:
            actions.remove(0)
        if self.state == len(self.states) - 1:
            actions.remove(1)
        return actions

    def update_next_state(self, action):
        if action == 0:
            if self.state == 0:
                return self.state, -10
            self.state -= 1
        if action == 1:
            if self.state == len(self.states) - 1:
                return self.state, -10
            self.state += 1
        
        reward = self.states[self.state]
        return self.state, reward

field = Field()
q_table = np.zeros((len(field.states), 2))

alpha = .5
epsilon = .5
gamma = .5

for _ in range(10000):
    field = Field()
    while not field.done():
        actions = field.get_possible_actions()
        if random.uniform(0, 1) < epsilon:
            action = random.choice(actions)
        else:
            action = np.argmax(q_table[field.state])
            
        cur_state = field.state
        next_state, reward = field.update_next_state(action)
        
        q_table[cur_state, action] = (1 - alpha)*q_table[cur_state, action] + alpha*(reward + gamma*np.max(q_table[next_state]))

Step 4: A more complex Example

Check out the video to see a More complex example.

Want to learn more?

This is part of a FREE 10h Machine Learning course with Python.

  • 15 video lessons – which explain Machine Learning concepts, demonstrate models on real data, introduce projects and show a solution (YouTube playlist).
  • 30 JuPyter Notebooks – with the full code and explanation from the lectures and projects (GitHub).
  • 15 projects – with step guides to help you structure your solutions and solution explained in the end of video lessons (GitHub).

How to use Multiple Linear Regression to Predict House Prices

What will we cover?

  • Learn about Multiple Linear Regression
  • Understand difference from discrete classifier
  • Understand it is Supervised learning task
  • Get insight into how similar a linear classifier is to discrete classifier
  • Hands-on experience with multiple linear regression

Step 1: What is Multiple Linear Regression?

Multiple Linear Regression is a Supervised learning task of learning a mapping from input point to a continuous value.

Wow. What does that mean?

This might not help all, but it is the case of a Linear Regression, where there are multiple explanatory variables.

Let’s start simple – Simple Linear Regression is the case most show first. It is given one input variable (explanatory variable) and one output value (response value).

An example could be – if the temperatur is X degrees, we expect to sell Y ice creams. That is, it is trying to predict how many ice creams we sell if we are given a temperature.

Now we know that there are other factors that might have high impact other that the temperature when selling ice cream. Say, is it rainy or sunny. What time of year it is, say, it might be turist season or not.

Hence, a simple model like that might not give a very accurate estimate.

Hence, we would like to model having more input variables (explanatory variables). When we have more than one it is called Multiple Linear Regression.

Step 2: Get Example Data

Let’s take a look at some house price data.

import pandas as pd

data = pd.read_csv('https://raw.githubusercontent.com/LearnPythonWithRune/MachineLearningWithPython/main/files/house_prices.csv')
print(data.head())

Notice – you can also download the file locally from the GitHub. This will make it faster to run every time.

The output should be giving the following data.

The goal is given a row of data we want to predict the House Unit Price. That is, given all but the last column in a row, can we predict the House Unit Price (the last column).

Step 3: Plot the data

Just for fun – let’s make a scatter plot of all the houses with Latitude and Longitude.

import matplotlib.pyplot as plt

fig, ax = plt.subplots()

ax.scatter(x=data['Longitude'], y=data['House unit price'])
plt.show()

This gives the following plot.

This shows you where the houses are located, which can be interesting because house prices can be dependent on location.

Somehow it should be intuitive that the longitude and latitude should not be linearly correlated to the house price – at least not in the bigger picture.

Step 4: Correlation of the features

Before we make the Multiple Linear Regression, let’s see how the features (the columns) correlate.

data.corr()

Which gives.

This is interesting. Look at the lowest row for the correlations with House Unit Price. It shows that Distance to MRT stations negatively correlated – that is, the longer to a MRT station the lower price. This might not be surprising.

More surprising is that Latitude and Longitude are actually comparably high correlated to the House Unit Price.

This might be the case for this particular dataset.

Step 5: Check the Quality of the dataset

For the Linear Regression model to perform well, you need to check that the data quality is good. If the input data is of poor quality (missing data, outliers, wrong values, duplicates, etc.) then the model will not be very reliable.

Here we will only check for missing values.

data.isnull().sum()

Which gives.

Transaction                     0
House age                       0
Distance to MRT station         0
Number of convenience stores    0
Latitude                        0
Longitude                       0
House unit price                0
dtype: int64

This tells us that there are no missing values.

If you want to learn more about Data Quality, then check out the free course on Data Science. In that course you will learn more about Data Quality and how it impacts the accuracy of your model.

Step 6: Create a Multiple Linear Regression Model

First we need to divide them into input variables X (explanatory variables) and output values y (response values).

Then we split it into a training and testing dataset. We create the model, we fit it, we use it predict the test dataset and get a score.

from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import r2_score

X = data.iloc[:,:-1]
y = data.iloc[:,-1]

X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=0, test_size=.15)

lin = LinearRegression()
lin.fit(X_train, y_train)

y_pred = lin.predict(X_test)

print(r2_score(y_test, y_pred))

For this run it gave 0.68.

Is that good or bad? Well, good question. The perfect match is 1, but that should not be expected. The worse score you can get is minus infinite – so we are far from that.

In order to get an idea about it – we need to compare it with variations.

In the free Data Science course we explore how to select features and evaluate models. It is a great idea to look into that.

Want to learn more?

This is part of a FREE 10h Machine Learning course with Python.

  • 15 video lessons – which explain Machine Learning concepts, demonstrate models on real data, introduce projects and show a solution (YouTube playlist).
  • 30 JuPyter Notebooks – with the full code and explanation from the lectures and projects (GitHub).
  • 15 projects – with step guides to help you structure your solutions and solution explained in the end of video lessons (GitHub).

Support-Vector Machine: Classify using Sklearn

What will we cover?

In this tutorial we will cover the following.

  • Learn about the problem of seperation
  • The idea to maximize the distance
  • Work with examples to demonstrate the issue
  • Use the Support Vector Machine (SVM) model on data.
  • Explore the result of SVM on classification data.

Step 1: What is Maximum Margin Separator?

Boundary that maximizes the distances between any of the data points (Wiki)

The problem can be illustrated as follows.

Looking at the image to the left we separate all the red dots from the blue dots. This separation is perfect. But we know that this line might not be ideal if more dots are coming. Imagine another blue dot is added (right image).

Could we have chosen the a better line of separation?

As you see above – there is a better line to chose from the start. The one that is the longest from all points.

Step 2: What is Support Vector Machine (SVM)?

The Support Vector Machine solves the separation problem stated above.

In machine learningsupport-vector machines (SVMs, also support-vector networks) are supervised learning models with associated learning algorithms that analyze data for classification and regression analysis (source: wiki).

But what do we use SVM for?

  • Classify data.
  • Face detection
  • Classification of images
  • Handwriting recognition
  • Inverse geosounding problem
  • Facial expression
  • Text classification

Among things.

But basically, it is all about classifying data. That is, given a collection of data and a set of categories for this data, the model helps classifies data into the correct categories.

Example of facial expression you might have categories of happy, sad, surprised, and angry. Then given an image of a face it can categorize it into one of the categories.

How does it do it?

Well, you need training data with correct labels.

In this tutorial we will make a gentle introduction to classification based on simple data.

Step 3: Gender classification based on height and heir length

Let’s consider the a list of measured height and hair lengths with the given gender.

import pandas as pd

url = 'https://raw.githubusercontent.com/LearnPythonWithRune/MachineLearningWithPython/main/files/gender.csv'
data = pd.read_csv(url)

print(data.head())

Resulting in this.

   Height  Hair length Gender
0     151           99      F
1     193            8      M
2     150          123      F
3     176            0      M
4     188           11      M

Step 4: Visualize the data

You can visualize the result as follows.

import pandas as pd
import matplotlib.pyplot as plt

url = 'https://raw.githubusercontent.com/LearnPythonWithRune/MachineLearningWithPython/main/files/gender.csv'
data = pd.read_csv(url)

data['Class'] = data['Gender'].apply(lambda x: 'r' if x == 'F' else 'b')

data = data.iloc[:25]

fig, ax = plt.subplots()

ax.scatter(x=data['Height'], y=data['Hair length'], c=data['Class'])
plt.show()

Where we only keep the first 25 points to simplify the plot.

Step 5: Creating a SVC model

We will use Sklearns SVC (Support Vector Classification (docs)) model to fit the data.

import pandas as pd
import numpy as np
from sklearn import svm

url = 'https://raw.githubusercontent.com/LearnPythonWithRune/MachineLearningWithPython/main/files/gender.csv'
data = pd.read_csv(url)

data['Class'] = data['Gender'].apply(lambda x: 'r' if x == 'F' else 'b')

X = data[['Height', 'Hair length']]
y = data['Gender']
y = np.array([0 if gender == 'M' else 1 for gender in y])

clf = svm.SVC(kernel='linear')
clf.fit(X, y)

Step 6: Visualize the model

We create a “box” to color the model prediction.

import pandas as pd
import numpy as np
from sklearn import svm
import matplotlib.pyplot as plt

url = 'https://raw.githubusercontent.com/LearnPythonWithRune/MachineLearningWithPython/main/files/gender.csv'
data = pd.read_csv(url)

data['Class'] = data['Gender'].apply(lambda x: 'r' if x == 'F' else 'b')

X = data[['Height', 'Hair length']]
y = data['Gender']
y = np.array([0 if gender == 'M' else 1 for gender in y])

clf = svm.SVC(kernel='linear')
clf.fit(X, y)

X_test = np.random.rand(10000, 2)
X_test = X_test*(70, 140) + (140, 0)

y_pred = clf.predict(X_test)

fig, ax = plt.subplots()

ax.scatter(x=X_test[:,0], y=X_test[:,1], c=y_pred, alpha=.25)
y_color = ['r' if value == 0 else 'b' for value in y]
ax.scatter(x=X['Height'], y=X['Hair length'], c=y_color)
plt.show()

Resulting in.

Want to learn more?

This is part of a FREE 10h Machine Learning course with Python.

  • 15 video lessons – which explain Machine Learning concepts, demonstrate models on real data, introduce projects and show a solution (YouTube playlist).
  • 30 JuPyter Notebooks – with the full code and explanation from the lectures and projects (GitHub).
  • 15 projects – with step guides to help you structure your solutions and solution explained in the end of video lessons (GitHub).

Linear Classifier From Scratch Explained on Real Project

What will we cover?

The goal is to learn about Supervised Learning and explore how to use it for classification.

This includes learning

  • What is Supervised Learning
  • Understand the classification problem
  • What is the Perceptron classifier
  • How to use the Perceptron classifier as a linear classifier

Step 1: What is Supervised Learning?

Supervised learning (SL) is the machine learning task of learning a function that maps an input to an output based on example input-output pairs

wikipedia.org

Said differently, if you have some items you need to classify, it could be books you want to put in categories, say fiction, non-fiction, etc.

Then if you were given a pile of books with the right categories given to them, how can you make a function (the machine learning model), which on other books without labels can guess the right category.

Supervised learning simply means, that in the learning phase, the algorithm (the one creating the model) is given examples with correct labels.

Notice, that supervised learning does not only restrict to classification problems, but it could predict anything.

If you are new to Machine Learning, I advise you start with this tutorial.

Step 2: What is the classification problem?

The classification problem is a supervised learning task of getting a function mapping an input point to a discrete category.

There is binary classification and multiclass classification, where the binary maps into two classes, and the multi classmaps into 3 or more classes.

I find it easiest to understand with examples.

Assume we want to predict if will rain or not rain tomorrow. This is a binary classification problem, because we map into two classes: rain or no rain.

To train the model we need already labelled historic data.

Hence, the task is given rows of historic data with correct labels, train a machine learning model (a Linear Classifier in this case) with this data. Then after that, see how good it can predict future data (without the right class label).

Step 3: Linear Classification explained mathematically and visually

Some like the math behind an algorithm. If you are not one of them, focus on the visual part – it will give you the understanding you need.

The task of Supervised Learning mathematically can be explained simply with the example data above to find a function f(humidity, pressure) to predict rain or no rain.

Examples

  • f(93, 000.7) = rain
  • f(49, 1015.5) = no rain
  • f(79, 1031.1) = no rain

The goal of Supervised Learning is to approximate the function f – the approximation function is often denoted h.

Why not identify f precisely? Well, because it is not ideal, as this would be an overfitted function, that would predict the historic data 100% accurate, but would fail to predict future values very well.

As we work with Linear Classifiers, we want the function to be linear.

That is, we want the approximation function h, to be on the form.

  • x_1: Humidity
  • x_2: Pressure
  • h(x_1, x_2) = w_0 + w_1*x_1 + w_2*x_2

Hence, the goal is to optimize values w_0, w_1, w_2, to find the best classifier.

What does all this math mean?

Well, that it is a linear classifier that makes decisions based on the value of a linear combination of the characteristics.

The above diagram shows how it would classify with a line whether it will predict rain or not. On the left side, this is the data classified from historic data, and the line shows an optimized line done by the machine learning algorithm.

On the right side, we have a new input data (without label), then with this line, it would classify it as rain (assuming blue means rain).

Step 4: What is the Perceptron Classifier?

The Perceptron Classifier is a linear algorithm that can be applied to binary classification.

It learns iteratively by adding new knowledge to an already existing line.

The learning rate is given by alpha, and the learning rule is as follows (don’t worry if you don’t understand it – it is not important).

  • Given data point x and y update each weight according to this.
    • w_i = w_i + alpha*(y – h_w(x)) X x_i

The rule can also be stated as follows.

  • w_i = w_i + alpha(actual value – estimated value) X x_i

Said in words, it adjusted the values according to the actual values. Every time a new values comes, it adjusts the weights to fit better accordingly.

Given the line after it has been adjusted to all the training data – then it is ready to predict.

Let’s try this on real data.

Step 5: Get the Weather data we will use to train a Perceptron model with

You can get all the code in a Jupyter Notebook with the csv file here.

This can be downloaded from the GitHub in a zip file by clicking here.

First let’s just import all the libraries used.

import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
from sklearn.linear_model import Perceptron
import matplotlib.pyplot as plt

Notice that in the Notebook we have an added line %matplotlib inline, which you should add if you run in a Notebook. The code here will be aligned with PyCharm or a similar IDE.

Then let’s read the data.

data = pd.read_csv('files/weather.csv', parse_dates=True, index_col=0)
print(data.head())

If you want to read the data directly from GitHub and not download the weather.csv file, you can do that as follows.

data = pd.read_csv('https://raw.githubusercontent.com/LearnPythonWithRune/MachineLearningWithPython/main/files/weather.csv', parse_dates=True, index_col=0)
print(data.head())

This will result in an output similar to this.

            MinTemp  MaxTemp  Rainfall  ...  RainToday  RISK_MM RainTomorrow
Date                                    ...                                 
2008-02-01     19.5     22.4      15.6  ...        Yes      6.0          Yes
2008-02-02     19.5     25.6       6.0  ...        Yes      6.6          Yes
2008-02-03     21.6     24.5       6.6  ...        Yes     18.8          Yes
2008-02-04     20.2     22.8      18.8  ...        Yes     77.4          Yes
2008-02-05     19.7     25.7      77.4  ...        Yes      1.6          Yes

Step 6: Select features and Clean the Weather data

We want to investigate the data and figure out how much missing data there.

A great way to do that is to use isnull().

print(data.isnull().sum())

This results in the following output.

MinTemp             3
MaxTemp             2
Rainfall            6
Evaporation        51
Sunshine           16
WindGustDir      1036
WindGustSpeed    1036
WindDir9am         56
WindDir3pm         33
WindSpeed9am       26
WindSpeed3pm       25
Humidity9am        14
Humidity3pm        13
Pressure9am        20
Pressure3pm        19
Cloud9am          566
Cloud3pm          561
Temp9am             4
Temp3pm             4
RainToday           6
RISK_MM             0
RainTomorrow        0
dtype: int64

This shows how many rows in each column has null value (missing values). We want to work only with a two features (columns), to keep our classification simple. Obviously, we need to keep RainTomorrow, as that is carrying the label of the class.

We select the features we want and drop the rows with null-values as follows.

dataset = data[['Humidity3pm', 'Pressure3pm', 'RainTomorrow']].dropna()

Step 7: Split into trading and test data

The next step we need to do is to split the dataset into a features and labels.

But we also want to rename the labels from No and Yes to be numeric.

X = dataset[['Humidity3pm', 'Pressure3pm']]
y = dataset['RainTomorrow']
y = np.array([0 if value == 'No' else 1 for value in y])

Then we do the splitting as follows, where we but a random_state in order to be able to reproduce. This is often a great idea, if you randomness and encounter a problem, then you can reproduce it.

X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=0)

This has divided the features into a train and test set (X_train, X_test), and the labels into a train and test (y_train, y_test) dataset.

Step 8: Train the Perceptron model and measure accuracy

Finally we want to create the model, fit it (train it), predict on the training data, and print the accuracy score.

clf = Perceptron(random_state=0)
clf.fit(X_train, y_train)
y_pred = clf.predict(X_test)
print(accuracy_score(y_test, y_pred))

This gives an accuracy of 0.773 or 77,3% accuracy.

Is that good?

Well what if it rains 22.7% of the time? And the model always predicts No rain?

Well, then it is correct 77.3% of the time.

Let’s just check for that.

Well, it is not raining in 74.1% of the time.

print(sum(y == 0)/len(y))

Is that a good model? Well, I find the binary classifiers a bit tricky because of this problem. The best way to get an idea is to visualize it.

Step 9: Visualize the model predictions

To visualize the data we can do the following.

fig, ax = plt.subplots()
X_data = X.to_numpy()
y_all = clf.predict(X_data)
ax.scatter(x=X_data[:,0], y=X_data[:,1], c=y_all, alpha=.25)
plt.show()

This results in the following output.

Finally, let’s visualize the actual data to compare.

ax.scatter(x=X_data[:,0], y=X_data[:,1], c=y, alpha=.25)
plt.show()

Resulting in.

Here is the full code.

import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
from sklearn.linear_model import Perceptron
import matplotlib.pyplot as plt
data = pd.read_csv('https://raw.githubusercontent.com/LearnPythonWithRune/MachineLearningWithPython/main/files/weather.csv', parse_dates=True, index_col=0)
print(data.head())
print(data.isnull().sum())
dataset = data[['Humidity3pm', 'Pressure3pm', 'RainTomorrow']].dropna()
X = dataset[['Humidity3pm', 'Pressure3pm']]
y = dataset['RainTomorrow']
y = np.array([0 if value == 'No' else 1 for value in y])
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=0)
clf = Perceptron(random_state=0)
clf.fit(X_train, y_train)
y_pred = clf.predict(X_test)
print(accuracy_score(y_test, y_pred))
print(sum(y == 0)/len(y))
fig, ax = plt.subplots()
X_data = X.to_numpy()
y_all = clf.predict(X_data)
ax.scatter(x=X_data[:,0], y=X_data[:,1], c=y_all, alpha=.25)
plt.show()
fig, ax = plt.subplots()
ax.scatter(x=X_data[:,0], y=X_data[:,1], c=y, alpha=.25)
plt.show()

Want to learn more?

This is part of a FREE 10h Machine Learning course with Python.

  • 15 video lessons – which explain Machine Learning concepts, demonstrate models on real data, introduce projects and show a solution (YouTube playlist).
  • 30 JuPyter Notebooks – with the full code and explanation from the lectures and projects (GitHub).
  • 15 projects – with step guides to help you structure your solutions and solution explained in the end of video lessons (GitHub).

What is Machine Learning? Exemplified with k-Nearest-Neighbors Classifier (KNN) to Predict Weather Forecast

What will we cover?

This tutorial will explain what Machine Learning is by comparing it to classical programming. Then how Machine Learning works and the three main categories of Machine Learning: Supervised, Unsupervised, and Reinforcement Learning.

Finally, we will explore a Supervised Machine Learning model called k-Nearest-Neighbors (KNN) classifier to get an understanding through practical application.

Goal of Lesson

  • Understand the difference between Classical Computing and Machine Learning
  • Know the 3 main categories of Machine Learning
  • Dive into Supervised Learning
  • Classification with 𝑘-Nearest-Neighbors Classifier (KNN)
  • How to classify data
  • What are the challenges with cleaning data
  • Create a project on real data with 𝑘-Nearest-Neighbor Classifier

Step 1: What is Machine Learning?

Classical Computing vs Machine Learning
  • In the classical computing model every thing is programmed into the algorithms. 
    • This has the limitation that all decision logic need to be understood before usage. 
    • And if things change, we need to modify the program.
  • With the modern computing model (Machine Learning) this paradigm is changes. 
    • We feed the algorithms (models) with data.
    • Based on that data, the algorithms (models) make decisions in the program.

Imagine you needed to teach your child how to bike a bicycle.

In the classical computing sense, you will instruct your child how to use a specific muscle in all cases. That is, if you lose balance to the right, then activate the your third muscle in your right leg. You need instructions for all muscles in all situations.

That is a lot of instructions and chances are, you forget specific situations.

Machine Learning feeds the child data, that is it will fall, it will fail – but eventually, it will figure it out itself, without instructions on how to use the specific muscles in the body.

Well, that is actually how most learn how to bike.

Step 2: How Machine Learning Works

On a high level, Machine Learning is divided into two phases.

  • Learning phase: Where the algorithm (model) learns in a training environment. Like, when you support your child learning to ride the bike, like catching the child while falling not to hit too hard.
  • Prediction phase: Where the algorithm (model) is applied on real data. This is when the child can bike on its own.

The Learning Phase is often divided into a few steps.

Phase 1: Learning
  • Get Data: Identify relevant data for the problem you want to solve. This data set should represent the type of data that the Machine Learn model will use to predict from in Phase 2 (predction).
  • Pre-processing: This step is about cleaning up data. While the Machine Learning is awesome, it cannot figure out what good data looks like. You need to do the cleaning as well as transforming data into a desired format.
  • Train model: This is where the magic happens, the learning step (Train model). There are three main paradigms in machine learning.
    • Supervised: where you tell the algorithm what categories each data item is in. Each data item from the training set is tagged with the right answer.
    • Unsupervised: is when the learning algorithm is not told what to do with it and it should make the structure itself.
    • Reinforcement: teaches the machine to think for itself based on past action rewards.
  • Test model: Finally, the testing is done to see if the model is good. The training data was divided into a test set and training set. The test set is used to see if the model can predict from it. If not, a new model might be necessary.

The Prediction Phase can be illustrated as follows.

Phase 2: Prediction

Step 3: Supervised Learning explained with Example

Supervised learning can be be explained as follows.

Given a dataset of input-output pairs, learn a function to map inputs to outputs.

There are different tasks – but we start to focus on Classification. Where supervised classification is the task of learning a function mapping an input point to a discrete category.

Now the best way to understand new things is to relate it to something we already understand.

Consider the following data.

Given the Humidity and Pressure for a given day can we predict if it will rain or not.

How will a Supervised Classification algorithm work?

Learning Phase: Given a set of historical data to train the model – like the data above, given rows of Humidity and Pressure and the label Rain or No Rain. Let the algorithm work with the data and figure it out.

Note: we leave out pre-processing and testing the model here.

Prediction Phase: Let the algorithm get new data – like in the morning you read Humidity and Pressure and let the algorithm predict if will rain or not that given day.

Written mathematically, it is the task to find a function 𝑓 as follows.

Ideally: 𝑓(ℎ𝑢𝑚𝑖𝑑𝑖𝑡𝑦,𝑝𝑟𝑒𝑠𝑠𝑢𝑟𝑒)

Examples:

  • 𝑓(93,999.7) = Rain
  • 𝑓(49,1015.5) = No Rain
  • 𝑓(79,1031.1) = No Rain

Goal: Approximate the function 𝑓 – the approximation function is often denoted 

Step 4: Visualize the data we want to fit

We will use pandas to work with data, which is a fast, powerful, flexible and easy to use open source data analysis and manipulation tool, built on top of the Python programming language.

The data we want to work with can be downloaded from a here and stored locally. Or you can access it directly as follows.

import pandas as pd

file_dest = 'https://raw.githubusercontent.com/LearnPythonWithRune/MachineLearningWithPython/main/files/weather.csv'
data = pd.read_csv(file_dest, parse_dates=True, index_col=0)

First lets’s visualize the data we want to work with.

import matplotlib.pyplot as plt
import pandas as pd

file_dest = 'https://raw.githubusercontent.com/LearnPythonWithRune/MachineLearningWithPython/main/files/weather.csv'
data = pd.read_csv(file_dest, parse_dates=True, index_col=0)

dataset = data[['Humidity3pm', 'Pressure3pm', 'RainTomorrow']]

fig, ax = plt.subplots()

dataset[dataset['RainTomorrow'] == 'No'].plot.scatter(x='Humidity3pm', y='Pressure3pm', c='b', alpha=.25, ax=ax)
dataset[dataset['RainTomorrow'] == 'Yes'].plot.scatter(x='Humidity3pm', y='Pressure3pm', c='r', alpha=.25, ax=ax)

plt.show()

Resulting in.

Blue dots is no rain, Red dots is rain

The goal is to make a mode which can predict Blue or Red dots.

Step 5: The k-Nearest-Neighbors Classifier

Given an input, choose the class of nearest datapoint.

𝑘-Nearest-Neighbors Classification

  • Given an input, choose the most common class out of the 𝑘 nearest data points

Let’s try to implement a model. We will use sklearn for that.

import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score

dataset_clean = dataset.dropna()

X = dataset_clean[['Humidity3pm', 'Pressure3pm']]
y = dataset_clean['RainTomorrow']
y = np.array([0 if value == 'No' else 1 for value in y])

neigh = KNeighborsClassifier()
neigh.fit(X_train, y_train)
y_pred = neigh.predict(X_test)
accuracy_score(y_test, y_pred)

This actually covers what you need. Make sure to have the dataset data from the previous step available here.

To visualize the code you can run the following.

fig, ax = plt.subplots()

y_map = neigh.predict(X_map)

ax.scatter(x=X_map[:,0], y=X_map[:,1], c=y_map, alpha=.25)
plt.show()

Want more help?

Check out this video explaining all steps in more depth. Also, it includes a guideline for making your first project with Machine Learning along with a solution for it.

This is part of a FREE 10h Machine Learning course with Python.

  • 15 video lessons – which explain Machine Learning concepts, demonstrate models on real data, introduce projects and show a solution (YouTube playlist).
  • 30 JuPyter Notebooks – with the full code and explanation from the lectures and projects (GitHub).
  • 15 projects – with step guides to help you structure your solutions and solution explained in the end of video lessons (GitHub).

Capstone Project: Reinforcement Learning from Scratch with Python

What will we cover?

We will learn what Reinforcement Learning is and how it works. Then by using Object-Oriented Programming technics (more about Object-Oriented Programming), we implement a Reinforcement Model to solve the problem of figuring out where to pick up and drop of item on a field.

Step 1: What is Reinforcement Learning?

Reinforcement Learning is one of the 3 main categories of Machine Learning (get started with Machine Learning here) and is concerned with how intelligent agents ought to take actions in an environment in order to maximize the notion of cumulative reward.

How Reinforcement Learning works

Reinforcement Learning teaches the machine to think for itself based on past action rewards.

  • Basically, the Reinforcement Learning algorithm tries to predict actions that gives rewards and avoids punishment.
  • It is like training a dog. You and the dog do not talk the same language, but the dogs learns how to act based on rewards (and punishment, which I do not advise or advocate).
  • Hence, if a dog is rewarded for a certain action in a given situation, then next time it is exposed to a similar situation it will act the same.
  • Translate that to Reinforcement Learning.
    • The agent is the dog that is exposed to the environment.
    • Then the agent encounters a state.
    • The agent performs an action to transition to a new state.
    • Then after the transition the agent receives a reward or penalty (punishment).
    • This forms a policy to create a strategy to choose actions in a given state.

What algorithms are used for Reinforcement Learning?

  • The most common algorithm for Reinforcement Learning are.
    • Q-Learning: is a model-free reinforcement learning algorithm to learn a policy telling an agent what action to take under what circumstances.
    • Temporal Difference: refers to a class of model-free reinforcement learning methods which learn by bootstrapping from the current estimate of the value function.
    • Deep Adversarial Network: is a technique employed in the field of machine learning which attempts to fool models through malicious input.
  • We will focus on the Q-learning algorithm as it is easy to understand as well as powerful.

How does the Q-learning algorithm work?

  • As already noted, I just love this algorithm. It is “easy” to understand and powerful as you will see.
  • The Q-Learning algorithm has a Q-table (a Matrix of dimension state x actions – don’t worry if you do not understand what a Matrix is, you will not need the mathematical aspects of it – it is just an indexed “container” with numbers).
  • The agent (or Q-Learning algorithm) will be in a state.
  • Then in each iteration the agent needs take an action.
  • The agent will continuously update the reward in the Q-table.
  • The learning can come from either exploiting or exploring.
  • This translates into the following pseudo algorithm for the Q-Learning.
  • The agent is in a given stateºº and needs to choose an action.

Algorithm

  • Initialise the Q-table to all zeros
  • Iterate
    • Agent is in state state.
    • With probability epsilon choose to explore, else exploit.
      • If explore, then choose a random action.
      • If exploit, then choose the best action based on the current Q-table.
    • Update the Q-table from the new reward to the previous state.
    • Q[state, action] = (1 – alpha) * Q[state, action] + alpha * (reward + gamma * max(Q[new_state]) — Q[state, action])

Variables

As you can se, we have introduced the following variables.

  • epsilon: the probability to take a random action, which is done to explore new territory.
  • alpha: is the learning rate that the algorithm should make in each iteration and should be in the interval from 0 to 1.
  • gamma: is the discount factor used to balance the immediate and future reward. This value is usually between 0.8 and 0.99
  • reward: is the feedback on the action and can be any number. Negative is penalty (or punishment) and positive is a reward.

Step 2: The problem we want to solve

Here we have a description of task we want to solve.

  • To keep it simple, we create a field of size 10×10 positions. In that field there is an item that needs to be picked up and moved to a drop-off point.
  • At each position there are 6 different actions that can be taken.
    • Action 0: Go South if on field.
    • Action 1: Go North if on field.
    • Action 2: Go East if on field (Please notice, I mixed up East and West (East is Left here)).
    • Action 3: Go West if on field (Please notice, I mixed up East and West (West is right here)).
    • Action 4: Pickup item (it can try even if it is not there)
    • Action 5: Drop-off item (it can try even if it does not have it)
  • Based on these actions we will make a reward system.
    • If the agent tries to go off the field, punish with -10 in reward.
    • If the agent makes a (legal) move, punish with -1 in reward, as we do not want to encourage endless walking around.
    • If the agent tries to pick up item, but it is not there or it has it already, punish with -10 in reward.
    • If the agent picks up the item correct place, reward with 20.
    • If agent tries to drop-off item in wrong place or does not have the item, punish with -10 in reward.
    • If the agent drops-off item in correct place, reward with 20.
  • That translates into the following code. I prefer to implement this code, as I think the standard libraries that provide similar frameworks hide some important details. As an example, and shown later, how do you map this into a state in the Q-table?

Step 3: Implementing the field

First we need a way to represent the field, representing the environment our model lives in. This is defined in Step 2 and could be implemented as follows.

class Field:
    def __init__(self, size, item_pickup, item_drop_off, start_position):
        self.size = size
        self.item_pickup = item_pickup
        self.item_drop_off = item_drop_off
        self.position = start_position
        self.item_in_car = False
        
    def get_number_of_states(self):
        return self.size*self.size*self.size*self.size*2
    
    def get_state(self):
        state = self.position[0]*self.size*self.size*self.size*2
        state = state + self.position[1]*self.size*self.size*2
        state = state + self.item_pickup[0]*self.size*2
        state = state + self.item_pickup[1]*2
        if self.item_in_car:
            state = state + 1
        return state
        
    def make_action(self, action):
        (x, y) = self.position
        if action == 0:  # Go South
            if y == self.size - 1:
                return -10, False
            else:
                self.position = (x, y + 1)
                return -1, False
        elif action == 1:  # Go North
            if y == 0:
                return -10, False
            else:
                self.position = (x, y - 1)
                return -1, False
        elif action == 2:  # Go East
            if x == 0:
                return -10, False
            else:
                self.position = (x - 1, y)
                return -1, False
        elif action == 3:  # Go West
            if x == self.size - 1:
                return -10, False
            else:
                self.position = (x + 1, y)
                return -1, False
        elif action == 4:  # Pickup item
            if self.item_in_car:
                return -10, False
            elif self.item_pickup != (x, y):
                return -10, False
            else:
                self.item_in_car = True
                return 20, False
        elif action == 5:  # Drop off item
            if not self.item_in_car:
                return -10, False
            elif self.item_drop_off != (x, y):
                self.item_pickup = (x, y)
                self.item_in_car = False
                return -10, False
            else:
                return 20, True

Step 4: A Naive approach to solve it (NON-Machine Learning)

A naive approach would to just take random actions and hope for the best. This is obviously not optimal, but nice to have as a base line to compare with.

def naive_solution():
    size = 10
    item_start = (0, 0)
    item_drop_off = (9, 9)
    start_position = (0, 9)
    
    field = Field(size, item_start, item_drop_off, start_position)
    done = False
    steps = 0
    
    while not done:
        action = random.randint(0, 5)
        reward, done = field.make_action(action)
        steps = steps + 1
    
    return steps

To make an estimate on how many steps it takes you can run this code.

runs = [naive_solution() for _ in range(100)]
print(sum(runs)/len(runs))

Where we use List Comprehension (learn more about list comprehension). This gave 143579.21. Notice, you most likely will get something different, as there is a high level of randomness involved.

Step 5: Implementing our Reinforcement Learning Model

Here we give the algorithm for what we need to implement.

Algorithm

  • Initialise the Q-table to all zeros
  • Iterate
    • Agent is in state state.
    • With probability epsilon choose to explore, else exploit.
      • If explore, then choose a random action.
      • If exploit, then choose the best action based on the current Q-table.
    • Update the Q-table from the new reward to the previous state.
    • Q[state, action] = (1 – alpha) * Q[state, action] + alpha * (reward + gamma * max(Q[new_state]) — Q[state, action])

Then we end up with the following code to train our Q-table.

size = 10
item_start = (0, 0)
item_drop_off = (9, 9)
start_position = (0, 9)

field = Field(size, item_start, item_drop_off, start_position)

number_of_states = field.get_number_of_states()
number_of_actions = 6

q_table = np.zeros((number_of_states, number_of_actions))

epsilon = 0.1
alpha = 0.1
gamma = 0.6

for _ in range(10000):
    field = Field(size, item_start, item_drop_off, start_position)
    done = False
    
    while not done:
        state = field.get_state()
        if random.uniform(0, 1) < epsilon:
            action = random.randint(0, 5)
        else:
            action = np.argmax(q_table[state])
            
        reward, done = field.make_action(action)
        # Q[state, action] = (1 – alpha) * Q[state, action] + alpha * (reward + gamma * max(Q[new_state]) — Q[state, action])
        
        new_state = field.get_state()
        new_state_max = np.max(q_table[new_state])
        
        q_table[state, action] = (1 - alpha)*q_table[state, action] + alpha*(reward + gamma*new_state_max - q_table[state, action])

Then we can apply our model as follows.

def reinforcement_learning():
    epsilon = 0.1
    alpha = 0.1
    gamma = 0.6
    
    field = Field(size, item_start, item_drop_off, start_position)
    done = False
    steps = 0
    
    while not done:
        state = field.get_state()
        if random.uniform(0, 1) < epsilon:
            action = random.randint(0, 5)
        else:
            action = np.argmax(q_table[state])
            
        reward, done = field.make_action(action)
        # Q[state, action] = (1 – alpha) * Q[state, action] + alpha * (reward + gamma * max(Q[new_state]) — Q[state, action])
        
        new_state = field.get_state()
        new_state_max = np.max(q_table[new_state])
        
        q_table[state, action] = (1 - alpha)*q_table[state, action] + alpha*(reward + gamma*new_state_max - q_table[state, action])
        
        steps = steps + 1
    
    return steps

And evaluate it as follows.

runs_rl = [reinforcement_learning() for _ in range(100)]
print(sum(runs_rl)/len(runs_rl))

This resulted in 47.45. Again, you should get something different.

But a comparison to taking random moves (Step 4) it is a factor 3000 better.

Want more?

Want to learn more Python, then this is part of a 8 hours FREE video course with full explanations, projects on each levels, and guided solutions.

The course is structured with the following resources to improve your learning experience.

  • 17 video lessons teaching you everything you need to know to get started with Python.
  • 34 Jupyter Notebooks with lesson code and projects.
  • A FREE 70+ pages eBook with all the learnings from the lessons.

See the full FREE course page here.

If you instead want to learn more about Machine Learning. Do not worry.

Then check out my Machine Learning with Python course.

  • 15 video lessons teaching you all aspects of Machine Learning
  • 30 JuPyter Notebooks with lesson code and projects
  • 10 hours FREE video content to support your learning journey.

Go to the course page for details.

Learn NumPy Basics with your first Machine Learning Project

What will we cover?

In this tutorial you will learn some basic NumPy. The best way to learn something new is to combine it with something useful. Therefore you will use the NumPy while creating your first Machine Learning project.

Step 1: What is NumPy?

NumPy is the fundamental package for scientific computing in Python.

NumPy.org

Well, that is how it is stated on the official NumPy page.

Maybe a better question is, what do you use NumPy for and why?

Well, the main tool you use from NumPy is the NumPy array. Arrays are quite similar to Python lists, just with a few restrictions.

  1. It can only contain one data type. That is, if a NumPy array has integers, then all entries can only be integers.
  2. The size cannot change (immutable). That is, you can not add or remove entries, like in a Python list.
  3. If it is a multi-dimension array, all sub-arrays must be of same shape. That is, you cannot have something similar to a Python list of list, where the first sub-list is of length 3, the second of length 7, and so on. They all must have same length (or shape).

Why would anyone use them, you might ask? They are more restrictive than Python lists.

Actually, and funny enough, making the data structures more restrictive, like NumPy arrays, can make it more efficient (faster).

Why?

Well, think about it. You know more about the data structure, and hence, do not need to make many additional checks.

Step 2: A little NumPy array basics we will use for our Machine Learning project

A NumPy array can be created of a list.

import numpy as np

a1 = np.array([1, 2, 3, 4])
print(a1)

Which will print.

array([1, 2, 3, 4])

The data type of a NumPy array can be given as follows.

print(a1.dtype)

It will print dtype(‘int64’). That is, the full array has only one type, int64, which are 64 bit integers. That is also different from Python integers, where you actually cannot specify the size of the integers. Here you can have int8, int16, int32, int64, and more. Again restrictions, which makes it more efficient.

print(a1.shape)

The above gives the shape, here, (4,). Notice, that this shape cannot be changed, because the data structure is immutable.

Let’s create another NumPy array and try a few things.

a1 = np.array([1, 2, 3, 4])
a2 = np.array([5, 6, 7, 8])

print(a1*2)
print(a1*a2)
print(a1 + a2)

Which results in.

array([2, 4, 6, 8])
array([ 5, 12, 21, 32])
array([ 6,  8, 10, 12])

With a little inspection you will realize that the first (a1*2) multiplies with 2 in each entry. The second (a1*a2) multiplies the entries pairwise. The third (a1 + a2) adds the entries pairwise.

Step 3: What is Machine Learning?

  • In the classical computing model every thing is programmed into the algorithms. This has the limitation that all decision logic need to be understood before usage. And if things change, we need to modify the program.
  • With the modern computing model (Machine Learning) this paradigm is changes. We feed the algorithms with data, and based on that data, we do the decisions in the program.

How Machine Learning Works

  • On a high level you can divide Machine Learning into two phases.
    • Phase 1: Learning
    • Phase 2: Prediction
  • The learing phase (Phase 1) can be divided into substeps.
  • It all starts with a training set (training data). This data set should represent the type of data that the Machine Learn model should be used to predict from in Phase 2 (predction).
  • The pre-processing step is about cleaning up data. While the Machine Learning is awesome, it cannot figure out what good data looks like. You need to do the cleaning as well as transforming data into a desired format.
  • Then for the magic, the learning step. There are three main paradigms in machine learning.
    • Supervised: where you tell the algorithm what categories each data item is in. Each data item from the training set is tagged with the right answer.
    • Unsupervised: is when the learning algorithm is not told what to do with it and it should make the structure itself.
    • Reinforcement: teaches the machine to think for itself based on past action rewards.
  • Finally, the testing is done to see if the model is good. The training data was divided into a test set and training set. The test set is used to see if the model can predict from it. If not, a new model might be necessary.

Then the prediction begins.

Step 4: A Linear Regression Model

Let’s try to use a Machine Learning model. One of the first model you will meet is the Linear Regression model.

Simply said, this model tries to fit data to a straight line. The best way to understand that, is to see it visually with one explanatory variable. That is, given a value (explanatory variable), can you predict the scalar response (the value you want to predict.

Say, given the temperature (explanatory variable), can you predict the sale of ice cream. Assuming there is a linear relationship, can you determine that? A guess is, the hotter it is, the more ice cream is sold. But whether a leaner model is a good predictor, is beyond the scope here.

Let’s try with some simple data.

But first we need to import a few libraries.

from sklearn.linear_model import LinearRegression

Then we generate some simple data.

x = [i for i in range(10)]
y = [i for i in range(10)]

For the case, it will be fully correlated, but it will only demonstrate it. This part is equivalent to the Get data step.

But x is the explanatory variable and y the scalar response we want to predict.

When you train the model, you give it input pairs of explanatory and scalar response. This is needed, as the model needs to learn.

After the learning you can predict data. But let’s prepare the data for the learning. This is the Pre-processing.

X = np.array(x).reshape((-1, 1))
Y = np.array(y).reshape((-1, 1))

Notice, this is very simple step, and we only need to convert the data into the correct format.

Then we can train the model (train model).

lin_regressor = LinearRegression()
lin_regressor.fit(X, Y)

Here we will skip the test model step, as the data is simple.

To predict data we can call the model.

Y_pred = lin_regressor.predict(X)

The full code together here.

from sklearn.linear_model import LinearRegression

x = [i for i in range(10)]
y = [i for i in range(10)]

X = np.array(x).reshape((-1, 1))
Y = np.array(y).reshape((-1, 1))

lin_regressor = LinearRegression()
lin_regressor.fit(X, Y)

Y_pred = lin_regressor.predict(X)

Step 5: Visualize the result

You can visualize the data and the prediction as follows (see more about matplotlib here).

import matplotlib.pyplot as plt

alpha = str(round(lin_regressor.intercept_[0], 5))
beta = str(round(lin_regressor.coef_[0][0], 5))

fig, ax = plt.subplots()

ax.set_title(f"Alpha {alpha}, Beta {beta}")
ax.scatter(X, Y)
ax.plot(X, Y_pred, c='r')

Alpha is called constant or intercept and measures the value where the regression line crosses the y-axis.

Beta is called coefficient or slope and measures the steepness of the linear regression.

Next step

If you want a real project with Linear Regression, then check out the video in the top of the post, which is part of a full course.

The project will look at car specs to see if there is a connection.

Want to learn more Python, then this is part of a 8 hours FREE video course with full explanations, projects on each levels, and guided solutions.

The course is structured with the following resources to improve your learning experience.

  • 17 video lessons teaching you everything you need to know to get started with Python.
  • 34 Jupyter Notebooks with lesson code and projects.
  • A FREE 70+ pages eBook with all the learnings from the lessons.

See the full FREE course page here.

If you instead want to learn more about Machine Learning. Do not worry.

Then check out my Machine Learning with Python course.

  • 15 video lessons teaching you all aspects of Machine Learning
  • 30 JuPyter Notebooks with lesson code and projects
  • 10 hours FREE video content to support your learning journey.

Go to the course page for details.

How to Learn Python for Data Science

What will we cover?

  • Is Python the correct language to learn for a Data Scientist?
  • How much Python do you need to learn as a Data Scientist?
  • How to learn Python fast?
  • How long does it take to become good at Python?
  • How to get started with Python?

Is Python the correct language to learn for a Data Scientist?

That is a good question to ask yourself. You want to become a Data Scientist, maybe you have some experience, but feel weak in the programming aspect, or maybe you start from scratch.

If I was to start my journey as a Data Scientist one of the questions I would ask myself, is, do I have the tools for it.

R is often high on the scale of programming language and environment to use as a Data Scientist. The language R is designed for effective data handling, operations on arrays and matrices, has data analysis tools, graphical facilities, and well established environment.

That sounds like all we need, so why bother looking further?

In the top there is a battle between two candidates: Python vs R.

Actually, Python is a general purpose language that has a wide aspects of uses, not only Data Scientist. Also, web services, game development, big data backend systems processing high volume data, just to mention a few.

With this description, it looks like R is tailored for Data Science, while Python is used for everything. The choice seems easy – do you want a tool made for the purpose, or something for general purpose?

Funny enough, as it might seem at first, Python has become more popular than R. Why is that?

A few reasons why Python is more popular than R.

  • Python is easy to use and learn.
  • Python has powerfull fast libraries.
  • Python has a huge community and it is easy to get help.
  • Python has easy data handling tools for reading and generating spreadsheets, parquet files, csv files, web scraping, sql databasis, and much more.
  • Python has great Machine Learning libraries developed by giants like Google (tensorflow) and Facebook (PyTorch).
  • Python support graphical data representation with libraries like Matplotlib.
  • Python has SciKit-learn for predictive data analysis.
  • Python has easy to use data representation with NumPy and pandas.

…and the list could go on.

Python is also a great fit when you want to build tailored-made system, which integrate up against any other platform or service, like automatically get data from various sources.

Do I need a Computer Science degree to use Python?

Python is programming and programmers have computer science degrees. Do you need one to become a good Data Scientist?

The short answer is: No.

A Computer Science degrees will enable you to build anything. Let’s try to think of it differently.

Think of transportation – car, busses, bikes, trains, which can move you from A to B. People without a driving license can use busses and trains. All they need is to know how to buy a ticket, understand a schedule to find out to get from A to B. If you get a driver license, then you can driver your own car. Finally, if you are a car mechanics, you can repair and possibly build your own car.

Similarly, a computer science degree will enable you to build cars, busses, trains, and more, which other people can use. A Data Scientist is like a person with a driver license, and you don’t need to be able to repair a car to drive it. That is, you only need to understand and navigate the dashboard in the car.

Data Science is the same., you need to understand the things you use, but you do not need to be able to build them yourself.

But wait! You might object. It is still programming, when I use the things I use.

Yes, but the level of programming is simple and you use the complicated things like you use a car without being a car mechanics.

Feel more comfortable?

How to Learn Python Fast?

Now you are ready and know what you want – how to get there fastest without wasting time.

Maybe one question before that .

Can everybody learn Python? Do you need special skills?

I have so far never met anyone, which could not learn Python to the level of Data Science – and honestly, also for the level of Computer Scientist. It is just a question about dedication and interest to get to the last steps.

But becoming a Data Scientist using Python is not a problem.

The question is more how to learn it fast? The best way to answer that is to look at some of the most common pitfalls that make people learn it slower and some give up on the way.

Pitfall 1: I understand the solution when I see, but why couldn’t I figure it out – am I stupid?

Did you ever learn a new language – a speaking one – like English. If you are non-native English, then you started learning English at once. Remember that?

First you started understanding a few words. Then you started to understand full sentences when people where speaking English, but you could barely express yourself in English yourself. It took time to get there.

Programming is the same – at first you can read and understand the solutions to your problem, it takes time for you to be able to express yourself in programming language.

The feeling you have while trying to solve a programming problem for long time, but not succeeding can be devastating. Then when you see the solution and it looks simple, then you start to feel stupid.

But stop there – this is normal. You learn first to understand code before you can express yourself in code. Just like learning a new speaking language.

We have all been there – and we still get there – just with different more complex problems. It will never end, you will just become comfortable about it and the challenges you face will be more and more complex.

Pitfall 2: Get distracted when it gets tough

When something gets difficult the easy exit is to quit and start something new easier.

Maybe you think, this is too difficult for me – I am not smart enough. This is more fun, so I start this now.

The truth is, that every talented programmer on planet earth has multiple times been stuck at a problem for days – not being able to solve it – if it was a bug or just a difficult problem to solve does not matter. But they have all been struggling with a problem for long time.

This can be quite difficult to deal with as a beginner. You sit with a problem, which does not seem hard and you feel like everyone else can solve it – the logical conclusion is that you are not smart enough, right?

Then you might change to another programming project – and think that is fine, you will still learn programming.

But the truth is, that solving hard problems or finding bugs is not easy. It takes time and you will learn a lot from it. Escaping to another project will not teach you as much as the difficult ones.

The best programmers are the ones that never give up when it gets tough. This is what the highly paid consultant are paid for, solving problems where other give up.

Pitfall 3: Different sources of learning

This is often difficult to understand in the beginning. But there are many styles in programming.

When you know people and been working professionally with them in a development environment for long time, you can actually see who coded it. Their style falls through.

Why does that matter?

In the beginning it does. Because, what most also fail to understand in the beginning is, that you can solve problems in endless ways. There is often no perfect solution for a problem, only different solutions which different tradeoffs.

As a beginner, you want to learn programming and you will not see the differences in styles. But if you starte learning from one person, then another one, then yet another one, then it becomes difficult.

This has never been more relevant in the age where so many people share online learning.

Again, it is like learning English with a specific dialect and different vocabulary. It is difficult in the beginning to distinguish between them, and difficult to see it matters. But in the long run you will speak English optimized for your environment.

Keep focused learning from one source. Do not change from one place to another all the time. Master the basics from one place until you are comfortable about it.

Pitfall 4: Comparing yourself to others

We often compare our learning journeys to others. You need to know if you are doing good or bad, if you need to adjust your approach or not.

This sounds good, right?

You need to keep in touch with reality and not waste time.

This is a major pitfall. You will see solutions to your problems, which are solved more elegant. There will be people that ‘just started’ and are already typing in code like you would never dream of.

This is devastating. Do you not have what it takes?

As hard as it is to accept, that you are not the fastest learner, and you need to work harder than others to reach the same. It is just as hard to realize, that the people you compare yourself with are often the top-of-the-top.

We all have our own journey. Mine is different from yours. I was good at one thing in the beginning, but you are awesome at something I never understood.

Accept that we all have our own journey – there will be times when you feel like the only one not understanding something simple (or at least I did that many times) – but other times when you actually understand something extremely complex.

We often miss these aspects, because we always compare ourselves to the brightest person in our context in any moment. That might be different persons from time to time.

Further, in the days of internet, the environment you compare yourself to is huge.

As you see, this comparison is not fair and will do you no good.

Accept that your journey is yours alone. Comparisons with others do not help you.

How long does it take to become a good Python programmer

I wish there was a simple answer to that. Unfortunately it is not that easy to answer.

First of all, what are your initial expectations and how will they evolve over time. Often people are fine with just some simple skills, but when they learn more they want to master more and it never stops.

It is natural. The problem is, your expectations to feeling successful moves along the way.

Secondly, is the dedication to it. You need to spend time on solving problems.

Experience shows, that either you need to burn for learning programming or you need it to solve you daily challenges.

It sounds like you need to keep motivated. And yes, you do. But the good news is, it is very rewarding and fulfilling to program. You are building something, you are creating something, you are the creator of something amazing. That feeling is awesome.

Does that mean it is just fun all the way from the beginning to end. Not at all, did you read the pitfalls above? Well, if you didn’t, go read them.

What I am saying is, it is a journey that will never end. The journey will sometimes feel bumpy, but the results are rewarding.

The more time you spend, the faster and better results you will get.

But how to keep motivation?

  • Remind yourself daily, that there are pitfall and all the best in the world have been there.
  • Keep it playful – the majority of the time it is joyful to program.
  • Accept it as a learning journey that will never end.

How to get started with Python for Data Science?

On this page there are a lot of resources available to get started with both Python and Data Science.

To help you further there are structured free courses you can follow with everything prepared.

Start Python for FREE

There is a full 8 hours video corse for Python.

  • 17 video lessons teaching you everything you need to know to get started with Python.
  • 34 Jupyter Notebooks with lesson code and projects.
  • A FREE eBook with all the learnings from the lessons.
Get started today with Python for FREE

Start Machine Learning for FREE

Another great free resource is the 10 hours free Machine Learning course.

  • 15 video lessons – which explain Machine Learning concepts, demonstrate models on real data, introduce projects and show a solution (YouTube playlist).
  • 30 JuPyter Notebooks – with the full code and explanation from the lectures and projects (GitHub).
  • 15 projects – with step guides to help you structure your solutions and solution explained in the end of video lessons (GitHub).
Get started with Machine Learning with Python for FREE

Pandas and Folium: Categorize GDP Growth by Country and Visualize on Map in 3 Easy Steps

What will we cover in this tutorial?

  • We will gather data from wikipedia.org List of countries by past and projected GDP using pandas.
  • First step will be get the data and merge the correct tables together.
  • Next step is using Machine Learning with Linear regression model to estimate the growth of each country GDP.
  • Final step is to visualize the growth rates on a leaflet map using folium.

Step 1: Get the data and merge it

The data is available on wikipedia on List of countries by past and projected GDP. We will focus on data from 1990 to 2019.

At first glance on the page you notice that the date is not gathered in one table.

From wikipedia.org

The first task will be to merge the three tables with the data from 1990-1999, 2000-2009, and 2010-2019.

The data can be collected by pandas read_html function. If you are new to this you can read this tutorial.

import pandas as pd

# The URL we will read our data from
url = 'https://en.wikipedia.org/wiki/List_of_countries_by_past_and_projected_GDP_(nominal)'
# read_html returns a list of tables from the URL
tables = pd.read_html(url)

# Merge the tables into one table
merge_index = 'Country (or dependent territory)'
table = tables[9].merge(tables[12], how="left", left_on=[merge_index], right_on=[merge_index])
table = table.merge(tables[15], how="left", left_on=[merge_index], right_on=[merge_index])

print(table)

The call to read_html will return all the tables in a list. By inspecting the results you will notice that we are interested in table 9, 12 and 15 and merge them. The output of the above will be.

     Country (or dependent territory)       1990       1991       1992       1993       1994       1995       1996       1997       1998       1999        2000        2001        2002        2003        2004        2005        2006        2007        2008        2009        2010        2011        2012        2013        2014        2015        2016        2017        2018        2019
0                         Afghanistan        NaN        NaN        NaN        NaN        NaN        NaN        NaN        NaN        NaN        NaN         NaN         NaN      4367.0      4514.0      5146.0      6167.0      6925.0      8556.0     10297.0     12066.0     15325.0     17890.0     20296.0     20170.0     20352.0     19687.0     19454.0     20235.0     19585.0     19990.0
1                             Albania     2221.0     1333.0      843.0     1461.0     2361.0     2882.0     3200.0     2259.0     2560.0     3209.0      3483.0      3928.0      4348.0      5611.0      7185.0      8052.0      8905.0     10675.0     12901.0     12093.0     11938.0     12896.0     12323.0     12784.0     13238.0     11393.0     11865.0     13055.0     15202.0     15960.0
2                             Algeria    61892.0    46670.0    49217.0    50963.0    42426.0    42066.0    46941.0    48178.0    48188.0    48845.0     54749.0     54745.0     56761.0     67864.0     85327.0    103198.0    117027.0    134977.0    171001.0    137054.0    161207.0    199394.0    209005.0    209703.0    213518.0    164779.0    159049.0    167555.0    180441.0    183687.0
3                              Angola    11236.0    10891.0     8398.0     6095.0     4438.0     5539.0     6535.0     7675.0     6506.0     6153.0      9130.0      8936.0     12497.0     14189.0     19641.0     28234.0     41789.0     60449.0     84178.0     75492.0     82471.0    104116.0    115342.0    124912.0    126777.0    102962.0     95337.0    122124.0    107316.0     92191.0
4                 Antigua and Barbuda      459.0      482.0      499.0      535.0      589.0      577.0      634.0      681.0      728.0      766.0       825.0       796.0       810.0       850.0       912.0      1013.0      1147.0      1299.0      1358.0      1216.0      1146.0      1140.0      1214.0      1194.0      1273.0      1353.0      1460.0      1516.0      1626.0      1717.0
5                           Argentina   153205.0   205515.0   247987.0   256365.0   279150.0   280080.0   295120.0   317549.0   324242.0   307673.0    308491.0    291738.0    108731.0    138151.0    164922.0    199273.0    232892.0    287920.0    363545.0    334633.0    424728.0    527644.0    579666.0    611471.0    563614.0    631621.0    554107.0    642928.0    518092.0    477743.0
6                             Armenia        NaN        NaN      108.0      835.0      648.0     1287.0     1597.0     1639.0     1892.0     1845.0      1912.0      2118.0      2376.0      2807.0      3577.0      4900.0      6384.0      9206.0     11662.0      8648.0      9260.0     10142.0     10619.0     11121.0     11610.0     10529.0     10572.0     11537.0     12411.0     13105.0

Step 2: Use linear regression to estimate the growth over the last 30 years

In this section we will use Linear regression from the scikit-learn library, which is a simple prediction tool.

If you are new to Machine Learning we recommend you read this tutorial on Linear regression.

import pandas as pd
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression

import numpy as np

# The URL we will read our data from
url = 'https://en.wikipedia.org/wiki/List_of_countries_by_past_and_projected_GDP_(nominal)'
# read_html returns a list of tables from the URL
tables = pd.read_html(url)

# Merge the tables into one table
merge_index = 'Country (or dependent territory)'
table = tables[9].merge(tables[12], how="left", left_on=[merge_index], right_on=[merge_index])
table = table.merge(tables[15], how="left", left_on=[merge_index], right_on=[merge_index])

row = table.iloc[1]
X = table.columns[1:].to_numpy().reshape(-1, 1)
X = X.astype(int)
Y = 1 + row.iloc[1:].pct_change()
Y = Y.cumprod().fillna(1.0).to_numpy()
Y = Y.reshape(-1, 1)

regr = LinearRegression()
regr.fit(X, Y)

Y_pred = regr.predict(X)

plt.scatter(X, Y)
plt.plot(X, Y_pred, color='red')
plt.show()

Which will result in the following plot.

Linear regression model applied on data from wikipedia.org

Which shows that the model approximates a line through the 30 years of data to estimate the growth of the country’s GDP.

Notice that we use the product (cumprod) of pct_change to be able to compare the data. If we used the data directly, we would not be possible to compare it.

We will do that for all countries to get a view of the growth. We are using the coefficient of the line, which indicates the growth rate.

import pandas as pd
from sklearn.linear_model import LinearRegression
import numpy as np

# The URL we will read our data from
url = 'https://en.wikipedia.org/wiki/List_of_countries_by_past_and_projected_GDP_(nominal)'
# read_html returns a list of tables from the URL
tables = pd.read_html(url)

# Merge the tables into one table
merge_index = 'Country (or dependent territory)'
table = tables[9].merge(tables[12], how="left", left_on=[merge_index], right_on=[merge_index])
table = table.merge(tables[15], how="left", left_on=[merge_index], right_on=[merge_index])

coef = []
countries = []

for index, row in table.iterrows():
    #print(row)
    X = table.columns[1:].to_numpy().reshape(-1, 1)
    X = X.astype(int)
    Y = 1 + row.iloc[1:].pct_change()
    Y = Y.cumprod().fillna(1.0).to_numpy()
    Y = Y.reshape(-1, 1)

    regr = LinearRegression()
    regr.fit(X, Y)

    coef.append(regr.coef_[0][0])
    countries.append(row[merge_index])

data = pd.DataFrame(list(zip(countries, coef)), columns=['Country', 'Coef'])

print(data)

Which results in the following output (or the first few lines).

                              Country      Coef
0                         Afghanistan  0.161847
1                             Albania  0.243493
2                             Algeria  0.103907
3                              Angola  0.423919
4                 Antigua and Barbuda  0.087863
5                           Argentina  0.090837
6                             Armenia  4.699598

Step 3: Merge the data to a leaflet map using folium

The last step is to merge the data together with the leaflet map using the folium library. If you are new to folium we recommend you read this tutorial.

import pandas as pd
import folium
import geopandas
from sklearn.linear_model import LinearRegression
import numpy as np

# The URL we will read our data from
url = 'https://en.wikipedia.org/wiki/List_of_countries_by_past_and_projected_GDP_(nominal)'
# read_html returns a list of tables from the URL
tables = pd.read_html(url)

# Merge the tables into one table
merge_index = 'Country (or dependent territory)'
table = tables[9].merge(tables[12], how="left", left_on=[merge_index], right_on=[merge_index])
table = table.merge(tables[15], how="left", left_on=[merge_index], right_on=[merge_index])

coef = []
countries = []

for index, row in table.iterrows():
    X = table.columns[1:].to_numpy().reshape(-1, 1)
    X = X.astype(int)
    Y = 1 + row.iloc[1:].pct_change()
    Y = Y.cumprod().fillna(1.0).to_numpy()
    Y = Y.reshape(-1, 1)

    regr = LinearRegression()
    regr.fit(X, Y)

    coef.append(regr.coef_[0][0])
    countries.append(row[merge_index])

data = pd.DataFrame(list(zip(countries, coef)), columns=['Country', 'Coef'])

# Read the geopandas dataset
world = geopandas.read_file(geopandas.datasets.get_path('naturalearth_lowres'))
# Replace United States of America to United States to fit the naming in the table
world = world.replace('United States of America', 'United States')

# Merge the two DataFrames together
table = world.merge(data, how="left", left_on=['name'], right_on=['Country'])


# Clean data: remove rows with no data
table = table.dropna(subset=['Coef'])

# We have 10 colors available resulting into 9 cuts.
table['Cat'] = pd.qcut(table['Coef'], 9, labels=[0, 1, 2, 3, 4, 5, 6, 7, 8])

print(table)

# Create a map
my_map = folium.Map()

# Add the data
folium.Choropleth(
    geo_data=table,
    name='choropleth',
    data=table,
    columns=['Country', 'Cat'],
    key_on='feature.properties.name',
    fill_color='YlGn',
    fill_opacity=0.7,
    line_opacity=0.2,
    legend_name='Growth of GDP since 1990',
    threshold_scale=[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
).add_to(my_map)
my_map.save('gdp_growth.html')

There is a twist in the way it is done. Instead of using a linear model to represent the growth rate on the map, we chose to add them in categories. The reason is that otherwise most countries group in small segment.

Here we have used the qcut to add them in each equal sized group.

This should result in an interactive html page looking something like this.

End result.

Simple Machine Learning Trading Bot in Python – Evaluating how it Performs

What will we cover in this tutorial

  • To create a machine learning trading bot in Python
  • How to build a simple Reinforcement Learning Trading bot.
  • The idea behind the Reinforcement Learning trading bot
  • Evaluate how the trading bot performs

Machine Learning and Trading?

First thing first. Machine Learning trading bot? Machine Learning can be used for various things in regards to trading.

Well, good to set our expectations. This tutorial is also experimental and does not claim to make a bullet-proof Machine Learning Trading bot that will make you rich. I strongly advice you not to use it for automated trading.

This tutorial is only intended to test and learn about how a Reinforcement Learning strategy can be used to build a Machine Learning Trading Bot.

Step 1: The idea behind the Reinforcement Learning strategy

I wanted to test how a Reinforcement Learning algorithm would do in the market.

First let us understand what Reinforcement Learning is. Reinforcement learning teaches the machine to think for itself based on past action rewards.

Reinforcement Learning (in Machine Learning) teaches the machine to think based on past action rewards.
Reinforcement Learning (in Machine Learning) teaches the machine to think based on past action rewards.

It is like training a dog. You and the dog do not talk the same language, but the dogs learns how to act based on rewards (and punishment, which I do not advise or advocate). 

Hence, if a dog is rewarded for a certain action in a given situation, then next time it is exposed to a similar situation it will act the same. 

Translate that to Reinforcement Learning. 

  • The agent is the dog that is exposed to the environment
  • Then the agent encounters a state
  • The agent performs an action to transition from that state to a new state
  • Then after the transition the agent receives a reward or penalty(punishment).
  • This forms a policy to create a strategy to choose actions in a given state

That turns out to fit well with trading, or potentially? That is what I want to investigate.

Step 2: The idea behind how to use Reinforcement Learning in Trading

The environment in trading could be translated to rewards and penalties (punishment). You win or loose on the stock market, right?

But we also want to simplify the environment for the bot, not to make it too complex. Hence, in this experiment, the bot is only knows 1 stock and has to decide to buy, keep or sell.

Said differently.

  • The trading bot (agent) is exposed to the stock history (environment).
  • Then the trading bot (agent) encounters the new stock price (state).
  • The trading bot (agent) then performs a choice to keep, sell or buy (action), which brings it to a new state.
  • Then the trading bot (agent) will receives a reward based on the value difference from day to day.

The reward will often first be encountered after some time, hence, the feedback from steps after should be set high. Or at least, that is my expectation.

Step 3: Understand Q-learning as the Reinforcement Learning model

The Q-learning model is easy to understand and has potential to be very powerful. Of course, it is not better than the design of it. But before we can design it, we need to understand the mechanism behind it.

Q-Learning algorithm (Reinforcement / Machine Learning) - exploit or explore - Update Q-table
Q-Learning algorithm (Reinforcement / Machine Learning) – exploit or explore – Update Q-table

The Q-Learning algorithm has a Q-table (a Matrix of dimension state x actions – don’t worry if you do not understand what a Matrix is, you will not need the mathematical aspects of it – it is just an indexed “container” with numbers).

  • The agent (or Q-Learning algorithm) will be in a state.
  • Then in each iteration the agent needs take an action.
  • The agent will continuously update the reward in the Q-table.
  • The learning can come from either exploiting or exploring.

This translates into the following pseudo algorithm for the Q-Learning. 

The agent is in a given state and needs to choose an action.

  • Initialise the Q-table to all zeros
  • Iterate:
    • Agent is in state state.
    • With probability epsilon choose to explore, else exploit.
      • If explore, then choose a random action.
      • If exploit, then choose the best action based on the current Q-table.
    • Update the Q-table from the new reward to the previous state.
      • Q[stateaction] = (1 – alpha) * Q[stateaction] + alpha * (rewardgamma * max(Q[new_state]) — Q[state, action])

As you can se, we have introduced the following variables.

  • epsilon: the probability to take a random action, which is done to explore new territory.
  • alpha: is the learning rate that the algorithm should make in each iteration and should be in the interval from 0 to 1.
  • gamma: is the discount factor used to balance the immediate and future reward. This value is usually between 0.8 and 0.99
  • reward: is the feedback on the action and can be any number. Negative is penalty (or punishment) and positive is a reward.

Step 4: The choices we need to take

Based on that, we need to see how the algorithm should map the stock information to a state. We want the model to be fairly simple and not have too many states, as it will take long time to populate it with data.

There are many parameters to choose from here. As we do not want to tell the algorithm what to do, we still need to feed it what what we find as relevant data.

In this case it was the following.

  • Volatility of the share.
  • The percentage change of the daily short mean (average over last 20 days).
  • Then the percentage of the daily long mean (average over the last 100 days).
  • The daily long mean, which is the average over the last 100 days.
  • The volume of the sales that day.

These values need to be calculated for the share we use. That can be done by the following code.

import pandas_datareader as pdr
import numpy as np


VALUE = 'Adj Close'
ID = 'id'
NAME = 'name'
DATA = 'data'


def get_data(name, years_ago):
    start = dt.datetime.now() - relativedelta(years=years_ago)
    end = dt.datetime.now()
    df = pdr.get_data_yahoo(name, start, end)
    return df


def process():
    stock = {ID: stock, NAME: 'AAPL'}

    stock[DATA] = get_data(stock[ID], 20)

# Updatea it will all values
    stock[DATA]['Short Mean'] = stock[DATA][VALUE].rolling(window=short_window).mean()
    stock[DATA]['Long Mean'] = stock[DATA][VALUE].rolling(window=long_window).mean()

    stock[DATA]['Daily Change'] = stock[DATA][VALUE].pct_change()
    stock[DATA]['Daily Short Change'] = stock[DATA]['Short Mean'].pct_change()
    stock[DATA]['Daily Long Change'] = stock[DATA]['Long Mean'].pct_change()
    stock[DATA]['Volatility'] = stock[DATA]['Daily Change'].rolling(75).std()*np.sqrt(75)

As you probably notice, this will create a challenge. You need to put them into bins, that is a fixed number of “boxes” to fit in.

def process():
    #...
    # Let's put data in bins
    stock[DATA]['Vla bin'] = pd.cut(stock[DATA]['Volatility'], bins=STATES_DIM, labels=False)
    stock[DATA]['Srt ch bin'] = pd.cut(stock[DATA]['Daily Short Change'], bins=STATES_DIM, labels=False)
    stock[DATA]['Lng ch bin'] = pd.cut(stock[DATA]['Daily Long Change'], bins=STATES_DIM, labels=False)
    # stock[DATA]['Srt mn bin'] = pd.cut(stock[DATA]['Short Mean'], bins=DIM, labels=False)
    stock[DATA]['Lng mn bin'] = pd.cut(stock[DATA]['Long Mean'], bins=STATES_DIM, labels=False)
    stock[DATA]['Vol bin'] = pd.cut(stock[DATA]['Volume'], bins=STATES_DIM, labels=False)

This will quantify the 5 dimensions into STATES_DIM, which you can define to what you think is appropriate.

Step 5: How to model it

This can be done by creating an environment, that will play the role as your trading account.

class Account:
    def __init__(self, cash=1000000, brokerage=0.001):
        self.cash = cash
        self.brokerage = brokerage
        self.stocks = 0
        self.stock_id = None
        self.has_stocks = False

    def get_value(self, row):
        if self.has_stocks:
            return self.cash + row[VALUE] * self.stocks
        else:
            return self.cash

    def buy_stock(self, stock_id, row):
        if self.has_stocks:
            return
        self.stock_id = stock_id
        self.stocks = int(self.cash // (row[VALUE]*(1.0 + self.brokerage)))
        self.cash -= self.stocks*row[VALUE]*1.001
        self.has_stocks = True
        self.print_status(row, "Buy")

    def sell_stock(self, row):
        if not self.has_stocks:
            return
        self.print_status(row, "Sell")
        self.cash += self.stocks * (row[VALUE]*(1.0 - self.brokerage))
        self.stock_id = None
        self.stocks = 0
        self.has_stocks = False

    def print_status(self, row, title="Status"):
        if self.has_stocks:
            print(title, self.stock_id, "TOTAL:", self.cash + self.stocks*float(row[VALUE]))
            print(" - ", row.name, "price", row[VALUE])
            print(" - ", "Short", row['Daily Short Change'])
            print(" - ", "Long", row['Daily Long Change'])
        else:
            print(title, "TOTAL", self.cash)

Then it should be iterated over a time where the trading bot can decide what to do.

def process():
    # Now let's prepare our model
    q_learning = QModel()
    account = Account()

    state = None
    reward = 0.0
    action = 0
    last_value = 0.0
    for index, row in stock[DATA].iterrows():
        if state is not None:
            # The reward is the immediate return
            reward = account.get_value(row) - last_value
            # You update the day after the action, when you know the results of your actions
            q_learning.update_reward(row, account.has_stocks, action, state, reward)
        action, state = q_learning.get_action(row, account.has_stocks)

        if action == 0:
            pass
        elif action == 1:
            if account.has_stocks:
                account.sell_stock(row)
            else:
                account.buy_stock(stock[ID], row)
        last_value = account.get_value(row)
    account.print_status(row)
    q_learning.save_pickle()
    return last_value

This code will do what ever the trading bot tells you to do.

Step 6: The Q-learning model

Now to the core of the thing. The actual trading bot, that knows nothing about trading. But can we train it to earn money on trading and how much? We will see that later.

class QModel:
    def __init__(self, alpha=0.5, gamma=0.7, epsilon=0.1):
        self.alpha = alpha
        self.gamma = gamma
        self.epsilon = epsilon

        self.states_per_dim = STATES_DIM
        self.dim = 5
        self.states = (self.states_per_dim ** self.dim) * 2
        self.actions = 2
        self.pickle = "q_model7.pickle"
        self.q_table = np.zeros((self.states, self.actions))
        if os.path.isfile(self.pickle):
            print("Loading pickle")
            with open(self.pickle, "rb") as f:
                self.q_table = pickle.load(f)

    def save_pickle(self):
        with open(self.pickle, "wb") as f:
            pickle.dump(self.q_table, f)

    def get_state(self, row, has_stock):
        dim = []
        dim.append(int(row['Vla bin']))
        dim.append(int(row['Srt ch bin']))
        dim.append(int(row['Lng ch bin']))
        dim.append(int(row['Lng mn bin']))
        dim.append(int(row['Vol bin']))
        for i in range(len(dim)):
            if dim[i] is None:
                dim[i] = 0
        dimension = 0
        if has_stock:
            dimension = 1 * (self.states_per_dim ** self.dim)
        dimension += dim[4] * (self.states_per_dim ** 4)
        dimension += dim[3] * (self.states_per_dim ** 3)
        dimension += dim[2] * (self.states_per_dim ** 2)
        dimension += dim[1] * (self.states_per_dim ** 1)
        dimension += dim[0]
        return dimension

    def get_action(self, row, has_stock):
        state = self.get_state(row, has_stock)

        if random.uniform(0, 1) < self.epsilon:
            action = random.randrange(0, self.actions)
        else:
            action = np.argmax(self.q_table[state])
        return action, state

    def update_reward(self, row, has_stock, last_action, last_state, reward):
        next_state = self.get_state(row, has_stock)

        old_value = self.q_table[last_state, last_action]
        next_max = np.max(self.q_table[next_state])

        new_value = (1 - self.alpha) * old_value + self.alpha * (reward + self.gamma * next_max)
        self.q_table[last_state, last_action] = new_value

Now we have the full code to try it out (the full code is at the end of the tutorial).

Step 7: Training the model

Now we need to train the model.

For that purpose, I have made a list of 134 stocks that I used and placed them in a CSV file.

Then the training is simply to read 1 of the 134 stocks in with 10 years of historical data. Find an 1 year window and run the algorithm on it.

The repeat.

f __name__ == "__main__":
    # source: http://www.nasdaqomxnordic.com/shares/listed-companies/copenhagen
    csv_stock_file = 'DK-Stocks.csv'

    while True:
        iterations = 1000
        for i in range(iterations):
            # Go at most 9 years back, as we only have 10 years available and need 1 year of data
            days_back = random.randrange(0, 9*365)
            process(csv_stock_file)

Then let it run and run and run and run again.

Step 8: Testing the algorithm

Of course, the testing should be done on unknown data. That is a stock it does not know. But you cannot also re-run on the same stock, as it will learn from it (unless you do not save the state from it).

Hence, I chose a good performing stock to see how it would do, to see if it could beat the buy-first-day-and-sell-last-day strategy.

The results of the trading bot on Apple stocks.

The return of 1,000,000$ investment with the Trading Bot was approximately 1,344,500$. This is a return on 34% for one year. Comparing that with the stock price itself.

Stock price was 201.55$ on July 1st 2019 and 362.09$ on June 30th, 2020. This would give the following return (0,10% in brokerage should be included in calculations as the Trading bot pays that on each sell and buy).

  • 1,792,847$

That does not look that good. That means that a simple strategy to buy on day one and sell on the last day would return more than the bot.

Of course, you can’t conclude it is not possible to do better on other stocks, but for this case it was not impressive.

Variations and next step

There are many variable to adjust, I especially think I set the gamma too low. There are other parameters to use to make the state. Can remove some, that might be making noice, and add ones that are more relevant. Also, the number of bins can be adjusted. That the bins are made independent of each other, might also be a problem.

Also read the tutorial on reinforcement learning.