Reinforcement Learning Explained with Real Problem and Code from Scratch

What will we cover?

  • Understand how Reinforcement Learning works
  • Learn about Agent and Environment
  • How it iterates and gets rewards based on action
  • How to continuously learn new things
  • Create own Reinforcement Learning from scratch

Step 1: Reinforcement Learning simply explained

Reinforcement Learning

Reinforcement Learning is like training a dog. You and the dog talk different languages. This makes it difficult to explain the dog what you want.

A common way to train a dog is like Reinforcement Learning. When the dog does something good, it get’s a reward. This teaches the dog that you want it to do it.

Said differently, if we relate it to the illustration above. The Agent is the dog. The dog is exposed to an Environment called a state. Based on this Agent (the dog) takes an Action. Based on whether you (the owner) likes the Action, you Reward the Agent.

The goal of the Agent is to get the most Reward. This way it makes it possible for you the owner to get the desired behaviour with adjusting the Reward according to the Actions.

Step 2: Markov Decision Process

The model for decision-making represents States (from the Environment), Actions (from the Agent), and the Rewards.

Written a bit mathematical.

  • S is the set of States
  • Actions(s) is the set of Actions when in state s
  • The transition model is P(s´, s, a)
  • The Reward function R(s, a, s’)

Step 3: Q-Learning

Q-learning is a model-free reinforcement learning algorithm to learn the value of an action in a particular state. It does not require a model of the environment (hence “model-free”), and it can handle problems with stochastic transitions and rewards without requiring adaptations. (wiki)

This can be modeled by a learning function Q(s, a), which estimates the value of performing action a when in state s.

It works as follows

  • Start with Q(s, a) = 0 for all s, a
  • Update Q when we take an action


The ϵ-Greedy Decision Making

The idea behind it is to either explore or exploit

  • With probability ϵ take a random move
  • Otherwise, take action 𝑎a with maximum 𝑄(𝑠,𝑎)

Let’s demonstrate it with code.

Step 3: Code Example

Assume we have the following Environment

  • You start at a random point.
  • You can either move left or right.
  • You loose if you hit a red box
  • You win if you hit the green box

Quite simple, but how can you program an Agent using Reinforcement Learning? And how can you do it from scratch.

The great way is to use an object representing the field (environment).

Field representing the Environment

To implement it all there are some background resources if needed.

Programming Notes:

What if there are more states?

import numpy as np
import random

class Field:
    def __init__(self):
        self.states = [-1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0]
        self.state = random.randrange(0, len(self.states))
    def done(self):
        if self.states[self.state] != 0:
            return True
            return False
    # action: 0 => left
    # action: 1 => right
    def get_possible_actions(self):
        actions = [0, 1]
        if self.state == 0:
        if self.state == len(self.states) - 1:
        return actions

    def update_next_state(self, action):
        if action == 0:
            if self.state == 0:
                return self.state, -10
            self.state -= 1
        if action == 1:
            if self.state == len(self.states) - 1:
                return self.state, -10
            self.state += 1
        reward = self.states[self.state]
        return self.state, reward

field = Field()
q_table = np.zeros((len(field.states), 2))

alpha = .5
epsilon = .5
gamma = .5

for _ in range(10000):
    field = Field()
    while not field.done():
        actions = field.get_possible_actions()
        if random.uniform(0, 1) < epsilon:
            action = random.choice(actions)
            action = np.argmax(q_table[field.state])
        cur_state = field.state
        next_state, reward = field.update_next_state(action)
        q_table[cur_state, action] = (1 - alpha)*q_table[cur_state, action] + alpha*(reward + gamma*np.max(q_table[next_state]))

Step 4: A more complex Example

Check out the video to see a More complex example.

Want to learn more?

This is part of a FREE 10h Machine Learning course with Python.

  • 15 video lessons – which explain Machine Learning concepts, demonstrate models on real data, introduce projects and show a solution (YouTube playlist).
  • 30 JuPyter Notebooks – with the full code and explanation from the lectures and projects (GitHub).
  • 15 projects – with step guides to help you structure your solutions and solution explained in the end of video lessons (GitHub).

How to use Multiple Linear Regression to Predict House Prices

What will we cover?

  • Learn about Multiple Linear Regression
  • Understand difference from discrete classifier
  • Understand it is Supervised learning task
  • Get insight into how similar a linear classifier is to discrete classifier
  • Hands-on experience with multiple linear regression

Step 1: What is Multiple Linear Regression?

Multiple Linear Regression is a Supervised learning task of learning a mapping from input point to a continuous value.

Wow. What does that mean?

This might not help all, but it is the case of a Linear Regression, where there are multiple explanatory variables.

Let’s start simple – Simple Linear Regression is the case most show first. It is given one input variable (explanatory variable) and one output value (response value).

An example could be – if the temperatur is X degrees, we expect to sell Y ice creams. That is, it is trying to predict how many ice creams we sell if we are given a temperature.

Now we know that there are other factors that might have high impact other that the temperature when selling ice cream. Say, is it rainy or sunny. What time of year it is, say, it might be turist season or not.

Hence, a simple model like that might not give a very accurate estimate.

Hence, we would like to model having more input variables (explanatory variables). When we have more than one it is called Multiple Linear Regression.

Step 2: Get Example Data

Let’s take a look at some house price data.

import pandas as pd

data = pd.read_csv('')

Notice – you can also download the file locally from the GitHub. This will make it faster to run every time.

The output should be giving the following data.

The goal is given a row of data we want to predict the House Unit Price. That is, given all but the last column in a row, can we predict the House Unit Price (the last column).

Step 3: Plot the data

Just for fun – let’s make a scatter plot of all the houses with Latitude and Longitude.

import matplotlib.pyplot as plt

fig, ax = plt.subplots()

ax.scatter(x=data['Longitude'], y=data['House unit price'])

This gives the following plot.

This shows you where the houses are located, which can be interesting because house prices can be dependent on location.

Somehow it should be intuitive that the longitude and latitude should not be linearly correlated to the house price – at least not in the bigger picture.

Step 4: Correlation of the features

Before we make the Multiple Linear Regression, let’s see how the features (the columns) correlate.


Which gives.

This is interesting. Look at the lowest row for the correlations with House Unit Price. It shows that Distance to MRT stations negatively correlated – that is, the longer to a MRT station the lower price. This might not be surprising.

More surprising is that Latitude and Longitude are actually comparably high correlated to the House Unit Price.

This might be the case for this particular dataset.

Step 5: Check the Quality of the dataset

For the Linear Regression model to perform well, you need to check that the data quality is good. If the input data is of poor quality (missing data, outliers, wrong values, duplicates, etc.) then the model will not be very reliable.

Here we will only check for missing values.


Which gives.

Transaction                     0
House age                       0
Distance to MRT station         0
Number of convenience stores    0
Latitude                        0
Longitude                       0
House unit price                0
dtype: int64

This tells us that there are no missing values.

If you want to learn more about Data Quality, then check out the free course on Data Science. In that course you will learn more about Data Quality and how it impacts the accuracy of your model.

Step 6: Create a Multiple Linear Regression Model

First we need to divide them into input variables X (explanatory variables) and output values y (response values).

Then we split it into a training and testing dataset. We create the model, we fit it, we use it predict the test dataset and get a score.

from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import r2_score

X = data.iloc[:,:-1]
y = data.iloc[:,-1]

X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=0, test_size=.15)

lin = LinearRegression(), y_train)

y_pred = lin.predict(X_test)

print(r2_score(y_test, y_pred))

For this run it gave 0.68.

Is that good or bad? Well, good question. The perfect match is 1, but that should not be expected. The worse score you can get is minus infinite – so we are far from that.

In order to get an idea about it – we need to compare it with variations.

In the free Data Science course we explore how to select features and evaluate models. It is a great idea to look into that.

Want to learn more?

This is part of a FREE 10h Machine Learning course with Python.

  • 15 video lessons – which explain Machine Learning concepts, demonstrate models on real data, introduce projects and show a solution (YouTube playlist).
  • 30 JuPyter Notebooks – with the full code and explanation from the lectures and projects (GitHub).
  • 15 projects – with step guides to help you structure your solutions and solution explained in the end of video lessons (GitHub).