Machine Learning

Reinforcement Learning Explained with Real Problem and Code from Scratch

Why it’s great to master Reinforcement Learning?

Reinforcement Learning offers several advantages and opens up exciting opportunities in the field of machine learning:

  1. Versatile learning approach: Reinforcement Learning provides a versatile approach to learning, where agents can learn from their interactions with an environment. This flexibility allows them to adapt and tackle a wide range of complex tasks.
  2. Decision-making in dynamic environments: Reinforcement Learning equips machines with the ability to make decisions in dynamic and uncertain environments. By mastering this technique, you gain the skills to build intelligent systems that can navigate and excel in complex real-world scenarios.
  3. Continuous learning and adaptation: Reinforcement Learning emphasizes continuous learning and adaptation. Agents can update their strategies based on feedback and experiences, making them capable of handling evolving environments and improving their performance over time.
  4. Applications in various domains: Reinforcement Learning has broad applications across different domains. It is used in robotics, game playing, recommendation systems, autonomous vehicles, and more. Mastering this technique opens doors to diverse career opportunities and exciting projects.

What will be covered in this lesson?

In this lesson, you will explore the foundations and practical aspects of Reinforcement Learning. The following topics will be covered:

  1. Understanding how Reinforcement Learning works: You will gain a comprehensive understanding of the fundamental concepts and principles of Reinforcement Learning, including the agent-environment interaction and the reward mechanism.
  2. Learning about agents and environments: You will dive into the core components of Reinforcement Learning, understanding the role of agents and environments in the learning process. This knowledge forms the basis for building effective RL systems.
  3. Iteration and reward mechanism: You will learn how the iterative process of Reinforcement Learning unfolds, as agents take actions, receive rewards, and update their strategies. This iterative feedback loop is crucial for agents to learn and improve their decision-making abilities.
  4. Continuous learning and exploration: The lesson will cover techniques and algorithms that enable agents to continuously learn, explore new strategies, and adapt to changing environments. You will discover methods for balancing exploration and exploitation to maximize long-term rewards.
  5. Building your own Reinforcement Learning model: Through hands-on exercises and practical examples, you will have the opportunity to build your own Reinforcement Learning model from scratch. This hands-on experience will solidify your understanding and provide you with the skills to apply RL in real-world scenarios.

By the end of this lesson, you will have a strong foundation in Reinforcement Learning and be equipped to apply these techniques to solve complex problems, paving the way for exciting opportunities in the field of machine learning.

Step 1: Reinforcement Learning simply explained

Reinforcement Learning

Reinforcement Learning is like training a dog. You and the dog talk different languages. This makes it difficult to explain the dog what you want.

A common way to train a dog is like Reinforcement Learning. When the dog does something good, it get’s a reward. This teaches the dog that you want it to do it.

Said differently, if we relate it to the illustration above. The Agent is the dog. The dog is exposed to an Environment called a state. Based on this Agent (the dog) takes an Action. Based on whether you (the owner) likes the Action, you Reward the Agent.

The goal of the Agent is to get the most Reward. This way it makes it possible for you the owner to get the desired behaviour with adjusting the Reward according to the Actions.

Step 2: Markov Decision Process

The model for decision-making represents States (from the Environment), Actions (from the Agent), and the Rewards.

Written a bit mathematical.

  • S is the set of States
  • Actions(s) is the set of Actions when in state s
  • The transition model is P(s´, s, a)
  • The Reward function R(s, a, s’)

Step 3: Q-Learning

Q-learning is a model-free reinforcement learning algorithm to learn the value of an action in particular state. It does not require a model of the environment (hence “model-free”), and it can handle problems with stochastic transitions and rewards without requiring adaptations. (wiki)

This can be modeled by a learning function Q(s, a), which estimates the value of performing action a when in state s.

It works as follows

  • Start with Q(s, a) = 0 for all s, a
  • Update Q when we take an action


The ϵ-Greedy Decision Making

The idea behind it is to either explore or exploit

  • With probability ϵ take a random move
  • Otherwise, take action 𝑎a with maximum 𝑄(𝑠,𝑎)

Let’s demonstrate it with code.

Step 3: Code Example

Assume we have the following Environment

  • You start at a random point.
  • You can either move left or right.
  • You loose if you hit a red box
  • You win if you hit the green box

Quite simple, but how can you program an Agent using Reinforcement Learning? And how can you do it from scratch.

The great way is to use an object representing the field (environment).

Field representing the Environment

To implement it all there are some background resources if needed.

Programming Notes:

What if there are more states?

import numpy as np
import random

class Field:
    def __init__(self):
        self.states = [-1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0]
        self.state = random.randrange(0, len(self.states))
    def done(self):
        if self.states[self.state] != 0:
            return True
            return False
    # action: 0 => left
    # action: 1 => right
    def get_possible_actions(self):
        actions = [0, 1]
        if self.state == 0:
        if self.state == len(self.states) - 1:
        return actions

    def update_next_state(self, action):
        if action == 0:
            if self.state == 0:
                return self.state, -10
            self.state -= 1
        if action == 1:
            if self.state == len(self.states) - 1:
                return self.state, -10
            self.state += 1
        reward = self.states[self.state]
        return self.state, reward

field = Field()
q_table = np.zeros((len(field.states), 2))

alpha = .5
epsilon = .5
gamma = .5

for _ in range(10000):
    field = Field()
    while not field.done():
        actions = field.get_possible_actions()
        if random.uniform(0, 1) < epsilon:
            action = random.choice(actions)
            action = np.argmax(q_table[field.state])
        cur_state = field.state
        next_state, reward = field.update_next_state(action)
        q_table[cur_state, action] = (1 - alpha)*q_table[cur_state, action] + alpha*(reward + gamma*np.max(q_table[next_state]))

Step 4: A more complex Example

Check out the video to see a More complex example.

Watch tutorial

Want to learn more?

In the next lesson you will learn Unsupervised Learning with k-Means Clustering.

This is part of a FREE 10h Machine Learning course with Python.

  • 15 video lessons – which explain Machine Learning concepts, demonstrate models on real data, introduce projects and show a solution (YouTube playlist).
  • 30 JuPyter Notebooks – with the full code and explanation from the lectures and projects (GitHub).
  • 15 projects – with step guides to help you structure your solutions and solution explained in the end of video lessons (GitHub).

Recent Posts

Build and Deploy an AI App

Build and Deploy an AI App with Python Flask, OpenAI API, and Google Cloud: In…

5 days ago

Building Python REST APIs with gcloud Serverless

Python REST APIs with gcloud Serverless In the fast-paced world of application development, building robust…

5 days ago

Accelerate Your Web App Development Journey with Python and Docker

App Development with Python using Docker Are you an aspiring app developer looking to level…

6 days ago

Data Science Course Made Easy: Unlocking the Path to Success

Why Value-driven Data Science is the Key to Your Success In the world of data…

2 weeks ago

15 Machine Learning Projects: From Beginner to Pro

Harnessing the Power of Project-Based Learning and Python for Machine Learning Mastery In today's data-driven…

2 weeks ago

Unlock the Power of Python: 17 Project-Based Lessons from Zero to Machine Learning

Is Python the right choice for Machine Learning? Should you learn Python for Machine Learning?…

2 weeks ago