Why it’s great to master Reinforcement Learning?
Reinforcement Learning offers several advantages and opens up exciting opportunities in the field of machine learning:
- Versatile learning approach: Reinforcement Learning provides a versatile approach to learning, where agents can learn from their interactions with an environment. This flexibility allows them to adapt and tackle a wide range of complex tasks.
- Decision-making in dynamic environments: Reinforcement Learning equips machines with the ability to make decisions in dynamic and uncertain environments. By mastering this technique, you gain the skills to build intelligent systems that can navigate and excel in complex real-world scenarios.
- Continuous learning and adaptation: Reinforcement Learning emphasizes continuous learning and adaptation. Agents can update their strategies based on feedback and experiences, making them capable of handling evolving environments and improving their performance over time.
- Applications in various domains: Reinforcement Learning has broad applications across different domains. It is used in robotics, game playing, recommendation systems, autonomous vehicles, and more. Mastering this technique opens doors to diverse career opportunities and exciting projects.
What will be covered in this lesson?
In this lesson, you will explore the foundations and practical aspects of Reinforcement Learning. The following topics will be covered:
- Understanding how Reinforcement Learning works: You will gain a comprehensive understanding of the fundamental concepts and principles of Reinforcement Learning, including the agent-environment interaction and the reward mechanism.
- Learning about agents and environments: You will dive into the core components of Reinforcement Learning, understanding the role of agents and environments in the learning process. This knowledge forms the basis for building effective RL systems.
- Iteration and reward mechanism: You will learn how the iterative process of Reinforcement Learning unfolds, as agents take actions, receive rewards, and update their strategies. This iterative feedback loop is crucial for agents to learn and improve their decision-making abilities.
- Continuous learning and exploration: The lesson will cover techniques and algorithms that enable agents to continuously learn, explore new strategies, and adapt to changing environments. You will discover methods for balancing exploration and exploitation to maximize long-term rewards.
- Building your own Reinforcement Learning model: Through hands-on exercises and practical examples, you will have the opportunity to build your own Reinforcement Learning model from scratch. This hands-on experience will solidify your understanding and provide you with the skills to apply RL in real-world scenarios.
By the end of this lesson, you will have a strong foundation in Reinforcement Learning and be equipped to apply these techniques to solve complex problems, paving the way for exciting opportunities in the field of machine learning.
Step 1: Reinforcement Learning simply explained

Reinforcement Learning is like training a dog. You and the dog talk different languages. This makes it difficult to explain the dog what you want.
A common way to train a dog is like Reinforcement Learning. When the dog does something good, it get’s a reward. This teaches the dog that you want it to do it.
Said differently, if we relate it to the illustration above. The Agent is the dog. The dog is exposed to an Environment called a state. Based on this Agent (the dog) takes an Action. Based on whether you (the owner) likes the Action, you Reward the Agent.
The goal of the Agent is to get the most Reward. This way it makes it possible for you the owner to get the desired behaviour with adjusting the Reward according to the Actions.
Step 2: Markov Decision Process
The model for decision-making represents States (from the Environment), Actions (from the Agent), and the Rewards.
Written a bit mathematical.
- S is the set of States
- Actions(s) is the set of Actions when in state s
- The transition model is P(sยด, s, a)
- The Reward function R(s, a, s’)
Step 3: Q-Learning
Q-learning is a model-free reinforcement learning algorithm to learn the value of an action in a particular state. It does not require a model of the environment (hence “model-free”), and it can handle problems with stochastic transitions and rewards without requiring adaptations. (wiki)
This can be modeled by a learning function Q(s, a), which estimates the value of performing action a when in state s.
It works as follows
- Start with Q(s, a) = 0 for all s, a
- Update Q when we take an action
๐(๐ ,๐)=๐(๐ ,๐)+๐ผ(Q(s,a)=Q(s,a)+ฮฑ(reward+๐พmax(๐ โฒ,๐โฒ)โ๐(๐ ,๐))=(1โ๐ผ)๐(๐ ,๐)+๐ผ(+ฮณmax(sโฒ,aโฒ)โQ(s,a))=(1โฮฑ)Q(s,a)+ฮฑ(reward+๐พmax(๐ โฒ,๐โฒ))+ฮณmax(sโฒ,aโฒ))
The ฯต-Greedy Decision Making
The idea behind it is to either explore or exploit
- With probability ฯต take a random move
- Otherwise, take action ๐a with maximum ๐(๐ ,๐)
Let’s demonstrate it with code.
Step 3: Code Example
Assume we have the following Environment

- You start at a random point.
- You can either move left or right.
- You loose if you hit a red box
- You win if you hit the green box
Quite simple, but how can you program an Agent using Reinforcement Learning? And how can you do it from scratch.
The great way is to use an object representing the field (environment).

To implement it all there are some background resources if needed.
Programming Notes:
- Libraries used
- numpy – scientific computing with Python (Lecture on NumPy)
- random – pseudo-random generators
- Functionality and concepts used
- Object-Oriented Programming (OOP): Lecture on Object Oriented Programming
What if there are more states?
import numpy as np
import random
class Field:
def __init__(self):
self.states = [-1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0]
self.state = random.randrange(0, len(self.states))
def done(self):
if self.states[self.state] != 0:
return True
else:
return False
# action: 0 => left
# action: 1 => right
def get_possible_actions(self):
actions = [0, 1]
if self.state == 0:
actions.remove(0)
if self.state == len(self.states) - 1:
actions.remove(1)
return actions
def update_next_state(self, action):
if action == 0:
if self.state == 0:
return self.state, -10
self.state -= 1
if action == 1:
if self.state == len(self.states) - 1:
return self.state, -10
self.state += 1
reward = self.states[self.state]
return self.state, reward
field = Field()
q_table = np.zeros((len(field.states), 2))
alpha = .5
epsilon = .5
gamma = .5
for _ in range(10000):
field = Field()
while not field.done():
actions = field.get_possible_actions()
if random.uniform(0, 1) < epsilon:
action = random.choice(actions)
else:
action = np.argmax(q_table[field.state])
cur_state = field.state
next_state, reward = field.update_next_state(action)
q_table[cur_state, action] = (1 - alpha)*q_table[cur_state, action] + alpha*(reward + gamma*np.max(q_table[next_state]))
Step 4: A more complex Example
Check out the video to see a More complex example.
Want to learn more?
In the next lesson you will learn Unsupervised Learning with k-Means Clustering.
This is part of a FREE 10h Machine Learning course with Python.
- 15 video lessons โ which explain Machine Learning concepts, demonstrate models on real data, introduce projects and show a solution (YouTube playlist).
- 30 JuPyter Notebooks โ with the full code and explanation from the lectures and projects (GitHub).
- 15 projects โ with step guides to help you structure your solutions and solution explained in the end of video lessons (GitHub).
Python for Finance: Unlock Financial Freedom and Build Your Dream Life
Discover the key to financial freedom and secure your dream life with Python for Finance!
Say goodbye to financial anxiety and embrace a future filled with confidence and success. If you’re tired of struggling to pay bills and longing for a life of leisure, it’s time to take action.
Imagine breaking free from that dead-end job and opening doors to endless opportunities. With Python for Finance, you can acquire the invaluable skill of financial analysis that will revolutionize your life.
Make informed investment decisions, unlock the secrets of business financial performance, and maximize your money like never before. Gain the knowledge sought after by companies worldwide and become an indispensable asset in today’s competitive market.
Don’t let your dreams slip away. Master Python for Finance and pave your way to a profitable and fulfilling career. Start building the future you deserve today!
Python for Finance a 21 hours course that teaches investing with Python.
Learn pandas, NumPy, Matplotlib for Financial Analysis & learn how to Automate Value Investing.
“Excellent course for anyone trying to learn coding and investing.” – Lorenzo B.
