What will we cover?
- Understand how Reinforcement Learning works
- Learn about Agent and Environment
- How it iterates and gets rewards based on action
- How to continuously learn new things
- Create own Reinforcement Learning from scratch
Step 1: Reinforcement Learning simply explained
Reinforcement Learning is like training a dog. You and the dog talk different languages. This makes it difficult to explain the dog what you want.
A common way to train a dog is like Reinforcement Learning. When the dog does something good, it get’s a reward. This teaches the dog that you want it to do it.
Said differently, if we relate it to the illustration above. The Agent is the dog. The dog is exposed to an Environment called a state. Based on this Agent (the dog) takes an Action. Based on whether you (the owner) likes the Action, you Reward the Agent.
The goal of the Agent is to get the most Reward. This way it makes it possible for you the owner to get the desired behaviour with adjusting the Reward according to the Actions.
Step 2: Markov Decision Process
The model for decision-making represents States (from the Environment), Actions (from the Agent), and the Rewards.
Written a bit mathematical.
- S is the set of States
- Actions(s) is the set of Actions when in state s
- The transition model is P(s´, s, a)
- The Reward function R(s, a, s’)
Step 3: Q-Learning
Q-learning is a model-free reinforcement learning algorithm to learn the value of an action in a particular state. It does not require a model of the environment (hence “model-free”), and it can handle problems with stochastic transitions and rewards without requiring adaptations. (wiki)
This can be modeled by a learning function Q(s, a), which estimates the value of performing action a when in state s.
It works as follows
- Start with Q(s, a) = 0 for all s, a
- Update Q when we take an action
The idea behind it is to either explore or exploit
- With probability ϵ take a random move
- Otherwise, take action 𝑎a with maximum 𝑄(𝑠,𝑎)
Let’s demonstrate it with code.
Step 3: Code Example
Assume we have the following Environment
- You start at a random point.
- You can either move left or right.
- You loose if you hit a red box
- You win if you hit the green box
Quite simple, but how can you program an Agent using Reinforcement Learning? And how can you do it from scratch.
The great way is to use an object representing the field (environment).
To implement it all there are some background resources if needed.
- Libraries used
- Functionality and concepts used
- Object-Oriented Programming (OOP): Lecture on Object Oriented Programming
import numpy as np import random class Field: def __init__(self): self.states = [-1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0] self.state = random.randrange(0, len(self.states)) def done(self): if self.states[self.state] != 0: return True else: return False # action: 0 => left # action: 1 => right def get_possible_actions(self): actions = [0, 1] if self.state == 0: actions.remove(0) if self.state == len(self.states) - 1: actions.remove(1) return actions def update_next_state(self, action): if action == 0: if self.state == 0: return self.state, -10 self.state -= 1 if action == 1: if self.state == len(self.states) - 1: return self.state, -10 self.state += 1 reward = self.states[self.state] return self.state, reward field = Field() q_table = np.zeros((len(field.states), 2)) alpha = .5 epsilon = .5 gamma = .5 for _ in range(10000): field = Field() while not field.done(): actions = field.get_possible_actions() if random.uniform(0, 1) < epsilon: action = random.choice(actions) else: action = np.argmax(q_table[field.state]) cur_state = field.state next_state, reward = field.update_next_state(action) q_table[cur_state, action] = (1 - alpha)*q_table[cur_state, action] + alpha*(reward + gamma*np.max(q_table[next_state]))
Step 4: A more complex Example
Check out the video to see a More complex example.
Want to learn more?
This is part of a FREE 10h Machine Learning course with Python.
- 15 video lessons – which explain Machine Learning concepts, demonstrate models on real data, introduce projects and show a solution (YouTube playlist).
- 30 JuPyter Notebooks – with the full code and explanation from the lectures and projects (GitHub).
- 15 projects – with step guides to help you structure your solutions and solution explained in the end of video lessons (GitHub).