What will we cover?
This tutorial will explain what Machine Learning is by comparing it to classical programming. Then how Machine Learning works and the three main categories of Machine Learning: Supervised, Unsupervised, and Reinforcement Learning.
Finally, we will explore a Supervised Machine Learning model called k-Nearest-Neighbors (KNN) classifier to get an understanding through practical application.
- Understand the difference between Classical Computing and Machine Learning
- Know the 3 main categories of Machine Learning
- Dive into Supervised Learning
- Classification with 𝑘-Nearest-Neighbors Classifier (KNN)
- How to classify data
- What are the challenges with cleaning data
- Create a project on real data with 𝑘-Nearest-Neighbor Classifier
Step 1: What is Machine Learning?
- In the classical computing model every thing is programmed into the algorithms.
- This has the limitation that all decision logic need to be understood before usage.
- And if things change, we need to modify the program.
- With the modern computing model (Machine Learning) this paradigm is changes.
- We feed the algorithms (models) with data.
- Based on that data, the algorithms (models) make decisions in the program.
Imagine you needed to teach your child how to bike a bicycle.
In the classical computing sense, you will instruct your child how to use a specific muscle in all cases. That is, if you lose balance to the right, then activate the your third muscle in your right leg. You need instructions for all muscles in all situations.
That is a lot of instructions and chances are, you forget specific situations.
Machine Learning feeds the child data, that is it will fall, it will fail – but eventually, it will figure it out itself, without instructions on how to use the specific muscles in the body.
Well, that is actually how most learn how to bike.
Step 2: How Machine Learning Works
On a high level, Machine Learning is divided into two phases.
- Learning phase: Where the algorithm (model) learns in a training environment. Like, when you support your child learning to ride the bike, like catching the child while falling not to hit too hard.
- Prediction phase: Where the algorithm (model) is applied on real data. This is when the child can bike on its own.
The Learning Phase is often divided into a few steps.
- Get Data: Identify relevant data for the problem you want to solve. This data set should represent the type of data that the Machine Learn model will use to predict from in Phase 2 (predction).
- Pre-processing: This step is about cleaning up data. While the Machine Learning is awesome, it cannot figure out what good data looks like. You need to do the cleaning as well as transforming data into a desired format.
- Train model: This is where the magic happens, the learning step (Train model). There are three main paradigms in machine learning.
- Supervised: where you tell the algorithm what categories each data item is in. Each data item from the training set is tagged with the right answer.
- Unsupervised: is when the learning algorithm is not told what to do with it and it should make the structure itself.
- Reinforcement: teaches the machine to think for itself based on past action rewards.
- Test model: Finally, the testing is done to see if the model is good. The training data was divided into a test set and training set. The test set is used to see if the model can predict from it. If not, a new model might be necessary.
The Prediction Phase can be illustrated as follows.
Step 3: Supervised Learning explained with Example
Supervised learning can be be explained as follows.
Given a dataset of input-output pairs, learn a function to map inputs to outputs.
There are different tasks – but we start to focus on Classification. Where supervised classification is the task of learning a function mapping an input point to a discrete category.
Now the best way to understand new things is to relate it to something we already understand.
Consider the following data.
Given the Humidity and Pressure for a given day can we predict if it will rain or not.
How will a Supervised Classification algorithm work?
Learning Phase: Given a set of historical data to train the model – like the data above, given rows of Humidity and Pressure and the label Rain or No Rain. Let the algorithm work with the data and figure it out.
Note: we leave out pre-processing and testing the model here.
Prediction Phase: Let the algorithm get new data – like in the morning you read Humidity and Pressure and let the algorithm predict if will rain or not that given day.
Written mathematically, it is the task to find a function 𝑓 as follows.
- 𝑓(93,999.7) = Rain
- 𝑓(49,1015.5) = No Rain
- 𝑓(79,1031.1) = No Rain
Goal: Approximate the function 𝑓 – the approximation function is often denoted ℎ
Step 4: Visualize the data we want to fit
The data we want to work with can be downloaded from a here and stored locally. Or you can access it directly as follows.
import pandas as pd file_dest = 'https://raw.githubusercontent.com/LearnPythonWithRune/MachineLearningWithPython/main/files/weather.csv' data = pd.read_csv(file_dest, parse_dates=True, index_col=0)
First lets’s visualize the data we want to work with.
import matplotlib.pyplot as plt import pandas as pd file_dest = 'https://raw.githubusercontent.com/LearnPythonWithRune/MachineLearningWithPython/main/files/weather.csv' data = pd.read_csv(file_dest, parse_dates=True, index_col=0) dataset = data[['Humidity3pm', 'Pressure3pm', 'RainTomorrow']] fig, ax = plt.subplots() dataset[dataset['RainTomorrow'] == 'No'].plot.scatter(x='Humidity3pm', y='Pressure3pm', c='b', alpha=.25, ax=ax) dataset[dataset['RainTomorrow'] == 'Yes'].plot.scatter(x='Humidity3pm', y='Pressure3pm', c='r', alpha=.25, ax=ax) plt.show()
The goal is to make a mode which can predict Blue or Red dots.
Step 5: The k-Nearest-Neighbors Classifier
Given an input, choose the class of nearest datapoint.
- Given an input, choose the most common class out of the 𝑘 nearest data points
Let’s try to implement a model. We will use sklearn for that.
import numpy as np from sklearn.model_selection import train_test_split from sklearn.neighbors import KNeighborsClassifier from sklearn.metrics import accuracy_score dataset_clean = dataset.dropna() X = dataset_clean[['Humidity3pm', 'Pressure3pm']] y = dataset_clean['RainTomorrow'] y = np.array([0 if value == 'No' else 1 for value in y]) neigh = KNeighborsClassifier() neigh.fit(X_train, y_train) y_pred = neigh.predict(X_test) accuracy_score(y_test, y_pred)
This actually covers what you need. Make sure to have the dataset data from the previous step available here.
To visualize the code you can run the following.
fig, ax = plt.subplots() y_map = neigh.predict(X_map) ax.scatter(x=X_map[:,0], y=X_map[:,1], c=y_map, alpha=.25) plt.show()
Want more help?
Check out this video explaining all steps in more depth. Also, it includes a guideline for making your first project with Machine Learning along with a solution for it.
This is part of a FREE 10h Machine Learning course with Python.
- 15 video lessons – which explain Machine Learning concepts, demonstrate models on real data, introduce projects and show a solution (YouTube playlist).
- 30 JuPyter Notebooks – with the full code and explanation from the lectures and projects (GitHub).
- 15 projects – with step guides to help you structure your solutions and solution explained in the end of video lessons (GitHub).