Learn how you can become a Python programmer in just 12 weeks.

    We respect your privacy. Unsubscribe at anytime.

    What is Machine Learning? Exemplified with k-Nearest-Neighbors Classifier (KNN) to Predict Weather Forecast

    What will we cover?

    This tutorial will explain what Machine Learning is by comparing it to classical programming. Then how Machine Learning works and the three main categories of Machine Learning: Supervised, Unsupervised, and Reinforcement Learning.

    Finally, we will explore a Supervised Machine Learning model called k-Nearest-Neighbors (KNN) classifier to get an understanding through practical application.

    Why it’s great to master Machine Learning and Classification with the π‘˜-Nearest-Neighbors (KNN) Classifier?

    1. Harnessing the power of data: Machine Learning enables you to extract valuable insights and patterns from vast amounts of data, unlocking the potential for data-driven decision making.
    2. Automation and efficiency: Machine Learning algorithms can automate complex tasks and processes, leading to increased efficiency and productivity in various domains.
    3. Versatility: Machine Learning techniques can be applied across diverse industries and problem domains, ranging from healthcare and finance to marketing and autonomous systems.
    4. Enhanced decision-making capabilities: By mastering Machine Learning and classification, you gain the ability to make accurate predictions and classify new data points, empowering you to make informed decisions.
    5. Scalability and adaptability: Machine Learning models can scale to handle large datasets and adapt to evolving patterns, making them suitable for dynamic and evolving environments.

    In this tutorial, we will cover the following

    • Understanding the difference between Classical Computing and Machine Learning: Learn about the fundamental distinctions between traditional programming approaches and Machine Learning algorithms, highlighting the paradigm shift towards data-driven decision making.
    • Exploring the three main categories of Machine Learning: Gain an overview of Supervised Learning, Unsupervised Learning, and Reinforcement Learning, understanding their underlying principles and use cases.
    • Diving into Supervised Learning: Focus on the Supervised Learning paradigm, where models learn from labeled training data to make predictions or classifications.
    • Classification with the π‘˜-Nearest-Neighbors Classifier (KNN): Deep dive into the KNN algorithm, a popular Supervised Learning technique for classification tasks.
    • Understanding the process of classifying data: Explore the workflow of classifying new data points based on their similarity to existing labeled examples.
    • Addressing challenges with data cleaning: Learn about the importance of data cleaning and preprocessing in preparing the dataset for the classification task.
    • Creating a project on real data with the π‘˜-Nearest-Neighbor Classifier: Apply the KNN algorithm to a practical project using real-world data, gaining hands-on experience in classification tasks.

    Step 1: What is Machine Learning?

    Classical Computing vs Machine Learning
    • In theΒ classical computing modelΒ every thing is programmed into the algorithms.
      • This has the limitation that all decision logic need to be understood before usage. 
      • And if things change, we need to modify the program.
    • With theΒ modern computing model (Machine Learning)Β this paradigm is changes.
      • We feed the algorithms (models) with data.
      • Based on that data, the algorithms (models) make decisions in the program.

    Imagine you needed to teach your child how to bike a bicycle.

    In the classical computing sense, you will instruct your child how to use a specific muscle in all cases. That is, if you lose balance to the right, then activate the your third muscle in your right leg. You need instructions for all muscles in all situations.

    That is a lot of instructions and chances are, you forget specific situations.

    Machine Learning feeds the child data, that is it will fall, it will fail – but eventually, it will figure it out itself, without instructions on how to use the specific muscles in the body.

    Well, that is actually how most learn how to bike.

    Step 2: How Machine Learning Works

    On a high level, Machine Learning is divided into two phases.

    • Learning phase: Where the algorithm (model) learns in a training environment. Like, when you support your child learning to ride the bike, like catching the child while falling not to hit too hard.
    • Prediction phase: Where the algorithm (model) is applied on real data. This is when the child can bike on its own.

    The Learning Phase is often divided into a few steps.

    Phase 1: Learning
    • Get Data: Identify relevant data for the problem you want to solve. This data set should represent the type of data that the Machine Learn model will use to predict from in Phase 2 (predction).
    • Pre-processing: This step is about cleaning up data. While the Machine Learning is awesome, it cannot figure out what good data looks like. You need to do the cleaning as well as transforming data into a desired format.
    • Train model: This is where the magic happens, the learning step (Train model). There are three main paradigms in machine learning.
      • Supervised: where you tell the algorithm what categories each data item is in. Each data item from the training set is tagged with the right answer.
      • Unsupervised: is when the learning algorithm is not told what to do with it and it should make the structure itself.
      • Reinforcement: teaches the machine to think for itself based on past action rewards.
    • Test model: Finally, the testing is done to see if the model is good. The training data was divided into a test set and training set. The test set is used to see if the model can predict from it. If not, a new model might be necessary.

    The Prediction Phase can be illustrated as follows.

    Phase 2: Prediction

    Step 3: Supervised Learning explained with Example

    Supervised learning can be be explained as follows.

    Given a dataset of input-output pairs, learn a function to map inputs to outputs.

    There are different tasks – but we start to focus on Classification. Where supervised classification is the task of learning a function mapping an input point to a discrete category.

    Now the best way to understand new things is to relate it to something we already understand.

    Consider the following data.

    Given the Humidity and Pressure for a given day can we predict if it will rain or not.

    How will a Supervised Classification algorithm work?

    Learning Phase: Given a set of historical data to train the model – like the data above, given rows of Humidity and Pressure and the label Rain or No Rain. Let the algorithm work with the data and figure it out.

    Note: we leave out pre-processing and testing the model here.

    Prediction Phase: Let the algorithm get new data – like in the morning you read Humidity and Pressure and let the algorithm predict if will rain or not that given day.

    Written mathematically, it is the task to find a function 𝑓 as follows.

    Ideally: π‘“(β„Žπ‘’π‘šπ‘–π‘‘π‘–π‘‘π‘¦,π‘π‘Ÿπ‘’π‘ π‘ π‘’π‘Ÿπ‘’)

    Examples:

    • 𝑓(93,999.7) = Rain
    • 𝑓(49,1015.5) = No Rain
    • 𝑓(79,1031.1) = No Rain

    Goal: Approximate the function π‘“ – the approximation function is often denoted β„Ž

    Step 4: Visualize the data we want to fit

    We will use pandas to work with data, which is a fast, powerful, flexible and easy to use open source data analysis and manipulation tool, built on top of the Python programming language.

    The data we want to work with can be downloaded from a here and stored locally. Or you can access it directly as follows.

    import pandas as pd
    file_dest = 'https://raw.githubusercontent.com/LearnPythonWithRune/MachineLearningWithPython/main/files/weather.csv'
    data = pd.read_csv(file_dest, parse_dates=True, index_col=0)
    

    First lets’s visualize the data we want to work with.

    import matplotlib.pyplot as plt
    import pandas as pd
    file_dest = 'https://raw.githubusercontent.com/LearnPythonWithRune/MachineLearningWithPython/main/files/weather.csv'
    data = pd.read_csv(file_dest, parse_dates=True, index_col=0)
    dataset = data[['Humidity3pm', 'Pressure3pm', 'RainTomorrow']]
    fig, ax = plt.subplots()
    dataset[dataset['RainTomorrow'] == 'No'].plot.scatter(x='Humidity3pm', y='Pressure3pm', c='b', alpha=.25, ax=ax)
    dataset[dataset['RainTomorrow'] == 'Yes'].plot.scatter(x='Humidity3pm', y='Pressure3pm', c='r', alpha=.25, ax=ax)
    plt.show()
    

    Resulting in.

    Blue dots is no rain, Red dots is rain

    The goal is to make a mode which can predict Blue or Red dots.

    Step 5: The k-Nearest-Neighbors Classifier

    Given an input, choose the class of nearest datapoint.

    π‘˜-Nearest-Neighbors Classification

    • Given an input, choose the most common class out of the π‘˜ nearest data points

    Let’s try to implement a model. We will use sklearn for that.

    import numpy as np
    from sklearn.model_selection import train_test_split
    from sklearn.neighbors import KNeighborsClassifier
    from sklearn.metrics import accuracy_score
    dataset_clean = dataset.dropna()
    X = dataset_clean[['Humidity3pm', 'Pressure3pm']]
    y = dataset_clean['RainTomorrow']
    y = np.array([0 if value == 'No' else 1 for value in y])
    neigh = KNeighborsClassifier()
    neigh.fit(X_train, y_train)
    y_pred = neigh.predict(X_test)
    accuracy_score(y_test, y_pred)
    

    This actually covers what you need. Make sure to have the dataset data from the previous step available here.

    To visualize the code you can run the following.

    fig, ax = plt.subplots()
    y_map = neigh.predict(X_map)
    ax.scatter(x=X_map[:,0], y=X_map[:,1], c=y_map, alpha=.25)
    plt.show()
    

    Want more help?

    Check out this video explaining all steps in more depth. Also, it includes a guideline for making your first project with Machine Learning along with a solution for it.

    Watch tutorial

    In the next lesson you will learn how to use Linear Classifier From Scratch Explained on Real Project.

    This is part of a FREE 10h Machine Learning course with Python.

    • 15 video lessons β€“ which explain Machine Learning concepts, demonstrate models on real data, introduce projects and show a solution (YouTube playlist).
    • 30 JuPyter Notebooks β€“ with the full code and explanation from the lectures and projects (GitHub).
    • 15 projects β€“ with step guides to help you structure your solutions and solution explained in the end of video lessons (GitHub).

    Python for Finance: Unlock Financial Freedom and Build Your Dream Life

    Discover the key to financial freedom and secure your dream life with Python for Finance!

    Say goodbye to financial anxiety and embrace a future filled with confidence and success. If you’re tired of struggling to pay bills and longing for a life of leisure, it’s time to take action.

    Imagine breaking free from that dead-end job and opening doors to endless opportunities. With Python for Finance, you can acquire the invaluable skill of financial analysis that will revolutionize your life.

    Make informed investment decisions, unlock the secrets of business financial performance, and maximize your money like never before. Gain the knowledge sought after by companies worldwide and become an indispensable asset in today’s competitive market.

    Don’t let your dreams slip away. Master Python for Finance and pave your way to a profitable and fulfilling career. Start building the future you deserve today!

    Python for Finance a 21 hours course that teaches investing with Python.

    Learn pandas, NumPy, Matplotlib for Financial Analysis & learn how to Automate Value Investing.

    “Excellent course for anyone trying to learn coding and investing.” – Lorenzo B.

    Leave a Comment