Learn how you can become a Python programmer in just 12 weeks.

    We respect your privacy. Unsubscribe at anytime.

    Support-Vector Machine: Classify using Sklearn

    Why it’s great to master the concept of separation and Support Vector Machines (SVM)?

    1. Versatile and powerful classification: SVM offers a robust algorithm for solving classification problems effectively, making it a valuable tool in machine learning.
    2. Noise tolerance and overfitting prevention: SVM can handle noisy data and is less prone to overfitting, ensuring more accurate and reliable results.
    3. Ability to handle linear and nonlinear tasks: SVM is not limited to linear separation; it can handle nonlinear classification tasks through kernel functions, expanding its applicability.
    4. Clear decision boundaries and interpretability: SVM produces clear decision boundaries between classes, making it easier to interpret and understand the model’s predictions.
    5. Scalability and efficiency: SVM is efficient for handling large datasets, enabling the processing of vast amounts of data in a reasonable timeframe.

    In this tutorial, we will cover the following?

    • Understanding the problem of separation: Learn about the significance of separating different classes of data in classification tasks.
    • Maximizing the distance: Explore the concept of maximizing the distance between classes to achieve better separation.
    • Working with examples: Engage with demonstrations and examples that illustrate the challenges and implications of the separation problem.
    • Using the SVM model: Gain practical knowledge and step-by-step guidance on applying the SVM model for data classification.
    • Exploring SVM results: Analyze and interpret the results obtained from SVM classification on various datasets.

    By mastering the concept of separation and becoming proficient in SVM, you will enhance your machine learning skills and be well-equipped to tackle classification problems with confidence and accuracy.

    Watch tutorial

    Step 1: What is Maximum Margin Separator?

    Boundary that maximizes the distances between any of the data points (Wiki)

    The problem can be illustrated as follows.

    Looking at the image to the left we separate all the red dots from the blue dots. This separation is perfect. But we know that this line might not be ideal if more dots are coming. Imagine another blue dot is added (right image).

    Could we have chosen the a better line of separation?

    As you see above – there is a better line to chose from the start. The one that is the longest from all points.

    Step 2: What is Support Vector Machine (SVM)?

    The Support Vector Machine solves the separation problem stated above.

    In machine learningsupport-vector machines (SVMs, also support-vector networks) are supervised learning models with associated learning algorithms that analyze data for classification and regression analysis (source: wiki).

    But what do we use SVM for?

    • Classify data.
    • Face detection
    • Classification of images
    • Handwriting recognition
    • Inverse geosounding problem
    • Facial expression
    • Text classification

    Among things.

    But basically, it is all about classifying data. That is, given a collection of data and a set of categories for this data, the model helps classifies data into the correct categories.

    Example of facial expression you might have categories of happy, sad, surprised, and angry. Then given an image of a face it can categorize it into one of the categories.

    How does it do it?

    Well, you need training data with correct labels.

    In this tutorial we will make a gentle introduction to classification based on simple data.

    Step 3: Gender classification based on height and heir length

    Let’s consider the a list of measured height and hair lengths with the given gender.

    import pandas as pd
    url = 'https://raw.githubusercontent.com/LearnPythonWithRune/MachineLearningWithPython/main/files/gender.csv'
    data = pd.read_csv(url)
    print(data.head())
    

    Resulting in this.

       Height  Hair length Gender
    0     151           99      F
    1     193            8      M
    2     150          123      F
    3     176            0      M
    4     188           11      M
    

    Step 4: Visualize the data

    You can visualize the result as follows.

    import pandas as pd
    import matplotlib.pyplot as plt
    url = 'https://raw.githubusercontent.com/LearnPythonWithRune/MachineLearningWithPython/main/files/gender.csv'
    data = pd.read_csv(url)
    data['Class'] = data['Gender'].apply(lambda x: 'r' if x == 'F' else 'b')
    data = data.iloc[:25]
    fig, ax = plt.subplots()
    ax.scatter(x=data['Height'], y=data['Hair length'], c=data['Class'])
    plt.show()
    

    Where we only keep the first 25 points to simplify the plot.

    Step 5: Creating a SVC model

    We will use Sklearns SVC (Support Vector Classification (docs)) model to fit the data.

    import pandas as pd
    import numpy as np
    from sklearn import svm
    url = 'https://raw.githubusercontent.com/LearnPythonWithRune/MachineLearningWithPython/main/files/gender.csv'
    data = pd.read_csv(url)
    data['Class'] = data['Gender'].apply(lambda x: 'r' if x == 'F' else 'b')
    X = data[['Height', 'Hair length']]
    y = data['Gender']
    y = np.array([0 if gender == 'M' else 1 for gender in y])
    clf = svm.SVC(kernel='linear')
    clf.fit(X, y)
    

    Step 6: Visualize the model

    We create a “box” to color the model prediction.

    import pandas as pd
    import numpy as np
    from sklearn import svm
    import matplotlib.pyplot as plt
    url = 'https://raw.githubusercontent.com/LearnPythonWithRune/MachineLearningWithPython/main/files/gender.csv'
    data = pd.read_csv(url)
    data['Class'] = data['Gender'].apply(lambda x: 'r' if x == 'F' else 'b')
    X = data[['Height', 'Hair length']]
    y = data['Gender']
    y = np.array([0 if gender == 'M' else 1 for gender in y])
    clf = svm.SVC(kernel='linear')
    clf.fit(X, y)
    X_test = np.random.rand(10000, 2)
    X_test = X_test*(70, 140) + (140, 0)
    y_pred = clf.predict(X_test)
    fig, ax = plt.subplots()
    ax.scatter(x=X_test[:,0], y=X_test[:,1], c=y_pred, alpha=.25)
    y_color = ['r' if value == 0 else 'b' for value in y]
    ax.scatter(x=X['Height'], y=X['Hair length'], c=y_color)
    plt.show()
    

    Resulting in.

    Want to learn more?

    In the next lesson you will learn how to use Multiple Linear Regression.

    This is part of a FREE 10h Machine Learning course with Python.

    • 15 video lessons – which explain Machine Learning concepts, demonstrate models on real data, introduce projects and show a solution (YouTube playlist).
    • 30 JuPyter Notebooks – with the full code and explanation from the lectures and projects (GitHub).
    • 15 projects – with step guides to help you structure your solutions and solution explained in the end of video lessons (GitHub).

    Python for Finance: Unlock Financial Freedom and Build Your Dream Life

    Discover the key to financial freedom and secure your dream life with Python for Finance!

    Say goodbye to financial anxiety and embrace a future filled with confidence and success. If you’re tired of struggling to pay bills and longing for a life of leisure, it’s time to take action.

    Imagine breaking free from that dead-end job and opening doors to endless opportunities. With Python for Finance, you can acquire the invaluable skill of financial analysis that will revolutionize your life.

    Make informed investment decisions, unlock the secrets of business financial performance, and maximize your money like never before. Gain the knowledge sought after by companies worldwide and become an indispensable asset in today’s competitive market.

    Don’t let your dreams slip away. Master Python for Finance and pave your way to a profitable and fulfilling career. Start building the future you deserve today!

    Python for Finance a 21 hours course that teaches investing with Python.

    Learn pandas, NumPy, Matplotlib for Financial Analysis & learn how to Automate Value Investing.

    “Excellent course for anyone trying to learn coding and investing.” – Lorenzo B.

    Leave a Comment