Support-Vector Machine: Classify using Sklearn

What will we cover?

In this tutorial we will cover the following.

  • Learn about the problem of seperation
  • The idea to maximize the distance
  • Work with examples to demonstrate the issue
  • Use the Support Vector Machine (SVM) model on data.
  • Explore the result of SVM on classification data.

Step 1: What is Maximum Margin Separator?

Boundary that maximizes the distances between any of the data points (Wiki)

The problem can be illustrated as follows.

Looking at the image to the left we separate all the red dots from the blue dots. This separation is perfect. But we know that this line might not be ideal if more dots are coming. Imagine another blue dot is added (right image).

Could we have chosen the a better line of separation?

As you see above – there is a better line to chose from the start. The one that is the longest from all points.

Step 2: What is Support Vector Machine (SVM)?

The Support Vector Machine solves the separation problem stated above.

In machine learningsupport-vector machines (SVMs, also support-vector networks) are supervised learning models with associated learning algorithms that analyze data for classification and regression analysis (source: wiki).

But what do we use SVM for?

  • Classify data.
  • Face detection
  • Classification of images
  • Handwriting recognition
  • Inverse geosounding problem
  • Facial expression
  • Text classification

Among things.

But basically, it is all about classifying data. That is, given a collection of data and a set of categories for this data, the model helps classifies data into the correct categories.

Example of facial expression you might have categories of happy, sad, surprised, and angry. Then given an image of a face it can categorize it into one of the categories.

How does it do it?

Well, you need training data with correct labels.

In this tutorial we will make a gentle introduction to classification based on simple data.

Step 3: Gender classification based on height and heir length

Let’s consider the a list of measured height and hair lengths with the given gender.

import pandas as pd
url = 'https://raw.githubusercontent.com/LearnPythonWithRune/MachineLearningWithPython/main/files/gender.csv'
data = pd.read_csv(url)
print(data.head())

Resulting in this.

   Height  Hair length Gender
0     151           99      F
1     193            8      M
2     150          123      F
3     176            0      M
4     188           11      M

Step 4: Visualize the data

You can visualize the result as follows.

import pandas as pd
import matplotlib.pyplot as plt
url = 'https://raw.githubusercontent.com/LearnPythonWithRune/MachineLearningWithPython/main/files/gender.csv'
data = pd.read_csv(url)
data['Class'] = data['Gender'].apply(lambda x: 'r' if x == 'F' else 'b')
data = data.iloc[:25]
fig, ax = plt.subplots()
ax.scatter(x=data['Height'], y=data['Hair length'], c=data['Class'])
plt.show()

Where we only keep the first 25 points to simplify the plot.

Step 5: Creating a SVC model

We will use Sklearns SVC (Support Vector Classification (docs)) model to fit the data.

import pandas as pd
import numpy as np
from sklearn import svm
url = 'https://raw.githubusercontent.com/LearnPythonWithRune/MachineLearningWithPython/main/files/gender.csv'
data = pd.read_csv(url)
data['Class'] = data['Gender'].apply(lambda x: 'r' if x == 'F' else 'b')
X = data[['Height', 'Hair length']]
y = data['Gender']
y = np.array([0 if gender == 'M' else 1 for gender in y])
clf = svm.SVC(kernel='linear')
clf.fit(X, y)

Step 6: Visualize the model

We create a “box” to color the model prediction.

import pandas as pd
import numpy as np
from sklearn import svm
import matplotlib.pyplot as plt
url = 'https://raw.githubusercontent.com/LearnPythonWithRune/MachineLearningWithPython/main/files/gender.csv'
data = pd.read_csv(url)
data['Class'] = data['Gender'].apply(lambda x: 'r' if x == 'F' else 'b')
X = data[['Height', 'Hair length']]
y = data['Gender']
y = np.array([0 if gender == 'M' else 1 for gender in y])
clf = svm.SVC(kernel='linear')
clf.fit(X, y)
X_test = np.random.rand(10000, 2)
X_test = X_test*(70, 140) + (140, 0)
y_pred = clf.predict(X_test)
fig, ax = plt.subplots()
ax.scatter(x=X_test[:,0], y=X_test[:,1], c=y_pred, alpha=.25)
y_color = ['r' if value == 0 else 'b' for value in y]
ax.scatter(x=X['Height'], y=X['Hair length'], c=y_color)
plt.show()

Resulting in.

Want to learn more?

This is part of a FREE 10h Machine Learning course with Python.

  • 15 video lessons – which explain Machine Learning concepts, demonstrate models on real data, introduce projects and show a solution (YouTube playlist).
  • 30 JuPyter Notebooks – with the full code and explanation from the lectures and projects (GitHub).
  • 15 projects – with step guides to help you structure your solutions and solution explained in the end of video lessons (GitHub).

Leave a Reply

%d bloggers like this: