Machine Learning

Master Unsupervised Learning with k-Means Clustering

Why it’s great to master Unsupervised Learning?

Mastering Unsupervised Learning offers several benefits and expands your skills in the field of machine learning:

  • Broaden your understanding: Unsupervised Learning takes you beyond the boundaries of traditional supervised learning. By mastering this approach, you can explore and analyze large amounts of unlabeled data, uncovering hidden patterns and structures that may not be evident through other methods.
  • Organize data without prior knowledge: Unsupervised Learning enables you to organize and make sense of data without relying on predefined labels or prior knowledge. This capability is particularly valuable when working with unstructured or unlabeled datasets, providing insights into the underlying structure and relationships within the data.
  • Learn k-Means Clustering: In this lesson, you will gain a solid understanding of k-Means Clustering, a popular unsupervised learning algorithm. You will learn how it works and how to apply it to cluster data points based on their similarity, allowing you to identify distinct groups or clusters within your dataset.
  • Train a k-Means Cluster model: Through practical exercises, you will have the opportunity to train your own k-Means Cluster model. This hands-on experience will deepen your understanding of the algorithm and equip you with the skills to apply it to real-world datasets.

By mastering Unsupervised Learning and specifically k-Means Clustering, you will unlock new possibilities for data exploration, pattern discovery, and knowledge extraction from unlabeled data.

Watch tutorial

Step 1: What is Unsupervised Learning?

Machine Learning is often divided into 3 main categories.

  • Supervised: where you tell the algorithm what categories each data item is in. Each data item from the training set is tagged with the right answer.
  • Unsupervised: is when the learning algorithm is not told what to do with it and it should make the structure itself.
  • Reinforcement: teaches the machine to think for itself based on past action rewards.

Where we see that Unsupervised is one of the main groups.

Unsupervised learning is a type of algorithm that learns patterns from untagged data. The hope is that through mimicry, which is an important mode of learning in people, the machine is forced to build a compact internal representation of its world and then generate imaginative content from it. In contrast to supervised learning where data is tagged by an expert, e.g. as a “ball” or “fish”, unsupervised methods exhibit self-organization that captures patterns as probability densities…

https://en.wikipedia.org/wiki/Unsupervised_learning

Step 2: k-Means Clustering

What is clustering?

Organize a set of objects into groups in such a way that similar objects tend to be in the same group.

What is k-Means Clustering?

Algorithm for clustering data based on repeatedly assigning points to clusters and updating those clusters’ centers.

Example of how it works in steps.
  • First we chose random cluster centroids (hollow point), then assign points to neareast centroid.
  • Then we update the centroid to be centered to the points.
  • Repeat

This can be repeated a specific number of times or until only small change in centroids positions.

Step 3: Create an Example

Let’s create some random data to demonstrate it.

import numpy as np
import pandas as pd
from sklearn.cluster import KMeans
import matplotlib.pyplot as plt

# Generate some numbers
data = np.random.randn(400,2)
data[:100] += 5, 5
data[100:200] += 10, 10
data[200:300] += 10, 5
data[300:] += 5, 10

fig, ax = plt.subplots()

ax.scatter(x=data[:,0], y=data[:,1])
plt.show()

This shows some random data in 4 clusters.

Then the following code demonstrates how it works. You can change max_iter to be the number iteration – try to do it for 1, 2, 3, etc.

model = KMeans(n_clusters=4, init='random', random_state=42, max_iter=10, n_init=1)

model.fit(data)

y_pred = model.predict(data)

fig, ax = plt.subplots()
ax.scatter(x=data[:,0], y=data[:,1], c=y_pred)
ax.scatter(x=model.cluster_centers_[:,0], y=model.cluster_centers_[:,1], c='r')
plt.show()
After 1st iteration – the cluster centers are are no optimal
After 10 iteration it is all in place

Want to learn more?

In the next lesson you will learn about Artificial Neural Network.

This is part of a FREE 10h Machine Learning course with Python.

  • 15 video lessons – which explain Machine Learning concepts, demonstrate models on real data, introduce projects and show a solution (YouTube playlist).
  • 30 JuPyter Notebooks – with the full code and explanation from the lectures and projects (GitHub).
  • 15 projects – with step guides to help you structure your solutions and solution explained in the end of video lessons (GitHub).
Rune

Recent Posts

Build and Deploy an AI App

Build and Deploy an AI App with Python Flask, OpenAI API, and Google Cloud: In…

5 days ago

Building Python REST APIs with gcloud Serverless

Python REST APIs with gcloud Serverless In the fast-paced world of application development, building robust…

5 days ago

Accelerate Your Web App Development Journey with Python and Docker

App Development with Python using Docker Are you an aspiring app developer looking to level…

6 days ago

Data Science Course Made Easy: Unlocking the Path to Success

Why Value-driven Data Science is the Key to Your Success In the world of data…

2 weeks ago

15 Machine Learning Projects: From Beginner to Pro

Harnessing the Power of Project-Based Learning and Python for Machine Learning Mastery In today's data-driven…

2 weeks ago

Unlock the Power of Python: 17 Project-Based Lessons from Zero to Machine Learning

Is Python the right choice for Machine Learning? Should you learn Python for Machine Learning?…

2 weeks ago