Step 1: Learn what is unsupervised machine learning?
An unsupervised machine learning model takes unlabelled (or categorised) data and lets the algorithm determined the answer for us.
The unsupervised machine learning model data without apparent structures and tries to identify some patterns itself to create categories.
Step 2: Understand the main types of unsupervised machine learning
There are two main types of unsupervised machine learning types.
- Clustering: Is used for grouping data into categories without knowing any labels before hand.
- Association: Is a rule-based for discovering interesting relations between variables in large databases.
In clustering the main algorithms used are K-means, hierarchy clustering, and hidden Markov model.
And in the association the main algorithm used are Apriori and FP-growth.
Step 3: How does K-means work
The K-means works in iterative steps
The k-means algorithm starts is an NP-hard problem, which mean there is no efficient way to solve in the general case. For this problem there are heuristics algorithms that converge fast to local optimum, which means you can find some optimum fast, but it might not be the best one, but often they can do just fine.
How does the algorithm work.
- Step 1: Start by a set of k means. These can be chosen by taking k random point from the dataset (called the Random Partition initialisation method).
- Step 2: Group each data point into the cluster of the nearest mean. Hence, each data point will be assigned to exactly one cluster.
- Step 3: Recalculate the the means (also called centroids) to converge towards local optimum.
Steps 2 and 3 are repeated until the grouping in Step 2 does not change any more.
Step 4: A simple Python example with the k-means algorithm
In this example we are going to start assuming you have the basic knowledge how to install the needed libraries. If not, then see the following article.
First of, you need to import the needed libraries.
import numpy as np import matplotlib.pyplot as plt from matplotlib import style from sklearn.cluster import KMeans
In the first basic example we are only going to plot some points on a graph.
style.use('ggplot') x = [1, 2, 0.3, 9.2, 2.4, 9, 12] y = [2, 4, 2.5, 8.5, 0.3, 11, 10] plt.scatter(x, y) plt.show()
The first line sets a style of the graph. Then we have the coordinates in the arrays x and y. This format is used to feed the scatter.
An advantage of plotting the points before you figure out how many clusters you want to use. Here it looks like there are two “groups” of plots, which translates into using to clusters.
To continue, we want to use the k means algorithm with two clusters.
import numpy as np import matplotlib.pyplot as plt from matplotlib import style from sklearn.cluster import KMeans style.use('ggplot') x = [1, 2, 0.3, 9.2, 2.4, 9, 12] y = [2, 4, 2.5, 8.5, 0.3, 11, 10] # We need to transform the input coordinates to plot use the k means algorithm X =  for i in range(len(x)): X.append([x[i], y[i]]) X = np.array(X) # The number of clusters kmeans = KMeans(n_clusters=2) kmeans.fit(X) labels = kmeans.labels_ # Then we want to have different colors for each type. colors = ['g.', 'r.'] for i in range(len(X)): # And plot them one at the time plt.plot(X[i], X[i], colors[labels[i]], markersize=10) # Plot the centres (or means) plt.scatter(centroids[:, 0], centroids[:, 1], marker= "x", s=150, linewidths=5, zorder=10) plt.show()
This results in the following result.
Considerations when using K-Means algorithm
We could have changed to use 3 clusters. That would have resulted in the following output.
This is not optimal for this dataset, but could be hard to predict without this visual representation of the dataset.
Uses of K-Means algorithm
Here are some interesting uses of the K-means algorithms:
- Personalised marketing to users
- Identifying fake news
- Spam filter in your inbox
Python for Finance: Unlock Financial Freedom and Build Your Dream Life
Discover the key to financial freedom and secure your dream life with Python for Finance!
Say goodbye to financial anxiety and embrace a future filled with confidence and success. If you’re tired of struggling to pay bills and longing for a life of leisure, it’s time to take action.
Imagine breaking free from that dead-end job and opening doors to endless opportunities. With Python for Finance, you can acquire the invaluable skill of financial analysis that will revolutionize your life.
Make informed investment decisions, unlock the secrets of business financial performance, and maximize your money like never before. Gain the knowledge sought after by companies worldwide and become an indispensable asset in today’s competitive market.
Don’t let your dreams slip away. Master Python for Finance and pave your way to a profitable and fulfilling career. Start building the future you deserve today!
Python for Finance a 21 hours course that teaches investing with Python.
Learn pandas, NumPy, Matplotlib for Financial Analysis & learn how to Automate Value Investing.
“Excellent course for anyone trying to learn coding and investing.” – Lorenzo B.