Master the Power of Machine Learning Classification: Unlock Insights through Categorization
Understanding the fundamentals of machine learning and classification is essential for leveraging the full potential of data-driven decision making. By mastering the concepts of machine learning and classification, and applying them using popular libraries like pandas and scikit-learn (Sklearn), you can effectively analyze and categorize data, enabling valuable insights and informed decision-making.
Why It’s Great to Master Machine Learning Classification:
- Enhanced Decision Making: Classification algorithms enable accurate categorization of data, empowering you to make informed decisions based on patterns and relationships.
- Versatile Applications: Classification techniques find applications across diverse fields, including finance, healthcare, marketing, and more, offering endless opportunities for data-driven solutions.
- Automation and Efficiency: By automating the classification process, you can efficiently analyze large datasets and generate predictions at scale.
- Predictive Power: Classification algorithms allow you to predict the category or class labels of unseen data, facilitating accurate forecasting and proactive planning.
- Feature Importance: Through classification, you gain insights into the relative importance of different features in determining the class labels, aiding in feature selection and interpretation.
Topics Covered in This Guide
- Introduction to Machine Learning: Explore the foundations of machine learning, its underlying principles, and the various techniques used for data analysis and prediction.
- Understanding Classification: Gain a comprehensive understanding of classification, its goals, and how it enables the categorization of data into distinct classes.
- Classification Algorithms: Dive into popular classification algorithms, including Decision Trees, Logistic Regression, Support Vector Machines (SVM), and k-Nearest Neighbors (k-NN).
- Hands-on Examples: Apply classification techniques using pandas and Sklearn to real-world datasets, demonstrating how to preprocess data, train models, and make predictions.
Unlock the potential of machine learning classification and embark on a journey to extract meaningful insights from your data.
Step 1: What is Machine Learning?

- In the classical computing model every thing is programmed into the algorithms.
- This has the limitation that all decision logic need to be understood before usage.
- And if things change, we need to modify the program.
- With the modern computing model (Machine Learning) this paradigm is changes.
- We feed the algorithms (models) with data.
- Based on that data, the algorithms (models) make decisions in the program.
Machine Learning with Python – for Beginners
Machine Learning with Python is a 10+ hours FREE course – a journey from zero to mastery.
- The course consist of the following content.
- 15 video lessons – which explain Machine Learning concepts, demonstrate models on real data, introduce projects and show a solution.
- 30 JuPyter Notebooks – with the full code and explanation from the lectures and projects (GitHub).
- 15 projects – with step guides to help you structure your solutions and solution explained in the end of video lessons.
Step 2: How Machine Learning works
Machine learning is divided into two phases.
Phase 1: Learning

- Get Data: Identify relevant data for the problem you want to solve. This data set should represent the type of data that the Machine Learn model will use to predict from in Phase 2 (predction).
- Pre-processing: This step is about cleaning up data. While the Machine Learning is awesome, it cannot figure out what good data looks like. You need to do the cleaning as well as transforming data into a desired format.
- Train model: This is where the magic happens, the learning step (Train model). There are three main paradigms in machine learning.
- Supervised: where you tell the algorithm what categories each data item is in. Each data item from the training set is tagged with the right answer.
- Unsupervised: is when the learning algorithm is not told what to do with it and it should make the structure itself.
- Reinforcement: teaches the machine to think for itself based on past action rewards.
- Test model: Finally, the testing is done to see if the model is good. The training data was divided into a test set and training set. The test set is used to see if the model can predict from it. If not, a new model might be necessary.
Phase 2: Prediction

Step 3: What is Supervised Learning
Supervised Learning
- Given a dataset of input-output pairs, learn a function to map inputs to outputs
- There are different tasks – but we start to focus on Classification
Classification
Step 4: Example with Iris Flower Dataset
The Iris Flower dataset is one of the datasets everyone has to work with.
- Kaggle Iris Flower Dataset
- Consists of three classes:
Iris-setosa
,Iris-versicolor
, andIris-virginica
- Given depedent features can we predict class
import pandas as pd
data = pd.read_csv('https://raw.githubusercontent.com/LearnPythonWithRune/DataScienceWithPython/main/files/iris.csv', index_col=0)
print(data.head())

Step 5: Create a Machine Learning Model
- A Few Machine Learning Models
SVC
C-Support Vector Classification.KNeighborsClassifier
Classifier implementing the k-nearest neighbors vote.
The Machine Learning is divided into a few steps – including dividing it into train and test dataset. The train dataset is used to train the model, while the test dataset is used to check the accuracy of the model.
- Steps
- Step 1: Assign independent features (those predicting) to
X
- Step 2: Assign classes (labels/dependent features) to
y
- Step 3: Divide into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
- Step 4: Create the model
svc = SVC()
- Step 5: Fit the model
svc.fit(X_train, y_train)
- Step 6: Predict with the model
y_pred = svc.predict(X_test)
- Step 7: Test the accuracy
accuracy_score(y_test, y_pred)
- Step 1: Assign independent features (those predicting) to
Code example here.
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score
X = data.drop('Species', axis=1)
y = data['Species']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
svc = SVC()
svc.fit(X_train, y_train)
y_pred = svc.predict(X_test)
accuracy_score(y_test, y_pred)
This gives an accurate model.
You can do the same with KNeighborsClassifier.
from sklearn.neighbors import KNeighborsClassifier
kn = KNeighborsClassifier()
kn.fit(X_train, y_train)
y_pred = kn.predict(X_test)
accuracy_score(y_test, y_pred)
Step 6: Find the most important features
permutation_importance
Permutation importance for feature evaluation.- Use the
permutation_importance
to calculate it.perm_importance = permutation_importance(svc, X_test, y_test)
- The results will be found in
perm_importance.importances_mean
from sklearn.inspection import permutation_importance
perm_importance = permutation_importance(svc, X_test, y_test)
perm_importance.importances_mean
Visualize the features by importance
- The most important features are given by
perm_importance.importances_mean.argsort()
- HINT: assign it to
sorted_idx
- HINT: assign it to
- To visualize it we can create a DataFrame
pd.DataFrame(perm_importance.importances_mean[sorted_idx], X_test.columns[sorted_idx], columns=['Value'])
- Then make a
barh
plot (usefigsize
)
sorted_idx = perm_importance.importances_mean.argsort()
df = pd.DataFrame(perm_importance.importances_mean[sorted_idx], X_test.columns[sorted_idx], columns=['Value'])
df.plot.barh()

color_map = {'Iris-setosa': 'b', 'Iris-versicolor': 'r', 'Iris-virginica': 'y'}
colors = data['Species'].apply(lambda x: color_map[x])
data.plot.scatter(x='PetalLengthCm', y='PetalWidthCm', c=colors)

Want to learn more?
Want to learn more about Data Science to become a successful Data Scientist?
In the next lesson you will learn How to make Feature Scaling with pandas DataFrames in this Data Science course.
This is one lesson of a 15 part Expert Data Science Blueprint course with the following resources.How to make Feature Scaling with pandas DataFrames
- 15 video lessons – covers the Data Science Workflow and concepts, demonstrates everything on real data, introduce projects and shows a solution (YouTube video).
- 30 JuPyter Notebooks – with the full code and explanation from the lectures and projects (GitHub).
- 15 projects – structured with the Data Science Workflow and a solution explained in the end of video lessons (GitHub).
Python Circle
Do you know what the 5 key success factors every programmer must have?
How is it possible that some people become programmer so fast?
While others struggle for years and still fail.
Not only do they learn python 10 times faster they solve complex problems with ease.
What separates them from the rest?
I identified these 5 success factors that every programmer must have to succeed:
- Collaboration: sharing your work with others and receiving help with any questions or challenges you may have.
- Networking: the ability to connect with the right people and leverage their knowledge, experience, and resources.
- Support: receive feedback on your work and ask questions without feeling intimidated or judged.
- Accountability: stay motivated and accountable to your learning goals by surrounding yourself with others who are also committed to learning Python.
- Feedback from the instructor: receiving feedback and support from an instructor with years of experience in the field.
I know how important these success factors are for growth and progress in mastering Python.
That is why I want to make them available to anyone struggling to learn or who just wants to improve faster.
With the Python Circle community, you can take advantage of 5 key success factors every programmer must have.

Be part of something bigger and join the Python Circle community.