Learn how you can become a Python programmer in just 12 weeks.

    We respect your privacy. Unsubscribe at anytime.

    What is Classification – an Introduction to Machine Learning with pandas

    Master the Power of Machine Learning Classification: Unlock Insights through Categorization

    Understanding the fundamentals of machine learning and classification is essential for leveraging the full potential of data-driven decision making. By mastering the concepts of machine learning and classification, and applying them using popular libraries like pandas and scikit-learn (Sklearn), you can effectively analyze and categorize data, enabling valuable insights and informed decision-making.

    Why It’s Great to Master Machine Learning Classification:

    • Enhanced Decision Making: Classification algorithms enable accurate categorization of data, empowering you to make informed decisions based on patterns and relationships.
    • Versatile Applications: Classification techniques find applications across diverse fields, including finance, healthcare, marketing, and more, offering endless opportunities for data-driven solutions.
    • Automation and Efficiency: By automating the classification process, you can efficiently analyze large datasets and generate predictions at scale.
    • Predictive Power: Classification algorithms allow you to predict the category or class labels of unseen data, facilitating accurate forecasting and proactive planning.
    • Feature Importance: Through classification, you gain insights into the relative importance of different features in determining the class labels, aiding in feature selection and interpretation.

    Topics Covered in This Guide

    • Introduction to Machine Learning: Explore the foundations of machine learning, its underlying principles, and the various techniques used for data analysis and prediction.
    • Understanding Classification: Gain a comprehensive understanding of classification, its goals, and how it enables the categorization of data into distinct classes.
    • Classification Algorithms: Dive into popular classification algorithms, including Decision Trees, Logistic Regression, Support Vector Machines (SVM), and k-Nearest Neighbors (k-NN).
    • Hands-on Examples: Apply classification techniques using pandas and Sklearn to real-world datasets, demonstrating how to preprocess data, train models, and make predictions.

    Unlock the potential of machine learning classification and embark on a journey to extract meaningful insights from your data.

    Step 1: What is Machine Learning?

    • In the classical computing model every thing is programmed into the algorithms.
      • This has the limitation that all decision logic need to be understood before usage. 
      • And if things change, we need to modify the program.
    • With the modern computing model (Machine Learning) this paradigm is changes.
      • We feed the algorithms (models) with data.
      • Based on that data, the algorithms (models) make decisions in the program.

    Machine Learning with Python – for Beginners

    Machine Learning with Python is a 10+ hours FREE course – a journey from zero to mastery.

    • The course consist of the following content.
      • 15 video lessons – which explain Machine Learning concepts, demonstrate models on real data, introduce projects and show a solution.
      • 30 JuPyter Notebooks – with the full code and explanation from the lectures and projects (GitHub).
      • 15 projects – with step guides to help you structure your solutions and solution explained in the end of video lessons.

    Step 2: How Machine Learning works

    Machine learning is divided into two phases.

    Phase 1: Learning

    • Get Data: Identify relevant data for the problem you want to solve. This data set should represent the type of data that the Machine Learn model will use to predict from in Phase 2 (predction).
    • Pre-processing: This step is about cleaning up data. While the Machine Learning is awesome, it cannot figure out what good data looks like. You need to do the cleaning as well as transforming data into a desired format.
    • Train model: This is where the magic happens, the learning step (Train model). There are three main paradigms in machine learning.
      • Supervised: where you tell the algorithm what categories each data item is in. Each data item from the training set is tagged with the right answer.
      • Unsupervised: is when the learning algorithm is not told what to do with it and it should make the structure itself.
      • Reinforcement: teaches the machine to think for itself based on past action rewards.
    • Test model: Finally, the testing is done to see if the model is good. The training data was divided into a test set and training set. The test set is used to see if the model can predict from it. If not, a new model might be necessary.

    Phase 2: Prediction

    Step 3: What is Supervised Learning

    Supervised Learning

    • Given a dataset of input-output pairs, learn a function to map inputs to outputs
    • There are different tasks – but we start to focus on Classification

    Classification

    • Supervised learning: the task of learning a function mapping an input point to a descrete category

    Step 4: Example with Iris Flower Dataset

    The Iris Flower dataset is one of the datasets everyone has to work with.

    • Kaggle Iris Flower Dataset
    • Consists of three classes: Iris-setosaIris-versicolor, and Iris-virginica
    • Given depedent features can we predict class
    import pandas as pd
    data = pd.read_csv('https://raw.githubusercontent.com/LearnPythonWithRune/DataScienceWithPython/main/files/iris.csv', index_col=0)
    print(data.head())
    

    Step 5: Create a Machine Learning Model

    • A Few Machine Learning Models

    The Machine Learning is divided into a few steps – including dividing it into train and test dataset. The train dataset is used to train the model, while the test dataset is used to check the accuracy of the model.

    • Steps
      • Step 1: Assign independent features (those predicting) to X
      • Step 2: Assign classes (labels/dependent features) to y
      • Step 3: Divide into training and test setsX_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
      • Step 4: Create the modelsvc = SVC()
      • Step 5: Fit the modelsvc.fit(X_train, y_train)
      • Step 6: Predict with the modely_pred = svc.predict(X_test)
      • Step 7: Test the accuracyaccuracy_score(y_test, y_pred)

    Code example here.

    from sklearn.model_selection import train_test_split
    from sklearn.svm import SVC
    from sklearn.metrics import accuracy_score
    X = data.drop('Species', axis=1)
    y = data['Species']
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
    svc = SVC()
    svc.fit(X_train, y_train)
    y_pred = svc.predict(X_test)
    accuracy_score(y_test, y_pred)
    

    This gives an accurate model.

    You can do the same with KNeighborsClassifier.

    from sklearn.neighbors import KNeighborsClassifier
    kn = KNeighborsClassifier()
    kn.fit(X_train, y_train)
    y_pred = kn.predict(X_test)
    accuracy_score(y_test, y_pred)
    

    Step 6: Find the most important features

    • permutation_importance Permutation importance for feature evaluation.
    • Use the permutation_importance to calculate it.perm_importance = permutation_importance(svc, X_test, y_test)
    • The results will be found in perm_importance.importances_mean
    from sklearn.inspection import permutation_importance
    perm_importance = permutation_importance(svc, X_test, y_test)
    perm_importance.importances_mean
    

    Visualize the features by importance

    • The most important features are given by perm_importance.importances_mean.argsort()
      • HINT: assign it to sorted_idx
    • To visualize it we can create a DataFramepd.DataFrame(perm_importance.importances_mean[sorted_idx], X_test.columns[sorted_idx], columns=['Value'])
    • Then make a barh plot (use figsize)
    sorted_idx = perm_importance.importances_mean.argsort()
    df = pd.DataFrame(perm_importance.importances_mean[sorted_idx], X_test.columns[sorted_idx], columns=['Value'])
    df.plot.barh()
    
    color_map = {'Iris-setosa': 'b', 'Iris-versicolor': 'r', 'Iris-virginica': 'y'}
    colors = data['Species'].apply(lambda x: color_map[x])
    data.plot.scatter(x='PetalLengthCm', y='PetalWidthCm', c=colors)
    

    Want to learn more?

    Want to learn more about Data Science to become a successful Data Scientist?

    In the next lesson you will learn How to make Feature Scaling with pandas DataFrames in this Data Science course.

    This is one lesson of a 15 part Expert Data Science Blueprint course with the following resources.How to make Feature Scaling with pandas DataFrames

    • 15 video lessons – covers the Data Science Workflow and concepts, demonstrates everything on real data, introduce projects and shows a solution (YouTube video).
    • 30 JuPyter Notebooks – with the full code and explanation from the lectures and projects (GitHub).
    • 15 projects – structured with the Data Science Workflow and a solution explained in the end of video lessons (GitHub).

    Python Circle

    Do you know what the 5 key success factors every programmer must have?

    How is it possible that some people become programmer so fast?

    While others struggle for years and still fail.

    Not only do they learn python 10 times faster they solve complex problems with ease.

    What separates them from the rest?

    I identified these 5 success factors that every programmer must have to succeed:

    1. Collaboration: sharing your work with others and receiving help with any questions or challenges you may have.
    2. Networking: the ability to connect with the right people and leverage their knowledge, experience, and resources.
    3. Support: receive feedback on your work and ask questions without feeling intimidated or judged.
    4. Accountability: stay motivated and accountable to your learning goals by surrounding yourself with others who are also committed to learning Python.
    5. Feedback from the instructor: receiving feedback and support from an instructor with years of experience in the field.

    I know how important these success factors are for growth and progress in mastering Python.

    That is why I want to make them available to anyone struggling to learn or who just wants to improve faster.

    With the Python Circle community, you can take advantage of 5 key success factors every programmer must have.

    Python Circle
    Python Circle

    Be part of something bigger and join the Python Circle community.

    Leave a Comment