Learn how you can become a Python programmer in just 12 weeks.

    We respect your privacy. Unsubscribe at anytime.

    Naive Bayes’ Rule for Sentiment Classification with Full Explanation

    Why it’s great to master Text Categorization?

    Mastering Text Categorization offers several advantages in the field of natural language processing and text analysis:

    1. Efficient classification: Text Categorization techniques allow for the automatic categorization and organization of large volumes of text data, making it easier to manage and retrieve information based on predefined categories.
    2. Information extraction: By accurately categorizing text documents, Text Categorization enables the extraction of valuable insights and knowledge from unstructured data, facilitating decision-making processes and data-driven insights.
    3. Personalized recommendations: Text Categorization can be used to develop recommendation systems that provide personalized content or suggestions based on the categorization of user preferences and interests.
    4. Streamlined information retrieval: Effective categorization helps in building efficient search systems, enabling users to quickly find relevant documents or information based on predefined categories.

    What will be covered in this tutorial?

    In this tutorial on Text Categorization, we will cover the following topics:

    • Understanding Text Categorization: Exploring the concept and significance of Text Categorization in organizing and classifying text data based on predefined categories.
    • The Bag-of-Words Model: Learning about the Bag-of-Words representation, a commonly used model in Text Categorization that treats each document as a collection of words without considering word order.
    • Naive Bayes’ Rule: Understanding the principles of Naive Bayes’ Rule, a probabilistic classifier used in Text Categorization to assign documents to specific categories based on the conditional probability of words appearing in each category.
    • Using Naive Bayes’ Rule for sentiment classification: Applying Naive Bayes’ Rule specifically for sentiment classification, which involves categorizing text based on positive, negative, or neutral sentiments expressed.
    • Problem smoothing: Exploring the concept of problem smoothing in Text Categorization, which helps address issues such as zero probabilities and improves the accuracy of classification models.

    By mastering these concepts and techniques, you will gain valuable skills in efficiently categorizing and organizing text data, enabling better information retrieval, knowledge extraction, and personalized recommendations based on predefined categories.

    Watch tutorial

    Step 1: What is Text Categorization?

    Text categorization (a.k.a. text classification) is the task of assigning predefined categories to free-text documents. It can provide conceptual views of document collections and has important applications in the real world.


    Exampels of Text Categorization includes.

    • Inbox vs Spam
    • Product review: Positive vs Negtive review

    Step 2: What is the Bag-of-Words model?

    We have already learned from Context-Free Grammars, that understanding the full structure of language is not efficient or even possible for Natural Language Processing. One approach was to look at trigrams (3 consecutive words), which can be used to learn about the language and even generate sentences.

    Another approach is the Bag-of-Words model.

    The Bag-of-Words model is a simplifying representation used in natural language processing and information retrieval (IR). In this model, a text (such as a sentence or a document) is represented as the bag (multiset) of its words, disregarding grammar and even word order but keeping multiplicity.


    What does that all mean?

    • The structure is not important
    • Works well to classify
    • Example could be
      • love this product.
      • This product feels cheap.
      • This is the best product ever.

    Step 3: What is Naive Bayes’ Classifier?

    Naive Bayes classifiers are a family of simple “probabilistic classifiers” based on applying Bayes’ theorem with strong (naΓ―ve) independence assumptions between the features (wiki).

    Bayes’ Rule Theorem

    Describes the probability of an event, based on prior knowledge of conditions that might be related to the event (wiki).

    𝑃(𝑏|π‘Ž) = 𝑃(π‘Ž|𝑏)𝑃(𝑏) / 𝑃(π‘Ž)

    Explained with Example

    What is the probability that the sentiment is positive giving the sentence “I love this product”. This can be expressed as follows.

    𝑃(positive|”I love this product”)=𝑃(positive|”I”, “love”, “this”, “product”)

    Bayes’s Rule implies it is equal to

    𝑃(“I”, “love”, “this”, “product”|positive)𝑃(positive) / 𝑃(“I”, “love”, “this”, “product”)

    Or proportional to

    𝑃(“I”, “love”, “this”, “product”|positive)𝑃(positive)

    The ‘Naive‘ part we use this to simplify


    Ant then we have that.

    𝑃(positive) = number of positive samples number of samples.

    𝑃(“love”|positive) = number of positive samples with “love”number of positive samples.

    Let’s try a more concrete example.

    𝑃(positive)𝑃(“I”|positive)𝑃(“love”|positive)𝑃(“this”|positive)𝑃(“product”|positive) = 0.47βˆ—0.30βˆ—0.40βˆ—0.28βˆ—0.25=0.003948

    𝑃(negative)𝑃(“I”|negative)𝑃(“love”|negative)𝑃(“this”|negative)𝑃(“product”|negative)=0.53βˆ—0.20βˆ—0.05βˆ—0.42βˆ—0.28 = 0.00062328

    Calculate the likelyhood

    “I love this product” is positive: 0.00394 / (0.00394 + 0.00062328) = 86.3%

    “I love this product” is negative: 0.00062328 / (0.00394 + 0.00062328) = 13.7%

    Step 4: The Problem with Naive Bayes’ Classifier?


    If a word never showed up in a sentence, then this will result in a probability of zero. Say, in the above example that the word “product” was not represented in a positive sentence. This would imply that the probability P(“product” | positive) = 0, which would imply that the calculations for “I love this product” is positive would be 0.

    There are different approaches to deal with this problem.

    Additive Smoothing

    Adding a value to each value in the distribution to smooth the data. This is straight forward, this ensures that even if the word “product” never showed up, then it will not create a 0 value.

    Laplace smoothing

    Adding 1 to each value in the distribution. This is just a concrete example of adding 1 to it.

    Step 5: Use NLTK to classify sentiment

    We already introduced the NLTK, which we will use here.

    import nltk
    import pandas as pd
    data = pd.read_csv('https://raw.githubusercontent.com/LearnPythonWithRune/MachineLearningWithPython/main/files/sentiment.csv')
    def extract_words(document):
        return set(
            word.lower() for word in nltk.word_tokenize(document)
            if any(c.isalpha() for c in word)
    words = set()
    for line in data['Text'].to_list():
    features = []
    for _, row in data.iterrows():
        features.append(({word: (word in row['Text']) for word in words}, row['Label']))
    classifier = nltk.NaiveBayesClassifier.train(features)

    This creates a classifier (based on a small dataset, don’t expect magic).

    To use it, try the following code.

    s = input()
    feature = {word: (word in extract_words(s)) for word in words}
    result = classifier.prob_classify(feature)
    for key in result.samples():
        print(key, result.prob(key))

    Example could be if you input “this was great”.

    this was great
     Negative 0.10747100603951745
     Positive 0.8925289939604821

    Want to learn more?

    If you followed the video you would also be introduced to a project where we create a sentiment classifier on a big twitter corpus.

    In the next lesson you will learn how to Implement a Term Frequency by Inverse Document Frequency (TF-IDF) with NLTK.

    This is part of a FREE 10h Machine Learning course with Python.

    • 15 video lessons β€“ which explain Machine Learning concepts, demonstrate models on real data, introduce projects and show a solution (YouTube playlist).
    • 30 JuPyter Notebooks β€“ with the full code and explanation from the lectures and projects (GitHub).
    • 15 projects β€“ with step guides to help you structure your solutions and solution explained in the end of video lessons (GitHub).

    Python for Finance: Unlock Financial Freedom and Build Your Dream Life

    Discover the key to financial freedom and secure your dream life with Python for Finance!

    Say goodbye to financial anxiety and embrace a future filled with confidence and success. If you’re tired of struggling to pay bills and longing for a life of leisure, it’s time to take action.

    Imagine breaking free from that dead-end job and opening doors to endless opportunities. With Python for Finance, you can acquire the invaluable skill of financial analysis that will revolutionize your life.

    Make informed investment decisions, unlock the secrets of business financial performance, and maximize your money like never before. Gain the knowledge sought after by companies worldwide and become an indispensable asset in today’s competitive market.

    Don’t let your dreams slip away. Master Python for Finance and pave your way to a profitable and fulfilling career. Start building the future you deserve today!

    Python for Finance a 21 hours course that teaches investing with Python.

    Learn pandas, NumPy, Matplotlib for Financial Analysis & learn how to Automate Value Investing.

    “Excellent course for anyone trying to learn coding and investing.” – Lorenzo B.

    Leave a Comment