# Naive Bayes’ Rule for Sentiment Classification with Full Explanation

## What will we cover?

• What is Text Categorization
• Learn about the Bag-of-Words Model
• Understand Naive Bayes’ Rule
• How to use Naive Bayes’ Rule for sentiment classification (text categorization)
• What problem smoothing solves

## Step 1: What is Text Categorization?

Text categorization (a.k.a. text classification) is the task of assigning predefined categories to free-text documents. It can provide conceptual views of document collections and has important applications in the real world.

http://www.scholarpedia.org/article/Text_categorization

Exampels of Text Categorization includes.

• Inbox vs Spam
• Product review: Positive vs Negtive review

## Step 2: What is the Bag-of-Words model?

We have already learned from Context-Free Grammars, that understanding the full structure of language is not efficient or even possible for Natural Language Processing. One approach was to look at trigrams (3 consecutive words), which can be used to learn about the language and even generate sentences.

Another approach is the Bag-of-Words model.

The Bag-of-Words model is a simplifying representation used in natural language processing and information retrieval (IR). In this model, a text (such as a sentence or a document) is represented as the bag (multiset) of its words, disregarding grammar and even word order but keeping multiplicity.

https://en.wikipedia.org/wiki/Bag-of-words_model

What does that all mean?

• The structure is not important
• Works well to classify
• Example could be
• love this product.
• This product feels cheap.
• This is the best product ever.

## Step 3: What is Naive Bayes’ Classifier?

Naive Bayes classifiers are a family of simple “probabilistic classifiers” based on applying Bayes’ theorem with strong (naïve) independence assumptions between the features (wiki).

### Bayes’ Rule Theorem

Describes the probability of an event, based on prior knowledge of conditions that might be related to the event (wiki).

𝑃(𝑏|𝑎) = 𝑃(𝑎|𝑏)𝑃(𝑏) / 𝑃(𝑎)

### Explained with Example

What is the probability that the sentiment is positive giving the sentence “I love this product”. This can be expressed as follows.

𝑃(positive|”I love this product”)=𝑃(positive|”I”, “love”, “this”, “product”)

Bayes’s Rule implies it is equal to

𝑃(“I”, “love”, “this”, “product”|positive)𝑃(positive) / 𝑃(“I”, “love”, “this”, “product”)

Or proportional to

𝑃(“I”, “love”, “this”, “product”|positive)𝑃(positive)

The ‘Naive‘ part we use this to simplify

𝑃(positive)𝑃(“I”|positive)𝑃(“love”|positive)𝑃(“this”|positive)𝑃(“product”|positive)

Ant then we have that.

𝑃(positive) = number of positive samples number of samples.

𝑃(“love”|positive) = number of positive samples with “love”number of positive samples.

Let’s try a more concrete example.

𝑃(positive)𝑃(“I”|positive)𝑃(“love”|positive)𝑃(“this”|positive)𝑃(“product”|positive) = 0.47∗0.30∗0.40∗0.28∗0.25=0.003948

𝑃(negative)𝑃(“I”|negative)𝑃(“love”|negative)𝑃(“this”|negative)𝑃(“product”|negative)=0.53∗0.20∗0.05∗0.42∗0.28 = 0.00062328

Calculate the likelyhood

“I love this product” is positive: 0.00394 / (0.00394 + 0.00062328) = 86.3%

“I love this product” is negative: 0.00062328 / (0.00394 + 0.00062328) = 13.7%

## Step 4: The Problem with Naive Bayes’ Classifier?

### Problem

If a word never showed up in a sentence, then this will result in a probability of zero. Say, in the above example that the word “product” was not represented in a positive sentence. This would imply that the probability P(“product” | positive) = 0, which would imply that the calculations for “I love this product” is positive would be 0.

There are different approaches to deal with this problem.

Adding a value to each value in the distribution to smooth the data. This is straight forward, this ensures that even if the word “product” never showed up, then it will not create a 0 value.

### Laplace smoothing

Adding 1 to each value in the distribution. This is just a concrete example of adding 1 to it.

## Step 5: Use NLTK to classify sentiment

We already introduced the NLTK, which we will use here.

```import nltk
import pandas as pd
def extract_words(document):
return set(
word.lower() for word in nltk.word_tokenize(document)
if any(c.isalpha() for c in word)
)
words = set()
for line in data['Text'].to_list():
words.update(extract_words(line))
features = []
for _, row in data.iterrows():
features.append(({word: (word in row['Text']) for word in words}, row['Label']))
classifier = nltk.NaiveBayesClassifier.train(features)
```

This creates a classifier (based on a small dataset, don’t expect magic).

To use it, try the following code.

```s = input()
feature = {word: (word in extract_words(s)) for word in words}
result = classifier.prob_classify(feature)
for key in result.samples():
print(key, result.prob(key))
```

Example could be if you input “this was great”.

```this was great
Negative 0.10747100603951745
Positive 0.8925289939604821
```