Understanding Naive-Bayes Classifier

4 min readApr 15, 2021

Statistical Basis

The Naïve Bayes Classifier uses the probability method as the statistical basis it uses.

For example, we have frame data regarding a text whether it is talking about sports or not.

Then there is one test of the following data: “A very close game” and the question is where the sentence falls into the classification.

So, mathematically we want P (sports | a very close game) which corresponds to the written word as the probability of the sports tag if given the sentence “a very close game”

Naïve Bayes is based on the Bayes Theorem. In general, this theorem is written as

So if we apply it to the problem, the probability becomes

Then, to get a bigger probability, we can substitute

to become

Actually, there is a problem regarding the divider in the Bayes theorem, namely P(a very close game) has a probability of 0. So, if it is forced, it will produce an infinite value because every non-zero number divided by zero will produce an infinite value.

The “Naïve” part appears when we assume each word in a very close game sentence is independent of one another. So that we can write:

By applying the previous statement, then

If we take an example: P (a | sports), by counting how many words “game” have the tag “sports”

The value 0 appears because there is no word “close” which has the tag “sports”. The solution is to use laplace smoothing which can make the probability value not zero. In addition, the denominator is added to the total number of words in the dataset (as many as 14 words to reduce the significance of the numerator), so

So that the full result is as follows:

So by multiplying all the probabilities:

So it can be classified as “a very close game” has a sports tag.

Deficiency:

1. If the assumptions are not enforced, performance will be lower
2. If there is no zero chance, then smoothing should not be done because it will affect the accuracy of the data.
3. Prone to machine rounding so that it is not suitable for values close to zero

Advantages:

1. When independent assumptions are treated, this type of classification yields good accuracy results
2. Easy to implement
3. Can work well for problem text classification.

Required data conditions:

• The presence of a feature in a class must not be related to the appearance of other data features.

Impact on business:

A concrete example in a business sense is to classify hotel reviews with high metric precision and recall levels with low computational load. The results of this model are reliable based only on a limited sample of data. The results of this model can also provide insights on how a hotel or tourist spot can develop based on professionalism, experiences in interaction with visitors, and so on.

Potential for misuse:

This model relies heavily on the previous target class in making predictions. An error in determining the target value will result in an erroneous probability.