Multinomial Naive Bayes

1. Predict Chinese or Japan

Set	Doc	Words	Class
Training	1	Chinese Beijing Chinese	c
	2	Chinese Chinese Shanghai	c
	3	Chinese Macao	c
	4	Tokyo Japan Chinese	j
Test	5	Chinese Chinese Chinese Tokyo Japan	?

Vocabulary

beijing

chinese

japan

macao

shanghai

tokyo

2. Predict SPAM or HAM

Category	Document
Spam	send us your password
Spam	review us
Spam	send us your account
Spam	send your password
Non-spam	password review
Non-spam	send us your review
?	review us now
?	review account

3. Predict sentiment

Set	Document ID	Keywords in the document	Class h
Training Set	1	Love Happy Joy Joy Happy	Yes
	2	Happy Love Kick Joy Happy	Yes
	3	Love Move Joy Good	Yes
	4	Love Happy Joy Love Pain	Yes
	5	Joy Love Pain Kick Pain	No
	6	Pain Pain Love kick	No
Testing Set	7	Love Pain Joy Love Kick	?

4. Predict Tech or Non Tech

STT	Sentence	Category
1	AI is transforming industries.	Tech
2	Quantum computing is the future.	Tech
3	New smartphone released today.	Tech
4	Football match was exciting.	Non Tech
5	New movie breaking records.	Non Tech
6	Cooking shows are popular.	Non Tech

Implement with python Example 1

import pandas as pd
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.naive_bayes import MultinomialNB
df = pd.DataFrame([
    [1, 'Chinese Beijing Chinese', 'c'],
    [2, 'Chinese Chinese Shanghai', 'c'],
    [3, 'Chinese Macao', 'c'],
    [4, 'Tokyo Japan Chinese', 'j'],
], columns=['Doc', 'Text', 'Class'])
vectorizer = CountVectorizer()
X = vectorizer.fit_transform(df['Text'])
y = df['Class']
print('Vocabulary', vectorizer.vocabulary_)
nb = MultinomialNB()
nb.fit(X, y)
text = ['Chinese Chinese Chinese Tokyo Japan']
X_test = vectorizer.transform(text).toarray()
print('Predict', nb.predict(X_test), 'Probality', nb.predict_proba(X_test))