AI

Multinomial Naive Bayes

1. Predict Chinese or Japan

Set Doc Words Class
Training1Chinese Beijing Chinesec
2Chinese Chinese Shanghaic
3Chinese Macaoc
4Tokyo Japan Chinesej
Test5Chinese Chinese Chinese Tokyo Japan?
Vocabulary
beijingchinesejapanmacaoshanghaitokyo

2. Predict SPAM or HAM

Category Document
Spam send us your password
Spam review us
Spam send us your account
Spam send your password
Non-spam password review
Non-spam send us your review
? review us now
? review account

3. Predict sentiment

Set Document ID Keywords in the document Class h
Training Set 1 Love Happy Joy Joy Happy Yes
2 Happy Love Kick Joy Happy Yes
3 Love Move Joy Good Yes
4 Love Happy Joy Love Pain Yes
5 Joy Love Pain Kick Pain No
6 Pain Pain Love kick No
Testing Set 7 Love Pain Joy Love Kick ?

4. Predict Tech or Non Tech

STT Sentence Category
1 AI is transforming industries. Tech
2 Quantum computing is the future. Tech
3 New smartphone released today. Tech
4 Football match was exciting. Non Tech
5 New movie breaking records. Non Tech
6 Cooking shows are popular. Non Tech

Implement with python Example 1

import pandas as pd
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.naive_bayes import MultinomialNB
df = pd.DataFrame([
    [1, 'Chinese Beijing Chinese', 'c'],
    [2, 'Chinese Chinese Shanghai', 'c'],
    [3, 'Chinese Macao', 'c'],
    [4, 'Tokyo Japan Chinese', 'j'],
], columns=['Doc', 'Text', 'Class'])
vectorizer = CountVectorizer()
X = vectorizer.fit_transform(df['Text'])
y = df['Class']
print('Vocabulary', vectorizer.vocabulary_)
nb = MultinomialNB()
nb.fit(X, y)
text = ['Chinese Chinese Chinese Tokyo Japan']
X_test = vectorizer.transform(text).toarray()
print('Predict', nb.predict(X_test), 'Probality', nb.predict_proba(X_test))