Page 46 - Data Science Algorithms in a Week
P. 46
Naive Bayes
Let us count the number of columns in the table with all known values to determine the
individual probabilities.
P(Play=Yes)=6/10=3/5 since there are 10 columns with complete data and 6 of them have the
value Yes for the attribute Play.
P(Temperature=Warm|Play=Yes)=3/6=1/2 since there are 6 columns with the value Yes for the
attribute Play and, out of them, 3 have the value Warm for the attribute Temperature.
Similarly, we have the following:
P(Wind=Strong|Play=Yes)=1/6
P(Sunshine=Sunny|Play=Yes)=3/6=1/2
P(Play=No)=4/10=2/5
P(Temperature=Warm|Play=No)=1/4
P(Wind=Strong|Play=No)=2/4=1/2
P(Sunshine=Sunny|Play=No)=1/4
Thus R=(1/2)*(1/6)*(1/2)*(3/5)=1/40 and ~R=(1/4)*(1/2)*(1/4)*(2/5)=1/80. Therefore, we have the
following:
P(Play=Yes|Temperature=Warm,Wind=Strong,Sunshine=Sunny)= R/(R+~R)=2/3~67%
Therefore, our friend is likely to be happy to play chess with us in the park in the stated
weather conditions with a probability of about 67%. Since this is a majority, we could
classify the data vector (Temperature=Warm,Wind=Strong, Sunshine=Sunny) to be in the class
Play=Yes.
Implementation of naive Bayes classifier
We implement a program calculating the probability of a data item belonging to a certain
class using Bayes' theorem:
# source_code/2/naive_bayes.py
# A program that reads the CSV file with the data and returns
# the Bayesian probability for the unknown value denoted by ? to
# belong to a certain class.
# An input CSV file should be of the following format:
[ 34 ]