Page 46 - Data Science Algorithms in a Week
P. 46

Naive Bayes


            Let us count the number of columns in the table with all known values to determine the
            individual probabilities.

            P(Play=Yes)=6/10=3/5 since there are 10 columns with complete data and 6 of them have the
            value Yes for the attribute Play.
            P(Temperature=Warm|Play=Yes)=3/6=1/2 since there are 6 columns with the value Yes for the
            attribute Play and, out of them, 3 have the value Warm for the attribute Temperature.
            Similarly, we have the following:


            P(Wind=Strong|Play=Yes)=1/6

            P(Sunshine=Sunny|Play=Yes)=3/6=1/2

            P(Play=No)=4/10=2/5

            P(Temperature=Warm|Play=No)=1/4

            P(Wind=Strong|Play=No)=2/4=1/2


            P(Sunshine=Sunny|Play=No)=1/4

            Thus R=(1/2)*(1/6)*(1/2)*(3/5)=1/40 and ~R=(1/4)*(1/2)*(1/4)*(2/5)=1/80. Therefore, we have the
            following:

            P(Play=Yes|Temperature=Warm,Wind=Strong,Sunshine=Sunny)= R/(R+~R)=2/3~67%

            Therefore, our friend is likely to be happy to play chess with us in the park in the stated
            weather conditions with a probability of about 67%. Since this is a majority, we could
            classify the data vector (Temperature=Warm,Wind=Strong, Sunshine=Sunny) to be in the class
            Play=Yes.




            Implementation of naive Bayes classifier

            We implement a program calculating the probability of a data item belonging to a certain
            class using Bayes' theorem:

                # source_code/2/naive_bayes.py
                # A program that reads the CSV file with the data and returns
                # the Bayesian probability for the unknown value denoted by ? to
                # belong to a certain class.
                # An input CSV file should be of the following format:


                                                     [ 34 ]
   41   42   43   44   45   46   47   48   49   50   51