Page 131 - Data Science Algorithms in a Week
P. 131

Clustering into K Clusters


            Now the red cluster contains only Peter and a non-owner. This clustering suggests that
            Peter is more likely a non-owner as well. However, according to the previous cluster Peter
            would be more likely an owner of a house. Therefore it may not be so clear whether Peter
            owns a house or not. Collecting more data would improve our analysis and should be
            carried out before making a definite classification in this problem.

            From our analysis we noticed that a different number of clusters can result in a different
            result for a classification as the nature of members in an individual cluster can change. After
            collecting more data we should perform a cross-validation to determine the number of the
            clusters that classifies the data with the highest accuracy.




            Document clustering – understanding the

            number of clusters k in a semantic context

            We are given the following information about the frequency counts for the words money
            and god(s) in the following 17 books from the Project Gutenberg:

             Book        Book name                                         Money in    God(s) in
             number                                                        %           %

             1           The Vedanta-Sutras with the Commentary by         0           0.07
                         Ramanuja, by Trans. George Thibaut

             2           The Mahabharata of Krishna-Dwaipayana Vyasa       0           0.17
                         - Adi Parva, by Kisari Mohan Ganguli
             3           The Mahabharata of Krishna-Dwaipayana             0.01        0.10
                         Vyasa, Part 2, by Krishna-Dwaipayana Vyasa
             4           Mahabharata of Krishna-Dwaipayana Vyasa Bk.       0           0.32
                         3 Pt. 1, by Krishna-Dwaipayana Vyasa
             5           The Mahabharata of Krishna-Dwaipayana Vyasa       0           0.06
                         Bk. 4, by Kisari Mohan Ganguli
             6           The Mahabharata of Krishna-Dwaipayana Vyasa       0           0.27
                         Bk. 3 Pt. 2, by Translated by Kisari Mohan Ganguli
             7           The Vedanta-Sutras with the Commentary by         0           0.06
                         Sankaracarya
             8           The King James Bible                              0.02        0.59




                                                    [ 119 ]
   126   127   128   129   130   131   132   133   134   135   136