Page 136 - Data Science Algorithms in a Week
P. 136

Clustering into K Clusters












































            This time the algorithm separated from the religious books book The Koran into a green
            cluster. This is because in fact the word god is the 5th most frequent word in The Koran.
            The clustering here happens to divide the books according to the writing style they were
            written with. Clustering into 4 clusters separates one book that has a relatively high
            frequency of the word money from the red cluster of non-religious books into a separate
            cluster. Let us look at the clustering into the 5 clusters.

            Output for 5 clusters:

                $ python k-means_clustering.py word_frequencies_money_god_scaled.csv 5 last
                The total number of steps: 2
                The history of the algorithm:
                Step number 0: point_groups = [((0.0, 0.0406976744), 0), ((0.0,
                0.0988372093), 0), ((0.125, 0.0581395349), 0), ((0.0, 0.1860465116), 0),
                ((0.0, 0.0348837209), 0), ((0.0, 0.1569767442), 0), ((0.0, 0.0348837209),
                0), ((0.25, 0.3430232558), 4), ((0.25, 0.261627907), 4), ((0.125,


                                                    [ 124 ]
   131   132   133   134   135   136   137   138   139   140   141