Page 135 - Data Science Algorithms in a Week
P. 135

Clustering into K Clusters


            We can observe that clustering into the 2 clusters divides books into religious ones, the ones
            in the blue cluster and non-religious ones, the ones in the red cluster. Let us try to cluster
            the books into the 3 clusters to observe how the algorithm would divide the data.

            Output for 3 clusters:

                $ python k-means_clustering.py
                document_clustering/word_frequencies_money_god_scaled.csv 3 last
                The total number of steps: 3
                The history of the algorithm:
                Step number 0: point_groups = [((0.0, 0.0406976744), 0), ((0.0,
                0.0988372093), 0), ((0.125, 0.0581395349), 0), ((0.0, 0.1860465116), 0),
                ((0.0, 0.0348837209), 0), ((0.0, 0.1569767442), 0), ((0.0, 0.0348837209),
                0), ((0.25, 0.3430232558), 0), ((0.25, 0.261627907), 0), ((0.125,
                0.4011627907), 0), ((0.125, 1.0), 2), ((0.625, 0.0058139535), 1), ((1.0,
                0.0), 1), ((0.5, 0.0058139535), 1), ((0.375, 0.0174418605), 0), ((0.5,
                0.0174418605), 1), ((0.75, 0.0174418605), 1)]
                centroids = [(0.0, 0.0406976744), (1.0, 0.0), (0.125, 1.0)]
                Step number 1: point_groups = [((0.0, 0.0406976744), 0), ((0.0,
                0.0988372093), 0), ((0.125, 0.0581395349), 0), ((0.0, 0.1860465116), 0),
                ((0.0, 0.0348837209), 0), ((0.0, 0.1569767442), 0), ((0.0, 0.0348837209),
                0), ((0.25, 0.3430232558), 0), ((0.25, 0.261627907), 0), ((0.125,
                0.4011627907), 0), ((0.125, 1.0), 2), ((0.625, 0.0058139535), 1), ((1.0,
                0.0), 1), ((0.5, 0.0058139535), 1), ((0.375, 0.0174418605), 1), ((0.5,
                0.0174418605), 1), ((0.75, 0.0174418605), 1)]
                centroids = [(0.10227272727272728, 0.14852008456363636), (0.675,
                0.0093023256), (0.125, 1.0)]
                Step number 2: point_groups = [((0.0, 0.0406976744), 0), ((0.0,
                0.0988372093), 0), ((0.125, 0.0581395349), 0), ((0.0, 0.1860465116), 0),
                ((0.0, 0.0348837209), 0), ((0.0, 0.1569767442), 0), ((0.0, 0.0348837209),
                0), ((0.25, 0.3430232558), 0), ((0.25, 0.261627907), 0), ((0.125,
                0.4011627907), 0), ((0.125, 1.0), 2), ((0.625, 0.0058139535), 1), ((1.0,
                0.0), 1), ((0.5, 0.0058139535), 1), ((0.375, 0.0174418605), 1), ((0.5,
                0.0174418605), 1), ((0.75, 0.0174418605), 1)]
                centroids = [(0.075, 0.16162790697), (0.625, 0.01065891475), (0.125, 1.0)]



















                                                    [ 123 ]
   130   131   132   133   134   135   136   137   138   139   140