Page 133 - Data Science Algorithms in a Week
P. 133

Clustering into K Clusters


             10            0.125         0.4011627907

             11            0.125         1
             12            0.625         0.0058139535
             13            1             0
             14            0.5           0.0058139535

             15            0.375         0.0174418605
             16            0.5           0.0174418605
             17            0.75          0.0174418605

            Now that we have rescaled data, let us apply k-means clustering algorithm trying dividing
            the data into a different number of the clusters.

            Input:

                source_code/5/document_clustering/word_frequencies_money_god_scaled.csv
                0,0.0406976744
                0,0.0988372093
                0.125,0.0581395349
                0,0.1860465116
                0,0.0348837209
                0,0.1569767442
                0,0.0348837209
                0.25,0.3430232558
                0.25,0.261627907
                0.125,0.4011627907
                0.125,1
                0.625,0.0058139535
                1,0
                0.5,0.0058139535
                0.375,0.0174418605
                0.5,0.0174418605
                0.75,0.0174418605
            Output for 2 clusters:

                $ python k-means_clustering.py
                document_clustering/word_frequencies_money_god_scaled.csv 2 last
                The total number of steps: 3
                The history of the algorithm:
                Step number 0: point_groups = [((0.0, 0.0406976744), 0), ((0.0,
                0.0988372093), 0), ((0.125, 0.0581395349), 0), ((0.0, 0.1860465116), 0),
                ((0.0, 0.0348837209), 0), ((0.0, 0.1569767442), 0), ((0.0, 0.0348837209),
                0), ((0.25, 0.3430232558), 0), ((0.25, 0.261627907), 0), ((0.125,


                                                    [ 121 ]
   128   129   130   131   132   133   134   135   136   137   138