Page 141 - Data Science Algorithms in a Week
P. 141

Clustering into K Clusters


                             0.0), 1), ((12.0, 0.0), 1), ((11.0, 0.0), 1)]
                             centroids = [(0.0, 0.0), (12.0, 0.0), (5.0, 0.0)]
                             Step number 1: point_groups = [((0.0, 0.0), 0), ((2.0, 0.0),
                             0), ((5.0, 0.0), 2), ((4.0, 0.0), 2), ((8.0, 0.0), 2), ((10.0,
                             0.0), 1), ((12.0, 0.0), 1), ((11.0, 0.0), 1)]
                             centroids = [(1.0, 0.0), (11.0, 0.0), (5.666666666666667, 0.0)]
                     For 4 clusters:

                             $ python k-means_clustering.py problem5_2.csv 4 last
                             The total number of steps: 2
                             The history of the algorithm:
                             Step number 0: point_groups = [((0.0, 0.0), 0), ((2.0, 0.0),
                             0), ((5.0, 0.0), 2), ((4.0, 0.0), 2), ((8.0, 0.0), 3), ((10.0,
                             0.0), 1), ((12.0, 0.0), 1), ((11.0, 0.0), 1)]
                             centroids = [(0.0, 0.0), (12.0, 0.0), (5.0, 0.0), (8.0, 0.0)]
                             Step number 1: point_groups = [((0.0, 0.0), 0), ((2.0, 0.0),
                             0), ((5.0, 0.0), 2), ((4.0, 0.0), 2), ((8.0, 0.0), 3), ((10.0,
                             0.0), 1), ((12.0, 0.0), 1), ((11.0, 0.0), 1)]
                             centroids = [(1.0, 0.0), (11.0, 0.0), (4.5, 0.0), (8.0, 0.0)]

                     b) We use the implemented algorithm again.
                          Input:

                             # source_code/5/problem5_2b.csv
                             2,2
                             2,5
                             10,4
                             3,5
                             7,3
                             5,9
                             2,8
                             4,10
                             7,4
                             4,4
                             5,8
                             9,3

                          Output for 2 clusters:

                             $ python k-means_clustering.py problem5_2b.csv 2 last
                             The total number of steps: 3
                             The history of the algorithm:
                             Step number 0: point_groups = [((2.0, 2.0), 0), ((2.0, 5.0),
                             0), ((10.0, 4.0), 1), ((3.0, 5.0), 0), ((7.0, 3.0), 1), ((5.0,
                             9.0), 1), ((2.0, 8.0), 0), ((4.0, 10.0), 0), ((7.0, 4.0), 1),
                             ((4.0, 4.0), 0), ((5.0, 8.0), 1), ((9.0, 3.0), 1)]
                             centroids = [(2.0, 2.0), (10.0, 4.0)]

                                                    [ 129 ]
   136   137   138   139   140   141   142   143   144   145   146