Page 140 - Data Science Algorithms in a Week
P. 140

Clustering into K Clusters


            Analysis:

                   1.      a) (1/3)*(2+3+4)=3
                        b) (1/3)*(100$+400$+1000$)=500$
                        c) ((10+40+0)/3,(20+60+40)/3)=(50/3, 120/3)=(50/3, 40)
                        d) ((200$+300$+500$+250$)/4,(40km+60km+100km+200km)/4)
                          =(1250$/4,400km/4)=(312.5$,100km)
                        e)((1+0+10+4+5)/5,(2+0+20+8+0)/5,(4+3+5+2+1)/5)=(4,6,3)


                   2. a) We add a second coordinate and set it to 0 for all the features. This way
                      the distance between the features does not change and we can use the
                      clustering algorithm we implemented earlier in this chapter.

            Input:

                             # source_code/5/problem5_2.csv
                             0,0
                             2,0
                             5,0
                             4,0
                             8,0
                             10,0
                             12,0
                             11,0

                     For 2 clusters:
                             $ python k-means_clustering.py problem5_2.csv 2 last
                             The total number of steps: 2
                             The history of the algorithm:
                             Step number 0: point_groups = [((0.0, 0.0), 0), ((2.0, 0.0),
                             0), ((5.0, 0.0), 0), ((4.0, 0.0), 0), ((8.0, 0.0), 1), ((10.0,
                             0.0), 1), ((12.0, 0.0), 1), ((11.0, 0.0), 1)]
                             centroids = [(0.0, 0.0), (12.0, 0.0)]
                             Step number 1: point_groups = [((0.0, 0.0), 0), ((2.0, 0.0),
                             0), ((5.0, 0.0), 0), ((4.0, 0.0), 0), ((8.0, 0.0), 1), ((10.0,
                             0.0), 1), ((12.0, 0.0), 1), ((11.0, 0.0), 1)]
                             centroids = [(2.75, 0.0), (10.25, 0.0)]
                     For 3 clusters:

                             $ python k-means_clustering.py problem5_2.csv 3 last
                             The total number of steps: 2
                             The history of the algorithm:
                             Step number 0: point_groups = [((0.0, 0.0), 0), ((2.0, 0.0),
                             0), ((5.0, 0.0), 2), ((4.0, 0.0), 2), ((8.0, 0.0), 2), ((10.0,

                                                    [ 128 ]
   135   136   137   138   139   140   141   142   143   144   145