Page 126 - Data Science Algorithms in a Week
P. 126

Clustering into K Clusters


             52   1          30000                  0                    non-owner

             28   0.25       95000                  0.65                 owner
             25   0.15625    78000                  0.48                 non-owner
             35   0.46875    130000                 1                    owner
             32   0.375      105000                 0.75                 owner

             20   0          100000                 0.7                  non-owner
             40   0.625      60000                  0.3                  owner
             50   0.9375     80000                  0.5                  ?

            Given the table, we produce the input file for the algorithm and execute it, clustering the
            features into the two clusters.
            Input:

                # source_code/5/house_ownership2.csv
                0.09375,0.2
                0.53125,0.04
                0.875,0.1
                1,0
                0.25,0.65
                0.15625,0.48
                0.46875,1
                0.375,0.75
                0,0.7
                0.625,0.3
                0.9375,0.5
            Output for two clusters:

                $ python k-means_clustering.py house_ownership2.csv 2 last
                The total number of steps: 3
                The history of the algorithm:
                Step number 0: point_groups = [((0.09375, 0.2), 0), ((0.53125, 0.04), 0),
                ((0.875, 0.1), 1), ((1.0, 0.0), 1), ((0.25, 0.65), 0), ((0.15625, 0.48),
                0), ((0.46875, 1.0), 0), ((0.375, 0.75), 0), ((0.0, 0.7), 0), ((0.625,
                0.3), 1), ((0.9375, 0.5), 1)]
                centroids = [(0.09375, 0.2), (1.0, 0.0)]
                Step number 1: point_groups = [((0.09375, 0.2), 0), ((0.53125, 0.04), 1),
                ((0.875, 0.1), 1), ((1.0, 0.0), 1), ((0.25, 0.65), 0), ((0.15625, 0.48),
                0), ((0.46875, 1.0), 0), ((0.375, 0.75), 0), ((0.0, 0.7), 0), ((0.625,
                0.3), 1), ((0.9375, 0.5), 1)]
                centroids = [(0.26785714285714285, 0.5457142857142857), (0.859375, 0.225)]
                Step number 2: point_groups = [((0.09375, 0.2), 0), ((0.53125, 0.04), 1),
                ((0.875, 0.1), 1), ((1.0, 0.0), 1), ((0.25, 0.65), 0), ((0.15625, 0.48),

                                                    [ 114 ]
   121   122   123   124   125   126   127   128   129   130   131