Page 124 - Data Science Algorithms in a Week
P. 124

Clustering into K Clusters


            Input data from gender classification

            We save data from the gender classification example into the CSV file:

                # source_code/5/persons_by_height_and_weight.csv
                180,75
                174,71
                184,83
                168,63
                178,70
                170,59
                164,53
                155,46
                162,52
                166,55
                172,60


            Program output for gender classification data

            We run the program implementing k-means clustering algorithm on the data from the
            gender classification example. The numerical argument 2 means that we would like to
            cluster the data into 2 clusters:

                $ python k-means_clustering.py persons_by_height_weight.csv 2 last
                The total number of steps: 2
                The history of the algorithm:
                Step number 0: point_groups = [((180.0, 75.0), 0), ((174.0, 71.0), 0),
                ((184.0, 83.0), 0), ((168.0, 63.0), 0), ((178.0, 70.0), 0), ((170.0, 59.0),
                0), ((164.0, 53.0), 1), ((155.0, 46.0), 1), ((162.0, 52.0), 1), ((166.0,
                55.0), 1), ((172.0, 60.0), 0)]
                centroids = [(180.0, 75.0), (155.0, 46.0)]
                Step number 1: point_groups = [((180.0, 75.0), 0), ((174.0, 71.0), 0),
                ((184.0, 83.0), 0), ((168.0, 63.0), 0), ((178.0, 70.0), 0), ((170.0, 59.0),
                0), ((164.0, 53.0), 1), ((155.0, 46.0), 1), ((162.0, 52.0), 1), ((166.0,
                55.0), 1), ((172.0, 60.0), 0)]
                centroids = [(175.14285714285714, 68.71428571428571), (161.75, 51.5)]
            The program also outputs a graph visible in Image 5.2. The parameter last means that we
            would like the program to do the clustering until the last step. If we would like to display
            only the first step (step 0), we could change last to 0 to run:

                $ python k-means_clustering.py persons_by_height_weight.csv 2 0
            Upon the execution of the program, we would get the graph of the clusters and their
            centroids at the initial step as in Image 5.1.



                                                    [ 112 ]
   119   120   121   122   123   124   125   126   127   128   129