Page 127 - Data Science Algorithms in a Week
P. 127

Clustering into K Clusters


                0), ((0.46875, 1.0), 0), ((0.375, 0.75), 0), ((0.0, 0.7), 0), ((0.625,
                0.3), 1), ((0.9375, 0.5), 1)]
                centroids = [(0.22395833333333334, 0.63), (0.79375, 0.188)]

































            The blue cluster contains scaled features (0.09375,0.2), (0.25,0.65), (0.15625,0.48), (0.46875,1),
            (0.375,0.75), (0,0.7) or unscaled ones (23,50000), (28,95000), (25,78000), (35,130000),
            (32,105000), (20,100000). The red cluster contains scaled features (0.53125,0.04), (0.875,0.1),
            (1,0), (0.625,0.3), (0.9375,0.5) or unscaled ones (37,34000), (48,40000), (52,30000), (40,60000),
            (50,80000).

            So Peter belongs to the red cluster. What is the proportion of house owners in a red cluster
            not counting Peter? 2/4 or 1/2 of the people in the red cluster are house owners. Thus the
            red cluster to which Peter belongs does not seem to have a high predictive power in
            determining whether Peter would be a house owner or not. We may try to cluster the data
            into more clusters in the hope that we would gain a purer cluster that could be more
            reliable for a prediction of the house-ownership for Peter. Let us therefore try to cluster the
            data into the three clusters.









                                                    [ 115 ]
   122   123   124   125   126   127   128   129   130   131   132